AI Engineering · September 15, 2024 · 8 min read

Building an AI-Powered Product Recommendation System

By Runtime Revolution

Recommendation systems are one of the highest-leverage places to add AI to an existing product. Done well, they drive measurable uplift in engagement and revenue. Done poorly, they surface irrelevant content, erode trust, and create a maintenance burden that outweighs the benefit.

This post walks through the architecture we use for production recommendation systems — the kind that improves over time rather than degrading silently.

The baseline problem with most recommendation systems

Most teams implement recommendations by starting with collaborative filtering (users who liked X also liked Y) or simple popularity rankings. These work at first. They stop working when:

The catalog is large and sparse (most items have few interactions)
The user base is heterogeneous (new users vs. long-term users need different signals)
The catalog changes frequently (new items have no interaction history)
There's no feedback loop — the system can't learn from what it got wrong

Embeddings solve most of these problems. The key insight: represent both users and items in the same vector space, then recommendations become nearest-neighbour lookups.

Architecture overview

The system has four components:

Embedding service — converts items and user behaviour into dense vectors
Vector store — stores and indexes embeddings for fast similarity search (we use Pinecone in production, pgvector works fine at smaller scale)
Retrieval layer — given a user context, fetches candidate items via ANN search
Re-ranking layer — applies business rules and freshness signals to the candidate set before returning final recommendations

Building the embedding service

For product recommendations, we embed item content (title, description, category, attributes) using a pre-trained sentence transformer. OpenAI's text-embedding-3-small is cost-effective and performant for most use cases. For domain-specific catalogs (medical, legal, technical), fine-tuning or a domain-specific model often outperforms general embeddings.

User embeddings are computed from interaction history: the weighted average of the item embeddings the user has interacted with, weighted by recency and engagement signal (click < add-to-cart < purchase < review).

The feedback loop

This is the part most teams skip. A recommendation system without a feedback loop will degrade over time as the catalog evolves and user preferences shift. You need:

Impression logging — what did you show, to whom, when
Click logging — what was clicked from a recommendation surface
Conversion logging — what downstream action followed
A scheduled job that retrains or updates embeddings using recent interaction data

The retraining frequency depends on catalog velocity. For a slow-moving catalog, weekly retraining is fine. For a fast-moving catalog (daily new items), you need online updates — embedding new items immediately and indexing them into the vector store without a full rebuild.

Re-ranking for business logic

Pure semantic similarity isn't always what you want to surface. A recommendation system that only optimises for similarity will over-index on safe choices. The re-ranking layer is where business rules live:

Freshness decay — boost newer items, reduce weight of items older than X days
Inventory filtering — don't recommend out-of-stock items
Diversity enforcement — avoid showing 10 variations of the same item
Business promotions — allow manual boosts for specific items

Keeping business logic in the re-ranking layer (rather than baked into the model) means it can be updated without retraining.

What to instrument from day one

Before you go live, instrument these metrics:

Click-through rate per recommendation surface
Conversion rate from recommendation click
Coverage — what percentage of your catalog ever appears in recommendations
Diversity — average pairwise distance between items in a recommendation set

Coverage and diversity are the early warning signals for the most common failure modes: a system that concentrates recommendations on a small subset of popular items, and a system that returns repetitive results.

The short version

Embeddings + vector search is the right foundation. Build the feedback loop before you launch, not after. Keep business logic in re-ranking, not in the model. Instrument coverage and diversity from day one.