Building an AI-Powered Product Recommendation System
Recommendation systems are one of the highest-leverage places to add AI to an existing product. Done well, they drive measurable uplift in engagement and revenue. Done poorly, they surface irrelevant content, erode trust, and create a maintenance burden that outweighs the benefit.
This post walks through the architecture we use for production recommendation systems — the kind that improves over time rather than degrading silently.
The baseline problem with most recommendation systems
Most teams implement recommendations by starting with collaborative filtering (users who liked X also liked Y) or simple popularity rankings. These work at first. They stop working when:
- The catalog is large and sparse (most items have few interactions)
- The user base is heterogeneous (new users vs. long-term users need different signals)
- The catalog changes frequently (new items have no interaction history)
- There's no feedback loop — the system can't learn from what it got wrong
Embeddings solve most of these problems. The key insight: represent both users and items in the same vector space, then recommendations become nearest-neighbour lookups.
Architecture overview
The system has four components:
- Embedding service — converts items and user behaviour into dense vectors
- Vector store — stores and indexes embeddings for fast similarity search (we use Pinecone in production, pgvector works fine at smaller scale)
- Retrieval layer — given a user context, fetches candidate items via ANN search
- Re-ranking layer — applies business rules and freshness signals to the candidate set before returning final recommendations
Building the embedding service
For product recommendations, we embed item content (title, description, category, attributes)
using a pre-trained sentence transformer. OpenAI's text-embedding-3-small is
cost-effective and performant for most use cases. For domain-specific catalogs (medical,
legal, technical), fine-tuning or a domain-specific model often outperforms general embeddings.
User embeddings are computed from interaction history: the weighted average of the item embeddings the user has interacted with, weighted by recency and engagement signal (click < add-to-cart < purchase < review).
The feedback loop
This is the part most teams skip. A recommendation system without a feedback loop will degrade over time as the catalog evolves and user preferences shift. You need:
- Impression logging — what did you show, to whom, when
- Click logging — what was clicked from a recommendation surface
- Conversion logging — what downstream action followed
- A scheduled job that retrains or updates embeddings using recent interaction data
The retraining frequency depends on catalog velocity. For a slow-moving catalog, weekly retraining is fine. For a fast-moving catalog (daily new items), you need online updates — embedding new items immediately and indexing them into the vector store without a full rebuild.
Re-ranking for business logic
Pure semantic similarity isn't always what you want to surface. A recommendation system that only optimises for similarity will over-index on safe choices. The re-ranking layer is where business rules live:
- Freshness decay — boost newer items, reduce weight of items older than X days
- Inventory filtering — don't recommend out-of-stock items
- Diversity enforcement — avoid showing 10 variations of the same item
- Business promotions — allow manual boosts for specific items
Keeping business logic in the re-ranking layer (rather than baked into the model) means it can be updated without retraining.
What to instrument from day one
Before you go live, instrument these metrics:
- Click-through rate per recommendation surface
- Conversion rate from recommendation click
- Coverage — what percentage of your catalog ever appears in recommendations
- Diversity — average pairwise distance between items in a recommendation set
Coverage and diversity are the early warning signals for the most common failure modes: a system that concentrates recommendations on a small subset of popular items, and a system that returns repetitive results.
The short version
Embeddings + vector search is the right foundation. Build the feedback loop before you launch, not after. Keep business logic in re-ranking, not in the model. Instrument coverage and diversity from day one.