Technology Apr 15, 2026 · 3 min read

How I Served 80,000+ Recommendations in Under 50ms

Every recommendation tutorial I found was either a Netflix black box or a 1,000-row Jupyter notebook toy. I wanted something in between — real, deployable, and something I actually understood. That's how Inkpick was born: a hybrid recommendation engine across cinema, music, and courses with sub-50ms...

DE
DEV Community
by Mayank Parashar
How I Served 80,000+ Recommendations in Under 50ms

Every recommendation tutorial I found was either a Netflix black box or a 1,000-row Jupyter notebook toy. I wanted something in between — real, deployable, and something I actually understood.
That's how Inkpick was born: a hybrid recommendation engine across cinema, music, and courses with sub-50ms inference on 80,000+ items. Just NumPy, FastAPI, and deliberate design choices.

What "Hybrid" Means

Content-Based Filtering — works on day one, no user history needed. But it traps users in a bubble.
Collaborative Filtering — discovers surprising cross-user patterns. Falls apart for new users (cold-start problem).

A hybrid blends both:

score_hybrid(i) = α · score_cb(i) + (1 - α) · score_cf(i)

Inkpick defaults α = 0.65 — content-biased for cold-start users, shifting toward collaborative as history grows.

plaintext
The Architecture
Client (Vanilla JS)
       │
  FastAPI (Async)
  ┌────┴────┬──────────┐
TF-IDF   Latent    Levenshtein
+ CSR    Factor    Fuzzy Search
  └────┬────┘
  Hybrid Layer
       │
 Service Registry
(cinema / audio / edu)

Each domain is fully decoupled. Adding a new domain = one new service file.

Content-Based: TF-IDF + Cosine Similarity
TF-IDF turns item metadata (title, genre, tags) into vectors. Words unique to one item = high weight. Common words like "the" = penalized.

Similarity between items is then a dot product:

similarity(q, i) = (q · i) / (‖q‖ · ‖i‖)

Why not SciPy? Inkpick implements CSR (Compressed Sparse Row) ops directly in NumPy — cutting a ~30MB dependency, reducing memory, and keeping full control over the pipeline. An 80,000-item matrix is ~98% zeros; CSR stores only non-zero values.

Collaborative Filtering: Latent Factors
CF decomposes the user–item interaction matrix into lower-dimensional embeddings:

R ≈ U × Vᵀ

These latent dimensions learn hidden patterns — "likes slow-burn thrillers" — without being told. In Inkpick, this module is a production-ready stub awaiting a trained ALS/BPR model. Honest limitation, next on the roadmap.

Fuzzy Search Fallback
Search "Godfater" → no match → system fails. Not ideal.
Inkpick uses Levenshtein edit-distance as a safety net:

"Godfater" → "Godfather" = 1 edit

When exact search fails, fuzzy kicks in and returns the closest matches. Small addition, big UX improvement.

The API
GET /recommend/cinema?item_id=tt0111161&top_k=5&mode=hybrid
json{
  "domain": "cinema",
  "results": [{ "title": "The Godfather", "score": 0.94 }],
  "latency_ms": 38
}

The mode param accepts content, collaborative, or hybrid — handy for debugging.

What I'd Fix in v2

Train the CF model — ALS or BPR. The hybrid is only as good as both components.
SBERT over TF-IDF — semantic similarity that keyword matching completely misses.

Add evaluation metrics — Precision@K, NDCG. Fast latency is measurable; recommendation quality currently isn't.

Dynamic α — learn the blend weight per user instead of hardcoding 0.65.

Diversity control — MMR to avoid returning "10 Batman movies."

Try It

live : inkpick.vercel.app
github : github.com/MayankParashar28/inkpick

Drop a comment if you're building something similar — would love to exchange notes.

DE
Source

This article was originally published by DEV Community and written by Mayank Parashar.

Read original article on DEV Community
Back to Discover

Reading List