Technology Apr 15, 2026 · 1 min read

I Built an LLM Gateway That Learns Which Model to Use — Here's How the Routing Works

How it works: Request arrives at an OpenAI-compatible endpoint Classifier detects task type + complexity Adaptive router picks the highest-scoring model for that cell Quality feedback (user ratings + LLM judge) continuously improves routing Change 2 lines in your code. That's it. But it's more tha...

DE
DEV Community
by Nicholas Blanchard
I Built an LLM Gateway That Learns Which Model to Use — Here's How the Routing Works

How it works:

Request arrives at an OpenAI-compatible endpoint
Classifier detects task type + complexity
Adaptive router picks the highest-scoring model for that cell
Quality feedback (user ratings + LLM judge) continuously improves routing
Change 2 lines in your code. That's it.

But it's more than a router. Full platform:

Request logs with replay + diff view
Time-series analytics (cost, latency p50/p95/p99)
A/B testing between models
Guardrails (PII redaction)
Prompt template versioning
Spend/latency alerting
Tweet 4:

Self-hosted with Docker. Your data never leaves your infrastructure.

Supports OpenAI, Anthropic, Google, Mistral, xAI, Ollama, and any OpenAI-compatible provider.

BYOK — bring your own API keys.

The routing gets smarter over time. An LLM-as-judge automatically scores responses, building quality data per model per task type.

After enough feedback, the router stops sending complex prompts to the cheapest model and starts picking the best one.

No manual configuration needed.

GitHub: https://github.com/syndicalt/provara
Live demo: https://www.provara.xyz

DE
Source

This article was originally published by DEV Community and written by Nicholas Blanchard.

Read original article on DEV Community
Back to Discover

Reading List