Beyond ChatGPT Wrappers: Building a Real Semantic Search API with ASP.NET Core and OpenAI Embeddings

Most developers jump straight to chat completions when they think "AI + backend." But the feature that's quietly changing how products work — semantic search — is more powerful, cheaper, and honestly more fun to build.

The Problem with Keyword Search

Imagine you're building a knowledge base for a SaaS product. A user types: "my account got locked". Your keyword search returns nothing because your docs say "authentication failure" and "access denied." Same meaning. Zero matches.

This is the gap that semantic search closes — and you can wire it into an ASP.NET Core API in an afternoon.

Instead of matching words, semantic search matches meaning. It does this using embeddings: numerical vectors that represent the semantic content of text. Similar meanings produce vectors that are close together in high-dimensional space.

Let's build it from scratch.

What We're Building

A minimal ASP.NET Core Web API that:

Accepts a list of documents and stores their embeddings
Accepts a search query and returns the most semantically relevant documents
Uses OpenAI's text-embedding-3-small model (fast and cheap)

4. Keeps everything in-memory for simplicity (swap for a vector DB like Qdrant later)

Prerequisites

.NET 8 SDK
An OpenAI API key

- Basic familiarity with ASP.NET Core minimal APIs

Step 1: Create the Project

dotnet new webapi -n SemanticSearchApi --use-minimal-apis
cd SemanticSearchApi
dotnet add package OpenAI

Add your API key to appsettings.Development.json:

{
  "OpenAI": {
    "ApiKey": "sk-..."
  }
}

Step 2: The Embedding Service

Create Services/EmbeddingService.cs:

using OpenAI.Embeddings;

public class EmbeddingService
{
    private readonly EmbeddingClient _client;

    public EmbeddingService(IConfiguration config)
    {
        var apiKey = config["OpenAI:ApiKey"]!;
        _client = new EmbeddingClient("text-embedding-3-small", apiKey);
    }

    public async Task<float[]> GetEmbeddingAsync(string text)
    {
        var result = await _client.GenerateEmbeddingAsync(text);
        return result.Value.ToFloats().ToArray();
    }
}

This wraps the OpenAI call and returns a float array — the raw vector representation of your text.

Step 3: The Vector Store

Create Services/VectorStore.cs:

public class DocumentEntry
{
    public string Id { get; set; } = Guid.NewGuid().ToString();
    public string Text { get; set; } = string.Empty;
    public float[] Embedding { get; set; } = [];
}

public class VectorStore
{
    private readonly List<DocumentEntry> _documents = new();

    public void Add(DocumentEntry entry) => _documents.Add(entry);

    public IEnumerable<(DocumentEntry Doc, float Score)> Search(
        float[] queryVector,
        int topK = 5)
    {
        return _documents
            .Select(doc => (doc, Score: CosineSimilarity(queryVector, doc.Embedding)))
            .OrderByDescending(x => x.Score)
            .Take(topK);
    }

    private static float CosineSimilarity(float[] a, float[] b)
    {
        float dot = 0, magA = 0, magB = 0;
        for (int i = 0; i < a.Length; i++)
        {
            dot  += a[i] * b[i];
            magA += a[i] * a[i];
            magB += b[i] * b[i];
        }
        return dot / (MathF.Sqrt(magA) * MathF.Sqrt(magB));
    }
}

Cosine similarity is the key formula here. It measures the angle between two vectors — if the angle is small (vectors point in the same direction), the texts are semantically similar. The score ranges from -1 to 1; above 0.75 usually means a strong match.

Step 4: Register Services in Program.cs

var builder = WebApplication.CreateBuilder(args);

builder.Services.AddSingleton<EmbeddingService>();
builder.Services.AddSingleton<VectorStore>();

var app = builder.Build();

Step 5: The API Endpoints

Still in Program.cs, add two endpoints:

Index Documents

app.MapPost("/documents", async (
    IndexRequest request,
    EmbeddingService embedder,
    VectorStore store) =>
{
    foreach (var text in request.Documents)
    {
        var embedding = await embedder.GetEmbeddingAsync(text);
        store.Add(new DocumentEntry { Text = text, Embedding = embedding });
    }
    return Results.Ok(new { indexed = request.Documents.Count });
});

record IndexRequest(List<string> Documents);

Search

app.MapGet("/search", async (
    string query,
    EmbeddingService embedder,
    VectorStore store) =>
{
    var queryVector = await embedder.GetEmbeddingAsync(query);
    var results = store.Search(queryVector, topK: 3);

    return Results.Ok(results.Select(r => new
    {
        text  = r.Doc.Text,
        score = Math.Round(r.Score, 4)
    }));
});

app.Run();

Step 6: See It in Action

First, index some documents:

curl -X POST http://localhost:5000/documents \
  -H "Content-Type: application/json" \
  -d '{
    "documents": [
      "How to reset your password in the settings menu",
      "Your account may be locked after 5 failed login attempts",
      "Contact support to upgrade your subscription plan",
      "Two-factor authentication setup guide",
      "How to export your data as a CSV file"
    ]
  }'

Now search with natural language:

curl "http://localhost:5000/search?query=my+account+got+locked"

Response:

[
  { "text": "Your account may be locked after 5 failed login attempts", "score": 0.8921 },
  { "text": "How to reset your password in the settings menu",          "score": 0.7634 },
  { "text": "Two-factor authentication setup guide",                    "score": 0.7102 }
]

The top result is exactly what the user meant, even though they used completely different words. That's the power of embeddings.

What's Happening Under the Hood

When you call OpenAI's embedding model, it processes your text through a neural network trained on massive amounts of human-written content. The output isn't some magic black box — it's a list of 1,536 floating-point numbers (for text-embedding-3-small) that encode the semantic position of your text in a high-dimensional concept space.

Texts that humans consider similar end up geometrically close in this space. That's it. No fine-tuning, no training on your data, no complex setup.

Taking It Further

This in-memory implementation is a great starting point. Here's what the production path looks like:

1. Use a Real Vector Database
For anything beyond a few thousand documents, swap the VectorStore for Qdrant, Weaviate, or pgvector (PostgreSQL extension). They handle indexing and similarity search at scale efficiently.

2. Persist Embeddings
Embedding generation costs API calls. Store vectors in your DB so you only compute them once per document.

3. Add a Re-ranker
After retrieving the top 20 results by cosine similarity, pass them through a cross-encoder or a second GPT call to re-rank by relevance. This is the "RAG" pattern at its core.

4. Combine with LLM Generation
Feed your search results as context into a chat completion call. Now you have a full Retrieval-Augmented Generation (RAG) system — your AI answers questions grounded in your data.

// After semantic search, pass results to GPT-4o
var context = string.Join("\n", topResults.Select(r => r.Doc.Text));
var answer = await chatClient.CompleteChatAsync(
    $"Answer the user's question using only the context below.\n\nContext:\n{context}\n\nQuestion: {query}"
);

Why This Matters for ASP.NET Developers

You don't need to become an ML engineer. You don't need to run models locally. The entire semantic intelligence lives in an API call — and the plumbing is just clean C# code you already know how to write.

What you do need to understand is the architecture: embeddings are just data, your API is just infrastructure, and the AI layer is just a smart service you can mock, test, and version like any other dependency.

That mental model — AI as a service, not magic — is what separates developers who build real AI features from those who just wrap a chatbot.

Full Source

The complete project is ~100 lines of code. No complex dependencies, no heavy frameworks. Just a clean ASP.NET Core minimal API doing something genuinely useful.

Start here. Add pgvector next week. Add RAG the week after. That's how production AI features actually get built — one solid layer at a time.

Happy building. If you hit issues or want to discuss scaling this pattern, reach out on LinkedIn.