You don't need an API key or a cloud subscription to use LLMs. Ollama lets you run models locally on your machine — completely free, completely private. Here's how to set it up and start building with it.
What is Ollama?
Ollama is a tool that downloads, manages, and serves LLMs locally. It exposes an OpenAI-compatible API at localhost:11434, so any code that works with the OpenAI API works with Ollama — zero changes.
Installation
# Linux / WSL
curl -fsSL https://ollama.com/install.sh | sh
# macOS
brew install ollama
# Windows
# Download from https://ollama.com/download
Start the server:
ollama serve
Pick a Model
# Code-focused (best for dev tools)
ollama pull qwen2.5-coder:7b # 4.7GB, good balance
ollama pull qwen2.5-coder:1.5b # 1.0GB, fast, good enough for many tasks
ollama pull deepseek-coder-v2 # 8.9GB, top quality
# General purpose
ollama pull llama3.1:8b # 4.7GB, Meta's latest
ollama pull mistral:7b # 4.1GB, fast and capable
My recommendation: start with qwen2.5-coder:1.5b for speed, upgrade to 7b when you need quality.
Your First API Call
Ollama serves an OpenAI-compatible endpoint. Here's a call with plain fetch:
const response = await fetch("http://localhost:11434/v1/chat/completions", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
model: "qwen2.5-coder:7b",
messages: [
{ role: "system", content: "You are a helpful assistant." },
{ role: "user", content: "Explain what a closure is in JavaScript." },
],
temperature: 0,
stream: false,
}),
});
const data = await response.json();
console.log(data.choices[0].message.content);
That's it. No API key, no SDK, no account.
Structured Output (JSON Mode)
The key to building real tools with LLMs is getting structured output. Tell the model to respond with JSON:
const response = await fetch("http://localhost:11434/v1/chat/completions", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
model: "qwen2.5-coder:7b",
messages: [
{
role: "system",
content: `Respond with ONLY valid JSON matching this schema:
{ "summary": "string", "topics": ["string"], "difficulty": "beginner|intermediate|advanced" }`,
},
{
role: "user",
content: "Analyze this article topic: Building REST APIs with Express.js",
},
],
temperature: 0,
stream: false,
}),
});
Tip: always validate the response with Zod or a similar schema validator. Smaller models sometimes return invalid JSON.
Building a Provider Abstraction
If you want your app to work with both Ollama (local) and Claude/OpenAI (cloud), create a simple interface:
interface LlmProvider {
chat(system: string, messages: Message[]): Promise<string>;
}
class OllamaProvider implements LlmProvider {
constructor(private model: string) {}
async chat(system: string, messages: Message[]): Promise<string> {
const response = await fetch("http://localhost:11434/v1/chat/completions", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
model: this.model,
messages: [{ role: "system", content: system }, ...messages],
temperature: 0,
stream: false,
}),
});
const data = await response.json();
return data.choices[0].message.content;
}
}
Now your code doesn't care where the model runs. Swap OllamaProvider for AnthropicProvider with a flag.
Performance Tips
- First call is slow — the model loads into memory. Subsequent calls are fast.
- Keep the server running — don't start/stop per request.
-
Use smaller models for dev —
1.5bfor iteration,7bfor production quality. -
Set
temperature: 0for deterministic output (important for structured responses). - Add a timeout — local models on CPU can take minutes for long prompts.
When to Use Local vs Cloud
| Use Case | Local (Ollama) | Cloud (Claude/GPT) |
|---|---|---|
| Development | Great | Expensive |
| Privacy-sensitive data | Required | Risky |
| Production quality | Good (7b+) | Best |
| Speed | Depends on hardware | Fast |
| Cost | Free | Per-token |
What I Built With It
spectr-ai — an AI smart contract auditor that works with both Claude and Ollama. The --model ollama:qwen2.5-coder:1.5b flag runs everything locally, free, no API key.
Local LLMs are good enough for real developer tools. The quality gap is closing fast.
This article was originally published by DEV Community and written by Pavel Espitia.
Read original article on DEV Community