I wanted to understand how AI coding tools actually work under the hood. Not just use them — but build one myself.
So I built AgentCode: an open-source, multi-model agentic coding CLI. You type a request in plain English, and it reads your codebase, writes code, runs tests, manages git — all autonomously.
Here's what I learned building it.
The Core Insight: It's Just a Loop
Every agentic coding tool — no matter how polished — runs the same fundamental pattern:
while needs_follow_up:
1. Send conversation + tools → LLM
2. If LLM returns tool calls → execute them, append results, loop
3. If LLM returns text → done
That's it. The "magic" of AI coding agents is a while loop with function calling. The other 95% is context management, tool execution, error handling, and permissions.
Here's the simplified version of my agentic loop:
def run_agent_loop(user_input, conversation, config):
conversation.add_user(user_input)
for iteration in range(config.max_iterations):
stream = completion(
model=routed_model,
messages=conversation.messages,
tools=TOOL_DEFINITIONS,
stream=True,
)
text, tool_calls, usage = process_stream(stream)
if not tool_calls:
# No tools called — model is done
conversation.add_assistant(content=text)
break
# Execute each tool, feed results back, loop
for tc in tool_calls:
result = execute_tool(tc.name, tc.args)
conversation.add_tool_result(tc.id, result)
When a user says "fix the bug in app.py", the LLM doesn't magically fix anything. It calls read_file("app.py"), sees the code, calls edit_file(...) with the fix, then calls run_command("pytest") to verify. Each step is a tool call that the loop executes and feeds back.
Architecture
┌─────────────────────────────────────────────────┐
│ cli.py (UI) │
│ REPL loop · slash commands · Rich terminal UI │
└──────────────────────┬──────────────────────────┘
│
┌──────────────────────▼──────────────────────────┐
│ agent.py (Brain) │
│ Agentic loop · context management · permissions│
│ │
│ LiteLLM ──→ Claude / GPT / Gemini / Ollama │
└──────────────────────┬──────────────────────────┘
│
┌──────────────────────▼──────────────────────────┐
│ tools.py (Hands) │
│ read_file · write_file · edit_file │
│ run_command · git_commit · search_text │
└─────────────────────────────────────────────────┘
Three files, three responsibilities:
- cli.py — the terminal UI (REPL, slash commands, session management)
- agent.py — the brain (agentic loop, streaming, permissions, context compaction)
- tools.py — the hands (file I/O, bash execution, git, search)
The Feature I'm Most Proud Of: Cost-Aware Routing
Most AI coding tools lock you into one model. You pay the same price whether you're asking "what does this function do" or "refactor the entire auth system."
AgentCode classifies every prompt by complexity and automatically picks the cheapest model that can handle it:
| Tier | Example Prompt | Model | Why |
|---|---|---|---|
| Light | "what does this function do" | Haiku | Fast, cheap — just reading and explaining |
| Medium | "write unit tests for app.py" | Sonnet | Needs to understand code and generate new code |
| Heavy | "refactor the entire auth system" | Opus | Multi-file, multi-step, architectural thinking |
The classification uses pattern matching on the input — words like "refactor", "migrate", "entire codebase" trigger heavy; "write", "create", "fix" trigger medium; "explain", "what is", "show me" trigger light.
def classify_complexity(user_input):
text = user_input.lower()
heavy_score = sum(1 for p in HEAVY_PATTERNS if re.search(p, text))
medium_score = sum(1 for p in MEDIUM_PATTERNS if re.search(p, text))
if heavy_score >= 2:
return "heavy"
elif medium_score >= 1:
return "medium"
else:
return "light"
Simple, transparent, and saves real money. You can always override with /model if you disagree with the routing.
Streaming: The UX Difference
The first version waited for the full LLM response before showing anything. You'd stare at a blank terminal for 5-10 seconds. Adding streaming was a night-and-day improvement.
The tricky part with streaming in an agentic loop: the LLM can return text AND tool calls in the same response. Text tokens arrive one at a time, but tool call arguments arrive as fragments that need to be assembled.
def process_stream(stream):
full_text = ""
tool_calls_acc = {}
for chunk in stream:
delta = chunk.choices[0].delta
# Text tokens — print immediately
if delta.content:
print(delta.content, end="", flush=True)
full_text += delta.content
# Tool call fragments — accumulate silently
if delta.tool_calls:
for tc_delta in delta.tool_calls:
idx = tc_delta.index
if idx not in tool_calls_acc:
tool_calls_acc[idx] = {"id": "", "name": "", "arguments": ""}
if tc_delta.function.arguments:
tool_calls_acc[idx]["arguments"] += tc_delta.function.arguments
return full_text, tool_calls_acc
Text streams to the screen in real-time. Tool calls assemble in the background. The user sees words appearing instantly while the agent figures out what to do next.
Multi-Model Support
AgentCode uses LiteLLM as an abstraction layer. This means I write one set of tool definitions in OpenAI's format, and LiteLLM translates them to whatever the provider expects.
Switch models mid-conversation:
❯ /model gpt-4o
✓ Switched to gpt-4o
❯ /model claude-opus-4-6
✓ Switched to claude-opus-4-6
❯ /model ollama/qwen2.5-coder
✓ Switched to ollama/qwen2.5-coder
Same tools, same loop, different brain. The local Ollama option means you can run the entire thing with zero API cost.
The Permission System
Any tool that writes files or executes commands asks before acting:
🔒 Permission Required
Tool: write_file
Args: {"path": "src/handler.py", "content": "..."}
Allow this action? [y/n] (y):
Read-only tools (read_file, list_directory, search) auto-approve. This keeps the flow fast while preventing the agent from doing anything destructive without your consent.
What I Learned
1. Context management is the hard problem. The agentic loop itself is trivial. Managing what's in the context window — compacting old messages, summarizing, keeping the right information available — that's where the real engineering is.
2. Tool definitions matter more than the prompt. A well-described tool with clear parameter descriptions outperforms a clever system prompt. The LLM reads the tool schema like documentation.
3. Streaming changes everything. The difference between "wait 8 seconds for a response" and "see words appearing instantly" is the difference between a frustrating tool and one you enjoy using.
4. Multi-model flexibility is underrated. Different models excel at different tasks. Being able to hot-swap between them — or let the router decide — means you always have the right tool for the job.
Try It
pip install agentcode-cli
export ANTHROPIC_API_KEY="your-key"
agentcode
The codebase is readable Python — no frameworks, no abstractions. If you're curious how agentic coding tools work, clone it and read through agent.py. The entire loop is about 50 lines.
GitHub: github.com/vigp17/AgentCode
PyPI: pypi.org/project/agentcode-cli
MIT licensed. Feedback and contributions welcome.
Tags: python, ai, opensource, tutorial
This article was originally published by DEV Community and written by Vignesh Pai.
Read original article on DEV Community