Over the past days, OpenBlob changed a lot.
Not just visually โ but fundamentally.
This is a proper progress update on where things are heading ๐
๐ง Quick recap
OpenBlob is a local-first desktop AI companion that:
- lives on your desktop
- understands your context
- can see your screen (via vision models)
- reacts in real-time
- executes actions directly on your system
๐ Repo: https://github.com/southy404/openblob
๐ง Rebuilding the core (this was the big one)
The biggest update isnโt something you see. Itโs how everything works underneath. OpenBlob now has a much cleaner and more scalable structure:
Core pipeline
input (voice / text / screen)
โ intent detection
โ command router
โ execution (local first)
โ AI fallback if needed
What changed
- Clear separation of responsibilities
- Proper command routing system
- Modular capabilities instead of chaos
- Easier to extend without breaking everything
This turns OpenBlob into something bigger than a chatbot: a runtime layer for your desktop.
๐งฉ Open-source friendly structure
One goal became very clear: this needs to be hackable. So the architecture is moving towards a module system like this:
๐ modules/
โณ ๐ discord/
โณ ๐ spotify/
โณ ๐ browser/
โณ ๐ system/
Each module:
- exposes commands
- runs locally
- can be extended independently
This makes it much easier to:
- build plugins
- integrate APIs
- experiment without touching the core
๐จ New UI (cleaner, faster, more alive)
The UI got a big upgrade:
- Floating bubble interface
- Glassmorphism style
- Smoother, more organic animations
- Faster interaction
Interaction now feels like:
-
CTRL + SPACEโ instant open - Global voice toggle
- Minimal friction
Less โtoolโ. More presence.
๐ฌ NEW: Just Chatting mode
Sometimes you donโt want commands. You just want to talk. So OpenBlob now has a Just Chatting mode:
- Pure conversation with your AI companion
- No command routing
- No execution layer
- Just dialogue
This is important because: the companion shouldnโt only do things โ it should also be there.
Use cases:
- Thinking out loud
- Asking questions
- Casual conversation
- Testing personality / tone
๐ผ Screenshot assistant (more usable now)
The screen pipeline is getting more solid:
screenshot
โ OCR
โ context extraction
โ reasoning
โ answer
Already useful for:
- Debugging
- UI understanding
- Games
- Quick research
Still improving โ but getting reliable.
๐๏ธ NEW: real-time transcript system
This is one of the biggest new additions. OpenBlob can now:
- Listen to system audio
- Listen to microphone input
- Generate live transcripts
- Store structured sessions
Pipeline
audio (system / mic)
โ transcription
โ segmented timeline
โ structured session
โ saved as text
What it already works for
- Meetings (Meet, Zoom, etc.)
- YouTube / podcasts
- Lectures
- General audio capture
๐งช Current prototype
- Live text appearing in real-time
- Segmented transcript blocks
- Session tracking
- Simple overlay UI
Itโs still early. But it works.
๐ฎ Where transcripts are going
This is not just speech-to-text. Next steps:
๐ Meeting assistant
- Summaries
- Key points
- Action items
๐ง Memory layer
- Link transcripts to context
- Searchable history
โก Real-time help
- Explain while listening
- Highlight important info
- Suggest responses
โก Philosophy (still the same)
- Local-first
- Context > Prompt
- System-level AI
- Playful + useful
๐งช Current state
- Still experimental
- Still buggy sometimes
- Evolving very fast
But now: Much better structure, clearer direction, and easier to contribute.
๐ค If you want to join
Now is actually a great time. You can:
- Build modules (Discord, Spotify, browser, etc.)
- Improve transcription
- Design UI
- Experiment with AI
๐ Join here: https://github.com/southy404/openblob
๐ก Final thought
Iโm starting to believe the future of AI is not a chat window in a browser.
But something that lives on your system, understands your context, and can both act and talk.
OpenBlob is slowly getting there.
This article was originally published by DEV Community and written by southy404.
Read original article on DEV Community