Introduction
"A neural network can recognize digits" — but what's actually happening inside?
I built a tool where you draw a digit with your finger or mouse, and watch the CNN (Convolutional Neural Network) recognize it in real time, with the internal signal flow visualized as it happens.
Try the Demo (runs in your browser — no install needed)
What Is This?
A tool that lets you see how a neural network makes its decisions as you draw handwritten digits.
Three Visualizations
Dial-style Heatmap — Digits 0–9 arranged like a phone dial, with color intensity showing confidence in real time. As you draw, you can see the network thinking: "looks like an 8... wait, now it's a 3."
Network Diagram — Input → Conv1 → Conv2 → FC → Output nodes and links light up orange based on signal strength. You can trace exactly which pathways the signal took to reach the answer.
CNN Input Preview — Shows how your drawing gets downscaled to 28×28 pixels. This is what the network actually "sees."
Not an Emulation — The Real Thing
This is not a simulation or replay. A real CNN with 27,690 parameters is running in your browser. Every time you draw a stroke, actual convolutions, ReLU activations, max-pooling, and fully-connected layer computations are executed, and the intermediate values are visualized directly.
Why I Built This
My previous project, Transformer Emulator, visualized the internals of a Transformer. But that was a "watch" experience — replaying pre-computed results.
This time, I wanted a "touch" experience. You draw a digit, and the network reacts instantly. The probabilities shift as you draw. The moment when "I'm drawing a 3 but the network thinks it's an 8" — that's something no textbook can give you.
What Happens While You Draw
On every pointermove event while drawing, the following pipeline runs:
- Canvas → 28×28 downscale — Bounding box detection, center-of-mass alignment. Same preprocessing as MNIST.
- CNN inference (JavaScript) — Conv → ReLU → MaxPool → Conv → ReLU → MaxPool → FC → Softmax. Pure matrix operations in vanilla JavaScript.
- Visualization update — Intermediate activations from each layer drive the dial colors and network diagram node/link brightness.
For a CNN this size on 28×28 input, inference completes in a few milliseconds — fast enough to run on every stroke without dropping frame rate.
Things I Learned
1. CNNs "Hesitate"
Watching the probability shift while drawing a "3":
| Drawing stage | Prediction |
|---|---|
| Drew a vertical line | 1: 30%, 7: 25% |
| Closed the top curve | 8: 55%, 9: 20% |
| Opened the bottom | 3: 60%, 8: 22% |
| Finished drawing | 3: 92% |
The intermediate states genuinely look like an 8. The CNN's "hesitation" matches human intuition — it's making rational judgments.
2. Preprocessing Makes or Breaks Accuracy
In the first version, drawing a "4" got classified as "7". The cause: missing preprocessing. MNIST data is center-of-mass aligned, but I was just naively downscaling the canvas to 28×28. Adding MNIST-compliant preprocessing (bounding box detection → center alignment → fit into 20×20 region) fixed it immediately.
3. 27,690 Parameters, 98% Accuracy
GPT-4 reportedly has ~1.8 trillion parameters. This CNN is 1/65-millionth that size. Yet it achieves 98.04% test accuracy. "Choose the right architecture (convolutions) and you can get high accuracy with minimal parameters" — this is the essence of CNNs, and you can feel it.
Tech Stack
| Component | Technology | Why |
|---|---|---|
| Training | Python / pure NumPy | No PyTorch — all backpropagation implemented from scratch. Educational purpose |
| Inference | Vanilla JavaScript | Runs entirely in the browser. No external libraries |
| Visualization | SVG + Canvas + CSS | Network diagram in SVG, drawing and preview in Canvas |
| Output | Single HTML file (~620KB) | Trained weights embedded as JSON. Easy to distribute |
Model Architecture
Conv(5x5, 8ch) → ReLU → MaxPool(2) # Detect 8 types of features from the image
Conv(3x3, 16ch) → ReLU → MaxPool(2) # Combine into 16 higher-level features
Flatten(400) → FC(64) → ReLU # Integrate all features for judgment
FC(10) → Softmax # Output probabilities for 0–9
Network Diagram Implementation
Nodes for each layer are placed in SVG, with <line> elements connecting adjacent layers. During inference, activation values update each node's fill and each link's stroke-opacity, making the signal flow visible.
There are 552 links total, but most have opacity near 0 — visually, only the active pathways light up.
Multilingual Support
A toggle button next to the title switches between Japanese and English. The initial language is auto-detected from the browser's language setting, and can also be set via URL parameter (?lang=en).
Since there are few text elements, a JS dictionary holds both languages and a button click swaps all text instantly — even mid-drawing.
Try It
Live Demo
https://tomoiura.github.io/digit_recognizer/
Just open it in your browser.
Build from Source
git clone https://github.com/Tomoiura/digit_recognizer.git
cd digit_recognizer
pip install numpy
python main.py
First run downloads MNIST data and trains the model (takes a few minutes). Subsequent runs use cached weights and complete in seconds.
Wrapping Up
My previous Transformer Emulator was about "watching AI learn." This project is about "drawing with your own hand and feeling AI react in real time."
Instead of formulas or diagrams, the answer to "what is a neural network doing?" comes through touching, seeing, and feeling. That's the experience I was aiming for.
If you find technical errors or have suggestions, Issues and PRs are welcome.
Related: Transformer Emulator — Visualize the internals of a Transformer decoder, also running in the browser.
This article was originally published by DEV Community and written by Tomohisa Iura.
Read original article on DEV Community