How we used MongoDB as the persistence layer in TradeEval — a real-time stock behavior analysis platform – Discover

TradeEval is a simulation-based stock analyzer that combines historical price data, live news sentiment, and ML-driven risk classification to show how stocks behave around real-world events.
Team Members:
This project was developed by:

We would like to express our sincere gratitude to @chanda_rajkumar for his valuable guidance and support throughout this project.

Here's how MongoDB fits into that architecture and why we chose it over a relational database.

Why MongoDB over a relational database:

When we started building TradeEval, the first design question was: what does our data actually look like? A backtest result for Apple looks completely different from a risk prediction result, which looks completely different from an event analysis. Each contains nested objects, variable-length arrays of news articles, and optional fields that only exist for certain analysis types.

Forcing that into a relational schema would mean either one massive table with dozens of nullable columns, or a complex join structure that changes every time we add a new analysis type. MongoDB's document model lets each result carry exactly the fields it needs — no nulls, no joins, no schema migrations when we add a new feature.

MongoDB stores each analysis result as a self-contained document. A backtest result, an event analysis, and a risk prediction all live in the same collection — differentiated by a type field, not by separate tables.

Our Data Model:

All results are written to a single results collection in the tradeeval_db database. Each document follows a loose shared envelope with a type discriminator:

{
  "type":      "full_analysis",
  "symbol":    "AAPL",
  "strategy":  "moving_average",
  "timestamp": "2026-04-10T14:32:11Z",
  "result": {
    "behavior_summary":    "strongly bullish",
    "news_signal":         "bullish",
    "volatility_pct":      18.4,
    "recent_return_pct":   3.2,
    "avg_sentiment_score": 0.312,
    "news_breakdown": {
      "positive": 6,
      "negative": 1,
      "neutral":  3
    },
    "top_news": [
      {
        "title":           "Apple beats Q2 earnings estimates",
        "source":          "Reuters",
        "sentiment":       "positive",
        "sentiment_score": 0.48,
        "impact_weight":   2.0
      }
    ],
    "backtest": {
      "total_return":  "18.4%",
      "max_drawdown":  "-7.2%",
      "sharpe_ratio":  1.42,
      "win_rate":      "61.9%"
    },
    "risk_label":    "Medium",
    "risk_level":    1,
    "confidence":    0.783
  }
}

The same collection holds backtest-only results, event-only results, and risk predictions — each a slightly different document shape, all queryable with the same pymongo client.

The Database Service Layer:

We connect to MongoDB lazily through a thin service module database.py so a cold MongoDB instance doesn't crash Django at startup. The connection is established only on the first actual write or read:

from pymongo import MongoClient
import os

_client     = None
_collection = None

def _get_collection():
    global _client, _collection
    if _collection is not None:
        return _collection
    mongo_uri   = os.environ.get("MONGODB_URI", "mongodb://127.0.0.1:27017/")
    _client     = MongoClient(mongo_uri, serverSelectionTimeoutMS=5000)
    _collection = _client["tradeeval_db"]["results"]
    return _collection

def save_result(data: dict) -> bool:
    from datetime import datetime
    data["timestamp"] = datetime.utcnow()
    _get_collection().insert_one(data)
    return True

The MONGODB_URI is read from an environment variable — mongodb://127.0.0.1:27017/ locally, and mongodb://mongo:27017/tradeval inside Docker, where mongo is the service name in docker-compose.yml.

What gets stored and why:

type: Discriminator — backtest / event_analysis / risk_prediction / full_analysis
symbol Stock ticker — used to query history for a specific stock
timestamp Auto-added on write — enables time-based sorting and audit trail
result Nested document — shape varies by type, no fixed schema required
news[] Array of analyzed articles with sentiment scores per article
_risk_label _ Top-level field for fast querying — Low / Medium / High Reading results back

We also expose a get_results() function that lets the frontend query stored analyses — for example, to show a history of all backtests run on Tesla, or all high-risk predictions from the last week:

def get_results(result_type=None, limit=50):
    query  = {"type": result_type} if result_type else {}
    cursor = collection.find(query).sort("timestamp", -1).limit(limit)
    return [
        {k: v for k, v in doc.items() if k != "_id"}
        for doc in cursor
    ]

Stripping _id before returning keeps the response JSON-serializable — MongoDB's ObjectId type isn't natively serializable by Python's json module.

Docker setup:

In our docker-compose.yml, MongoDB runs as a named service with a persistent volume so data survives container restarts:

services:
  mongo:
    image: mongo:6.0
    ports:
      - "27017:27017"
    volumes:
      - mongo_data:/data/db
    environment:
      MONGO_INITDB_DATABASE: tradeval
    healthcheck:
      test: ["CMD", "mongosh", "--eval", "db.adminCommand('ping')"]
      interval: 10s
      retries: 5

volumes:
  mongo_data:

The Django backend uses depends_on: mongo: condition: service_healthy so it only starts after MongoDB is confirmed ready — avoiding the race condition where Django tries to write a result before Mongo is up.

Key insights from building this way

Document flexibility 1 Collection handles multiple analysis types without schema changes
Nested data support 2+ Arrays of news articles stored naturally, no join tables needed
Development speed 0 Schema migrations — adding fields is just writing them
Query patterns type + symbol + timestamp Fast filtering by analysis type, stock, or recency

Live demonstration:

The following shows TradeEval in action — from analyzing a stock's behavior through news sentiment to storing the complete analysis in MongoDB and retrieving it for later review.

What made MongoDB the right choice:

MongoDB was the right choice for TradeEval for three specific reasons. First, our result documents have variable shapes — a backtest has different fields from a risk prediction — and the document model handles that naturally. Second, nested arrays of news articles with per-article sentiment scores are a natural fit for a document store and would require a separate join table in SQL. Third, the combination of pymongo, lazy connection, and environment-variable-driven URI made it trivial to run identically in local development and inside Docker with zero code changes.

The lazy connection pattern in particular saved us from a common frustration: having Django crash at startup just because MongoDB wasn't ready yet. By deferring the connection until the first actual database operation, the web server can start even if the database container is still spinning up — a small trick that made local development much smoother.

One more thing: We keep all results in a single collection with a type discriminator. This avoids collection sprawl while still letting us query efficiently. For example, db.results.find({"type": "risk_prediction", "result.risk_label": "High"}) finds all high-risk predictions in one go.

What's next:
TradeEval continues to evolve. The MongoDB-backed persistence layer already handles our core analysis storage, and we're looking at adding two things: first, a lightweight caching layer for expensive backtests using MongoDB's TTL indexes; second, aggregated views that pre-compute weekly summary documents for faster dashboard loading. Both fit naturally into the document model without needing to re-architect the database layer.

TradeEval is a project built with Django, MongoDB, yfinance, TextBlob, and scikit-learn. The full source is available on GitHub.

How we used MongoDB as the persistence layer in TradeEval — a real-time stock behavior analysis platform

Reading List