Technology Apr 21, 2026 · 8 min read

How I Ship systemd Logs to CloudWatch for $0 (Django + Celery on EC2)

Running Django and Celery as systemd services on EC2 and tired of SSH-ing in to debug? Here's the exact setup I used to ship logs to CloudWatch Logs for free, without touching a single production service. Real commands, real configs, real gotchas included. I was SSH-ing into production to debug. Ev...

DE
DEV Community
by Hemanth
How I Ship systemd Logs to CloudWatch for $0 (Django + Celery on EC2)

Running Django and Celery as systemd services on EC2 and tired of SSH-ing in to debug? Here's the exact setup I used to ship logs to CloudWatch Logs for free, without touching a single production service. Real commands, real configs, real gotchas included.

I was SSH-ing into production to debug. Every. Single. Time.

journalctl -u gunicorn.service -f was my monitoring stack. It worked until I needed to see what happened three hours ago on a Celery task that silently failed. SSH in, scroll up, hope journald hadn't rotated it yet.

I needed logs off the server, queryable, and retained without paying $40/month for a logging SaaS I'd barely use. So I set up CloudWatch Logs with the Amazon CloudWatch Agent. Total monthly cost: $0.

This is exactly how I did it.

What the CloudWatch Agent Actually Does

Before installing anything, understand the mechanism.

The agent is not a real-time forwarder. It wakes up every 5 seconds, reads new bytes from your log file since the last read using a stored file offset, compresses them, and ships one HTTPS batch to CloudWatch.

Your application has zero awareness the agent exists. It reads from outside your process, like tail -f does. Django doesn't know. Daphne doesn't know. Celery doesn't know.

Resource profile on a production EC2: roughly 0.1% CPU at steady state with a tiny spike every 5 seconds on flush, 35-60 MB RAM fixed regardless of log volume, and a few KB of compressed network per flush. Not worth worrying about.

Logs show up in CloudWatch within 5-15 seconds of being written. Not real-time, near-real-time. Good enough for debugging a failed task after the fact.

The Cost Reality

CloudWatch Logs free tier per account per month: 5 GB ingestion, 5 GB storage.

A typical Django/Celery setup at INFO log level generates roughly 80-100 MB/month. With 7-day retention, only about 20 MB sits in storage at any given moment. The free tier covers this comfortably.

The only thing that'll push you over is leaving Django at DEBUG level in production. That multiplies log volume 10-50x instantly. Keep Django at INFO, Celery at ERROR. You won't see a bill.

Why Not Point the Agent Directly at journald

Both my services, Daphne and Celery, log through journald. The obvious config would be a journald collector block in the agent JSON:

"journald": {
  "collect_list": [
    { "units": ["gunicorn.service"], "log_group_name": "/prod/daphne" }
  ]
}

I tried this first. The agent threw:

E! Invalid Json input schema.
Under path : /logs/logs_collected | Error : Additional property journald is not allowed

Agent version 1.300064 doesn't support the journald collector in its config schema. The docs don't make this obvious upfront.

So before starting the setup, understand the actual flow you're building:

Flowchart showing how systemd service logs flow from journald through a piping service and log file on disk to the CloudWatch Agent and finally into AWS CloudWatch Logs on an EC2 instance<br>

Two hops instead of one, but completely non-destructive. You don't touch your production services at all during setup. journald still captures everything in parallel, so you keep local history and get CloudWatch shipping simultaneously.

Step 1: IAM Role on the EC2

The agent needs AWS credentials to call the CloudWatch API. Attach an IAM role to the EC2 instance directly — not access keys in a config file.

Create a role in IAM, trusted entity is EC2, attach the CloudWatchAgentServerPolicy managed policy. Then go to your EC2 instance, Actions, Security, Modify IAM role, attach it.

Verify it worked from the server:

curl -s http://169.254.169.254/latest/meta-data/iam/info | python3 -m json.tool

You should see your InstanceProfileArn in the output. That IP 169.254.169.254 is the EC2 Instance Metadata Service — a link-local address only reachable from inside the instance itself. The agent hits this on startup to get temporary credentials automatically. No keys, no secrets files sitting on disk.

Step 2: Install the Agent

wget https://s3.amazonaws.com/amazoncloudwatch-agent/ubuntu/amd64/latest/amazon-cloudwatch-agent.deb
sudo dpkg -i amazon-cloudwatch-agent.deb

Step 3: Create the Log Files

sudo mkdir -p /var/log/app
sudo touch /var/log/app/daphne.log /var/log/app/celery.log
sudo chown ubuntu:www-data /var/log/app/daphne.log /var/log/app/celery.log
sudo chmod 644 /var/log/app/daphne.log /var/log/app/celery.log

Step 4: Write the Agent Config

sudo tee /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json > /dev/null << 'EOF'
{
  "agent": {
    "run_as_user": "cwagent"
  },
  "logs": {
    "logs_collected": {
      "files": {
        "collect_list": [
          {
            "file_path": "/var/log/app/daphne.log",
            "log_group_name": "/prod/daphne",
            "log_stream_name": "{instance_id}",
            "retention_in_days": 7
          },
          {
            "file_path": "/var/log/app/celery.log",
            "log_group_name": "/prod/celery",
            "log_stream_name": "{instance_id}",
            "retention_in_days": 7
          },
          {
            "file_path": "/var/log/nginx/access.log",
            "log_group_name": "/prod/nginx-access",
            "log_stream_name": "{instance_id}",
            "retention_in_days": 7
          },
          {
            "file_path": "/var/log/nginx/error.log",
            "log_group_name": "/prod/nginx-error",
            "log_stream_name": "{instance_id}",
            "retention_in_days": 7
          }
        ]
      }
    },
    "force_flush_interval": 5
  }
}
EOF

retention_in_days set here means the agent creates the log group with that retention automatically. No manual console clicks needed.

Step 5: Create the Piping Services

One small systemd service per application. Each one runs journalctl -f for that unit and appends output to the log file the agent reads.

For Daphne:

sudo tee /etc/systemd/system/daphne-log-pipe.service > /dev/null << 'EOF'
[Unit]
Description=Pipe gunicorn journald logs to file
After=gunicorn.service
BindsTo=gunicorn.service

[Service]
ExecStart=/bin/bash -c "journalctl -u gunicorn.service -f --no-pager -o short-iso >> /var/log/app/daphne.log"
Restart=always
RestartSec=3
User=ubuntu

[Install]
WantedBy=multi-user.target
EOF

For Celery:

sudo tee /etc/systemd/system/celery-log-pipe.service > /dev/null << 'EOF'
[Unit]
Description=Pipe celery journald logs to file
After=celery.service
BindsTo=celery.service

[Service]
ExecStart=/bin/bash -c "journalctl -u celery.service -f --no-pager -o short-iso >> /var/log/app/celery.log"
Restart=always
RestartSec=3
User=ubuntu

[Install]
WantedBy=multi-user.target
EOF

BindsTo=gunicorn.service means if the main service stops, the piping service stops with it. Clean dependency, no orphaned processes.

Enable and start:

sudo systemctl daemon-reload
sudo systemctl enable daphne-log-pipe.service celery-log-pipe.service
sudo systemctl start daphne-log-pipe.service celery-log-pipe.service

Verify lines are flowing into the files:

sleep 10 && tail -5 /var/log/app/daphne.log && tail -5 /var/log/app/celery.log

You should see actual request lines from Daphne and task output from Celery. If the files are empty after 10 seconds, check systemctl status daphne-log-pipe.service.

Step 6: Start the Agent

sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl \
  -a fetch-config \
  -m ec2 \
  -c file:/opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json \
  -s

Check status:

sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a status

Expected output:

{
  "status": "running",
  "starttime": "2026-04-13T05:39:11+00:00",
  "configstatus": "configured",
  "version": "1.300064.1b1344"
}

Step 7: Verify Logs Are in CloudWatch

From AWS CloudShell:

aws logs filter-log-events \
  --log-group-name /prod/daphne \
  --region us-west-2 \
  --limit 5 \
  --output table \
  --query 'events[*].message'

Your actual API request lines should appear. If the console shows storedBytes: 0, wait 2-3 minutes and refresh. The agent flushes every 5 seconds but CloudWatch takes a moment to index.

Where Logs Live Now

Three places simultaneously, independently:

Location Retention Purpose
journald on EC2 capped (set via journald.conf) fast local debugging
/var/log/app/*.log on EC2 logrotate weekly agent reads from here
CloudWatch Logs 7 days off-server, queryable

The agent is a photocopier. It never deletes, never moves, never touches the original log. Stop the agent tomorrow and your EC2 logs are completely unaffected. If your EC2 dies, CloudWatch still has the last 7 days.

Cleaning This Up Later

The piping services are temporary scaffolding. Once you've confirmed everything is stable over a few days, do the cleaner version during a low-traffic window.

Add these lines directly to your gunicorn and celery service files under [Service]:

StandardOutput=append:/var/log/app/daphne.log
StandardError=append:/var/log/app/daphne.log

Reload and restart both services, verify logs still flow, then remove the piping services:

sudo systemctl stop daphne-log-pipe.service celery-log-pipe.service
sudo systemctl disable daphne-log-pipe.service celery-log-pipe.service
sudo rm /etc/systemd/system/daphne-log-pipe.service
sudo rm /etc/systemd/system/celery-log-pipe.service
sudo systemctl daemon-reload

Direct write, one less moving part. The reason I didn't do this on day one is that it requires a production service restart. Validate the full pipeline first, clean up after.

What You Can Build on Top

With logs in CloudWatch, the real value starts.

Metric filters let you turn log patterns into metrics. Every line matching " 500 " in your daphne logs increments an HTTP5xxErrors counter. Every ERROR in Celery increments a CeleryTaskFailures counter. From there you can build dashboards, set alarms, get SNS notifications when error rates spike.

Logs Insights is the other useful piece. It's SQL for your log groups:

fields @timestamp, @message
| filter @message like /api\/auth\/login/
| filter @message like "403"
| stats count() by bin(5m)

That query shows login failures per 5-minute window. Took 30 seconds to write, runs in 2 seconds. Try doing that with journalctl.

The piping service pattern isn't the cleanest architecture forever, but it got logs off the server without a single production restart on day one. If you're still SSH-ing into EC2 to read logs, this setup takes about 30 minutes and the free tier covers most production workloads. Do it before you actually need it at 2 AM.

DE
Source

This article was originally published by DEV Community and written by Hemanth.

Read original article on DEV Community
Back to Discover

Reading List