I needed to add vocal removal to an app last week without shipping a 300 MB Demucs model with the binary. The shortest path I found was StemSplit's API — three endpoints, one model (HTDemucs), and a free tier big enough to prototype on.
This is the working tutorial I wish I'd had. Single file, batch, webhooks, and a Flask wrapper at the end. All copy-paste runnable.
What You'll Learn
- ✅ How to remove vocals from any audio file in 5 lines of Python
- ✅ How the async job pattern works (upload → poll → download)
- ✅ Batch processing with bounded concurrency
- ✅ Webhook callbacks instead of polling
- ✅ Wrapping the API as a Flask backend for your own frontend
- ✅ Cost math, rate limits, and the production gotchas
Prerequisites
pip install requests python-dotenv tenacity flask
You'll need a free API key from the AI vocal remover dashboard — sign up gives you 10 minutes of processing, no card. Drop it in .env:
STEMSPLIT_API_KEY=your_key_here
# config.py
import os
from dotenv import load_dotenv
load_dotenv()
API_KEY = os.environ["STEMSPLIT_API_KEY"]
API_BASE = "https://api.stemsplit.io/v1"
HEADERS = {"Authorization": f"Bearer {API_KEY}"}
The 5-Line Version
If you just want to see it work before committing to anything:
import requests, time
job = requests.post(
"https://api.stemsplit.io/v1/separate",
headers=HEADERS,
files={"audio": open("song.mp3", "rb")},
json={"stems": 2, "format": "wav"},
).json()
status = {"status": "processing"}
while status["status"] not in ("completed", "failed"):
time.sleep(3)
status = requests.get(f"https://api.stemsplit.io/v1/jobs/{job['job_id']}", headers=HEADERS).json()
open("instrumental.wav", "wb").write(requests.get(status["stems"]["instrumental"]).content)
That's the entire flow. Upload, poll, download. The rest of this article is making that production-grade.
How the API Works
Three endpoints. That's the whole surface area:
| Method | Path | Purpose |
|---|---|---|
POST |
/v1/separate |
Upload audio + start a job |
GET |
/v1/jobs/{id} |
Poll job status |
POST |
/v1/webhooks |
Register a callback URL (skip the polling) |
The stems field on the upload controls what you get back:
stems |
Output |
|---|---|
2 |
vocals + instrumental (vocal removal) |
4 |
vocals + drums + bass + other
|
6 |
adds guitar + piano
|
For vocal removal you want stems: 2. The instrumental is the "everything except vocals" file.
📝 BPM and musical key come back in every job response at no extra cost. Useful if you're piping into a DJ tool, practice app, or recommender.
Single File: The Production Version
Same flow as the 5-liner, but with timeouts, retries on flaky network calls, exponential backoff for polling, and proper file streaming for large files.
import requests
import time
from pathlib import Path
from tenacity import retry, stop_after_attempt, wait_exponential
from config import API_BASE, HEADERS
@retry(stop=stop_after_attempt(3), wait=wait_exponential(min=2, max=20))
def _post_with_retry(url: str, **kwargs):
resp = requests.post(url, headers=HEADERS, timeout=60, **kwargs)
resp.raise_for_status()
return resp
def remove_vocals(
audio_path: str,
output_dir: str = "output",
poll_interval: float = 3.0,
timeout: float = 300.0,
) -> dict:
"""
Remove vocals from a single audio file using the StemSplit API.
Args:
audio_path: Path to input file. MP3, WAV, FLAC, M4A, OGG, WEBM up to 100 MB.
output_dir: Where to write the downloaded stems.
poll_interval: Seconds between status checks.
timeout: Maximum seconds to wait before giving up.
Returns:
Dict with 'vocals' and 'instrumental' file paths, plus 'bpm' and 'key'.
"""
Path(output_dir).mkdir(parents=True, exist_ok=True)
with open(audio_path, "rb") as f:
job = _post_with_retry(
f"{API_BASE}/separate",
files={"audio": (Path(audio_path).name, f)},
data={"stems": "2", "format": "wav"},
).json()
job_id = job["job_id"]
deadline = time.time() + timeout
while time.time() < deadline:
status = requests.get(f"{API_BASE}/jobs/{job_id}", headers=HEADERS, timeout=30).json()
if status["status"] == "completed":
break
if status["status"] == "failed":
raise RuntimeError(f"Job {job_id} failed: {status.get('error')}")
time.sleep(poll_interval)
else:
raise TimeoutError(f"Job {job_id} did not complete within {timeout}s")
out = {}
for stem, url in status["stems"].items():
path = Path(output_dir) / f"{Path(audio_path).stem}_{stem}.wav"
with requests.get(url, stream=True, timeout=120) as r:
r.raise_for_status()
with open(path, "wb") as f:
for chunk in r.iter_content(chunk_size=8192):
f.write(chunk)
out[stem] = str(path)
out["bpm"] = status.get("bpm")
out["key"] = status.get("key")
return out
Usage:
result = remove_vocals("song.mp3")
print(result)
# {
# 'vocals': 'output/song_vocals.wav',
# 'instrumental': 'output/song_instrumental.wav',
# 'bpm': 124.0,
# 'key': 'C# minor'
# }
Notes on what this gives you that the 5-liner doesn't:
-
Streaming download. Stems can be 30–80 MB each. Don't
.contentthem into memory if you're processing many files. -
Bounded wait. A
timeoutparameter means a stuck job can't hang your worker forever. -
Retried uploads.
tenacitywraps the upload in three attempts with exponential backoff — survives transient network blips.
Batch Processing
The naive batch loop is slow because most of the wall-clock time is the model running on the server. You want to upload several files, then poll them all in parallel.
from concurrent.futures import ThreadPoolExecutor, as_completed
from pathlib import Path
import glob
def remove_vocals_batch(
input_dir: str,
output_dir: str = "output",
max_concurrent: int = 3,
) -> list[dict]:
"""
Remove vocals from every audio file in a directory.
Keep max_concurrent low — the free tier rate-limits aggressively.
Bumped to 5–10 once you're on a paid plan.
"""
files = []
for ext in ("mp3", "wav", "flac", "m4a", "ogg", "webm"):
files.extend(glob.glob(f"{input_dir}/*.{ext}"))
print(f"Found {len(files)} files to process")
results = []
with ThreadPoolExecutor(max_workers=max_concurrent) as ex:
futures = {ex.submit(remove_vocals, f, output_dir): f for f in files}
for future in as_completed(futures):
src = futures[future]
try:
result = future.result()
results.append({"input": src, **result, "status": "ok"})
print(f"✅ {Path(src).name} ({result.get('bpm', '?')} BPM)")
except Exception as e:
results.append({"input": src, "status": "error", "error": str(e)})
print(f"❌ {Path(src).name}: {e}")
ok = sum(1 for r in results if r["status"] == "ok")
print(f"\nDone. {ok}/{len(files)} succeeded.")
return results
# Usage
results = remove_vocals_batch("./music")
A 50-file batch on the free tier with max_concurrent=3 finishes in around 12–15 minutes wall-clock for ~4-minute songs. Most of that is wait time on the GPU queue, not network.
⚠️ Set
max_concurrentto your tier's allowed parallel jobs. The free tier rate-limits at 3 concurrent. You'll get HTTP 429 above that, which will retry-storm if you don't catch it.
Skip the Polling: Webhooks
Polling is fine for scripts. For a backend, you want webhooks so you're not burning HTTP requests waiting around.
Register a callback URL when you create the job:
def remove_vocals_async(audio_path: str, callback_url: str) -> str:
"""
Start a vocal removal job. The API will POST to callback_url when done.
Returns the job_id immediately.
"""
with open(audio_path, "rb") as f:
resp = requests.post(
f"{API_BASE}/separate",
headers=HEADERS,
files={"audio": (Path(audio_path).name, f)},
data={
"stems": "2",
"format": "wav",
"webhook_url": callback_url,
},
timeout=60,
)
resp.raise_for_status()
return resp.json()["job_id"]
The callback payload looks like:
{
"job_id": "job_a8f2c1",
"status": "completed",
"stems": {
"vocals": "https://stems.stemsplit.io/...",
"instrumental": "https://stems.stemsplit.io/..."
},
"bpm": 124.0,
"key": "C# minor",
"duration_seconds": 218.5
}
Receive it in Flask:
from flask import Flask, request, jsonify
import requests
app = Flask(__name__)
@app.route("/webhook/stemsplit", methods=["POST"])
def stemsplit_webhook():
payload = request.get_json()
if payload["status"] != "completed":
# log + alert; payload['error'] has the reason
return "", 204
job_id = payload["job_id"]
instrumental_url = payload["stems"]["instrumental"]
# Download in a background task — don't block the webhook response
queue_download.delay(job_id, instrumental_url)
return "", 200
Two rules for webhooks that bite people:
- Always return 2xx fast. The webhook caller won't wait for you to download the stem. Queue the download, return, then process out-of-band.
-
Verify the payload. Set a webhook secret in your dashboard and check the
X-StemSplit-Signatureheader. The format matches Stripe-style HMAC-SHA256 over the raw body.
Wrapping It as Your Own API
If you're building a frontend for vocal removal, you usually don't want browsers calling the StemSplit API directly — your key would leak. Wrap it.
from flask import Flask, request, jsonify, send_file
import io
import requests
from pathlib import Path
app = Flask(__name__)
@app.route("/api/remove-vocals", methods=["POST"])
def remove_vocals_endpoint():
"""
POST a multipart audio file. Returns the instrumental as the response body.
"""
if "audio" not in request.files:
return jsonify({"error": "missing 'audio' file"}), 400
upload = request.files["audio"]
if upload.content_length and upload.content_length > 100 * 1024 * 1024:
return jsonify({"error": "file too large (max 100 MB)"}), 413
job = requests.post(
f"{API_BASE}/separate",
headers=HEADERS,
files={"audio": (upload.filename, upload.stream)},
data={"stems": "2", "format": "wav"},
timeout=60,
).json()
job_id = job["job_id"]
while True:
status = requests.get(f"{API_BASE}/jobs/{job_id}", headers=HEADERS, timeout=30).json()
if status["status"] == "completed":
break
if status["status"] == "failed":
return jsonify({"error": status.get("error", "job failed")}), 500
instrumental_url = status["stems"]["instrumental"]
audio_bytes = requests.get(instrumental_url, timeout=120).content
return send_file(
io.BytesIO(audio_bytes),
mimetype="audio/wav",
as_attachment=True,
download_name=f"{Path(upload.filename).stem}_instrumental.wav",
)
if __name__ == "__main__":
app.run(port=8000)
For real production, swap the inline polling for a Celery task and webhook the result back to the user via WebSocket or SSE. The pattern in the Stem Splitter API with FastAPI and Celery article drops in cleanly here — same architecture, different upstream.
Cost Math
The pricing is $0.10 per minute of input audio. So a 4-minute song costs $0.40 to process. The 10-minute free tier on signup works out to ~2–3 average tracks, enough to wire up the integration before you commit a card.
A few back-of-envelope numbers I ran for the side project:
| Workload | Math | Monthly |
|---|---|---|
| Personal app, 50 songs/month | 50 × 4 min × $0.10 | $20 |
| Side project, 1,000 songs/month | 1000 × 4 × $0.10 | $400 |
| Production batch, 10,000 songs/month | 10000 × 4 × $0.10 | $4,000 |
Above ~5,000 songs/month, running Demucs yourself on a $0.40/hr GPU starts to make sense if you have someone willing to babysit the queue. Below that, the API is the cheaper option because you're not paying for idle GPU time.
Credits don't expire, so buying $50 of credits and burning through them slowly is fine — useful for hobby projects.
Common Issues
"Job stuck on processing for >2 minutes"
A 4-minute track typically completes in 40–60 seconds. If it's been over two minutes, the file is probably long (10+ min mixes are slower) or the queue is backed up. The API has a hard 5-minute SLA per job; my polling code times out at 300 s for a reason.
"HTTP 429 on every other request"
You're hitting the concurrency cap. Drop max_concurrent to 3 and add a backoff:
from tenacity import retry, retry_if_exception_type, wait_exponential
@retry(retry=retry_if_exception_type(requests.HTTPError), wait=wait_exponential(min=2, max=60))
def safe_post(*args, **kwargs):
r = requests.post(*args, **kwargs)
r.raise_for_status()
return r
"The instrumental still has a faint vocal in the chorus"
Heavily layered choruses are the hard case for any AI vocal remover. Two things help:
- Try
stems: 6instead of2and re-mix without the vocal stem. Backing harmonies sometimes get bucketed intootherorguitarwhen the model isn't sure they're lead vocal. - Convert your input to WAV first if it's an MP3 below 192 kbps. Lossy compression strips frequency detail the model relies on.
"Webhook never fires"
Three things to check, in order:
- Is your webhook URL publicly reachable? (Use a tunnel like
cloudflaredfor local dev.) - Are you returning 2xx within 10 seconds? Slow handlers get marked failed.
- Is the URL on HTTPS? The API won't POST to plain HTTP.
"Big files time out on upload"
Default timeout=60 for requests.post covers maybe a 50 MB upload on a decent connection. For 100 MB files, bump to 300:
requests.post(..., timeout=300)
Or stream the upload from disk so you're not loading the whole file into memory first (which the example code already does via open(path, "rb")).
Summary
| Use Case | Pattern |
|---|---|
| Quick script to remove vocals from one file | The 5-line version above |
| CLI tool for a folder of files |
remove_vocals_batch with bounded concurrency |
| Backend for a vocal-removal frontend | Flask wrapper + webhook callbacks |
| Production pipeline | Celery + webhooks + retry-with-backoff |
| 5,000+ songs/month | Reconsider local Demucs on your own GPU |
The whole AI vocal remover workflow boils down to three HTTP calls. The interesting part is what you build around them — error handling, batching, and how you stream results back to the user without blocking your workers.
Related Articles
This article was originally published by DEV Community and written by StemSplit.
Read original article on DEV Community