Technology Apr 17, 2026 · 7 min read

How Voice Data Travels: With Internet vs Without Internet πŸ“žπŸŒ

A developer's deep dive into what actually happens when you make a phone call So you're building a voice call feature in your app. You pick up a library, maybe WebRTC or a third-party SDK, and things just... work. But then a question hits you mid-implementation: "Wait β€” how is voice data actual...

DE
DEV Community
by Munna Thakur
How Voice Data Travels: With Internet vs Without Internet πŸ“žπŸŒ


A developer's deep dive into what actually happens when you make a phone call

So you're building a voice call feature in your app. You pick up a library, maybe WebRTC or a third-party SDK, and things just... work. But then a question hits you mid-implementation:

"Wait β€” how is voice data actually being sent? And how is this different from a regular phone call?"

That exact thought led me down a rabbit hole. This article breaks it all down β€” in plain English, with real technical depth underneath.

The Big Picture First

When you speak into a phone, your voice is just air vibrations (analog signal). Before it can travel anywhere β€” through towers or internet β€” it must be converted into digital data. Both call types do this. The difference is how that data travels afterward.

Your Voice (Analog)
      ↓
  Digitize + Compress
      ↓
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚ No Internetβ”‚         β”‚  With Internet  β”‚
  β”‚ GSM/VoLTE β”‚         β”‚  WebRTC/VoIP    β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Part 1: Normal Phone Calls (No Internet) πŸ“ž

What's happening under the hood?

A regular phone call uses your telecom operator's infrastructure β€” towers, cables, switching centers β€” completely independent of the internet.

Step-by-Step Flow

You speak 🎀
    ↓
Microphone captures analog audio
    ↓
ADC (Analog-to-Digital Converter) β†’ digital signal
    ↓
Codec compresses it (AMR / AMR-WB / EVS)
    ↓
Sent to nearest Cell Tower πŸ“‘
    ↓
Telecom Core Network (routes the call)
    ↓
Receiver's Cell Tower πŸ“‘
    ↓
Receiver's phone decodes β†’ plays audio πŸ”Š

The Codec: AMR (Adaptive Multi-Rate)

This is the compression algorithm used in traditional calls. It's smart β€” it adapts the bitrate based on network conditions.

AMR Mode Bitrate Quality
AMR 4.75 4.75 kbps Low (weak signal)
AMR 12.2 12.2 kbps High (strong signal)
AMR-WB (HD Voice) 23.85 kbps HD quality

What does the data look like?

Under the hood, voice is not sent as one big audio file. It's split into tiny chunks β€” each chunk represents about 20 milliseconds of audio.

[20ms chunk] β†’ [20ms chunk] β†’ [20ms chunk] β†’ [20ms chunk] β†’ ...
    #1               #2               #3               #4

Each frame looks something like this conceptually:

{
  "type": "voice_frame",
  "codec": "AMR",
  "sequence": 101,
  "timestamp": 2003400,
  "payload": "<compressed binary audio bytes>"
}

⚠️ In reality it's binary, not JSON β€” but this structure represents what's inside each packet.

Circuit Switching vs VoLTE

Old GSM (2G/3G) β†’ Circuit Switching

  • A dedicated "pipe" is reserved just for your call
  • Like booking a private road β€” no one else uses it during your call
  • Very stable, but inefficient (resources wasted during silence)

VoLTE (4G/5G) β†’ Packet Switching (but controlled)

  • Voice is broken into packets like internet data
  • But the network gives it priority (QoS β€” Quality of Service)
  • Lower latency, HD quality, still uses telecom infrastructure

Part 2: Internet Calls (WhatsApp, WebRTC) 🌐

What's happening under the hood?

Apps like WhatsApp, Google Meet, and Discord use the internet to carry voice. The key technology here is WebRTC (Web Real-Time Communication) β€” an open standard built into browsers and mobile OSes.

Step-by-Step Flow

You speak 🎀
    ↓
Microphone captures analog audio
    ↓
ADC β†’ digital signal
    ↓
Opus Codec compresses it
    ↓
Packetized into UDP packets
    ↓
Sent via Internet (WiFi / 4G / 5G)
    ↓
STUN/TURN Server (for NAT traversal)
    ↓
Peer-to-Peer connection (WebRTC)
    ↓
Receiver reassembles packets β†’ decodes β†’ plays audio πŸ”Š

The Codec: Opus

Opus is the go-to codec for internet voice/audio. It's open-source, low-latency, and adaptive.

Feature Opus
Bitrate range 6 kbps – 510 kbps
Latency ~20ms
Handles packet loss? βœ… Yes (built-in FEC)
Quality at low bitrate Excellent
Used by WhatsApp, Discord, Zoom, WebRTC

Opus has Forward Error Correction (FEC) built in β€” meaning it sends redundant data so if a packet is lost, it can still reconstruct the audio. That's why internet calls still sound okay even with minor packet loss.

Why UDP and not TCP?

This is one of the most important decisions in real-time audio.

TCP (used in HTTP, file downloads):

  • Guarantees delivery β€” if a packet is lost, it resends it
  • Problem: Resending takes time β†’ delay β†’ unacceptable in real-time voice

UDP (used in WebRTC voice):

  • No guarantee of delivery
  • No resending lost packets
  • But it's fast β€” packets go out and don't wait

In voice calls, a 200ms old audio packet is useless anyway. Better to skip it and keep playing forward than wait for a retry.

TCP mindset: "Wait, I need packet #47 before I continue"  ❌ (for voice)
UDP mindset: "Packet #47 is gone? Fine, move on."        βœ… (for voice)

How WebRTC Establishes Connection (Simplified)

  1. Signaling β€” Both peers exchange metadata (IP, codec support) via a server
  2. ICE (Interactive Connectivity Establishment) β€” Finding the best network path
  3. STUN Server β€” Figures out your public IP (you're usually behind a router/NAT)
  4. TURN Server β€” Relays traffic if direct P2P fails (firewall situations)
  5. DTLS Handshake β€” Encrypted connection established
  6. SRTP β€” Voice packets flow securely, peer-to-peer
Caller                  Signaling Server               Receiver
  |                           |                            |
  |----offer (SDP)----------->|                            |
  |                           |-------offer (SDP)--------->|
  |                           |<------answer (SDP)---------|
  |<---answer (SDP)-----------|                            |
  |                           |                            |
  |<==================ICE Candidates exchanged============>|
  |                                                        |
  |<================P2P Voice (SRTP/UDP)==================>|

What does the data look like?

{
  "type": "audio_packet",
  "codec": "opus",
  "ssrc": 3892741023,
  "sequence": 4821,
  "timestamp": 96000,
  "payload": "<opus encoded binary>"
}

This is an RTP (Real-time Transport Protocol) packet. WebRTC wraps it in SRTP (Secure RTP) for encryption.

Part 3: Side-by-Side Comparison

Feature Normal Call πŸ“ž Internet Call 🌐
Network Telecom (Jio, Airtel) Internet (WiFi / Mobile data)
Protocol GSM / VoLTE WebRTC (RTP over UDP)
Codec AMR / AMR-WB / EVS Opus
Latency ~100–150ms ~150–300ms (network-dependent)
Data path Operator controlled Peer-to-peer (mostly)
Delivery Guaranteed (circuit/priority) Best-effort (UDP)
Encryption Limited (operator can see) E2E Encrypted (DTLS + SRTP)
Packet loss handling Network-level QoS Opus FEC + NACK
Works without data? βœ… Yes ❌ No
Cost Per minute or bundled Uses ~0.3–0.5 MB/min
Emergency calls βœ… Works ❌ Cannot call 112/911

Part 4: Why Voice Sometimes Breaks on Internet Calls πŸ€–

Ever heard someone sound like a robot during a WhatsApp call? Here's exactly why:

1. Packet Loss

Some UDP packets don't arrive. If too many are lost in a row, the audio decoder has gaps β†’ robotic or stuttering sound.

2. Jitter

Packets arrive out of order or unevenly spaced. WebRTC uses a jitter buffer to smooth this out β€” but if jitter is too high, the buffer overflows or the audio gets chopped.

Sent:     [P1]--[P2]--[P3]--[P4]--[P5]
Received: [P1]------[P3][P2]----[P5]  ← P4 lost, P2 P3 swapped

3. Network Handoff

When you're moving (driving, walking), your phone switches between towers or WiFi ↔ 4G. During handoff, packets drop β†’ brief audio glitch.

4. Congestion

Your internet is shared. If someone starts a big download in parallel, your voice packets compete for bandwidth β†’ delay spikes.

Part 5: As a Developer β€” What Should You Know?

If you're building a voice feature, here are the key decisions:

Choosing your approach

Use WebRTC if:

  • Building for web/mobile app
  • Need P2P, low cost at scale
  • Want E2E encryption
  • Don't need emergency call support

Use VoIP / SIP if:

  • Need PSTN (real phone number) integration
  • Need to call regular phones
  • Enterprise telephony

Use a managed SDK if:

  • Fast shipping matters
  • Examples: Twilio, Agora, Daily.co, Vonage

Key WebRTC APIs to know

// Get user's microphone
const stream = await navigator.mediaDevices.getUserMedia({ audio: true });

// Create peer connection
const pc = new RTCPeerConnection({
  iceServers: [{ urls: 'stun:stun.l.google.com:19302' }]
});

// Add audio track to connection
stream.getTracks().forEach(track => pc.addTrack(track, stream));

// Create and send offer
const offer = await pc.createOffer();
await pc.setLocalDescription(offer);
// β†’ Send offer to other peer via your signaling server

// When you receive their answer:
await pc.setRemoteDescription(new RTCSessionDescription(answer));

Monitor call quality in real time

// Get audio stats
const stats = await pc.getStats();
stats.forEach(report => {
  if (report.type === 'inbound-rtp' && report.kind === 'audio') {
    console.log('Packets lost:', report.packetsLost);
    console.log('Jitter:', report.jitter);
    console.log('Round trip time:', report.roundTripTime);
  }
});

Quick Summary

Both call types:
  Voice β†’ Digitize β†’ Compress β†’ Send in 20ms chunks β†’ Decode β†’ Play

Without Internet (Normal Call):
  Codec: AMR | Path: Telecom towers | Protocol: GSM/VoLTE | Stable + Guaranteed

With Internet (WhatsApp/WebRTC):
  Codec: Opus | Path: Internet P2P | Protocol: RTP over UDP | Flexible + Encrypted

The biggest conceptual difference:

  • Normal call = a dedicated pipe reserved just for you (like booking a private road)
  • Internet call = many small packets racing through shared roads, reassembled on arrival

Further Reading

If this helped you understand what's actually happening under the hood when you make a call, drop a ❀️. And if you're building something with WebRTC, feel free to ask questions in the comments!

Tags: #webrtc #voip #networking #javascript #webdev #beginners

DE
Source

This article was originally published by DEV Community and written by Munna Thakur.

Read original article on DEV Community
Back to Discover

Reading List