Add "For Developers" guides - Debugging, Encryption and SDP

mickel8 · mickel8 · commit 049cdd192d88 · 2025-07-09T12:43:26.000+02:00
diff --git a/guides/for_developers/fd_debugging.md b/guides/for_developers/fd_debugging.md
@@ -0,0 +1,132 @@
+# WebRTC debugging
+
+**It is also worth taking a look at [debugging](../advanced/debugging.md)**
+
+
+In most cases, when **something** does not work, we try to find the problem according to the following workflow:
+1. Check whether session has been negotiated successfully.
+1. Check whether connection (ICE and DTLS) has been established.
+1. Check whether RTP packets are demuxed, frames assembled and decoded.
+1. Check QoE statistics - freezes, jitter, packet loss, bitrate, fps.
+ 
+
+```mermaid
+flowchart TD
+    S["Types of problems in WebRTC"] --> Session["Session negotiation (number of tracks, codecs)"]
+    S --> Connection["Connection establishment (ICE and DTLS)"]
+    S --> Playout["Playout (demuxing, packetization, decoding)"]
+    S --> QoE["QoE (freezes, low quality, low fps)"]
+```
+
+## Session Negotiation
+
+Here, we just validate that SDP offer/answer looks the way it should.
+In particular:
+1. Check number of audio and video mlines.
+1. Check if any mlines are rejected (either by presence of port 0 in the m="" or a=inactive).
+In most cases port is set to 9 (which means automatic negotiation by ICE) or if ICE is already in progress or this is subsequent negotiation, it might be set to a port currently used by the ICE agent. Port 0 appears when someone stops transceiver via [`stop()`](https://developer.mozilla.org/en-US/docs/Web/API/RTCRtpTransceiver/stop).
+1. Check  mlines directions (a=sendrecv/sendonly/recvonly/inactive)
+1. Check codecs, their profiles and payload types.
+1. Number of mlines between offer and answer cannot change. 
+This means that if one side offer that it is willing to only receive a single audio track,
+everything the other side can do is either confirm it will be sending or decline and say it won't be sending. 
+If the other side also wants to send, additional negotiation has to be performed **in this case**.
+
+SDP offer/answer can be easily checked in chrome://webrtc-internals (in chromium based browsers) or (about:webrtc in FF).  
+
+## Connection establishment
+
+WebRTC connection state (PeerConnection or PC state) is a sum of ICE connection state and DTLS state.
+In particular, PC is in state connected when both ICE and DTLS is in state connected.
+
+The whole flow looks like this:
+1. ICE searches for a pair of local and remote address that can be used to send and receive data.
+1. Once a valid pair is found, ICE changes its state to connected and DTLS handshake is started.
+1. Once DTLS handshake finishes, DTLS changes its state to connected and so the whole PC.
+1. In the meantime, ICE continues checking other pairs of local and remote address in case there is better path. 
+If there is, ICE seamlessly switches to it - the transmission is not stopped or interrupted.
+
+
+More on ICE, its state changes, failures and restarts in section devoted to ICE.
+
+In most cases, DTLS handshake works correctly. Most problems are related to ICE as it's pretty complex protocol.
+
+Debugging ICE:
+
+1. Check ICE candidates grid in chrome://webrtc-internals or about:webrtc
+1. Turn on debug logs in ex_ice or chromium (via command line argument). FF exposes all ICE logs in about:webrtc->Connection Log. 
+Every implementation (ex_ice, chromium, ff) is very verbose.
+You can compare what's happening on both sides.
+1. Try to filter out some of the local network interfaces and remove STUN/TURN servers to reduce complexity of ICE candidate grid, amount of logs and number of connectivity checks.
+In ex_webrtc, this is possible via [configuration options](https://hexdocs.pm/ex_webrtc/0.14.0/ExWebRTC.PeerConnection.Configuration.html#t:options/0).
+1. Use Wireshark. 
+Use filters on src/dst ip/port, udp and stun protocols.
+This way you can analyze whole STUN/ICE/TURN traffic between a single local and remote address.
+
+Debugging DTLS:
+
+This is really rare.
+We used Wireshark or turned on [debug logs in ex_dtls](https://hexdocs.pm/ex_dtls/0.17.0/readme.html#debugging). 
+
+## Playout
+
+If both session negotiation and connection establishment went well, you can observe packets are flowing but nothing is visible in the web browser, the problem might be in RTP packets demuxing, frames assembly or frames decoding on the client side.
+
+1. We heavily rely on chrome://webrtc-internals here. 
+1. Check counters: packetsReceived, framesReceived, framesDecoded, framesDropped.
+1. E.g. if packetsReceived increases but framesReceived does not, it means that there is a problem in assembling video frames from RTP packets. This can happen when:
+    1. web browser is not able to correctly demux incomming RTP streams possibly because sender uses incorrect payload type in RTP packets (different than the one announced in SDP) or does not include MID in RTP headers. 
+    Keep in mind that MID MAY be sent only at the beginning of the transmission to save bandwidth.
+    This is enough to create a mapping between SSRC and MID on the receiver side.
+    1. marker bit in RTP header is incorrectly set by the sender (although dependent on the codec, in case of video, marker bit is typically set when an RTP packet contains the end of a video frame)
+    1. media is incorrectly packed into RTP packet payload because of bugs in RTP payloader
+1. E.g. if packetsReceived increases, framesReceived increases but framesDecoded does not, it probably means errors in decoding process. 
+In this case, framesDropped will probably also increase.
+1. framesDropped may also increase when frames are assembled too late i.e. their playout time has passed.
+1. Check borwser logs. 
+Some of the errors (e.g. decoder errors) might be logged.
+
+## QoE
+
+The hardest thing to debug.
+Mostly because it very often depends on a lot of factors (network condition, hardware, sender capabilities, mobile devices).
+Problems with QoE are hard to reproduce, very often don't occur in local/office environment.
+
+1. We heavily rely on chrome://webrtc-internals here.
+1. Check coutners: nackCount, retransmittedPacketsSent, packetsLost. 
+Retransmissions (RTX) are must have. 
+Without RTX, even 1% of packet loss will have very big impact on QoE.
+1. Check incoming/outgoing bitrate and its stability.
+1. Check jitterBufferDelay/jitterBufferEmittedCount_in_ms - this is avg time each video frame spends in jitter buffer before being emitted for plaout.
+1. JitterBuffer is adjusted dynamically. 
+
+## Debugging in production
+
+1. Dump WebRTC stats via getStats() into db for later analysis.
+1. getStats() can still be called after PC has failed or has been closed.
+1. Continous storage WebRTC stats as time series might be challenging.
+We don't have a lot of experience doing it.
+1. Come up with custom metrics that will allow you to observe the scale of a given problem or monitor how something changes in time.
+1. E.g. if you feel like you very often encounter ICE failures, count them and compare to successful workflows or to the number of complete and successful SDP offer/answer exchanges.
+This way you will see the scale of the problem and you can observer how it changes in time,  after introducing fixes or new features.
+1. It's important to look at numbers instead of specific cases as there will always be someone who needs to refresh the page, restart the connection etc.
+What matters is the ratio of such problems and how it changes in time.
+1. E.g. this is a quote from Sean DuBois working on WebRTC in OpenAI:
+    > We have metrics of how many people post an offer compared to how many people get to connected [state]. It’s never alarmed on a lot of users.
+    
+    Watch the full interview [here](https://www.youtube.com/watch?v=HVsvNGV_gg8) and read the blog [here](https://webrtchacks.com/openai-webrtc-qa-with-sean-dubois/#h).
+1. Collect user feedback (on a scale 1-3/1-5, via emoji) and observe how it changes.
+
+## MOS
+
+Initially, MOS was simply asking people about their feedback on a scale from 1 to 5 and then computing avg.
+Right now, we have algorithms that aim to calculate audio/video quality on the same scale but using WebRTC stats: jitter, bitrate, packet loss, resolution, codecs, freezes, etc.
+An example can be found here: https://github.com/livekit/rtcscore-go
+
+## chrome://webrtc-internals
+
+1. Based on [getStats()](https://developer.mozilla.org/en-US/docs/Web/API/RTCPeerConnection/getStats) API
+1. getStats() does not return derivatives. 
+They depend on the frequency of calls to getStats() and have to be calcualted by a user.
+1. chrome://webrtc-internals can be dumped and then analyzed using: https://fippo.github.io/webrtc-dump-importer/
+
diff --git a/guides/for_developers/fd_encryption.md b/guides/for_developers/fd_encryption.md
@@ -0,0 +1,139 @@
+# WebRTC encryption
+
+In WebRTC, there are two types of data:
+* Media
+* Arbitrary Data
+
+## Media
+
+Media is audio or video. 
+It's sent using RTP - a protocol that adds timestamps, sequence numbers and other information
+that UDP lacks but is needed for retransmissions (RTX), correct audio/video demultiplexing, sync and playout, and so on.
+
+## Arbitrary data
+
+By arbitrary data we mean anything that is not audio or video.
+This can be chat messages, signalling in game dev, files, etc.
+Arbitrary data is sent using SCTP - a transport protocol like UDP/TCP but with a lot of custom features.
+In the context of WebRTC, two of them are the most important - reliability and transmission order.
+They are configurable and depending on the use-case, we can send data reliably/unreliably and in-order/unordered.
+SCTP has not been successfully implemented in the industry.
+A lot of network devices don't support SCTP datagrams and are optimized for TCP traffic.
+Hence, in WebRTC, SCTP is encapsulated into DTLS and then into UDP.
+Users do not interact with SCTP directly, instead they use abstraction layer built on top of it called Data Channels.
+Data Channels do not add additional fields/header to the SCTP payload.
+
+## Encryption
+
+```mermaid
+flowchart TD
+    subgraph Media - optional
+        M(["Media"]) --> R["RTP/RTCP"]
+    end
+    subgraph ArbitraryData - optional
+        A["Arbitrary Data"] --> SCTP["SCTP"]
+    end
+    R --> S["SRTP/SRTCP"]
+    D["DTLS"] -- keying material --> S 
+    I["ICE"] --> U["UDP"]
+    SCTP --> D
+    S --> I
+    D --> I
+```
+
+1. Media is encapsulated into RTP packets but not into DTLS datagrams.
+1. In the context of media, DTLS is only used to obtain keying material that is used to create SRTP/SRTCP context.
+1. RTP packet **payloads** are encrypted using SRTP.
+1. RTP headers are not encrypted - we can see and analyze them in Wireshark without configuring encryption keys.
+1. DTLS datagrams, among other fields, contain 16-bit sequence number in their headers.
+
+
+## E2E Flow
+
+1. Establish ICE connection
+2. Perform DTLS handshake
+3. Create SRTP/SRTCP context using keying material obtained from DTLS context
+4. Encapsulate media into RTP, encrypt using SRTP, and send using ICE(UDP)
+5. Encapsulate arbitrary data into SCTP, encrypt using DTLS and send using ICE(UDP)
+
+Points 1 and 2 are mandatory, no matter we send media, arbitrary data or both.
+WebRTC communication is **ALWAYS** encrypted.
+
+## TLS/DTLS handshake
+
+See:
+* https://tls12.xargs.org/
+* https://en.wikipedia.org/wiki/Diffie%E2%80%93Hellman_key_exchange#
+* https://webrtcforthecurious.com/docs/04-securing/
+* https://www.ibm.com/docs/en/cloud-paks/z-modernization-stack/2023.4?topic=handshake-tls-12-protocol
+
+
+1. TLS uses asymmetric cryptography but depending on TLS version and cipher suites, it is used for different purposes.
+1. In TLS-RSA, we use server's public key from server's cert to encrypt pre-master secret and send it from a client to the server.
+Then, both sides use client random, server random, and pre-master secret to create master secret. 
+1. In DH-TLS, server's public key from server's cert is not used to encrypt anything.
+Instead, both sides generate priv/pub key pairs and exchange pub keys between each other. 
+Pub key is based on a priv key and both of them are generated per connection.
+They are not related to e.g. server's pub key that's included in server's cert.
+All params are sent unecrypted.
+1. Regardless of the TLS version, server's cert is used to ensure server's identity.
+This cert is signed by Certificate Authority (CA).
+CA computes hash of the certificate and encrypts it using CA's private key.
+The result is known as digest and is included in server's cert.
+Client takes cert digest and verifies it using CA public key.
+1. In standard TLS handshake, server MUST send its certificate to a client but
+client only sends its certificate when explicitly requests by the server.
+1. In DTLS-SRTP in WebRTC, both sides MUST send their certificates.
+1. In DTLS-SRTP in WebRTC, both sides generate self-signed certificates.
+1. Alternatively, certs can be configured when creating a peer connection: https://developer.mozilla.org/en-US/docs/Web/API/RTCPeerConnection/RTCPeerConnection#certificates
+1. Fingerprints of these certs are included in SDP offer/answer and are checked once DTLS-SRTP handshake is completed i.e.
+we take fingerprint from the SDP (which is assumed to be received via secured channel) and check it against fingerprint
+of the cert received during DTLS-SRTP handshake.
+1. The result of DTLS-SRTP handshake is master secret, which is then used to create so called keying material.
+
+## Keying material
+
+See:
+* https://datatracker.ietf.org/doc/html/rfc5705#section-4
+* https://datatracker.ietf.org/doc/html/rfc5764#section-4.2 
+
+Keying material is used to create SRTP encryption keys and is derived from a master secret established during DTLS-SRTP handshake.
+
+```
+keying_material = PRF(master_secret, client_random + server_random + context_value_length + context_value, label)
+```
+
+
+* PRF is defined by TLS/DTLS
+* context_value and context_value_length are optional and are not used in WebRTC
+* label is used to allow for a single master secret to be used for many different purposes. 
+This is because PRF gives the same output for the same input.
+Using exactly the same keying material in different contexts would be insecure.
+In WebRTC this is a string "EXTRACTOR-dtls_srtp"
+* length of keying material is configurable and depends on SRTP profile
+* keying material is divided into four parts as shown below:
+
+    ```mermaid
+    flowchart TD
+        K["KeyingMaterial"] --> CM["ClientMasterKey"]
+        K --> SM["ServertMasterKey"]
+        K --> CS["ClientMasterSalt"]
+        K --> SS["ServerMasterSalt"]
+    ```
+    
+    They are then fed into SRTP KDF (key derivation function), which is another PRF (dependent on SRTP protection profile), which produces actual encryption keys.
+    Client uses ClientMasterKey and ClientMasterSalt while server uses ServerMasterKey and ServerMasterSalt.
+    By client and server we mean DTLS roles i.e. client is the side that inits DTLS handshake. 
+
+### Protection profiles
+
+Some of the protection profiles:
+* AES128_CM_SHA1_80
+* AES128_CM_SHA1_32
+* AEAD_AES_128_GCM
+* AEAD_AES_256_GCM
+
+Meaning:
+* AES128_CM - encryption algorithm (AES in counter mode) with 128-bit long key
+* SHA1_80 - auth function for creating 80-bit long message authentication code (MAC)
+* AEAD_AES_128_GCM - modified AES, both encrypts and authenticates
diff --git a/guides/for_developers/fd_sdp.md b/guides/for_developers/fd_sdp.md
diff --git a/mix.exs b/mix.exs
@@ -85,14 +85,17 @@ defmodule ExWebRTC.MixProject do
 
     deploying_guides = ["bare", "fly"]
 
+    for_developers_guides = ["fd_encryption", "fd_debugging"]
+
     [
       main: "readme",
       logo: "logo.svg",
       extras:
         ["README.md"] ++
           Enum.map(intro_guides, &"guides/introduction/#{&1}.md") ++
           Enum.map(advanced_guides, &"guides/advanced/#{&1}.md") ++
-          Enum.map(deploying_guides, &"guides/deploying/#{&1}.md"),
+          Enum.map(deploying_guides, &"guides/deploying/#{&1}.md") ++
+          Enum.map(for_developers_guides, &"guides/for_developers/#{&1}.md"),
       assets: %{"guides/assets" => "assets"},
       source_ref: "v#{@version}",
       formatters: ["html"],
@@ -101,7 +104,8 @@ defmodule ExWebRTC.MixProject do
       groups_for_extras: [
         Introduction: Path.wildcard("guides/introduction/*.md"),
         Advanced: Path.wildcard("guides/advanced/*.md"),
-        Deploying: Path.wildcard("guides/deploying/*.md")
+        Deploying: Path.wildcard("guides/deploying/*.md"),
+        "For Developers": Path.wildcard("guides/for_developers/*.md")
       ],
       groups_for_modules: [
         MEDIA: ~r"ExWebRTC\.Media\..*",