|
| 1 | +# WebRTC debugging |
| 2 | + |
| 3 | +**It is also worth taking a look at [debugging](../advanced/debugging.md)** |
| 4 | + |
| 5 | + |
| 6 | +In most cases, when **something** does not work, we try to find the problem according to the following workflow: |
| 7 | +1. Check whether session has been negotiated successfully. |
| 8 | +1. Check whether connection (ICE and DTLS) has been established. |
| 9 | +1. Check whether RTP packets are demuxed, frames assembled and decoded. |
| 10 | +1. Check QoE statistics - freezes, jitter, packet loss, bitrate, fps. |
| 11 | + |
| 12 | + |
| 13 | +```mermaid |
| 14 | +flowchart TD |
| 15 | + S["Types of problems in WebRTC"] --> Session["Session negotiation (number of tracks, codecs)"] |
| 16 | + S --> Connection["Connection establishment (ICE and DTLS)"] |
| 17 | + S --> Playout["Playout (demuxing, packetization, decoding)"] |
| 18 | + S --> QoE["QoE (freezes, low quality, low fps)"] |
| 19 | +``` |
| 20 | + |
| 21 | +## Session Negotiation |
| 22 | + |
| 23 | +Here, we just validate that SDP offer/answer looks the way it should. |
| 24 | +In particular: |
| 25 | +1. Check number of audio and video mlines. |
| 26 | +1. Check if any mlines are rejected (either by presence of port 0 in the m="" or a=inactive). |
| 27 | +In most cases port is set to 9 (which means automatic negotiation by ICE) or if ICE is already in progress or this is subsequent negotiation, it might be set to a port currently used by the ICE agent. Port 0 appears when someone stops transceiver via [`stop()`](https://developer.mozilla.org/en-US/docs/Web/API/RTCRtpTransceiver/stop). |
| 28 | +1. Check mlines directions (a=sendrecv/sendonly/recvonly/inactive) |
| 29 | +1. Check codecs, their profiles and payload types. |
| 30 | +1. Number of mlines between offer and answer cannot change. |
| 31 | +This means that if one side offer that it is willing to only receive a single audio track, |
| 32 | +everything the other side can do is either confirm it will be sending or decline and say it won't be sending. |
| 33 | +If the other side also wants to send, additional negotiation has to be performed **in this case**. |
| 34 | + |
| 35 | +SDP offer/answer can be easily checked in chrome://webrtc-internals (in chromium based browsers) or (about:webrtc in FF). |
| 36 | + |
| 37 | +## Connection establishment |
| 38 | + |
| 39 | +WebRTC connection state (PeerConnection or PC state) is a sum of ICE connection state and DTLS state. |
| 40 | +In particular, PC is in state connected when both ICE and DTLS is in state connected. |
| 41 | + |
| 42 | +The whole flow looks like this: |
| 43 | +1. ICE searches for a pair of local and remote address that can be used to send and receive data. |
| 44 | +1. Once a valid pair is found, ICE changes its state to connected and DTLS handshake is started. |
| 45 | +1. Once DTLS handshake finishes, DTLS changes its state to connected and so the whole PC. |
| 46 | +1. In the meantime, ICE continues checking other pairs of local and remote address in case there is better path. |
| 47 | +If there is, ICE seamlessly switches to it - the transmission is not stopped or interrupted. |
| 48 | + |
| 49 | + |
| 50 | +More on ICE, its state changes, failures and restarts in section devoted to ICE. |
| 51 | + |
| 52 | +In most cases, DTLS handshake works correctly. Most problems are related to ICE as it's pretty complex protocol. |
| 53 | + |
| 54 | +Debugging ICE: |
| 55 | + |
| 56 | +1. Check ICE candidates grid in chrome://webrtc-internals or about:webrtc |
| 57 | +1. Turn on debug logs in ex_ice or chromium (via command line argument). FF exposes all ICE logs in about:webrtc->Connection Log. |
| 58 | +Every implementation (ex_ice, chromium, ff) is very verbose. |
| 59 | +You can compare what's happening on both sides. |
| 60 | +1. Try to filter out some of the local network interfaces and remove STUN/TURN servers to reduce complexity of ICE candidate grid, amount of logs and number of connectivity checks. |
| 61 | +In ex_webrtc, this is possible via [configuration options](https://hexdocs.pm/ex_webrtc/0.14.0/ExWebRTC.PeerConnection.Configuration.html#t:options/0). |
| 62 | +1. Use Wireshark. |
| 63 | +Use filters on src/dst ip/port, udp and stun protocols. |
| 64 | +This way you can analyze whole STUN/ICE/TURN traffic between a single local and remote address. |
| 65 | + |
| 66 | +Debugging DTLS: |
| 67 | + |
| 68 | +This is really rare. |
| 69 | +We used Wireshark or turned on [debug logs in ex_dtls](https://hexdocs.pm/ex_dtls/0.17.0/readme.html#debugging). |
| 70 | + |
| 71 | +## Playout |
| 72 | + |
| 73 | +If both session negotiation and connection establishment went well, you can observe packets are flowing but nothing is visible in the web browser, the problem might be in RTP packets demuxing, frames assembly or frames decoding on the client side. |
| 74 | + |
| 75 | +1. We heavily rely on chrome://webrtc-internals here. |
| 76 | +1. Check counters: packetsReceived, framesReceived, framesDecoded, framesDropped. |
| 77 | +1. E.g. if packetsReceived increases but framesReceived does not, it means that there is a problem in assembling video frames from RTP packets. This can happen when: |
| 78 | + 1. web browser is not able to correctly demux incomming RTP streams possibly because sender uses incorrect payload type in RTP packets (different than the one announced in SDP) or does not include MID in RTP headers. |
| 79 | + Keep in mind that MID MAY be sent only at the beginning of the transmission to save bandwidth. |
| 80 | + This is enough to create a mapping between SSRC and MID on the receiver side. |
| 81 | + 1. marker bit in RTP header is incorrectly set by the sender (although dependent on the codec, in case of video, marker bit is typically set when an RTP packet contains the end of a video frame) |
| 82 | + 1. media is incorrectly packed into RTP packet payload because of bugs in RTP payloader |
| 83 | +1. E.g. if packetsReceived increases, framesReceived increases but framesDecoded does not, it probably means errors in decoding process. |
| 84 | +In this case, framesDropped will probably also increase. |
| 85 | +1. framesDropped may also increase when frames are assembled too late i.e. their playout time has passed. |
| 86 | +1. Check borwser logs. |
| 87 | +Some of the errors (e.g. decoder errors) might be logged. |
| 88 | + |
| 89 | +## QoE |
| 90 | + |
| 91 | +The hardest thing to debug. |
| 92 | +Mostly because it very often depends on a lot of factors (network condition, hardware, sender capabilities, mobile devices). |
| 93 | +Problems with QoE are hard to reproduce, very often don't occur in local/office environment. |
| 94 | + |
| 95 | +1. We heavily rely on chrome://webrtc-internals here. |
| 96 | +1. Check coutners: nackCount, retransmittedPacketsSent, packetsLost. |
| 97 | +Retransmissions (RTX) are must have. |
| 98 | +Without RTX, even 1% of packet loss will have very big impact on QoE. |
| 99 | +1. Check incoming/outgoing bitrate and its stability. |
| 100 | +1. Check jitterBufferDelay/jitterBufferEmittedCount_in_ms - this is avg time each video frame spends in jitter buffer before being emitted for plaout. |
| 101 | +1. JitterBuffer is adjusted dynamically. |
| 102 | + |
| 103 | +## Debugging in production |
| 104 | + |
| 105 | +1. Dump WebRTC stats via getStats() into db for later analysis. |
| 106 | +1. getStats() can still be called after PC has failed or has been closed. |
| 107 | +1. Continous storage WebRTC stats as time series might be challenging. |
| 108 | +We don't have a lot of experience doing it. |
| 109 | +1. Come up with custom metrics that will allow you to observe the scale of a given problem or monitor how something changes in time. |
| 110 | +1. E.g. if you feel like you very often encounter ICE failures, count them and compare to successful workflows or to the number of complete and successful SDP offer/answer exchanges. |
| 111 | +This way you will see the scale of the problem and you can observer how it changes in time, after introducing fixes or new features. |
| 112 | +1. It's important to look at numbers instead of specific cases as there will always be someone who needs to refresh the page, restart the connection etc. |
| 113 | +What matters is the ratio of such problems and how it changes in time. |
| 114 | +1. E.g. this is a quote from Sean DuBois working on WebRTC in OpenAI: |
| 115 | + > We have metrics of how many people post an offer compared to how many people get to connected [state]. It’s never alarmed on a lot of users. |
| 116 | + |
| 117 | + Watch the full interview [here](https://www.youtube.com/watch?v=HVsvNGV_gg8) and read the blog [here](https://webrtchacks.com/openai-webrtc-qa-with-sean-dubois/#h). |
| 118 | +1. Collect user feedback (on a scale 1-3/1-5, via emoji) and observe how it changes. |
| 119 | + |
| 120 | +## MOS |
| 121 | + |
| 122 | +Initially, MOS was simply asking people about their feedback on a scale from 1 to 5 and then computing avg. |
| 123 | +Right now, we have algorithms that aim to calculate audio/video quality on the same scale but using WebRTC stats: jitter, bitrate, packet loss, resolution, codecs, freezes, etc. |
| 124 | +An example can be found here: https://github.com/livekit/rtcscore-go |
| 125 | + |
| 126 | +## chrome://webrtc-internals |
| 127 | + |
| 128 | +1. Based on [getStats()](https://developer.mozilla.org/en-US/docs/Web/API/RTCPeerConnection/getStats) API |
| 129 | +1. getStats() does not return derivatives. |
| 130 | +They depend on the frequency of calls to getStats() and have to be calcualted by a user. |
| 131 | +1. chrome://webrtc-internals can be dumped and then analyzed using: https://fippo.github.io/webrtc-dump-importer/ |
| 132 | + |
0 commit comments