WebSockets and WebRTC
WebSockets and WebRTC are standard protocols that are built into every web browser today and are commonly used in Web, desktop, and mobile app development.
WebSockets are used to send and receive data continuously, at relatively low bitrates, between clients and servers. WebRTC is used to send audio and video data at high bitrates and is the foundation for most of today’s popular video calling apps, including Microsoft Teams, Google Meet, Facebook Messenger, Discord, and WhatsApp.
WebSockets and WebRTC are complementary. Most real-time audio and video applications use WebRTC for media transport and also use WebSockets to set up sessions and manage application state.
Using WebSockets vs WebRTC for real-time video and audio
Because WebSockets are widely used and familiar, many developers who are writing a real-time audio (or video) application for the first time try to use WebSockets for audio (or video) transport. This is generally not a good idea.
- The WebSocket protocol is a relatively heavyweight network layer built on top of TCP. TCP’s packet delivery guarantees get in the way of the selective resending and fast adjustments to changes in network conditions that are important for real-time media transport. WebRTC media transport runs on top of UDP. UDP is simpler and much more effective for real-time media.
- Bandwidth management is built into WebRTC. Bandwidth management is critical for reliable deliver of audio and video in real-world network conditions. A WebSocket client that gets behind sending/receiving data will fall farther and farther behind. WebRTC clients, on the other hand, can drop packets, adjust target bitrates gracefully, and catch back up.
- WebRTC includes media and network layer statistics that help debug application performance and are critical for observability when operating at scale.
- Echo cancellation is built into most WebRTC implementations. Echo cancellation is critical for any app that uses both a speaker and microphone simultaneously.
- WebSockets timeouts and the underlying TCP stack behaviors vary across platforms and are not designed for real-time media use cases. This tends not to matter much for use cases where occasional blips of a few seconds of latency are okay. But for real-time media to work well in real-world network situations requires spending a lot of time tuning and testing keepalive and reconnection logic. This is all handled for you by WebRTC.
Getting Started with WebRTC
WebRTC is more complicated than WebSockets. The easiest way to get started is with a hosted WebRTC service such as Daily.
If you are interested in learning in detail how WebRTC works or contributing to the WebRTC ecosystem, there are also excellent Open Source media server frameworks — for example, Mediasoup.
For an in-depth walk through describing how to build voice-driven AI applications that are optimized for conversational latency and reliable performance, see our How To Talk To an LLM (With Your Voice) blog post.
Want to learn more about WebRTC? Visit our community, developer resources, or our blog.