This is part of our social gaming series, in which we walk through key features of our Daily-powered social game, Code of Daily: Modern Wordfare. This is a standalone post. You do not need to read the rest of the series to follow along. But if you’re curious about the greater context of our application, check out part one or the rest of the series.
Introduction
Daily’s Client SDK provides developers a high amount of flexibility when it comes to running their video calls. This enables seamless integration of Daily-powered calls with the UX and branding of the consuming application.
That flexibility also means developers end up doing a bit more handling of their users’ video, audio, and screen tracks in their own code—specifically, retrieving tracks from Daily and keeping them updated on their application’s media DOM elements.
In this post, we’ll take a look at how Daily video and audio features were incorporated into our social game, Code of Daily: Modern Wordfare. Specifically, we’ll focus on handling players’ video and audio with the media tracks Daily provides.
Following along
Check out the demo repository to follow along directly on GitHub. I’ll link any relevant reference to specific code here. You can also follow the instructions to clone and run the game locally if you’d like to spin up the application on your own machine.
Media properties of a Daily call participant
Each presence-enabled call participant in Daily can be retrieved through the Daily call object’s participants()
instance method. The participant object contains references to that participant’s video and audio tracks.
Daily provides a few pieces of key data in relation to each track. You can find a list of all the track properties Daily provides in our documentation.
In Code of Daily: Modern Wordfare (CoD), we’ve used two main Daily-provided track properties to render players’ video and audio:
state
: whether the track is playable, blocked, loading, or in some other statepersistentTrack
: the media track itself
Using these two properties, we can implement all of the track juggling we need for the social game. We’ll go through how I’ve done that here. But first, let’s take a closer look at what a track actually is.
What is a track?
A Daily track is an instance of MediaStreamTrack
. A MediaStreamTrack
contains the audio or video data that a call participant is sending to other participants who are subscribed to their tracks. Such tracks can be attached to a MediaStream
, which is then assigned as the source of a media DOM element (like an HTMLVideoElement
or HTMLAudioElement
). Each MediaStream
instance can contain one or more MediaStreamTracks
. Each media DOM element can contain one source object, which is usually a MediaStream
.
An HTMLVideoElement
can be created as follows:
- In your HTML file, with a
<video>
tag - Through JavaScript at runtime, with
document.createElement("video")
An HTMLAudioElement
can be created as follows:
- In your HTML file, with an
<audio>
tag - Through JavaScript at runtime, with
document.createElement("audio")
- The
new Audio()
constructor
In CoD we only concern ourselves with a participant’s video and audio tracks. But if your application calls for screen sharing, your tracks could also be screen video and system audio.
With Daily’s Client SDK, developers have direct control over which remote tracks a user is subscribed to. In the case of CoD, since our game sessions are intended for only a small number of players, each player subscribes to every other player’s video and audio automatically.
You might notice that when retrieving a track from a Daily participant, the track
object contains two fields that seem quite similar:
persistentTrack
(use this one!): contains a reference to a track in any statetrack
(avoid): contains a playable track, if one exists
Why use persistentTrack
?
Both track
and persistentTrack
are instances of MediaStreamTrack
. However, track
is only set when there is a ready-to-be-played media track available. That means you will never find a track in a "blocked"
, "off"
, or other non-playable state in this field.
The persistentTrack
field contains a track regardless of its playability. This track may be "interrupted"
, "blocked"
, "loading"
, or any other non-playable state.
We recommend using persistentTrack
in your application because it maintains a track reference regardless of its playability, allowing you to minimize repeatedly resetting tracks on your media DOM elements. Redundantly swapping tracks and streams on your media elements is not recommended for the following reasons:
- Juggling playable and non-playable tracks on associated DOM elements exposes visual issues like black frames during track interrupts in some browsers.
- In some browsers (like Safari), swapping audio tracks on the media DOM element can cause audio interruptions and prevent autoplays, especially if the user has the application tab in the background.
We plan to eventually update the track
property to behave exactly like persistentTrack
, so you may as well future-proof your implementation by using persistentTrack
today.
How to retrieve Daily’s media tracks
There are many ways to utilize Daily’s media tracks and your application flow might call for something unique, but let’s go through the approach used in CoD.
First, I added handlers to the "track-started"
and "track-stopped"
events. These are emitted by Daily when the playability of a track has changed. This can be triggered by a participant toggling their media on or off, device permission changes, or even network hiccups. The handler definition happens in CoD’s Game
class, during the initial call setup.
this.call.registerTrackStartedHandler((p) => {
const tracks = Call.getParticipantTracks(p.participant);
try {
updateMedia(p.participant.session_id, tracks);
} catch (e) {
console.warn(e);
}
});
this.call.registerTrackStoppedHandler((p) => {
const tracks = Call.getParticipantTracks(p.participant);
try {
updateMedia(p.participant.session_id, tracks);
} catch (e) {
console.warn(e);
}
});
When Daily receives a "track-started"
event, the code above retrieves all available participant tracks using the getParticipantTracks()
static method on our Call
class:
// getParticipantTracks() retrieves video and audio tracks
// for the given participant, if they are usable.
static getParticipantTracks(p: DailyParticipant): Tracks {
const mediaTracks: Tracks = {
videoTrack: null,
audioTrack: null,
};
const tracks = p?.tracks;
if (!tracks) return mediaTracks;
const vt = tracks.video;
const vs = vt?.state;
if (vt.persistentTrack && (vs === playableState || vs === loadingState)) {
mediaTracks.videoTrack = vt.persistentTrack;
}
// Only get audio track if this is a remote participant
if (!p.local) {
const at = tracks.audio;
const as = at?.state;
if (at.persistentTrack && (as === playableState || as === loadingState)) {
mediaTracks.audioTrack = at.persistentTrack;
}
}
return mediaTracks;
}
Above, we will retrieve the participant's audio and video tracks from their participant object. If the track is in a "playable"
or "loading"
(i.e., expected to be playable shortly) state, we set it on our mediaTracks
object and return it to the caller (which is our track event-handling function).
If the tracks are not playable, we do not return them, and the local participant will experience the given participant as hidden or muted (we’ll go through how we do that below).
The Tracks
type that is returned above looks as follows, with fields for both kinds of track that we care about (video and audio):
export type Tracks = {
videoTrack: MediaStreamTrack | null;
audioTrack: MediaStreamTrack | null;
};
After retrieving the participant’s media tracks, our track event handler calls updateMedia()
, which is where we’ll actually update the relevant DOM element with the retrieved tracks. Let’s go through how that’s handled.
Handling media tracks
When assigning Daily media tracks to a media DOM element, we have to deal with a few scenarios:
- Brand new tracks being assigned on a media element that doesn’t yet have any. In CoD, this happens when a new participant first joins the game.
- The participant sends tracks that differ from those previously set, which means the tracks on our media DOM element need to be updated. For example, this could happen if a move to another SFU (Selective Forwarding Unit) is triggered mid-call.
- The participant mutes their video or audio and the media DOM element has to be updated to suit (such as hiding the video in favor of a stylized “video-off” background).
A reliable way of setting media tracks on DOM elements
To reliably juggle media tracks on DOM elements, I would propose implementing updateMedia()
as follows:
- If there is no existing
MediaStream
set as the media DOM element’s source object, construct a new stream and set the available tracks on it - Otherwise, check whether the provided video and audio track IDs (obtainable via the
MediaStreamTrack
’sid
property) match those that already exist on the media DOM element’sMediaStream
. If either of the IDs do not match, replace just that track, not the entire stream. - Additionally, if the newly provided tracks do not contain a valid video track, hide the associated video element to show whatever “cam-off” style you might want for that participant.
You can have a look at the implementation of this approach on GitHub. You will note the usage of the following methods on the MediaStream
class in the implementation:
getTracks()
: Obtaining all existing tracks on the video element’s source objectMediaStream
getAudioTracks()
: Obtaining all audio tracks from the existing source objectgetVideoTracks()
: Obtaining all video tracks from the existing source objectremoveTrack()
: Removing an out-of-dateMediaStreamTrack
from the existing source objectaddTrack()
: Adding a newMediaStreamTrack
to the existing source object
This implementation results in a new MediaStream
being constructed only once, when first setting the tracks. From then on, you can replace just the relevant tracks as needed (those that have actually changed). If the new tracks we receive are identical to those that are already set, this results in a no-op. Special styling can be applied to indicate cam-off or mic-off states when the video and audio tracks are null, without removing the old tracks from the MediaStream
.
A tempting, but flawed, approach to setting media tracks on DOM elements
A developer might be tempted to implement our updateMedia()
method by simply re-creating a MediaStream
with the updated media tracks each time new tracks are retrieved from Daily. If no playable tracks are received, you might remove the media DOM element’s srcObject
completely since there is nothing to play. In fact, you can see an example of this approach (as well the updates I made to implement the more robust approach outlined above) in an older version of Modern Wordfare.
The problem with this approach is primarily twofold:
- We open ourselves up to playback issues when resetting the media source on a DOM element: video flickering, audio hiccups similar to the kind you might experience if using the playable-only
track
property we covered above - We perform needless construction and teardown of the media element’s source by setting a new
MediaStream
as the video element’s source repeatedly. For example, in WebKit, settingsrcObject
on a media element invokes various loading operations
Consider separate video and audio DOM elements
In CoD, we went with adding all tracks to a single stream on a single HTMLVideoElement
for simplicity. This is because our social game currently subscribes to the tracks of every other participant automatically (Daily’s default behavior), and does no hiding or scrolling of certain participants’ tiles.
But there are cases for separating the video and audio into separate DOM elements.
Pagination of video tiles
You might have a use case where certain participants’ video gets hidden behind pagination or similar features. For example, this is the case in our own Daily Prebuilt, which supports large numbers of participants and therefore has to hide some of them on the screen.
In these cases, you can consider assigning a participant's video track to an HTMLVideoElement
and the audio track to its own HTMLAudioElement
, each with its own respective MediaStream
.
This way, even if a call participant scrolls or otherwise goes out of view of the local participant, you can still opt to play their audio to the local user. They don’t have to be seen to be heard.
Browser autoplay considerations
Browsers may block audio elements or video elements with audio tracks assigned from autoplaying in the browser. Automatic playback of audio elements is commonly blocked if the user has not yet interacted with the page via a gesture (clicking or tapping, for example). You can read more about audio autoplay behavior on MDN.
For this reason, it can be worth considering separating a participant’s media tracks into a separate HTMLVideoElement
and HTMLAudioElement
. Otherwise, if audio and video are both on a single video element and autoplay is blocked, the end user might see frozen video frames in addition to no audio until the conditions for playing the media are met.
You will likely want to hide the default call controls associated with an HTMLAudioElement
. This can be done by constructing the element through JavaScript and not attaching it to the DOM, or defining the <audio>
tag without the controls
property.
Wrapping up
In this post, we covered a lot of information about Daily’s media tracks. You now know:
- What Daily media tracks are
- The difference between Daily’s
track
andpersistentTrack
properties - How to retrieve a participant’s tracks
- Recommendations of how to render the media tracks in the DOM, along with some common pitfalls and what we recommend that you not do
- When it might make sense to assign all tracks to a single
HTMLVideoElement
and when you might consider using a dedicatedHTMLAudioElement
element for the audio track
Please reach out if you have any feedback, or questions about working with Daily’s participant media tracks.