Chat moderation in Daily using TensorFlow.js

Creating friendly, welcoming spaces is a common consideration when building collaborative applications. End users can often manually moderate their sessions, but that’s largely a reactive process that can leave much to be desired. How can we as developers build applications that empower end users to better deal with this issue?

In this post, we’ll explore just one of many automated text chat moderation options in Daily-powered video applications by using TensorFlow.js. We’ll be extending the chat interface of Daily’s Vue demo app to show an example of such an implementation.

Note that the approach taken in this article is just one exploratory option for aided moderation to give developers some ideas for what is possible. There is no totally foolproof way to implement this feature: human interaction is hard!

Getting started

First, create a free Daily account and a room.

Next, clone the Vue starter app from GitHub and change your working directory to the root of the demo:

git clone https://github.com/daily-demos/vue-call-object.git
cd vue-call-object

Install the package dependencies and run the app:

npm install
npm run dev


Open the app on your browser at the server port shown in your terminal (e.g., http://localhost:5173/). The start script watches any changes you make in your code and updates the browser once you save the file.

If you would like to see the finished version of this demo extension, you can go directly to the feat/chat-moderation-with-tensorflow-js branch on GitHub.

What is TensorFlow.js?

Machine learning is a field of computer science that gives computers learning capabilities. The ability to learn paves the way for computers to “understand” written text and spoken words. Computers can run models or programs that identify patterns based on previous data and thus detect things like potentially toxic speech.

TensorFlow.js is a JavaScript library developers can use to run pre-trained machine-learning models in the browser. The library has a variety of models for tasks such as object identification and language processing. One of these models is the text toxicity detection model.

As per the README, toxicity is a model that detects whether a piece of text contains threatening language, insults, obscenities, identity-based hate, or sexually explicit language. We’ll use this model in our code to perform client-side moderation of chat messages.

What are we building?

We are extending the chat feature found in the existing Vue demo’s ChatTile.vue component. We will add a function to examine a user's chat text input for offensive content when the "Send" button is clicked. If the input text is deemed problematic, the message won't be transmitted to other participants. We’ll also alert the user that we flagged their message and why.

The ChatTile component

The ChatTile.vue component is a single file component that contains the following:

  • a button for opening the chat screen
  • a div for listing out chat messages
  • a form consisting of an input field for the user to type a message and a submit button

The <script> section of this component contains a submitForm() function that handles the form submission. Our goal is to modify this function. If you want to learn more about implementing chat applications in Daily, Jess’s article on adding a custom chat widget with Daily’s sendAppMessage() is a great resource.

Installing the package and loading the model

To install TensorFlow.js in our project, run the following command in your terminal:

npm install @tensorflow/tfjs @tensorflow-models/toxicity

Next, import the toxicity model package in the ChatTile component:

import { load } from '@tensorflow-models/toxicity'

Above, we import the load() function from the toxicity detection model. When called, this function retrieves the pre-trained model so web applications can use it.

You’ll also want to add a null model property to the data object:

export default {
  name: "ChatTile",
  props: ["sendMessage", "messages"],
  data() {
    return {
      chatIsOpen: false,
      close,
      chat,
      send,
      text: "",
      model: null,
    };
  },

Then, retrieve the model within Vue's beforeCreate() hook in the ChatTile.vue component:

export default {
  name: "ChatTile",
  props: ["sendMessage", "messages"],
  data() { //... },
  beforeCreate() {
    const threshold = 0.9
     load(threshold)
       .then((model) => {
         this.model = Object.freeze(model);
       })
       .catch((e) => {
         console.error("failed to load toxicity model", e);
       });
  },
}

Above, we load the TensorFlow.js toxicity model when the component instance is initialized. We then store the retrieved model in the model property within the data object.

You will also see that we called the Object.freeze method on the retrieved model. We did this to prevent Vue from modifying the model object to make it reactive.

Finally, you can save the file, run the application, and observe the browser console to see the model contents:

The TensorFlow.js toxicity model object, logged to the console

Ensuring the model is loaded before chat is available

The TensorFlow.js model load operation above is asynchronous, so we have no guarantee that the model will be loaded by the time the user joins a Daily video call and tries to chat. Therefore, we won’t activate the in-app chat button until model load is complete. To do so, update the button definition in ChatTile.vue to disable the button if the model is falsy:

      <button class="chat" @click="handleChatClick" :disabled="!model">
         <template v-if="!model">
           <img class="icon" :src="close" alt="" />
         </template>
         <template v-else-if="chatIsOpen">
           <img class="icon" :src="close" alt="" />
         </template>
         <template v-else>
          <img class="icon" :src="chat" alt="" />
        </template>
      </button>

The above results in the application chat button being disabled with an “X” icon until the model loads:

Chat functionality disabled with a "X" icon

Once the TensorFlow.js model has loaded, the chat button is enabled automatically:

Chat functionality enabled with a chat icon

Analyzing input text using the toxicity model

With the toxicity model loaded and stored, we can perform end-user input text analysis. We start by adding a function called analyzeInputText() to the methods object in the ChatTile component:

export default {
  name: "ChatTile",
  props: ["sendMessage", "messages"],
  data() {//...},
  beforeCreate() {//...},
  methods: {
    // Toggle chat's view (open/closed)
    //...

    // Send chat message using prop method from CallTile.vue
    submitForm(e) {//...},
    async analyzeInputText(text) {
      const predictions = await this.model.classify(text);
      const containsToxicContent = predictions.some(prediction => prediction.results[0].match)
      return containsToxicContent
    },
  },
}

Let’s take a closer look at what’s happening in the function above.

analyzeInputText() is an asynchronous function which takes one argument called text. Within this function, the provided text is passed into a call to the TensorFlow.js model’s classify() method. The classify() method returns a Promise, which we await and store in a variable called predictions.

The predictions promise resolves with an array of prediction objects. Each prediction object contains prediction results for what TensorFlow calls a label.

TensorFlow.js predictions object logged to console

Labels are categories for the kind of toxic speech that TensorFlow can predict. A label can be one of the following:

  • "identity_attack"
  • "insult"
  • "obscene"
  • "severe_toxicity"
  • "sexual_explicit"
  • "threat"
  • "toxicity"

Each prediction object has a results array. The results array contains prediction results data for each string passed to the classify() method.

We pass a single string to classify(), so our results array has only one object. That object has two keys: match and probabilities. The match key is a boolean that tells us whether or not the model classified the text under a given category.

The containsToxicContent variable we define above is a boolean representing whether any of the labels has a truthy match value. Finally, we return containsToxicContent.

Now that we know what analyzeInput() does, the next step is to call it from the submitForm handler function and retrieve the result:

export default {
  name: "ChatTile",
  props: ["sendMessage", "messages"],
  data() { //... },
  beforeCreate() {//...},
  methods: {
    // Send chat message using prop method from CallTile.vue
    async submitForm(e) {
      e.preventDefault();

      const hasToxicContent = await this.analyzeInputText(this.text)
      console.log(hasToxicContent)

      // ...Logic to handle the result of the toxicity check via warning or sending an app message below (covered in subsequent section)
      this.text = ""
    },
    async analyzeInputText(text) {//...},
  },
}

Above, when the chat input form is submitted, we call analyzeInputText() with the message string the user is intending to send. Now, let’s take a look at how the result of this call is actually handled.

Displaying a warning if the message was flagged

We have the analysis result, but are not doing anything useful with it yet. We want to use the value of hasToxicContent to either display a warning or send the provided chat message to other call participants.

First, let’s add a displayWarning boolean to the data object:

data() {
    return {
      chatIsOpen: false,
      close,
      chat,
      send,
      text: "",
      model: null,
      displayWarning: false,
    }
}

Next, update the form submission handler to set the new displayWarning boolean to true when toxic content is detected. If hasToxicContent is false, we send the message as usual. The final form submission handler should look as follows:

async submitForm(e) {
    e.preventDefault();
    this.displayWarning = false;

    const hasToxicContent = await this.analyzeInputText(this.text);
    hasToxicContent ? this.displayWarning = true : this.sendMessage(this.text);
    this.text = "";
},

Additionally, we need an element to hold our warning message. Create one in the chat-container element:

<template>
  <div :class="chatIsOpen ? 'chat-wrapper open' : 'chat-wrapper'">
    <!-- other elements -->

    <div class="chat-container">
      <!-- messages container -->

      <p v-if="displayWarning" class="warning-message">
        Oops! The message you're trying to send was flagged as potentially offensive.
        If you think this was flagged in error, please contact us!
      </p>

      <!-- form -->
    </div>
  </div>
</template>

Above, we’ve added a p tag with a v-if directive and set it to displayWarning so that if the value of hasToxicContent is false, the message is removed from the DOM.

Let’s also style the warning message so that it stands out:

<style scoped>
  .warning-message {
    color: red;
  }
</style>

Finally, we can save the file and observe the behavior in the browser.

User's Daily chat input being flagged by the TensorFlow.js toxicity model


You will notice that the warning message does not disappear if we send a non-toxic message. Let’s fix that by resetting the displayWarning data property when the form is submitted:

async submitForm(e) {
	// prevent default handler
      this.displayWarning = false

      // ...
      hasToxicContent ? this.displayWarning = true : this.sendMessage(this.text)
      this.text = ""
}

By resetting displayWarning before proceeding to the rest of the handler logic, we hide any warning message associated with previously flagged input.

Save the file and test in the browser again to see that the old error message gets removed as expected.

User's first Daily chat being flagged, and the warning being reset on subsequent message

A word of caution

Machine learning is not infallible. Models can fail to understand certain contexts and flag (or miss!) certain messages. Be sure to consider these issues and edge cases when architecting chat moderation for your application. This demo is not intended to be used in production–it is an experimental project providing an example of how to implement a feature like this with Daily.

Additionally, this article features a client-side implementation of chat moderation. There is always the risk of a knowledgeable malicious actor bypassing the checks we built on the client.

Conclusion

In this article, we learned about TensorFlow.js, the toxicity detection model, and how it can provide us with one option to add chat moderation capabilities to our Daily apps.

We used a pre-trained model and it is a way to get started if you decide to go this route. Visit the TensorFlow.js documentation to learn more about the library, ways to extend what we’ve built so far, and other features.

Have you had to implement aided content moderation in your applications? We’d love to hear more about your experiences, and any thoughts you might have about our TensorFlow.js example above, over at our Discord community.

Never miss a story

Get the latest direct to your inbox.