Building a Voicemail Detection Agent with Pipecat and Daily

In this post, we'll walk through setting up a real-time voice agent that can make calls, detect voicemail systems, and handle live conversations using Pipecat and Daily's WebRTC infrastructure.

Intelligent voice mail detection is a critical requirement for voice agents leveraging telephony. For example, a company can use intelligent voicemail detection to streamline outbound verification calls for an online lender, ensuring that only live customers receive identity verification prompts while voicemails receive tailored callback requests. Another use case is for appointment scheduling services, where the system can confirm bookings with live recipients while leaving rescheduling instructions for voicemail responses.

What We're Building

Our voicemail detection bot demonstrates several key capabilities:

Automated outbound calling using Daily's dial-out feature
Intelligent voicemail system detection
Dynamic message handling for both voicemail and live conversations
Real-time voice processing and natural language understanding

Prerequisites

Before we begin, make sure you have:

A Daily account with API key
Google API key
Cartesia API key
Python 3.10 or newer
Basic familiarity with async Python programming

Getting Started

Setting Up the Project

First, clone the Pipecat repository to access the voicemail detection example:

git clone <https://github.com/pipecat-ai/pipecat.git>
cd pipecat/examples/phone-chatbot/daily-pstn-advanced-voicemail-detection

Configuring Your Environment

Create and activate a Python virtual environment:

python3 -m venv venv
source venv/bin/activate

Install the required dependencies (full requirements file coming soon)

pip install -r requirements.txt

Setting Up Environment Variables

Create a .env file in your project directory with your API credentials:

DAILY_API_KEY=your_daily_api_key
GOOGLE_API_KEY=your_google_api_key
CARTESIA_API_KEY=your_cartesia_api_key

Testing the Bot

Enabling Dial-out Capabilities

For more details on this step, refer to this guide.

For this demo, dial-out is limited to American and Canadian phone numbers. If you need international dial-out, please contact us at help@daily.co to discuss your requirements on a case-by-case basis.

Steps to Enable Dial-out:

Enable dial-out for your Daily account:

Please fill in this form to request dial-out access for your Daily domain: https://forms.gle/Q5eaHLEosuKyzC7BA

Purchase a phone number (required for dialing out)

First, check available phone numbers in your region:

curl  -H "Content-Type: application/json" \
      -H "Authorization: Bearer your-daily-api-key" \
      https://api.daily.co/v1/list-available-numbers?region=CA

This command returns a list of purchasable numbers for the specified region
Next, purchase a number and move on to Step 3:

curl --request POST \
  --url 'https://api.daily.co/v1/buy-phone-number' \
  --header 'Authorization: Bearer your-daily-api-key' \
  --header 'Content-Type: application/json' \
  --data '{
        "number": "+12097808812"
}'

Starting the server

Daily offers REST helpers for creating rooms and managing meeting tokens programmatically. You can find them here. In this example, we’ll use the code from server.py and the daily_helpers.py file within the utils folder to not only start the bot but also create rooms and meeting tokens with the required properties to enable dial-out.

You can start the server by calling the following in the terminal.

python server.py

Initiate a dial-out call:

Start the bot and specify a phone number to dial out to with the following cURL command:

curl -X POST "http://127.0.0.1:7860/start" \
  -H "Content-Type: application/json" \
  -d '{
    "dialout_settings": {
      "phone_number": "+12345678910"
    }
  }'

Test the bot and voicemail detection:

The bot will now start up. Once it joins the call, it will begin dialing out.
You can test voicemail detection by pretending to be a voicemail machine.

Ensure the bot has the necessary permissions:

To initiate dial-out, the bot must have a token with either the is_owner or canAdmin property set.
The server.py script already handles this for you, but if you're building your own app, make sure to include this step.

Understanding the Code

Let's examine the key components that make our voicemail detection bot work:

Call Termination

The bot needs to know when to end a call, especially after leaving a voicemail:

async def terminate_call(
        params: FunctionCallParams,
        call_flow_state: CallFlowState = None,
    ):
        """Function the bot can call to terminate the call."""
        if call_flow_state:
            # Set call terminated flag in the session manager
            call_flow_state.set_call_terminated()

        await params.llm.queue_frame(EndTaskFrame(), FrameDirection.UPSTREAM)

AI Configuration

We use Gemini Flash Lite 2.0 and Gemini Flash 2.0 for natural language processing and decision-making. This example showcases using the Flash Lite model, a much cheaper and lightweight model for detecting voicemail, along with the Flash model for the human conversation part. It also showcases how to collect audio and send it directly to the LLM

This is how we register the LLM and the tools:

voicemail_detection_llm = GoogleLLMService(
        model="models/gemini-2.0-flash-lite",
        api_key=os.getenv("GOOGLE_API_KEY"),
        system_instruction=system_instruction,
        tools=tools,
    )

voicemail_detection_llm.register_function(
        "switch_to_voicemail_response",
        handlers.voicemail_response,
    )
    voicemail_detection_llm.register_function(
        "switch_to_human_conversation", handlers.human_conversation
    )
    voicemail_detection_llm.register_function(
        "terminate_call", lambda params: terminate_call(params, call_flow_state)
    )


tools = [
        {
            "function_declarations": [
                {
                    "name": "switch_to_voicemail_response",
                    "description": "Call this function when you detect this is a voicemail system.",
                },
                {
                    "name": "switch_to_human_conversation",
                    "description": "Call this function when you detect this is a human.",
                },
                {
                    "name": "terminate_call",
                    "description": "Call this function to terminate the call.",
                },
            ]
        }
    ]

Bot Behavior Definition

The bot's intelligence comes from its carefully crafted system prompts.

First of all, we have 1 prompt to detect the voicemail like so:

system_instruction = """You are Chatbot trying to determine if this is a voicemail system or a human.

        If you hear any of these phrases (or very similar ones):
        - "Please leave a message after the beep"
        - "No one is available to take your call"
        - "Record your message after the tone"
        - "You have reached voicemail for..."
        - "You have reached [phone number]"
        - "[phone number] is unavailable"
        - "The person you are trying to reach..."
        - "The number you have dialed..."
        - "Your call has been forwarded to an automated voice messaging system"

        Then call the function switch_to_voicemail_response.

        If it sounds like a human (saying hello, asking questions, etc.), call the function switch_to_human_conversation.

        DO NOT say anything until you've determined if this is a voicemail or human.
        
        If you are asked to terminate the call, **IMMEDIATELY** call the `terminate_call` function. **FAILURE TO CALL `terminate_call` IMMEDIATELY IS A MISTAKE.**"""

If the bot detects that it has has reached voicemail, it calls the voicemail_response function with the following result_callback:

sync def voicemail_response(self, params: FunctionCallParams):
        """Function the bot can call to leave a voicemail message."""
        message = """You are Chatbot leaving a voicemail message. Say EXACTLY this message and then terminate the call:

                    'Hello, this is a message for Pipecat example user. This is Chatbot. Please call back on 123-456-7891. Thank you.'"""

        await params.result_callback(message)

If the bot detects that it is talking to a human, it closes the current pipeline task and moves on to the next one. There we use the more advanced Gemini model. Because this model is more expensive, we move away from giving the model audio and instead give the model text to work with instead.

human_conversation_system_instruction = """You are Chatbot talking to a human. Be friendly and helpful.

        Start with: "Hello! I'm a friendly chatbot. How can I help you today?"

        Keep your responses brief and to the point. Listen to what the person says.

        When the person indicates they're done with the conversation by saying something like:
        - "Goodbye"
        - "That's all"
        - "I'm done"
        - "Thank you, that's all I needed"

        THEN say: "Thank you for chatting. Goodbye!" and call the terminate_call function."""

Conclusion

This example demonstrates how to build a sophisticated voicemail detection bot using Pipecat and Daily's WebRTC infrastructure. The bot showcases real-time audio processing, natural language understanding, and automated call handling capabilities.

For more information about the Pipecat framework and Daily’s WebRTC infrastructure, visit our documentation here: https://docs.pipecat.ai/getting-started/overview. Join the Discord Pipecat community or contact us.

Categories

Topics