
In this post, we'll walk through setting up a real-time voice agent that can make calls, detect voicemail systems, and handle live conversations using Pipecat and Daily's WebRTC infrastructure.
Intelligent voice mail detection is a critical requirement for voice agents leveraging telephony. For example, a company can use intelligent voicemail detection to streamline outbound verification calls for an online lender, ensuring that only live customers receive identity verification prompts while voicemails receive tailored callback requests. Another use case is for appointment scheduling services, where the system can confirm bookings with live recipients while leaving rescheduling instructions for voicemail responses.
What We're Building
Our voicemail detection bot demonstrates several key capabilities:
- Automated outbound calling using Daily's dial-out feature
- Intelligent voicemail system detection
- Dynamic message handling for both voicemail and live conversations
- Real-time voice processing and natural language understanding
Prerequisites
Before we begin, make sure you have:
- A Daily account with API key
- Google API key
- Cartesia API key
- Python 3.10 or newer
- Basic familiarity with async Python programming
Getting Started
Setting Up the Project
First, clone the Pipecat repository to access the voicemail detection example:
git clone <https://github.com/pipecat-ai/pipecat.git>
cd pipecat/phone-chatbot/
Configuring Your Environment
- Create and activate a Python virtual environment:
python3 -m venv venv
source venv/bin/activate
- Install the required dependencies (full requirements file coming soon)
pip install -r requirements.txt
Setting Up Environment Variables
Create a .env
file in your project directory with your API credentials:
DAILY_API_KEY=your_daily_api_key
GOOGLE_API_KEY=your_google_api_key
CARTESIA_API_KEY=your_cartesia_api_key
Testing the Bot
Initial Testing with Daily Prebuilt
Daily offers REST helpers for creating rooms and managing meeting tokens programmatically. You can find them here. In this example, we’ll use the code from bot_runners.py
to not only start the bot but also create rooms and meeting tokens with the required properties to enable dial-out.
Lets run the example without specifying a number to dial out to, which will allow us to test using Daily’s Prebuilt interface:
- Open two terminal windows, ensuring both are in the correct folder and using the appropriate Python environment.
- In the first terminal, start the bot runner:
python bot_runner.py --host localhost
- In the second terminal, spin up the bot:
curl -X POST "http://127.0.0.1:7860/start" \
-H "Content-Type: application/json" \
-d '{
"config": {
"voicemail_detection": {
"testInPrebuilt": true
}
}
}'
- You should receive a response that looks like this:
{
"status": "Bot started",
"bot_type": "voicemail_detection",
"room_url": "https://bdom.daily.co/WOImXSXvBNBlkwffhVRe"
}
- Take note of the
room_url
from the response, then paste it into your browser. - Join the call through the pre-join screen.
- To test the bot, you can either:
- Simulate a voicemail system (e.g., "Please leave a message after the beep...")
- Act as a live caller
Enabling Dial-out Capabilities
For more details on this step, refer to this guide.
For this demo, dial-out is limited to American and Canadian phone numbers. If you need international dial-out, please contact us at help@daily.co to discuss your requirements on a case-by-case basis.
Steps to Enable Dial-out:
Enable dial-out for your Daily account:
- Add a payment method in the Daily dashboard
- Contact help@daily.co to request dial-out activation
- Provide your company details, use case, as well as your Daily domain or registered email address
- Once you receive confirmation, proceed to Step 2
Purchase a phone number (required for dialing out)
- First, check available phone numbers in your region:
curl -H "Content-Type: application/json" \
-H "Authorization: Bearer your-daily-api-key" \
https://api.daily.co/v1/list-available-numbers?region=CA
- This command returns a list of purchasable numbers for the specified region
- Next, purchase a number and move on to Step 3:
curl --request POST \
--url 'https://api.daily.co/v1/buy-phone-number' \
--header 'Authorization: Bearer your-daily-api-key' \
--header 'Content-Type: application/json' \
--data '{
"number": "+12097808812"
}'
Initiate a dial-out call:
- Add the phone number you want to dial to the previous cURL command:
curl -X POST "http://127.0.0.1:7860/start" \
-H "Content-Type: application/json" \
-d '{
"config": {
"dialout_settings": {
"phoneNumber": "+12092428393"
},
"voicemail_detection": {
"testInPrebuilt": false
}
}
}'
Test the bot and voicemail detection:
- The bot will now start up. Once it joins the call, it will begin dialing out.
- You can test voicemail detection as done in previous steps.
Ensure the bot has the necessary permissions:
- To initiate dial-out, the bot must have a token with either the
is_owner
orcanAdmin
property set. - The
bot_runner.py
script already handles this for you, but if you're building your own app, make sure to include this step.
Understanding the Code
Let's examine the key components that make our voicemail detection bot work:
Call Termination
The bot needs to know when to end a call, especially after leaving a voicemail:
async def terminate_call(
function_name,
tool_call_id,
args,
llm: LLMService,
context,
result_callback,
session_manager=None,
):
"""Function the bot can call to terminate the call."""
if session_manager:
# Set call terminated flag in the session manager
session_manager.call_flow_state.set_call_terminated()
await llm.queue_frame(EndTaskFrame(), FrameDirection.UPSTREAM)
AI Configuration
We use Gemini Flash Lite 2.0 and Gemini Flash 2.0 for natural language processing and decision-making. This example showcases using the Flash Lite model, a much cheaper and lightweight model for detecting voicemail, along with the Flash model for the human conversation part. It also showcases how to collect audio and send it directly to the LLM
This is how we register the LLM and the tools:
voicemail_detection_llm = GoogleLLMService(
model="models/gemini-2.0-flash-lite",
api_key=os.getenv("GOOGLE_API_KEY"),
system_instruction=system_instruction,
tools=tools,
)
voicemail_detection_llm.register_function(
"terminate_call", functools.partial(terminate_call, session_manager=session_manager)
)
tools = [
{
"function_declarations": [
{
"name": "terminate_call",
"description": "Call this function to terminate the call.",
},
]
}
]
Bot Behavior Definition
The bot's intelligence comes from its carefully crafted system prompts.
First of all, we have 1 prompt to detect the voicemail like so:
system_instruction = """You are Chatbot trying to determine if this is a voicemail system or a human.
If you hear any of these phrases (or very similar ones):
- "Please leave a message after the beep"
- "No one is available to take your call"
- "Record your message after the tone"
- "You have reached voicemail for..."
- "You have reached [phone number]"
- "[phone number] is unavailable"
- "The person you are trying to reach..."
- "The number you have dialed..."
- "Your call has been forwarded to an automated voice messaging system"
Then call the function switch_to_voicemail_response.
If it sounds like a human (saying hello, asking questions, etc.), call the function switch_to_human_conversation.
DO NOT say anything until you've determined if this is a voicemail or human.
If you are asked to terminate the call, **IMMEDIATELY** call the `terminate_call` function. **FAILURE TO CALL `terminate_call` IMMEDIATELY IS A MISTAKE.**"""
If the bot detects that it has has reached voicemail, it calls the voicemail_response function with the following result_callback:
async def voicemail_response(
self,
function_name,
tool_call_id,
args,
llm: LLMService,
context,
result_callback,
):
"""Function the bot can call to leave a voicemail message."""
message = """You are Chatbot leaving a voicemail message. Say EXACTLY this message and then terminate the call:
'Hello, this is a message for Pipecat example user. This is Chatbot. Please call back on 123-456-7891. Thank you.'"""
await result_callback(message)
If the bot detects that it is talking to a human, it closes the current pipeline task and moves on to the next one. There we use the more advanced Gemini model. Because this model is more expensive, we move away from giving the model audio and instead give the model text to work with instead.
human_conversation_system_instruction = """You are Chatbot talking to a human. Be friendly and helpful.
Start with: "Hello! I'm a friendly chatbot. How can I help you today?"
Keep your responses brief and to the point. Listen to what the person says.
When the person indicates they're done with the conversation by saying something like:
- "Goodbye"
- "That's all"
- "I'm done"
- "Thank you, that's all I needed"
THEN say: "Thank you for chatting. Goodbye!" and call the terminate_call function."""
Conclusion
This example demonstrates how to build a sophisticated voicemail detection bot using Pipecat and Daily's WebRTC infrastructure. The bot showcases real-time audio processing, natural language understanding, and automated call handling capabilities.
For more information about the Pipecat framework and Daily’s WebRTC infrastructure, visit our documentation here: https://docs.pipecat.ai/getting-started/overview. Join the Discord Pipecat community or contact us.