Daily Bots: Build Real-Time Voice, Vision, and Video AI Agents

Today we’re sharing Daily Bots, a hosted AI bots platform.

Developers can ship voice-to-voice AI with any LLM; build with Open Source SDKs; and run on Daily’s real-time global infrastructure:

  • Create AI agents that talk naturally.
  • Design voice-to-voice AI flexibly, with leading commercial and open models. We’ve partnered with Anthropic, Cartesia, Deepgram, and Together AI. You can also use any LLM that supports OpenAI-compatible APIs.
  • Build ultra low latency experiences for desktop, mobile, and telephone.
  • Use the leading Open Source tooling for voice-to-voice and multimodal video AI. Daily Bots implements the RTVI standard for real-time inference, and is built on the Pipecat server-side framework.
  • Launch quickly and scale on Daily’s global WebRTC infrastructure.

In this post, we’ll talk about why we built Daily Bots and what it does; how we're excited to work with our partners; and also some of the fun demos you can play with.

If you’d like to jump straight in, here are docs and demos — our playground demo with configurable LLMs; function calling demo and vision with Anthropic; and iOS and Android. Sign up here (with a $10 credit during launch week).

Why Daily Bots

At Daily, we’ve been building real-time audio and video infrastructure since 2016. Our customers have been developers building conversational experiences – it started with people talking to each other.

Now, with generative AI, the definition of conversational experiences has expanded. Today’s Large Language Models are very good at open-ended conversations. They can follow scripts and perform multi-step tasks. They can call out to external systems and APIs.

Voice-driven LLM workflows are starting to have a big impact in healthcare and education. LLMs are improving the customer support experience and enterprise workflows. Virtual characters will transform video games and entertainment. And this is just the start of the impact of AI.

Building experiences in which humans can have useful, natural, real-time conversations with AI models involves:

  • Choosing and writing code for the right generative AI models for your specific use case.
  • Orchestrating the human -> AI -> human conversation loop, incorporating prompting, state management, data flow between models, and calling out to external systems.
  • Standing up both audio/video infrastructure and AI/orchestration infrastructure – service discovery, routing, autoscaling, fault tolerance, observability.
  • Having good client SDKs for all the platforms you need to support.

Over the past year and a half, as we’ve been helping our customers stand up new AI-powered real-time features, we’ve put together a complete set of tools that check all the boxes above.

We’ve rolled these tools and best practices into two big Open Source projects: Pipecat for server-side AI orchestration and the RTVI open standard for real-time inference clients. These are truly vendor neutral efforts, with a growing community and contributors from a wide range of stakeholders.

Now we’re filling another gap in the voice-to-voice and real-time ecosystem with Daily Bots.

  • Daily Bots lets you run your RTVI/Pipecat AI agents end-to-end on Daily’s infrastructure.
  • Start a real-time AI session with a single call to /api/v1/bots/start. Launch fast. Scale without limits. If your needs evolve beyond Daily Bots, you can take your code to another platform or stand up your own infrastructure.

AI that talks naturally

Human conversation is complicated!

We interrupt each other. We know when someone finishes speaking and expects us to talk. We change topics and go off on tangents.

And, most of all, we almost always respond quickly. Long pauses make conversations feel so unnatural that most people will just opt out. It’s critical to have voice-to-voice response times faster than 1 second. (Faster than 800ms is better!)

Daily Bots implements best practices for all of the hard, low-level challenges that voice AI product teams face. With a few lines of code, developers can leverage:

  • A modular architecture that enables easy switching between different LLMs and voice models. Use state-of-the-art LLMs with large parameter counts where needed. Or use models optimized for conversational response times.
  • Multi-turn context management, with tool calling and vision input.
  • Voice-to-voice response times as low as 500ms.
  • Interruption handling with word-level context accuracy.
  • Phrase endpointing that combines voice activity detection, semantic cues, and noise-level averaging.
  • Echo cancellation and background noise reduction.
  • Metrics and observability down to the level of individual media streams from every session.

Flexibility to use the best models, and the best models for your use case

Daily Bots developers can use both commercial and open models. You can use our integrated LLMs, or "Bring Your Own (API) Key" (BYOK) for your preferred service.

We’ve directly integrated with Anthropic, Cartesia, Deepgram, and Together AI. 

  • Anthropic’s Claude 3.5 Sonnet is an excellent multi-turn conversational model. Daily Bots includes support for Sonnet’s vision input, tool calling, and the brand new context caching feature.
  • Cartesia’s Sonic voice model has raised the bar for voice quality at extremely low latencies. Cartesia offers a wide range of excellent voices, plus the ability to create your own voices.
  • Deepgram is a long-time Daily partner and the long-time leader in real-time speech to text accuracy and multi-language support.
  • Together AI delivers fast, high quality inference for all three sizes of Meta’s Llama 3.1 LLMs: 8B, 70B, and 405B.  

With all of these partners, we do consolidated billing. You get just one bill from Daily, with line items showing your usage of each model. Also, it’s likely that you will benefit from higher rate limits and lower pricing when you use our partners’ services through Daily Bots. See Daily Bots pricing here.

Of course, you can always BYOK for both our partners and other services.

We can support any LLM provider that offers OpenAI-compatible APIs. We work regularly with OpenAI, Groq, and Fireworks, for example.

If you need custom models, our partners offer fine-tuned models and inference services for enterprise customers.

Daily Bots infrastructure can also be deployed inside your Virtual Private Cloud. If you manage your own inference, co-locating orchestration compute with inference has latency, cost, and compliance benefits.

Build now, and for the future

Our goal with Daily Bots is to accelerate the development of real-time, multimodal AI.

With a few lines of code, configure bots that scale on demand, on Daily’s infrastructure, automatically keeping pace as your application’s usage increases.

Write clients for iOS, Android, and the Web using the RTVI Open Source SDKs and Daily Bots helper libraries.

Buy phone numbers from Daily and make your bots accessible via dial-in.

All of this runs on Daily’s Global Mesh Network. Our distributed points of presence deliver 13ms first-hop latency to 5 billion people on six continents. (A little more, on average, if you happen to be in Antarctica.)

It’s also worth noting that Daily Bots is only one of your options, if you’re building real-time AI agents on the Open Source toolkits we use at Daily.

Definitely go check out Vapi and Tavus, for example. They’ve developed specialized technology, and best practices, to support different applications of multimodal inference. Vapi has great voice APIs, with user-friendly dashboards and excellent telephony support. Tavus’s Conversational Video Interface powers AI apps that can speak, hear, and see naturally. We’re proud these innovative platforms also leverage Daily’s WebRTC infrastructure.

If you’re interested in real-time AI, you can leverage Tavus or Vapi; build on the Daily Bots Open Source cloud; or strike out on your own and stand up your own Pipecat-based infrastructure!

Demos, demos, demos & starting out

We’ve had a ton of fun building out Daily Bots.


AI is moving fast! Check out Vapi and Tavus. Join the Daily community on Discord. Let us know if you find Daily Bots, RTVI, and Pipecat useful. We’re excited to build the future with you.

Never miss a story

Get the latest direct to your inbox.