In this guide, we’ll use Pipecat – an open-source framework for building conversational and multimodal AI agents – to set up a real-time AI voice agent, and interact with it using an Android app running the Pipecat Client library.
The server and client will interact using the RTVI protocol, sending audio (and video) data with WebRTC. This ensures low latency, and a stable session even on a mobile data connection. RTVI support is built into the Pipecat framework.
Running the bot backend
To get started quickly, we can use one of the ready-made example projects on GitHub. First, clone the Pipecat repository:
git clone https://github.com/pipecat-ai/pipecat.git
We’re going to be using the simple-chatbot
example server, located in the following directory:
cd pipecat/examples/simple-chatbot/server/
Inside that directory, set up a virtual Python environment, and install the necessary dependencies:
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txt
You’ll need API keys for Gemini (to run the LLM) and Daily (for the WebRTC transport).
- Gemini: get an API key using Google AI Studio
- Daily: create an account to get 10,000 free WebRTC minutes every month, then access your API key in the dashboard
Create a new file in the simple-chatbot/server
directory called .env
, with your API keys:
DAILY_API_KEY=7df...
GEMINI_API_KEY=AIza...
BOT_IMPLEMENTATION=gemini
Finally, run the server as follows:
python server.py
Make the backend accessible to your Android device
The server is now listening for RTVI connections on http://localhost:7860
. You can also navigate to that URL in a web browser to try out the bot.
This URL is only accessible on your local machine. To allow your Android device to connect to it, you have a few options.
Option 1: adb reverse
With your device connected over ADB, run the following command:
adb reverse tcp:7860 tcp:7860
Now, an app connecting to localhost:7860
on the Android device will have the connection forwarded to the server on your PC.
Option 2: ngrok
If you use the third party tool ngrok, the following command will make your server available publicly on the internet:
ngrok http --domain=<you>.ngrok.io 7860
(replacing <you>
with your ngrok subdomain)
Option 3: configure local firewall
Alternatively, configure the firewall on your PC to allow remote connections to port 7860
.
Make a note of your RTVI connection URL
Depending on which of the three options you chose above, make a note of the relevant RTVI connection URL:
http://localhost:7860
if usingadb reverse
https://<you>.ngrok.io
if using ngrok (replacing<you>
as needed)http://192.168.1.100:7860
if configuring your firewall (replacing the IP with your LAN IP)
Creating an Android client
Create a new project inside Android Studio, with the type “Empty Activity”. This will create a basic Android app which uses Jetpack Compose for the UI.
Once the initial Gradle sync is complete, add the Pipecat client (with Daily WebRTC transport) to your app/build.gradle.kts
file. We’ll also be using the Accompanist library to manage microphone permissions:
implementation("ai.pipecat:daily-transport:0.3.1")
implementation("com.google.accompanist:accompanist-permissions:0.35.1-alpha")
Re-sync the Gradle project using File > Sync Project with Gradle Files
.
We’ll need to declare the internet and audio permissions in our AndroidManifest.xml
. Put this inside the <manifest>
tag body.
<uses-permission android:name="android.permission.INTERNET" />
<uses-permission android:name="android.permission.RECORD_AUDIO" />
<uses-permission android:name="android.permission.MODIFY_AUDIO_SETTINGS" />
Also inside AndroidManifest.xml
, we should declare that we’ll be connecting to the RTVI backend over cleartext HTTP, if you didn’t choose the ngrok option above. Note that in production, you should ensure that connections are encrypted with HTTPS. For development, we can add the following property to the <application>
tag:
<application
android:usesCleartextTraffic="true"
...
>
Now, go to MainActivity.onCreate()
and replace the code inside the Scaffold
with the following:
val context = LocalContext.current
// Display a user-readable connection status message
var status by remember { mutableStateOf("Loading...") }
// Keeps track of whether we have the mic permission
val micPermissionState: PermissionState =
rememberPermissionState(Manifest.permission.RECORD_AUDIO)
val recordAudioLauncher = rememberLauncherForActivityResult(
contract = ActivityResultContracts.RequestMultiplePermissions(),
onResult = { permissionsMap -> }
)
// If we have permission to access the mic, begin the RTVI connection
if (micPermissionState.status.isGranted) {
LaunchedEffect(Unit) {
status = "Connecting..."
// Callbacks for RTVI events. Override other functions for events
// like transcriptions, audio level changes, and more.
val callbacks = object : RTVIEventCallbacks() {
override fun onBackendError(message: String) {
status = "Error: $message"
}
override fun onBotReady(
version: String,
config: List<ServiceConfig>
) {
status = "Bot ready"
}
override fun onDisconnected() {
status = "Disconnected"
}
}
// Create the RTVI client with the default options, and
// our connection URL
val client = RTVIClient(
transport = DailyTransport.Factory(context),
callbacks = callbacks,
options = RTVIClientOptions(
params = RTVIClientParams(
baseUrl = "<http://localhost:7860>"
)
)
)
// Connect to the bot
client.connect().await()
}
} else {
// If we don't have permission, request it from the user
LaunchedEffect(Unit) {
recordAudioLauncher.launch(arrayOf(Manifest.permission.RECORD_AUDIO))
status = "Waiting for permission..."
}
}
// Display the connection status
Text(
modifier = Modifier.padding(innerPadding).padding(20.dp),
text = status,
)
Replace the URL http://localhost:7860
above with your RTVI connection URL from the previous section, if you didn’t go with the adb reverse
approach.
The code above creates and starts an RTVIClient
, connecting to the bot backend, and displays the connection status on the screen. It also requests the RECORD_AUDIO
permission from the user, which allows us to access the microphone.
Run the app
Now run the app on your device. You should now be able to have a two-way voice conversation with the bot!
Next steps
Now that you have a basic chat bot, try customizing it with some of the following changes inside simple-chatbot/server/bot-gemini.py
:
- Change the voice of the assistant by altering the constructor arguments in
GeminiMultimodalLiveLLMService()
- Alter the system prompt by modifying the
messages
array - Take a look at our function calling guide, which will allow the LLM to interact with external services and APIs
For more information, please take a look at our Pipecat documentation: https://docs.pipecat.ai/getting-started/overview