Build a voice agent for Android with Gemini Multimodal Live

In this guide, we’ll use Pipecat – an open-source framework for building conversational and multimodal AI agents – to set up a real-time AI voice agent, and interact with it using an Android app running the Pipecat Client library.

The server and client will interact using the RTVI protocol, sending audio (and video) data with WebRTC. This ensures low latency, and a stable session even on a mobile data connection. RTVI support is built into the Pipecat framework.

Running the bot backend

To get started quickly, we can use one of the ready-made example projects on GitHub. First, clone the Pipecat repository:

git clone https://github.com/pipecat-ai/pipecat.git

We’re going to be using the simple-chatbot example server, located in the following directory:

cd pipecat/examples/simple-chatbot/server/

Inside that directory, set up a virtual Python environment, and install the necessary dependencies:

python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -r requirements.txt

You’ll need API keys for Gemini (to run the LLM) and Daily (for the WebRTC transport).

Gemini: get an API key using Google AI Studio
Daily: create an account to get 10,000 free WebRTC minutes every month, then access your API key in the dashboard

Create a new file in the simple-chatbot/server directory called .env, with your API keys:

DAILY_API_KEY=7df...
GEMINI_API_KEY=AIza...
BOT_IMPLEMENTATION=gemini

Finally, run the server as follows:

python server.py

Make the backend accessible to your Android device

The server is now listening for RTVI connections on http://localhost:7860. You can also navigate to that URL in a web browser to try out the bot.

This URL is only accessible on your local machine. To allow your Android device to connect to it, you have a few options.

Option 1: `adb reverse`

With your device connected over ADB, run the following command:

adb reverse tcp:7860 tcp:7860

Now, an app connecting to localhost:7860 on the Android device will have the connection forwarded to the server on your PC.

Option 2: ngrok

If you use the third party tool ngrok, the following command will make your server available publicly on the internet:

ngrok http --domain=<you>.ngrok.io 7860

(replacing <you> with your ngrok subdomain)

Option 3: configure local firewall

Alternatively, configure the firewall on your PC to allow remote connections to port 7860.

Make a note of your RTVI connection URL

Depending on which of the three options you chose above, make a note of the relevant RTVI connection URL:

http://localhost:7860 if using adb reverse
https://<you>.ngrok.io if using ngrok (replacing <you> as needed)
http://192.168.1.100:7860 if configuring your firewall (replacing the IP with your LAN IP)

Creating an Android client

Create a new project inside Android Studio, with the type “Empty Activity”. This will create a basic Android app which uses Jetpack Compose for the UI.

Once the initial Gradle sync is complete, add the Pipecat client (with Daily WebRTC transport) to your app/build.gradle.kts file. We’ll also be using the Accompanist library to manage microphone permissions:

implementation("ai.pipecat:daily-transport:0.3.1")
implementation("com.google.accompanist:accompanist-permissions:0.35.1-alpha")

Re-sync the Gradle project using File > Sync Project with Gradle Files.

We’ll need to declare the internet and audio permissions in our AndroidManifest.xml. Put this inside the <manifest> tag body.

<uses-permission android:name="android.permission.INTERNET" />
<uses-permission android:name="android.permission.RECORD_AUDIO" />
<uses-permission android:name="android.permission.MODIFY_AUDIO_SETTINGS" />

Also inside AndroidManifest.xml, we should declare that we’ll be connecting to the RTVI backend over cleartext HTTP, if you didn’t choose the ngrok option above. Note that in production, you should ensure that connections are encrypted with HTTPS. For development, we can add the following property to the <application> tag:

<application
        android:usesCleartextTraffic="true"
        ...
>

Now, go to MainActivity.onCreate() and replace the code inside the Scaffold with the following:

val context = LocalContext.current

// Display a user-readable connection status message
var status by remember { mutableStateOf("Loading...") }

// Keeps track of whether we have the mic permission
val micPermissionState: PermissionState =
    rememberPermissionState(Manifest.permission.RECORD_AUDIO)

val recordAudioLauncher = rememberLauncherForActivityResult(
    contract = ActivityResultContracts.RequestMultiplePermissions(),
    onResult = { permissionsMap -> }
)

// If we have permission to access the mic, begin the RTVI connection
if (micPermissionState.status.isGranted) {
    LaunchedEffect(Unit) {

        status = "Connecting..."

        // Callbacks for RTVI events. Override other functions for events
        // like transcriptions, audio level changes, and more.
        val callbacks = object : RTVIEventCallbacks() {
            override fun onBackendError(message: String) {
                status = "Error: $message"
            }

            override fun onBotReady(
                version: String,
                config: List<ServiceConfig>
            ) {
                status = "Bot ready"
            }

            override fun onDisconnected() {
                status = "Disconnected"
            }
        }

        // Create the RTVI client with the default options, and
        // our connection URL
        val client = RTVIClient(
            transport = DailyTransport.Factory(context),
            callbacks = callbacks,
            options = RTVIClientOptions(
                params = RTVIClientParams(
                    baseUrl = "<http://localhost:7860>"
                )
            )
        )

        // Connect to the bot
        client.connect().await()
    }
} else {
    // If we don't have permission, request it from the user
    LaunchedEffect(Unit) {
        recordAudioLauncher.launch(arrayOf(Manifest.permission.RECORD_AUDIO))
        status = "Waiting for permission..."
    }
}

// Display the connection status
Text(
    modifier = Modifier.padding(innerPadding).padding(20.dp),
    text = status,
)

Replace the URL http://localhost:7860 above with your RTVI connection URL from the previous section, if you didn’t go with the adb reverse approach.

The code above creates and starts an RTVIClient, connecting to the bot backend, and displays the connection status on the screen. It also requests the RECORD_AUDIO permission from the user, which allows us to access the microphone.

Run the app

Now run the app on your device. You should now be able to have a two-way voice conversation with the bot!

Next steps

Now that you have a basic chat bot, try customizing it with some of the following changes inside simple-chatbot/server/bot-gemini.py:

Change the voice of the assistant by altering the constructor arguments in GeminiMultimodalLiveLLMService()
Alter the system prompt by modifying the messages array
Take a look at our function calling guide, which will allow the LLM to interact with external services and APIs

For more information, please take a look at our Pipecat documentation: https://docs.pipecat.ai/getting-started/overview

Categories

Topics

Build a voice agent for Android with Gemini Multimodal Live

Running the bot backend

Make the backend accessible to your Android device

Option 1: `adb reverse`

Option 2: ngrok

Option 3: configure local firewall

Make a note of your RTVI connection URL

Creating an Android client

Run the app

Next steps

Never miss a story

Categories

Topics

Build a voice agent for Android with Gemini Multimodal Live

Running the bot backend

Make the backend accessible to your Android device

Option 1: adb reverse

Option 2: ngrok

Option 3: configure local firewall

Make a note of your RTVI connection URL

Creating an Android client

Run the app

Next steps

Never miss a story

Option 1: `adb reverse`