Skip to content

[AI] Implement RealtimeInputConfig and Manual Activity Signals for Live API#8080

Open
milaGGL wants to merge 5 commits intomainfrom
mila-realtime-input-config
Open

[AI] Implement RealtimeInputConfig and Manual Activity Signals for Live API#8080
milaGGL wants to merge 5 commits intomainfrom
mila-realtime-input-config

Conversation

@milaGGL
Copy link
Copy Markdown
Contributor

@milaGGL milaGGL commented Apr 30, 2026

Add support for RealtimeInputConfig to configure voice activity detection and manual activity signals in the Live API.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 30, 2026

📝 PRs merging into main branch

Our main branch should always be in a releasable state. If you are working on a larger change, or if you don't want this change to see the light of the day just yet, consider using a feature branch first, and only merge into the main branch when the code complete and ready to be released.

@milaGGL
Copy link
Copy Markdown
Contributor Author

milaGGL commented Apr 30, 2026

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces comprehensive support for realtime input configuration in the Live API, specifically adding LiveRealtimeInputConfig and LiveActivityDetection. These additions allow for fine-grained control over how the model detects user activity, handles interruptions, and defines turn coverage. The PR includes new public methods sendStartActivityRealtime and sendStopActivityRealtime for manual activity signaling, along with corresponding updates to the Java-friendly LiveSessionFutures and internal serialization logic. Comprehensive unit and instrumentation tests have been added to verify the new configuration options. Review feedback focuses on minor documentation refinements to ensure consistent KDoc linking syntax and improved readability in the LiveSession class.

Comment thread ai-logic/firebase-ai/src/main/kotlin/com/google/firebase/ai/type/LiveSession.kt Outdated
Comment thread ai-logic/firebase-ai/src/main/kotlin/com/google/firebase/ai/type/LiveSession.kt Outdated
@milaGGL milaGGL changed the title [AI] support realtime input config [AI] Implement RealtimeInputConfig and Manual Activity Signals for Live API Apr 30, 2026
*
* Only required when automatic activity detection is disabled via [LiveRealtimeInputConfig].
*/
public suspend fun sendStartActivityRealtime() {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

func name is not decided yet. same for sendStopActivityRealtime.

* starting and ending of the activity.
*/
@PublicPreviewAPI
public class LiveRealtimeInputConfig
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

name is not decided yet. could be changed to RealtimeInputConfig.

gotTurnComplete = true
}
// Stop collecting when there's a new handle AND turnComplete is true
!(gotTurnComplete && lastResumptionUpdate?.newHandle != null)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixes takeWhile early return error when it received lastResumptionUpdate with null newHandle

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the takeWhile bug starts breaking CI we can spin a different PR just with this fix.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have removed the test fix from this PR.

)

@Serializable
internal data class Internal(
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

toInternal() and internal are a bit long. Open for suggesstion.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can make Sensitivity have it's own Internal class and dedup here.

* Only required when automatic activity detection is disabled via [LiveRealtimeInputConfig].
*/
public suspend fun sendStartActivityRealtime() {
sendFrame(BidiGenerateContentRealtimeInputSetup(activityStart = true).toInternal())
Copy link
Copy Markdown
Contributor Author

@milaGGL milaGGL Apr 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

activity_start and activity_end are expected to be an object with no field: https://docs.cloud.google.com/vertex-ai/generative-ai/docs/reference/rpc/google.cloud.aiplatform.v1beta1#activitystart

these two fields set to booleans in BidiGenerateContentRealtimeInputSetup to be serialized to {} later.

open for code optimization here.

I do have considered using a sealed interface instead ,like this:

internal sealed interface BidiRealtimeInput {
  data class Text(val text: String) : BidiRealtimeInput
  data class Audio(val data: InlineData) : BidiRealtimeInput
  data class Video(val data: InlineData) : BidiRealtimeInput
  data class MediaStream(val chunks: List<InlineData>) : BidiRealtimeInput
  object ActivityStart : BidiRealtimeInput
  object ActivityEnd : BidiRealtimeInput
}

and seriaization works like this

internal class BidiGenerateContentRealtimeInputSetup(val input: BidiRealtimeInput) {
  @Serializable internal class ActivityStart
  @Serializable internal class ActivityEnd
  @Serializable
  internal class Internal(val realtimeInput: BidiGenerateContentRealtimeInput) {
    @Serializable
    internal data class BidiGenerateContentRealtimeInput(
      val mediaChunks: List<InlineData.Internal>? = null, ... 
    )
  }
  fun toInternal() = Internal(
    when (input) {
      is BidiRealtimeInput.Text -> Internal.BidiGenerateContentRealtimeInput(text = input.text)
       ...
      is BidiRealtimeInput.ActivityStart -> Internal.BidiGenerateContentRealtimeInput(activityStart = ActivityStart())
      ...
    }
  )
}

By doing so, we are no longer using boolean for activityStart, activityEnd. but we are also fixed to mutually exclusive input action (Only one input action,Text OR Audio OR Start signal can be sent once).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If encoding works, and given that this is internal, feel free to go that way!

@milaGGL milaGGL marked this pull request as ready for review May 1, 2026 14:39
) {

/** How sensitive the model interprets speech activity. */
public enum class Sensitivity {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

public API shouldn't be using enums

)

@Serializable
internal data class Internal(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can make Sensitivity have it's own Internal class and dedup here.

) {
@Serializable
internal enum class StartSensitivity {
@SerialName("START_SENSITIVITY_UNSPECIFIED") UNSPECIFIED,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not 100% necessary, but if the LiveActivityDetection class is only ever created by devs, there's no need for an "unspecified" case.

) {

/** How a model handles user input activity. */
public enum class ActivityHandling {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

public API should have no enums

}

/** How the model considers which input is included in the user's turn. */
public enum class TurnCoverage {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

* Only required when automatic activity detection is disabled via [LiveRealtimeInputConfig].
*/
public suspend fun sendStartActivityRealtime() {
sendFrame(BidiGenerateContentRealtimeInputSetup(activityStart = true).toInternal())
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If encoding works, and given that this is internal, feel free to go that way!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants