FLM-Audio is a audio-language subversion of RoboEgo/FLM-Ego -- an omnimodal model with native full duplexity. It simultaneously listens, speaks, and composes internal monologue, delivering low‑latency, duplex conversational responses in both English and Chinese. FLM‑Audio is robust to noise and user interruptions, prioritizing responsiveness and naturalness.
- Language(s): Chinese; English;
Motivation & Survey: Toward Embodied AGI: A Review of Embodied AI and the Road Ahead
FLM-Audio Research Paper: FLM-Audio: Natural Monologues Improves Native Full-Duplex Chatbots via Dual Training
Omnimodal System Card: RoboEgo System Card: An Omnimodal Model with Native Full Duplexity
Despite extensive data cleaning, FLM‑Audio may still produce undesired content (e.g., biased or offensive language). Users should not disseminate unsafe outputs. Project authors are not responsible for misuse or harmful consequences.
# install dependencies
pip install -r requirements-server.txt
python -m flmaudio.server --port 8990# install dependencies
pip install -r requirements-clientgui.txt
python -m flmaudio.client_gradio --url http://localhost:8990# install dependencies
pip install -r requirements-clientcli.txt
python -m flmaudio.client --url http://localhost:8990FLM-Audio is licensed under the Apache License 2.0, except for python code under third_party/moshi, which is licensed under the MIT License.
This project is intended for research use only in compliance with applicable laws. For commercial use, please contact us.