1.Project Overview
Project Summary
tkvoice is an voice interaction framework project based on UBTech's Walker TienKung (EI/VV) robots, featuring speech recognition (ASR), large language models (LLM), and text-to-speech (TTS) capabilities. It enables end-to-end offline speech conversation abilities.
Walker TienKung (EI/VV) is equipped with iFLYTEK’s RK3588 AIUI Multimodal Development Kit. Its audio transmission protocol can be used as a reference.
Repository:https://github.com/UBTECH-Robot/tkvoice
opentts_public branch integrated with speech recognition (ASR), large language models (LLM), and text-to-speech (TTS) capabilities. It enables end-to-end offline speech conversation abilities. Currently, only Chinese is supported.
The content of this document is mainly based on the opentts_public branch, which is a fully offline speech solution. However, the overall workflow also generally applies to the allonline_public branch.
It can be understood this way: this project is a service framework following an ASR → LLM → TTS pipeline. The opentts_public branch is built on this framework and connects to locally deployed ASR, LLM, and TTS services on TienKung. In contrast, the allonline_public branch is also based on the same framework, but uses online Microsoft Speech Services for speech processing, while the LLM component can connect to any provider compatible with the OpenAI LLM API (since it internally uses the OpenAI Python SDK).
If you want to use the allonline_public branch, reference document
System Architecture
Hardware Deployment
├── x86 Server (192.168.41.1) → ASR Service (Funasr)
├── Orin Board (192.168.41.2) → LLM Service (Ollama) + TTS
└── RK3588s Device → Audio Acquisition Device
Data Flow Process
RK3588s Audio Stream
↓
[tk_audio_publisher] Acquire Complete Sentence Audio
↓ (audio_sentence_frames topic)
[tk_asr_text_publisher] Speech Recognition
↓ (asr_sentence topic)
[tk_audio_process] LLM Understanding + TTS Synthesis
↓
AudioPlayer Playback Output
Core Technology Stack
| Module | Technology | Deployment Location | Port |
|---|---|---|---|
| ASR | Funasr | x86 Server | 10097 |
| LLM | Ollama (qwen2.5:1.5b) | Orin Board | - |
| TTS | Piper-TTS | Orin Board | - |
| Framework | ROS 2 (Python) | Orin Board | - |
Project Structure Analysis
Main Directory Overview
tkvoice/
├── src/
│ ├── audio_message/ # Custom ROS Message Definition
│ │ └── msg/
│ │ └── AudioFrame.msg # Audio Frame Message Format
│ └── audio_service/ # Core Service Package
│ ├── setup.py # Python Package Configuration (v0.2.26)
│ ├── package.xml # ROS2 Package Metadata
│ └── audio_service/ # Main Code Directory
│ ├── tk_audio_publisher.py # Audio Publisher
│ ├── tk_asr_text_publisher.py # ASR Text Publisher
│ ├── tk_audio_process.py # Audio Processing + LLM + TTS
│ ├── funasr_client.py # Funasr API Wrapper
│ ├── llm_client.py # Ollama API Wrapper
│ ├── piper_provider.py # TTS Voice Synthesis
│ ├── socket_audio_provider.py # Audio Socket Receiver
│ ├── socket_connector.py # Network Connection Manager
│ ├── log_config.py # Logging Configuration
│ └── utils.py # Utility Functions
├── res/ # Resources and Installation Scripts
│ ├── docker_funasr/ # Funasr Docker Scripts
│ ├── ollama/ # Ollama Installation Scripts
│ └── piper_voices/ # TTS Voice Models
└── build.sh / install.sh # Build and Installation Scripts
Core Code Modules (8 Main Modules)
| Module | Functionality |
|---|---|
tk_audio_publisher | Acquires audio stream from RK3588s, publishes complete sentence audio |
tk_asr_text_publisher | Audio recognition, communicates with Funasr service via WebSocket |
tk_audio_process | Core processing node: LLM understanding + TTS synthesis + audio playback |
funasr_client | Funasr WebSocket client wrapper |
llm_client | Ollama API client wrapper |
piper_provider | Piper TTS Python library caller |
socket_audio_provider | Socket audio data receiver |
socket_connector | Network connection manager |
Open Source Licenses
| Component | License |
|---|---|
| Project Core | Apache-2.0 |
| TTS Module (piper-tts) | GPL-3.0 |
Dependencies and Deployment
System Dependencies
- ROS 2 - Middleware Framework
- Python 3 - Programming Language
- Docker - Container for x86 Server
Third-Party Services
- Funasr - Speech Recognition Model Service
- Ollama - Large Language Model Inference Framework
- Piper-TTS - Offline Speech Synthesis Library
Installation Steps
- x86 Server: Install Docker + Funasr image
- Orin Board: Install Ollama + Pull qwen2.5:1.5b model
- Orin Board: Deploy audio_service ROS package