Version: V2.0.5.1

1.Project Overview

Project Summary

tkvoice is an voice interaction framework project based on UBTech's Walker TienKung (EI/VV) robots, featuring speech recognition (ASR), large language models (LLM), and text-to-speech (TTS) capabilities. It enables end-to-end offline speech conversation abilities.

info

Walker TienKung (EI/VV) is equipped with iFLYTEK’s RK3588 AIUI Multimodal Development Kit. Its audio transmission protocol can be used as a reference.

Repository：https://github.com/UBTECH-Robot/tkvoice

opentts_public branch integrated with speech recognition (ASR), large language models (LLM), and text-to-speech (TTS) capabilities. It enables end-to-end offline speech conversation abilities. Currently, only Chinese is supported.

warning

The content of this document is mainly based on the opentts_public branch, which is a fully offline speech solution. However, the overall workflow also generally applies to the allonline_public branch.

It can be understood this way: this project is a service framework following an ASR → LLM → TTS pipeline. The opentts_public branch is built on this framework and connects to locally deployed ASR, LLM, and TTS services on TienKung. In contrast, the allonline_public branch is also based on the same framework, but uses online Microsoft Speech Services for speech processing, while the LLM component can connect to any provider compatible with the OpenAI LLM API (since it internally uses the OpenAI Python SDK).

If you want to use the allonline_public branch, reference document

System Architecture

Hardware Deployment

├── x86 Server (192.168.41.1)  → ASR Service (Funasr)
├── Orin Board (192.168.41.2)  → LLM Service (Ollama) + TTS
└── RK3588s Device             → Audio Acquisition Device

Data Flow Process

RK3588s Audio Stream
    ↓
[tk_audio_publisher] Acquire Complete Sentence Audio
    ↓ (audio_sentence_frames topic)
[tk_asr_text_publisher] Speech Recognition
    ↓ (asr_sentence topic)
[tk_audio_process] LLM Understanding + TTS Synthesis
    ↓
AudioPlayer Playback Output

Core Technology Stack

Module	Technology	Deployment Location	Port
ASR	Funasr	x86 Server	10097
LLM	Ollama (qwen2.5:1.5b)	Orin Board	-
TTS	Piper-TTS	Orin Board	-
Framework	ROS 2 (Python)	Orin Board	-

Project Structure Analysis

Main Directory Overview

tkvoice/
├── src/
│   ├── audio_message/          # Custom ROS Message Definition
│   │   └── msg/
│   │       └── AudioFrame.msg  # Audio Frame Message Format
│   └── audio_service/          # Core Service Package
│       ├── setup.py            # Python Package Configuration (v0.2.26)
│       ├── package.xml         # ROS2 Package Metadata
│       └── audio_service/      # Main Code Directory
│           ├── tk_audio_publisher.py      # Audio Publisher
│           ├── tk_asr_text_publisher.py   # ASR Text Publisher
│           ├── tk_audio_process.py        # Audio Processing + LLM + TTS
│           ├── funasr_client.py           # Funasr API Wrapper
│           ├── llm_client.py           # Ollama API Wrapper
│           ├── piper_provider.py          # TTS Voice Synthesis
│           ├── socket_audio_provider.py   # Audio Socket Receiver
│           ├── socket_connector.py        # Network Connection Manager
│           ├── log_config.py              # Logging Configuration
│           └── utils.py                   # Utility Functions
├── res/                        # Resources and Installation Scripts
│   ├── docker_funasr/          # Funasr Docker Scripts
│   ├── ollama/                 # Ollama Installation Scripts
│   └── piper_voices/           # TTS Voice Models
└── build.sh / install.sh       # Build and Installation Scripts

Core Code Modules (8 Main Modules)

Module	Functionality
`tk_audio_publisher`	Acquires audio stream from RK3588s, publishes complete sentence audio
`tk_asr_text_publisher`	Audio recognition, communicates with Funasr service via WebSocket
`tk_audio_process`	Core processing node: LLM understanding + TTS synthesis + audio playback
`funasr_client`	Funasr WebSocket client wrapper
`llm_client`	Ollama API client wrapper
`piper_provider`	Piper TTS Python library caller
`socket_audio_provider`	Socket audio data receiver
`socket_connector`	Network connection manager

Open Source Licenses

Component	License
Project Core	Apache-2.0
TTS Module (piper-tts)	GPL-3.0

Dependencies and Deployment

System Dependencies

ROS 2 - Middleware Framework
Python 3 - Programming Language
Docker - Container for x86 Server

Third-Party Services

Funasr - Speech Recognition Model Service
Ollama - Large Language Model Inference Framework
Piper-TTS - Offline Speech Synthesis Library

Installation Steps

x86 Server: Install Docker + Funasr image
Orin Board: Install Ollama + Pull qwen2.5:1.5b model
Orin Board: Deploy audio_service ROS package

Project Summary​

System Architecture​

Hardware Deployment​

Data Flow Process​

Core Technology Stack​

Project Structure Analysis​

Main Directory Overview​

Core Code Modules (8 Main Modules)​

Open Source Licenses​

Dependencies and Deployment​

System Dependencies​

Third-Party Services​

Installation Steps​