Skip to main content
Version: V2.0.5.1

1.Project Overview


Project Summary

tkvoice is an voice interaction framework project based on UBTech's Walker TienKung (EI/VV) robots, featuring speech recognition (ASR), large language models (LLM), and text-to-speech (TTS) capabilities. It enables end-to-end offline speech conversation abilities.

info

Walker TienKung (EI/VV) is equipped with iFLYTEK’s RK3588 AIUI Multimodal Development Kit. Its audio transmission protocol can be used as a reference.

Repository:https://github.com/UBTECH-Robot/tkvoice

opentts_public branch integrated with speech recognition (ASR), large language models (LLM), and text-to-speech (TTS) capabilities. It enables end-to-end offline speech conversation abilities. Currently, only Chinese is supported.

warning

The content of this document is mainly based on the opentts_public branch, which is a fully offline speech solution. However, the overall workflow also generally applies to the allonline_public branch.

It can be understood this way: this project is a service framework following an ASR → LLM → TTS pipeline. The opentts_public branch is built on this framework and connects to locally deployed ASR, LLM, and TTS services on TienKung. In contrast, the allonline_public branch is also based on the same framework, but uses online Microsoft Speech Services for speech processing, while the LLM component can connect to any provider compatible with the OpenAI LLM API (since it internally uses the OpenAI Python SDK).

If you want to use the allonline_public branch, reference document


System Architecture

Hardware Deployment

├── x86 Server (192.168.41.1)  → ASR Service (Funasr)
├── Orin Board (192.168.41.2) → LLM Service (Ollama) + TTS
└── RK3588s Device → Audio Acquisition Device

Data Flow Process

RK3588s Audio Stream

[tk_audio_publisher] Acquire Complete Sentence Audio
↓ (audio_sentence_frames topic)
[tk_asr_text_publisher] Speech Recognition
↓ (asr_sentence topic)
[tk_audio_process] LLM Understanding + TTS Synthesis

AudioPlayer Playback Output

Core Technology Stack

ModuleTechnologyDeployment LocationPort
ASRFunasrx86 Server10097
LLMOllama (qwen2.5:1.5b)Orin Board-
TTSPiper-TTSOrin Board-
FrameworkROS 2 (Python)Orin Board-

Project Structure Analysis

Main Directory Overview

tkvoice/
├── src/
│ ├── audio_message/ # Custom ROS Message Definition
│ │ └── msg/
│ │ └── AudioFrame.msg # Audio Frame Message Format
│ └── audio_service/ # Core Service Package
│ ├── setup.py # Python Package Configuration (v0.2.26)
│ ├── package.xml # ROS2 Package Metadata
│ └── audio_service/ # Main Code Directory
│ ├── tk_audio_publisher.py # Audio Publisher
│ ├── tk_asr_text_publisher.py # ASR Text Publisher
│ ├── tk_audio_process.py # Audio Processing + LLM + TTS
│ ├── funasr_client.py # Funasr API Wrapper
│ ├── llm_client.py # Ollama API Wrapper
│ ├── piper_provider.py # TTS Voice Synthesis
│ ├── socket_audio_provider.py # Audio Socket Receiver
│ ├── socket_connector.py # Network Connection Manager
│ ├── log_config.py # Logging Configuration
│ └── utils.py # Utility Functions
├── res/ # Resources and Installation Scripts
│ ├── docker_funasr/ # Funasr Docker Scripts
│ ├── ollama/ # Ollama Installation Scripts
│ └── piper_voices/ # TTS Voice Models
└── build.sh / install.sh # Build and Installation Scripts

Core Code Modules (8 Main Modules)

ModuleFunctionality
tk_audio_publisherAcquires audio stream from RK3588s, publishes complete sentence audio
tk_asr_text_publisherAudio recognition, communicates with Funasr service via WebSocket
tk_audio_processCore processing node: LLM understanding + TTS synthesis + audio playback
funasr_clientFunasr WebSocket client wrapper
llm_clientOllama API client wrapper
piper_providerPiper TTS Python library caller
socket_audio_providerSocket audio data receiver
socket_connectorNetwork connection manager

Open Source Licenses

ComponentLicense
Project CoreApache-2.0
TTS Module (piper-tts)GPL-3.0

Dependencies and Deployment

System Dependencies

  • ROS 2 - Middleware Framework
  • Python 3 - Programming Language
  • Docker - Container for x86 Server

Third-Party Services

  • Funasr - Speech Recognition Model Service
  • Ollama - Large Language Model Inference Framework
  • Piper-TTS - Offline Speech Synthesis Library

Installation Steps

  1. x86 Server: Install Docker + Funasr image
  2. Orin Board: Install Ollama + Pull qwen2.5:1.5b model
  3. Orin Board: Deploy audio_service ROS package