Skip to main content

Version: V2.0.5.1

3.2.System Architecture

Hardware Topology

The project uses a multi-machine collaborative architecture, with the Orin board as the core orchestrator:

IP Address Configuration

Device	IP Address	Function	Port
RK3588s	10.42.0.127	Audio acquisition and playback	9080
Orin Board	192.168.41.2	ROS 2 System + Ollama LLM + Piper TTS	11434
x86 Server	192.168.41.1	Funasr ASR Service	10097

Network Communication Protocol Details

1. RK3588s ↔ Orin Board (Socket TCP)

RK3588s Microphone
    ↓
Audio Stream (16000 Hz, 16-bit PCM)
    ↓
Socket TCP Connection (10.42.0.127:9080)
    ↓ (Send raw audio)
tk_audio_publisher (Determine complete sentence by VAD field and receive audio)
    ↓
ROS audio_sentence_frames Topic
    ↓
tk_asr_text_publisher (Subscription)

The TCP data packets provided on port 9080 of the RK3588s board will be explained in detail below.

2. Orin Board ↔ x86 Server (WebSocket)

tk_asr_text_publisher (Orin Board)
    ↓
WebSocket Client Connection
    ↓
192.168.41.1:10097 (x86 Server Funasr)
    ↓
Send complete audio sentence
    ↓
Funasr Model Recognition
    ↓
Return recognition result
    ↓
Obtain final recognition text

3. Orin Board Internal (Local Ollama + Piper TTS)

tk_audio_process (Receive recognition text)
    ↓
Ollama LLM Client (HTTP)
    ↓
localhost:11434/api/chat (Local Ollama Service)
    ↓
qwen2.5:1.5b Model Streaming Generation of Answer执行 Ollama 安装
    ↓
Wait for streaming return of answer text suitable for single sentence playback
    ↓
Piper TTS (Local Call)
    ↓
ONNX Model Inference: Text→Speech Waveform
    ↓
AudioPlayer Playback
    ↓
Continue playback until Ollama LLM Client generation completes

Key Points Summary

Level	Processing Location	Technology	Description
Audio Acquisition	RK3588s	Microphone Array + Socket	Raw audio sent via Socket to Orin
Audio Reception & Publishing	Orin Board	tk_audio_publisher (ROS)	Receive raw audio, publish by complete sentences
Speech Recognition	Orin Sends, x86 Processes	WebSocket + Funasr Docker	Orin sends audio via WebSocket to x86 for recognition
Text Publishing	Orin Board	tk_asr_text_publisher (ROS)	Recognition results published to ROS topic
LLM Processing	Orin Board	Ollama HTTP API	Local Ollama call, generate answer (streaming)
TTS Synthesis	Orin Board	Piper ONNX + Pytorch	Local Piper call, generate speech waveform
Audio Playback	Orin Board	AudioPlayer (PyAudio)	Queue-based playback, played directly on Orin1.

Hardware Topology
IP Address Configuration
Network Communication Protocol Details
Key Points Summary