Skip to main content
Version: V2.0.5.1

3.2.System Architecture


Hardware Topology

The project uses a multi-machine collaborative architecture, with the Orin board as the core orchestrator:

IP Address Configuration

DeviceIP AddressFunctionPort
RK3588s10.42.0.127Audio acquisition and playback9080
Orin Board192.168.41.2ROS 2 System + Ollama LLM + Piper TTS11434
x86 Server192.168.41.1Funasr ASR Service10097

Network Communication Protocol Details

1. RK3588s ↔ Orin Board (Socket TCP)

RK3588s Microphone

Audio Stream (16000 Hz, 16-bit PCM)

Socket TCP Connection (10.42.0.127:9080)
↓ (Send raw audio)
tk_audio_publisher (Determine complete sentence by VAD field and receive audio)

ROS audio_sentence_frames Topic

tk_asr_text_publisher (Subscription)

The TCP data packets provided on port 9080 of the RK3588s board will be explained in detail below.

2. Orin Board ↔ x86 Server (WebSocket)

tk_asr_text_publisher (Orin Board)

WebSocket Client Connection

192.168.41.1:10097 (x86 Server Funasr)

Send complete audio sentence

Funasr Model Recognition

Return recognition result

Obtain final recognition text

3. Orin Board Internal (Local Ollama + Piper TTS)

tk_audio_process (Receive recognition text)

Ollama LLM Client (HTTP)

localhost:11434/api/chat (Local Ollama Service)

qwen2.5:1.5b Model Streaming Generation of Answer执行 Ollama 安装

Wait for streaming return of answer text suitable for single sentence playback

Piper TTS (Local Call)

ONNX Model Inference: Text→Speech Waveform

AudioPlayer Playback

Continue playback until Ollama LLM Client generation completes

Key Points Summary

LevelProcessing LocationTechnologyDescription
Audio AcquisitionRK3588sMicrophone Array + SocketRaw audio sent via Socket to Orin
Audio Reception & PublishingOrin Boardtk_audio_publisher (ROS)Receive raw audio, publish by complete sentences
Speech RecognitionOrin Sends, x86 ProcessesWebSocket + Funasr DockerOrin sends audio via WebSocket to x86 for recognition
Text PublishingOrin Boardtk_asr_text_publisher (ROS)Recognition results published to ROS topic
LLM ProcessingOrin BoardOllama HTTP APILocal Ollama call, generate answer (streaming)
TTS SynthesisOrin BoardPiper ONNX + PytorchLocal Piper call, generate speech waveform
Audio PlaybackOrin BoardAudioPlayer (PyAudio)Queue-based playback, played directly on Orin1.