3.2.System Architecture
Hardware Topology
The project uses a multi-machine collaborative architecture, with the Orin board as the core orchestrator:

IP Address Configuration
| Device | IP Address | Function | Port |
|---|---|---|---|
| RK3588s | 10.42.0.127 | Audio acquisition and playback | 9080 |
| Orin Board | 192.168.41.2 | ROS 2 System + Ollama LLM + Piper TTS | 11434 |
| x86 Server | 192.168.41.1 | Funasr ASR Service | 10097 |
Network Communication Protocol Details
1. RK3588s ↔ Orin Board (Socket TCP)
RK3588s Microphone
↓
Audio Stream (16000 Hz, 16-bit PCM)
↓
Socket TCP Connection (10.42.0.127:9080)
↓ (Send raw audio)
tk_audio_publisher (Determine complete sentence by VAD field and receive audio)
↓
ROS audio_sentence_frames Topic
↓
tk_asr_text_publisher (Subscription)
The TCP data packets provided on port 9080 of the RK3588s board will be explained in detail below.
2. Orin Board ↔ x86 Server (WebSocket)
tk_asr_text_publisher (Orin Board)
↓
WebSocket Client Connection
↓
192.168.41.1:10097 (x86 Server Funasr)
↓
Send complete audio sentence
↓
Funasr Model Recognition
↓
Return recognition result
↓
Obtain final recognition text
3. Orin Board Internal (Local Ollama + Piper TTS)
tk_audio_process (Receive recognition text)
↓
Ollama LLM Client (HTTP)
↓
localhost:11434/api/chat (Local Ollama Service)
↓
qwen2.5:1.5b Model Streaming Generation of Answer执行 Ollama 安装
↓
Wait for streaming return of answer text suitable for single sentence playback
↓
Piper TTS (Local Call)
↓
ONNX Model Inference: Text→Speech Waveform
↓
AudioPlayer Playback
↓
Continue playback until Ollama LLM Client generation completes
Key Points Summary
| Level | Processing Location | Technology | Description |
|---|---|---|---|
| Audio Acquisition | RK3588s | Microphone Array + Socket | Raw audio sent via Socket to Orin |
| Audio Reception & Publishing | Orin Board | tk_audio_publisher (ROS) | Receive raw audio, publish by complete sentences |
| Speech Recognition | Orin Sends, x86 Processes | WebSocket + Funasr Docker | Orin sends audio via WebSocket to x86 for recognition |
| Text Publishing | Orin Board | tk_asr_text_publisher (ROS) | Recognition results published to ROS topic |
| LLM Processing | Orin Board | Ollama HTTP API | Local Ollama call, generate answer (streaming) |
| TTS Synthesis | Orin Board | Piper ONNX + Pytorch | Local Piper call, generate speech waveform |
| Audio Playback | Orin Board | AudioPlayer (PyAudio) | Queue-based playback, played directly on Orin1. |