3.3.Core Concepts
1. ROS 2 Middleware
tvkoice is a ROS 2 project using the publish-subscribe pattern for data transmission.
ROS Analogy:
- Imagine multiple independent programs that need to communicate through "Topics"
- One program "publishes" data, other programs "subscribe" to data
- Like newspapers: publishers distribute (publish), readers subscribe and read
ROS Topics in This Project:
| Topic Name | Data Type | Publisher | Subscriber | Meaning |
|---|---|---|---|---|
audio_frames | AudioFrame | tk_audio_publisher | Internal | Single audio frame |
audio_sentence_frames | AudioFrame | tk_audio_publisher | tk_asr_text_publisher | Complete sentence audio |
asr_sentence | String | tk_asr_text_publisher | tk_audio_process | Recognition text |
2. Threading and Asynchronous Programming
TKVoice extensively uses multithreading to handle concurrent tasks:
import threading
# Create a background thread
thread = threading.Thread(target=some_function, daemon=True)
thread.start()
# Thread runs parallel to main thread
Why Multithreading is Needed:
In speech interaction, multiple tasks must occur simultaneously:
- Receive audio data (I/O intensive)
- Send data to ASR service (network I/O intensive)
- Process LLM answer (compute intensive)
- Play speech (I/O intensive)
Single-threaded approach would cause blocking: audio reception blocked during answer processing.
3. Queue Data Structure
from queue import Queue
q = Queue(maxsize=1) # Store maximum 1 element
# Put data
q.put(data)
# Get data
data = q.get(timeout=2) # Wait 2 seconds, timeout if no data
Queue Benefits:
- Thread-safe data passing between threads
- Implement producer-consumer pattern
- Automatic thread synchronization
4. WebSocket Real-time Communication
WebSocket is a bidirectional real-time communication protocol:
Regular HTTP (Request-Response): WebSocket (Persistent Connection):
Client → Server Client ↔ Server
↓ ↓
Get Response → Close Connection Keep Connection, Send/Receive Anytime
↓ ↓
Repeat Streaming Processing
Application in TKVoice:
- When communicating with Funasr, use WebSocket to send audio chunks
- Funasr returns results while recognizing (streaming)
5. Audio Processing Basics
Sample Rate: 16000 Hz = 16000 samples per second Bit Depth: 16 bit = 2 bytes per sample Channels: 1 = Mono
Calculation: 1 second audio size = 16000 × 2 = 32000 bytes ≈ 31 KB