Version: V2.0.5.1

3.3.Core Concepts

1. ROS 2 Middleware

tvkoice is a ROS 2 project using the publish-subscribe pattern for data transmission.

ROS Analogy:

Imagine multiple independent programs that need to communicate through "Topics"
One program "publishes" data, other programs "subscribe" to data
Like newspapers: publishers distribute (publish), readers subscribe and read

ROS Topics in This Project:

Topic Name	Data Type	Publisher	Subscriber	Meaning
`audio_frames`	AudioFrame	tk_audio_publisher	Internal	Single audio frame
`audio_sentence_frames`	AudioFrame	tk_audio_publisher	tk_asr_text_publisher	Complete sentence audio
`asr_sentence`	String	tk_asr_text_publisher	tk_audio_process	Recognition text

2. Threading and Asynchronous Programming

TKVoice extensively uses multithreading to handle concurrent tasks:

import threading

# Create a background thread
thread = threading.Thread(target=some_function, daemon=True)
thread.start()

# Thread runs parallel to main thread

Why Multithreading is Needed:

In speech interaction, multiple tasks must occur simultaneously:

Receive audio data (I/O intensive)
Send data to ASR service (network I/O intensive)
Process LLM answer (compute intensive)
Play speech (I/O intensive)

Single-threaded approach would cause blocking: audio reception blocked during answer processing.

3. Queue Data Structure

from queue import Queue

q = Queue(maxsize=1)  # Store maximum 1 element

# Put data
q.put(data)

# Get data
data = q.get(timeout=2)  # Wait 2 seconds, timeout if no data

Queue Benefits:

Thread-safe data passing between threads
Implement producer-consumer pattern
Automatic thread synchronization

4. WebSocket Real-time Communication

WebSocket is a bidirectional real-time communication protocol:

Regular HTTP (Request-Response):    WebSocket (Persistent Connection):
Client → Server                     Client ↔ Server
  ↓                                  ↓
Get Response → Close Connection     Keep Connection, Send/Receive Anytime
  ↓                                  ↓
Repeat                              Streaming Processing

Application in TKVoice:

When communicating with Funasr, use WebSocket to send audio chunks
Funasr returns results while recognizing (streaming)

5. Audio Processing Basics

Sample Rate: 16000 Hz = 16000 samples per second Bit Depth: 16 bit = 2 bytes per sample Channels: 1 = Mono

Calculation: 1 second audio size = 16000 × 2 = 32000 bytes ≈ 31 KB

1. ROS 2 Middleware​

2. Threading and Asynchronous Programming​

3. Queue Data Structure​

4. WebSocket Real-time Communication​

5. Audio Processing Basics​

1. ROS 2 Middleware

2. Threading and Asynchronous Programming

3. Queue Data Structure

4. WebSocket Real-time Communication

5. Audio Processing Basics