Skip to main content
Version: V2.0.5.1

3.3.Core Concepts


1. ROS 2 Middleware

tvkoice is a ROS 2 project using the publish-subscribe pattern for data transmission.

ROS Analogy:

  • Imagine multiple independent programs that need to communicate through "Topics"
  • One program "publishes" data, other programs "subscribe" to data
  • Like newspapers: publishers distribute (publish), readers subscribe and read

ROS Topics in This Project:

Topic NameData TypePublisherSubscriberMeaning
audio_framesAudioFrametk_audio_publisherInternalSingle audio frame
audio_sentence_framesAudioFrametk_audio_publishertk_asr_text_publisherComplete sentence audio
asr_sentenceStringtk_asr_text_publishertk_audio_processRecognition text

2. Threading and Asynchronous Programming

TKVoice extensively uses multithreading to handle concurrent tasks:

import threading

# Create a background thread
thread = threading.Thread(target=some_function, daemon=True)
thread.start()

# Thread runs parallel to main thread

Why Multithreading is Needed:

In speech interaction, multiple tasks must occur simultaneously:

  • Receive audio data (I/O intensive)
  • Send data to ASR service (network I/O intensive)
  • Process LLM answer (compute intensive)
  • Play speech (I/O intensive)

Single-threaded approach would cause blocking: audio reception blocked during answer processing.

3. Queue Data Structure

from queue import Queue

q = Queue(maxsize=1) # Store maximum 1 element

# Put data
q.put(data)

# Get data
data = q.get(timeout=2) # Wait 2 seconds, timeout if no data

Queue Benefits:

  • Thread-safe data passing between threads
  • Implement producer-consumer pattern
  • Automatic thread synchronization

4. WebSocket Real-time Communication

WebSocket is a bidirectional real-time communication protocol:

Regular HTTP (Request-Response):    WebSocket (Persistent Connection):
Client → Server Client ↔ Server
↓ ↓
Get Response → Close Connection Keep Connection, Send/Receive Anytime
↓ ↓
Repeat Streaming Processing

Application in TKVoice:

  • When communicating with Funasr, use WebSocket to send audio chunks
  • Funasr returns results while recognizing (streaming)

5. Audio Processing Basics

Sample Rate: 16000 Hz = 16000 samples per second Bit Depth: 16 bit = 2 bytes per sample Channels: 1 = Mono

Calculation: 1 second audio size = 16000 × 2 = 32000 bytes ≈ 31 KB