Skip to main content
Version: V2.0.5.1

RL-based Gait Training and Deployment Guide - V1.2.1


Description

Reference project:

  • https://github.com/UBTECH-Robot/TienKung-Lab

    This codebase is a reinforcement learning-based locomotion control system designed specifically for the full-size humanoid robot TienKung. It combines AMP-style rewards with periodic gait rewards, promoting natural, stable, and efficient walking and running behavior.

    Built on IsaacLab, the codebase supports Sim2Sim migration to MuJoCo and features a modular architecture for seamless customization and expansion. Furthermore, it integrates light-projection-based sensors to enhance perception, enabling precise interaction with the environment and obstacle avoidance. The framework has been successfully validated on the real TienKung robot.

  • https://github.com/UBTECH-Robot/Deploy_Tienkung

    This codebase contains two directories: rl_control_new and x_humanoid_rl_sdk.

    rl_control_new is a ROS2-based reinforcement learning control library for humanoid robots, used to control the Tienkung series of humanoid robots. This library uses reinforcement learning algorithms to implement robot locomotion control and supports both simulation and real-world robot environments.

    The other directory, x_humanoid_rl_sdk, is the Tienkung humanoid robot reinforcement learning control SDK, which includes state machine implementations, robot interfaces, and control algorithms.

This guide, based on the reference project, resolves some compilation and configuration issues.

  • Compatible Models:Walker Tienkung (Lite). Walker Tienkung · Voice & Vision (Plus) and Walker Tienkung · Embodied Intelligence (Pro) will be added in Q1 2026.
  • Operating System: Ubuntu 22.04 (x86)
  • Recommended GPU: Nvidia RTX 30 series and above, 16GB+ VRAM
  • Minimum System Configuration: 512GB disk, 16GB RAM
  • Note: If git or other resource downloads are slow, it is recommended to use a proxy

Software Package Structure

Download the two packages TienKung-Lab.zip and ros_lite_src.zip and place them in the ~/ directory.


~/ # Software packages
├── TienKung-Lab.zip # Gait training package
├── ros_lite_src.zip # Gait deployment package

Part I: Basic Software

1. CUDA

Install CUDA.

It is recommended to first correctly install the latest Nvidia driver, then use nvidia-smi to check the CUDA version compatible with the driver, and select the corresponding CUDA Toolkit download from https://developer.nvidia.com/cuda-toolkit-archive. This example uses CUDA version 12.8.

cd ~/
wget https://developer.download.nvidia.com/compute/cuda/12.8.0/local_installers/cuda_12.8.0_570.86.10_linux.run
sudo sh cuda_12.8.0_570.86.10_linux.run
# Set environment variables in ~/.bashrc
echo 'export PATH=/usr/local/cuda/bin:$PATH' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc
# Apply changes immediately
source ~/.bashrc

2. Conda

Install Miniconda, keep selecting yes to complete the installation.

cd ~/
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash ./Miniconda3-latest-Linux-x86_64.sh

Note: Once installed, check if conda environment variables are set in the .bashrc file.

cat ~/.bashrc

3. IsaacSim

Install IsaacSim 4.5. After downloading and extracting the software package, create the env_isaacsim environment, activate it, and then install.

Reference: https://docs.isaacsim.omniverse.nvidia.com/4.5.0/installation/install_workstation.html

mkdir ~/isaacsimdir
cd ~/isaacsimdir
wget https://download.isaacsim.omniverse.nvidia.com/isaac-sim-standalone-4.5.0-linux-x86_64.zip
unzip "isaac-sim-standalone-4.5.0-linux-x86_64.zip" -d ~/isaacsimdir
conda create -n env_isaacsim python=3.10
conda activate env_isaacsim
./python.sh -m pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
./post_install.sh
./isaac-sim.selector.sh
conda deactivate

4. IsaacLab

Install IsaacLab 2.1.0. After downloading and extracting the software package, create the env_isaaclab environment, activate it, and then install.

Reference: https://isaac-sim.github.io/IsaacLab/release/2.1.0/source/setup/installation/binaries_installation.html#installing-isaac-lab

You can start referring from the "Installing Isaac Lab" section.

sudo apt install git
mkdir ~/isaaclabdir
cd ~/isaaclabdir
wget https://github.com/isaac-sim/IsaacLab/archive/refs/heads/release/2.1.0.zip
unzip "2.1.0.zip" -d ~/isaaclabdir
cd IsaacLab-release-2.1.0
ln -s ~/isaacsimdir ~/isaaclabdir/IsaacLab-release-2.1.0/_isaac_sim
./isaaclab.sh --conda env_isaaclab
conda activate env_isaaclab
sudo apt install cmake build-essential
./isaaclab.sh --install
conda deactivate
warning
  • If you encounter a permission error when executing ./isaaclab.sh --install, it's likely due to insufficient permissions in the linked directory corresponding to the _isaac_sim directory. Try modifying the permissions.

  • If you encounter a pip package conflict error, ignore it for now. After --install completes, try the subsequent steps to see if training can start normally.

warning

Special Note for RTX 50 Series GPUs: please use the latest PyTorch nightly version instead of the PyTorch 2.5.1 bundled with Isaac Sim. Note that cu128 in the URL must match your CUDA version.

~/isaaclabdir/IsaacLab-release-2.1.0/isaaclab.sh -p -m pip install --upgrade --pre torch --index-url https://download.pytorch.org/whl/nightly/cu128

Part II: Gait Training & Deployment

Using TienKung-Lab open-source framework for gait training

1. Extract Code

cd ~/
unzip TienKung-Lab.zip

2. Environment Installation

warning
  1. Install the PyTorch version compatible with your CUDA. For example, you can go to https://mirrors.nju.edu.cn/pytorch/whl/ to check the corresponding version, then update the corresponding pip command. Note that [cu128] in the URL must match your CUDA version.

  2. All subsequent commands require activating the env_isaaclab environment with conda activate env_isaaclab to run properly.

conda activate env_isaaclab
cd ~/TienKung-Lab
pip install -e .
pip install -U --index-url https://mirrors.nju.edu.cn/pytorch/whl/cu128/ torchvision==0.24.0 torch==2.9.0
pip uninstall -y rsl-rl-lib || true
cd ~/TienKung-Lab/rsl_rl
pip install -e .
pip install onnxscript

3. Training

Train the policy using AMP expert data from tienkung/datasets/motion_amp_expert.

info

The concept of AMP (Adversarial Motion Priors) was first proposed by the academic community:

Training a high-dimensional simulated agent with an under-specified reward function often leads the agent to learn physically infeasible strategies that are ineffective when deployed in the real world. To mitigate these unnatural behaviors, reinforcement learning practitioners often utilize complex reward functions that encourage physically plausible behaviors. However, a tedious labor-intensive tuning process is often required to create hand-designed rewards which might not easily generalize across platforms and tasks. We propose substituting complex reward functions with "style rewards" learned from a dataset of motion capture demonstrations. A learned style reward can be combined with an arbitrary task reward to train policies that perform tasks using naturalistic strategies. These natural strategies can also facilitate transfer to the real world.

info

So what exactly is the AMP expert data here? It's actually a "reference motion trajectory that closely resembles a human movement."

Ordinary reinforcement learning can only set rewards like the following:

  • Don't fall +1
  • Speed ​​close to the target +1
  • Low energy +1

The results usually look like this:

  • Strange, shaky, stiff gait
  • Like a "mechanical monster"

But numerically perfectly valid.

The reason is that the reward function cannot fully describe "natural human movement." However, by providing the AMP expert data to the algorithm, it can determine during training: "How similar is this movement to the given reference motion data?", and then assign a reward, completing the training.

Parameters:

  • --task: Task type (options: walk, run)
  • --headless: Train without GUI display
  • --num_envs: Number of robot instances in the scene
  • --max_iterations The default value is 50000

The current software package supports the Lite model; Plus and Pro models will be added in Q1 2026.

Start training

cd ~/TienKung-Lab

python legged_lab/scripts/train.py --task=walk --headless --logger=tensorboard --num_envs=2048
warning
  • If using the default training iterations of 50,000, it will take approximately 18 hours on an RTX 4090 graphics card.
  • Currently, the software package supports Tiangong Walker (Lite), Tiangong Wujie (Plus), and Tiangong Wujiang (Pro), which will be added in Q1 of 2026.

Model location: ~/TienKung-Lab/logs/walk/xxxx/model_xxx.pt, where the .pt file is the model, and xxxx is a timestamp-named folder created when training starts. For example:

  • ~/TienKung-Lab/logs/walk/202x-01-12_05-07-14/model_100.pt
  • ~/TienKung-Lab/logs/walk/202x-01-12_05-07-14/model_49900.pt

Check status

Open a new terminal to check the training status:

conda activate env_isaaclab
cd ~/TienKung-Lab
tensorboard --logdir=logs/walk

4. Resume Training

cd ~/TienKung-Lab
mkdir ~/TienKung-Lab/logs/walk/pretrained
# Copy the PT file to folder【pretrained】
python legged_lab/scripts/train.py --task=walk --headless --logger=tensorboard --num_envs=2048 --resume=True --load_run=pretrained --checkpoint=model_8100.pt

5、Run in the simulation environment (Isaac Lab)

Load and run the trained model in the Isaac Lab simulation environment.

  • --task: Task type, either walk or run
  • --num_envs: Number of robots in the scene
  • --load_run: Directory containing the .pt file, defaults to the directory of the latest training file
  • --checkpoint: Filename of the .pt file, defaults to the file generated by the latest training file
cd ~/TienKung-Lab
mkdir ~/TienKung-Lab/logs/walk/exe
# copy .pt to exe dir,for example:~/TienKung-Lab/logs/walk/exe/model_49900.pt
python legged_lab/scripts/play.py --task=walk --num_envs=1 --load_run=exe --checkpoint=model_49900.pt

The play.py script will also export the model used in the current run in both .pt and .onnx formats, and save them to the exported/ subdirectory under the same directory, for example:

  • ~/TienKung-Lab/logs/walk/exe/exported/policy.pt
  • ~/TienKung-Lab/logs/walk/exe/exported/policy.onnx

6. Sim2Sim (MuJoCo)

Load the trained model in MuJoCo to perform cross-simulation validation.

When executing the play script in the aforementioned Run step, the trained model will be automatically exported and saved to the exported/ subdirectory under the same directory. It can be directly specified here using the --policy argument.

Run MuJoCo simulation with the trained policy:

cd ~/TienKung-Lab
python legged_lab/scripts/sim2sim.py --task walk --duration 50 --policy ./logs/walk/exe/exported/policy.pt

7. Model Conversion

Use OpenVINO for model conversion to obtain .bin and .xml files

cd ~/TienKung-Lab
pip install openvino
ovc logs/walk/exe/exported/policy.pt --output_model logs/walk/exe/exported/

8. Deployment

Deploy the trained locolocomotion control model to the ros_lite project on the training machine.

Copy model files

Copy .bin.xml and .pt to ~/ros_lite/install/rl_control_new/share/rl_control_new/config/policy/ dir。

cd ~/
unzip ros_lite_src.zip -d ros_lite
cd ~/TienKung-Lab/logs/walk/exe/exported/
cp policy.pt policy.bin policy.xml ~/ros_lite/install/rl_control_new/share/rl_control_new/config/policy/

Ensure you use the correct model file

Ensure that mlp.path in the configuration file ~/ros_lite/install/rl_control_new/share/rl_control_new/config/tg22_config.yaml points to the correct name of the model you just trained; no suffix is ​​needed.

For example, if the trained model files include policy.pt, policy.bin, and policy.xml, then the value of mlp.path in the configuration file should be set to /config/policy/policy.

cd ~/ros_lite/install/rl_control_new/share/rl_control_new/config/
cat tg22_config.yaml
info

The actual model files used for locomotion control are .bin and .xml.

warning

Here, the model file is directly copied into the compiled output directory of the ros_lite project, which is suitable for scenarios where no further compilation is performed on the real robot.

If the project will be recompiled on the real robot, the model file can be copied to the ~/ros_lite/src/rl_control_new/config/policy directory. After the project is transferred to the real robot, it can then be compiled there.

9. Deployment of real devices

Package the entire ~/ros_lite directory on the training machine, place it on the x86 board of Tiangong Xingzhe (192.168.41.1), and unzip it in the /home/ubuntu/ directory. At this point, the gait training model has been deployed to the Tiangong real machine.

Preparing for the test

  1. On the x86 board at 192.168.41.1, ensure that the robot's auto-start service is disabled.
    sudo systemctl status proc_manager.service
  2. Check if the service status is disabled; if not, disable it manually.
    sudo systemctl disable proc_manager.service
    Then power off and restart the entire robot, and wait for the restart to complete.
  3. Manually start the body_control on the x86 board with IP address 192.168.41.1.
    tmux
    sudo su
    cd ros2ws
    source install/setup.bash
    ros2 launch body_control body.launch.py
  4. Launch another terminal to initiate the reinforcement learning operation and control for this training session.
     tmux
    cd ~/ros_lite
    source install/setup.bash
    ros2 launch rl_control_new rl.launch.py
danger

Since this is a newly trained model, the results cannot be guaranteed, so safety must be ensured:

  • The robot must be connected to the transfer machine with a rope. The rope should not be too loose to prevent the robot from falling too forcefully in an emergency and causing danger.
  • And there is another person ready to press the emergency stop button at any time.

Remote Control Button Instructions

ButtonFunction
A key + G key (switch to middle zero position)Switch to MLP (Machine Learning Policy) control mode
D keySwitch to ZERO (Zeroing) control mode
C keySwitch to STOP (Stop) control mode
Left joystickControl robot forward/backward and left/right movement
Right joystickControl robot turning (left/right)
danger

After starting the locolocomotion control service, pressing A will cause the robot to transition directly from the homing (zeroing) state into the walking state. There is no standing state! This is particularly important to note.

Control logic description

  1. The robot's initial state is STOP mode. After startup, press ZERO mode to ensure all joints of the robot return to the set zero position
  2. Press A key + G key (switch to middle zero position) to switch to MLP mode, the robot starts walking. Note: the robot will not enter the standing state.
  3. Press D key to return to ZERO mode, the robot returns to the initial posture
  4. Press C key to enter STOP mode, maintaining the current posture
  5. State switching flow: STOP -> ZERO -> MLP -> STOP

Then, under the premise of ensuring safety, you can gradually use the remote control to operate the robot to return to zero, and walk.

Referencehttps://github.com/Open-X-Humanoid/Deploy_Tienkung/tree/main/rl_control_new