A Safe Reinforcement Learning Framework for Competitive Racing
Note: This project implements a hybrid learning architecture (Imitation Learning + PPO) integrated with Action Mapping and State Mapping safety mechanisms for high-speed autonomous driving.
Autonomous racing requires a delicate balance between aggressive speed and physical safety constraints. This project addresses the "cold start" problem in Reinforcement Learning by using a two-stage training process:
- Warm Start (Imitation Learning): The agent initially learns to clone expert driving behaviors to establish a baseline safe policy.
- Optimization (PPO): The policy is fine-tuned using Proximal Policy Optimization to surpass the expert's performance.
To ensure safety during this aggressive optimization, the system implements an Action Mapping mechanism to enforce traction limits and a State Mapping mechanism to handle dynamic overtaking scenarios.
Agent utilizing State Mapping to identify feasible overtaking corridors while adhering to friction constraints.
- Python 3.10 (Strict requirement).
- Windows: Download the Python 3.10 Installer (Scroll to "Files" -> "Windows installer (64-bit)").
- Mac/Linux: Use your package manager (e.g.,
brew install python@3.10orsudo apt install python3.10).
-
Clone the repository:
git clone [https://github.com/tarasathwik/MetaDrive_Project.git](https://github.com/tarasathwik/MetaDrive_Project.git) cd MetaDrive_Project -
Create a Virtual Environment: We must explicitly use the Python 3.10 executable to create the environment.
Windows:
# If you installed Python 3.10 correctly, the 'py' launcher handles versions: py -3.10 -m venv venv # Activate it: .\venv\Scripts\activate
Mac/Linux:
# Point to the specific 3.10 binary python3.10 -m venv venv # Activate it: source venv/bin/activate
-
Install Dependencies:
pip install -r requirements.txt
-
Verify Installation: Check that the simulator launches:
python scripts/play_game.py
This framework combines modern RL algorithms with rigorous safety constraints.
-
Stage I: Imitation Learning (Behavioral Cloning): We minimize the divergence between the agent's policy and the expert's actions using the Behavioral Cloning loss:
-
Stage II: Proximal Policy Optimization (PPO): We optimize the policy using the clipped surrogate objective to ensure stable updates:
-
Action Mapping (Traction Control): To prevent skidding, we enforce the Friction Circle Constraint. The total force applied by the agent must strictly adhere to the physical limits of tire friction .
We implement a projection function H_AM that maps unsafe actions back onto this safe boundary.
-
State Mapping (Dynamic Perception): For competitive overtaking, we transform raw track observations into a "Feasible Area" vector based on opponent positions .
This allows the agent to perceive the track as a dynamic corridor, ignoring blocked lanes.
Run the manual control script to generate expert demonstrations for Imitation Learning.
python scripts/collect_data.py --episodes 10