Event-driven elevator group control as a Gymnasium environment, with classical baselines, RL agents, and a full evaluation/visualization pipeline. This repo accompanies the final report in RL_Project_final_report_William.pdf.
- Event-driven simulator with passenger spawn, door open, and door close events.
- Compact tensor observation for multi-car, multi-floor control.
- FIFO and LOOK baselines for classical dispatch comparison.
- Policy Gradient (with baseline), 1-step Actor-Critic, and PPO experiments.
- Rich visualization scripts and published figures under
visualization/.
flowchart TD
A[Problem: Elevator Dispatch] --> B[MDP Formulation]
B --> C[Gymnasium Environment]
C --> D1[Baselines: FIFO, LOOK]
C --> D2[RL Agents: PG+Baseline, 1-step AC, PPO]
D1 --> E[Evaluation]
D2 --> E
E --> F[Visualization & Analysis]
F --> G[Final Report]
What I built:
- MDP formulation and event-driven simulator (Gymnasium environment).
- FIFO and LOOK baseline solvers for classical control.
- RL training and evaluation (PG+baseline, 1-step AC, PPO).
- Visualization tooling for rewards, timelines, and load analysis.
- Final report with analysis and failure modes (reward hacking, seed-fixed fairness).
- Observation:
N x M x 5tensor with hall calls (up/down), car calls, car positions, and directions. - Action: choose
(floor, car);floor == Nmeans idle. - Dynamics: event-driven (spawn, open, close) instead of fixed time steps.
- Reward: current code rewards boarding/alighting plus a completion bonus. The report explores alternative rewards (e.g., squared waiting time) and analyzes reward hacking behaviors.
- Baselines: FIFO, LOOK (
Solver/) - RL: Policy Gradient with baseline, 1-step Actor-Critic, PPO (
RL_Elevator.ipynb)
The report shows that baselines remain strong on average, while PPO and policy-gradient variants can outperform baselines in specific, fixed-seed scenarios. The plots below are generated by scripts in visualization/.
| 3 halls, 1 car | 6 halls, 1 car |
|---|---|
![]() |
![]() |
| 3 halls, 1 car | 6 halls, 1 car |
|---|---|
![]() |
![]() |
| Event Timeline | Load Intensity |
|---|---|
![]() |
![]() |
Scripts live in visualization/:
event_plot.pycompares event timelines across runs.load_plot.pycolors OPEN events by car load.rewards_boxplot.pyaggregates reward distributions.filling_plot.pyplots losses with running mean/variance.ppo_handle.pyconverts PPO accumulated rewards back to per-step.
python3 -m venv .venv
.venv/bin/pip install -r requirements.txt
.venv/bin/pip install -e .# Baselines
.venv/bin/python -m Solver.FIFO
.venv/bin/python -m Solver.LOOK
# Visualizations (examples)
.venv/bin/python visualization/event_plot.py visualization/FIFO_3_1.txt visualization/LOOK_3_1.txt
.venv/bin/python visualization/load_plot.py visualization/PPO_render_6_1.txtSee the full analysis and figures in RL_Project_final_report_William.pdf.
Elevators/wrapperscontains generic Gymnasium wrappers from earlier experiments; they are not used by the elevator environment unless you explicitly wrap it.- The environment ID is
Elevators/Elevators-v0.
MIT. See LICENSE.





