Skip to content

Event‑driven elevator dispatch Gymnasium environment with FIFO/LOOK baselines, RL agents, and visualization/analysis from a full course project.

License

Notifications You must be signed in to change notification settings

william-dan/rl-elevator

Repository files navigation

Elevator Dispatch RL (Gymnasium)

Python Gymnasium License: MIT Status

Event-driven elevator group control as a Gymnasium environment, with classical baselines, RL agents, and a full evaluation/visualization pipeline. This repo accompanies the final report in RL_Project_final_report_William.pdf.

Highlights

  • Event-driven simulator with passenger spawn, door open, and door close events.
  • Compact tensor observation for multi-car, multi-floor control.
  • FIFO and LOOK baselines for classical dispatch comparison.
  • Policy Gradient (with baseline), 1-step Actor-Critic, and PPO experiments.
  • Rich visualization scripts and published figures under visualization/.

My Contributions (End-to-End Pipeline)

flowchart TD
  A[Problem: Elevator Dispatch] --> B[MDP Formulation]
  B --> C[Gymnasium Environment]
  C --> D1[Baselines: FIFO, LOOK]
  C --> D2[RL Agents: PG+Baseline, 1-step AC, PPO]
  D1 --> E[Evaluation]
  D2 --> E
  E --> F[Visualization & Analysis]
  F --> G[Final Report]
Loading

What I built:

  • MDP formulation and event-driven simulator (Gymnasium environment).
  • FIFO and LOOK baseline solvers for classical control.
  • RL training and evaluation (PG+baseline, 1-step AC, PPO).
  • Visualization tooling for rewards, timelines, and load analysis.
  • Final report with analysis and failure modes (reward hacking, seed-fixed fairness).

Environment Design

  • Observation: N x M x 5 tensor with hall calls (up/down), car calls, car positions, and directions.
  • Action: choose (floor, car); floor == N means idle.
  • Dynamics: event-driven (spawn, open, close) instead of fixed time steps.
  • Reward: current code rewards boarding/alighting plus a completion bonus. The report explores alternative rewards (e.g., squared waiting time) and analyzes reward hacking behaviors.

Baselines and RL Agents

  • Baselines: FIFO, LOOK (Solver/)
  • RL: Policy Gradient with baseline, 1-step Actor-Critic, PPO (RL_Elevator.ipynb)

Results Snapshot

The report shows that baselines remain strong on average, while PPO and policy-gradient variants can outperform baselines in specific, fixed-seed scenarios. The plots below are generated by scripts in visualization/.

Rewards (3 halls, 1 car) vs (6 halls, 1 car)

3 halls, 1 car 6 halls, 1 car
Rewards 3-1 Rewards 6-1

Reward Distributions

3 halls, 1 car 6 halls, 1 car
Boxplot 3-1 Boxplot 6-1

Event Timeline and Load Insights

Event Timeline Load Intensity
Timeline Load

Visualization Toolkit

Scripts live in visualization/:

  • event_plot.py compares event timelines across runs.
  • load_plot.py colors OPEN events by car load.
  • rewards_boxplot.py aggregates reward distributions.
  • filling_plot.py plots losses with running mean/variance.
  • ppo_handle.py converts PPO accumulated rewards back to per-step.

Installation

python3 -m venv .venv
.venv/bin/pip install -r requirements.txt
.venv/bin/pip install -e .

Quickstart

# Baselines
.venv/bin/python -m Solver.FIFO
.venv/bin/python -m Solver.LOOK

# Visualizations (examples)
.venv/bin/python visualization/event_plot.py visualization/FIFO_3_1.txt visualization/LOOK_3_1.txt
.venv/bin/python visualization/load_plot.py visualization/PPO_render_6_1.txt

Report

See the full analysis and figures in RL_Project_final_report_William.pdf.

Notes

  • Elevators/wrappers contains generic Gymnasium wrappers from earlier experiments; they are not used by the elevator environment unless you explicitly wrap it.
  • The environment ID is Elevators/Elevators-v0.

License

MIT. See LICENSE.

About

Event‑driven elevator dispatch Gymnasium environment with FIFO/LOOK baselines, RL agents, and visualization/analysis from a full course project.

Topics

Resources

License

Stars

Watchers

Forks