Elevator Dispatch RL (Gymnasium)

Event-driven elevator group control as a Gymnasium environment, with classical baselines, RL agents, and a full evaluation/visualization pipeline. This repo accompanies the final report in RL_Project_final_report_William.pdf.

Highlights

Event-driven simulator with passenger spawn, door open, and door close events.
Compact tensor observation for multi-car, multi-floor control.
FIFO and LOOK baselines for classical dispatch comparison.
Policy Gradient (with baseline), 1-step Actor-Critic, and PPO experiments.
Rich visualization scripts and published figures under visualization/.

My Contributions (End-to-End Pipeline)

flowchart TD
  A[Problem: Elevator Dispatch] --> B[MDP Formulation]
  B --> C[Gymnasium Environment]
  C --> D1[Baselines: FIFO, LOOK]
  C --> D2[RL Agents: PG+Baseline, 1-step AC, PPO]
  D1 --> E[Evaluation]
  D2 --> E
  E --> F[Visualization & Analysis]
  F --> G[Final Report]

What I built:

MDP formulation and event-driven simulator (Gymnasium environment).
FIFO and LOOK baseline solvers for classical control.
RL training and evaluation (PG+baseline, 1-step AC, PPO).
Visualization tooling for rewards, timelines, and load analysis.
Final report with analysis and failure modes (reward hacking, seed-fixed fairness).

Environment Design

Observation: N x M x 5 tensor with hall calls (up/down), car calls, car positions, and directions.
Action: choose (floor, car); floor == N means idle.
Dynamics: event-driven (spawn, open, close) instead of fixed time steps.
Reward: current code rewards boarding/alighting plus a completion bonus. The report explores alternative rewards (e.g., squared waiting time) and analyzes reward hacking behaviors.

Baselines and RL Agents

Baselines: FIFO, LOOK (Solver/)
RL: Policy Gradient with baseline, 1-step Actor-Critic, PPO (RL_Elevator.ipynb)

Results Snapshot

The report shows that baselines remain strong on average, while PPO and policy-gradient variants can outperform baselines in specific, fixed-seed scenarios. The plots below are generated by scripts in visualization/.

Rewards (3 halls, 1 car) vs (6 halls, 1 car)

3 halls, 1 car	6 halls, 1 car

Reward Distributions

3 halls, 1 car	6 halls, 1 car

Event Timeline and Load Insights

Event Timeline	Load Intensity

Visualization Toolkit

Scripts live in visualization/:

event_plot.py compares event timelines across runs.
load_plot.py colors OPEN events by car load.
rewards_boxplot.py aggregates reward distributions.
filling_plot.py plots losses with running mean/variance.
ppo_handle.py converts PPO accumulated rewards back to per-step.

Installation

python3 -m venv .venv
.venv/bin/pip install -r requirements.txt
.venv/bin/pip install -e .

Quickstart

# Baselines
.venv/bin/python -m Solver.FIFO
.venv/bin/python -m Solver.LOOK

# Visualizations (examples)
.venv/bin/python visualization/event_plot.py visualization/FIFO_3_1.txt visualization/LOOK_3_1.txt
.venv/bin/python visualization/load_plot.py visualization/PPO_render_6_1.txt

Report

See the full analysis and figures in RL_Project_final_report_William.pdf.

Notes

Elevators/wrappers contains generic Gymnasium wrappers from earlier experiments; they are not used by the elevator environment unless you explicitly wrap it.
The environment ID is Elevators/Elevators-v0.

License

MIT. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
Elevators		Elevators
Solver		Solver
visualization		visualization
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CITATION.cff		CITATION.cff
LICENSE		LICENSE
LOOK-render.txt		LOOK-render.txt
README.md		README.md
RL_Elevator.ipynb		RL_Elevator.ipynb
RL_Project_Proposal_William.pdf		RL_Project_Proposal_William.pdf
RL_Project_final_report_William.pdf		RL_Project_final_report_William.pdf
RL_Project_mid_term_report_William.pdf		RL_Project_mid_term_report_William.pdf
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Elevator Dispatch RL (Gymnasium)

Highlights

My Contributions (End-to-End Pipeline)

Environment Design

Baselines and RL Agents

Results Snapshot

Rewards (3 halls, 1 car) vs (6 halls, 1 car)

Reward Distributions

Event Timeline and Load Insights

Visualization Toolkit

Installation

Quickstart

Report

Notes

License

About

Uh oh!

Uh oh!

Languages

License

william-dan/rl-elevator

Folders and files

Latest commit

History

Repository files navigation

Elevator Dispatch RL (Gymnasium)

Highlights

My Contributions (End-to-End Pipeline)

Environment Design

Baselines and RL Agents

Results Snapshot

Rewards (3 halls, 1 car) vs (6 halls, 1 car)

Reward Distributions

Event Timeline and Load Insights

Visualization Toolkit

Installation

Quickstart

Report

Notes

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Languages