Taxi Implementation of Taxi Reinforcement Learning Problem and Q-learning Method TO DO Set up the environment (states, actions, rewards, rendering, reset, step) Implement Q-learning method (also epsilon-greedy policy)