hierarchical learning

Investigate into several streams identified of possible HL implementations
see variants from policy(s) = a:

**policy((s_t, s_t+1, s_t+2)) = a**  
breaking Markov property. implementation of DQN with Recurrent Neural Networks will do exactly this, but can use a more straight forward approach, testing in regular q-learning tabular approachs

**policy((s_t, s_t+1, s_t+2)) = (a_t, a_t+1, a_t+2)**
map sequence of states to sequence of actions
not sure on how to implement, and how similar is this considered to the mapping of policy(s) to new policy described below
see how similar is to current macro-actions implementations

**policy(s) = (a_t, a_t+1, a_t+2)**
maps one state to several actions
it is something actually performed in the action repeat scenario, like in Pong or the Drone simulator. except for blindly repeating that action, already maps the state to a sequence of actions, not exactly equal to one another

**policy(s) = policy_t**
**policy((s_t, s_t+1, s_t+2))  = policy_t**
essential approach to hierarchical learning investigated in H-DQN
a new policy is triggered either by reaching a specific state (like a termination/initiation state, like reaching a door out of one room and into another room)
or by doing a specific sequence of states (example: after getting a problem right 5 times in a row, or after solving problems A,B and C move to a different problem)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hierarchical learning #18

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

hierarchical learning #18

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions