Skip to content

hierarchical learning #18

@lucasosouza

Description

@lucasosouza

Investigate into several streams identified of possible HL implementations
see variants from policy(s) = a:

policy((s_t, s_t+1, s_t+2)) = a
breaking Markov property. implementation of DQN with Recurrent Neural Networks will do exactly this, but can use a more straight forward approach, testing in regular q-learning tabular approachs

policy((s_t, s_t+1, s_t+2)) = (a_t, a_t+1, a_t+2)
map sequence of states to sequence of actions
not sure on how to implement, and how similar is this considered to the mapping of policy(s) to new policy described below
see how similar is to current macro-actions implementations

policy(s) = (a_t, a_t+1, a_t+2)
maps one state to several actions
it is something actually performed in the action repeat scenario, like in Pong or the Drone simulator. except for blindly repeating that action, already maps the state to a sequence of actions, not exactly equal to one another

policy(s) = policy_t
policy((s_t, s_t+1, s_t+2)) = policy_t
essential approach to hierarchical learning investigated in H-DQN
a new policy is triggered either by reaching a specific state (like a termination/initiation state, like reaching a door out of one room and into another room)
or by doing a specific sequence of states (example: after getting a problem right 5 times in a row, or after solving problems A,B and C move to a different problem)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions