feat: [WIP] add RNN training by ValerianRey · Pull Request #1 · SimplexLab/Recursion

ValerianRey · 2025-12-17T23:43:46Z

@PierreQuinton this is the code of what I explained on signal. super messy so far.

src/recursion/models/trivial_memory_model.py

PierreQuinton · 2025-12-18T08:14:52Z

src/recursion/models/trivial_memory_model.py

I think it could be possible to clone the parameters of the memory model at each call, it should not require more memory. But then if we do backward we obtain a grad for each of the copies, we can stack them. Of course this also works and later on we can also make this quite efficient with hooks.

I guess in this code, there is no training at all? (no .grad=...)

I think it could be possible to clone the parameters of the memory model at each call, it should not require more memory. But then if we do backward we obtain a grad for each of the copies, we can stack them. Of course this also works and later on we can also make this quite efficient with hooks.

I think the current method is almost maximally efficient. But maybe it's not expressive enough (can't really select paths of length 1, 2, 4, 8, etc, without computing also 3, 5, 6, 7, ..., for now).

We could maybe do what you say with a detached view of the parameters (I think cloning duplicates memory + is differentiable so the gradients would flow back to the original params)

Selecting only paths is doable only with residual RNN. But note that if you select only path to level 2 memory, then you don't train interaction between level 1 and level 2, which is not typically what we want to do.

* Reset memories, memories_wrt and param_to_gradients * Use transform to aggregate and accumulate into .grad * Train head too * Change some values * At this point, it seems hard to train with Mean() and doable with UPGrad()

feat: [WIP] add RNN training

be6271e

PierreQuinton reviewed Dec 18, 2025

View reviewed changes

src/recursion/models/trivial_memory_model.py Outdated Show resolved Hide resolved

PierreQuinton reviewed Dec 18, 2025

View reviewed changes

ValerianRey added 2 commits December 18, 2025 15:02

Improvements:

25a32ed

* Reset memories, memories_wrt and param_to_gradients * Use transform to aggregate and accumulate into .grad * Train head too * Change some values * At this point, it seems hard to train with Mean() and doable with UPGrad()

Add batched JD training

927b36e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: [WIP] add RNN training#1

feat: [WIP] add RNN training#1
ValerianRey wants to merge 3 commits intomainfrom
add-rnn-training

ValerianRey commented Dec 17, 2025

Uh oh!

Uh oh!

PierreQuinton Dec 18, 2025 •

edited

Loading

Uh oh!

PierreQuinton Dec 18, 2025 •

edited

Loading

Uh oh!

ValerianRey Dec 18, 2025

Uh oh!

ValerianRey Dec 18, 2025

Uh oh!

PierreQuinton Dec 19, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

ValerianRey commented Dec 17, 2025

Uh oh!

Uh oh!

PierreQuinton Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

PierreQuinton Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ValerianRey Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

ValerianRey Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

PierreQuinton Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

PierreQuinton Dec 18, 2025 •

edited

Loading

PierreQuinton Dec 18, 2025 •

edited

Loading

PierreQuinton Dec 19, 2025 •

edited

Loading