Skip to content

Conversation

@jchelly
Copy link
Collaborator

@jchelly jchelly commented Feb 15, 2024

This makes the seed for the random shuffle used in unbinding depend on the particle IDs so that the random number sequence no longer depends on the (unpredictable) order in which threads process halos. It also adds a parameter that can be used to change the seed so that we can run multiple realizations. Particle IDs are xor'd with the supplied parameter and then used to initialize the random number generator. I'm not sure if some other operation would make more sense.

I think this change also fixes two problems with the original code:

  • The random number generator was never seeded so we always got the same, default seed on all MPI ranks
  • Calling std::random_shuffle from multiple threads without specifying the random number generator may have used a non thread-safe default generator

There might still be some non-deterministic behaviour due to rounding error in reduction operations depending on the order of execution.

@jchelly
Copy link
Collaborator Author

jchelly commented Feb 15, 2024

This still doesn't seem to be enough to make runs deterministic with >1 threads (tested by running on L1000N0900/DMO_FIDUCIAL twice.)

@jchelly
Copy link
Collaborator Author

jchelly commented Apr 29, 2024

It looks like TrackIds somehow get assigned in a non-deterministic way when we use multiple threads. In the small Colibre test I get the same halo propertes in snapshot 000 but the TrackIds differ.

@jchelly
Copy link
Collaborator Author

jchelly commented Apr 29, 2024

FeedCentrals() assigns subhalo indexes in a non-deterministic way using a critical section. It could be fixed using an ordered section or just removing the openmp directives.

@jchelly
Copy link
Collaborator Author

jchelly commented Apr 29, 2024

With the ordered section fix two colibre test runs now generate identical output for snasphots 0-10 (of 15) but diverge after that.

@VictorForouhar
Copy link
Collaborator

It looks like TrackIds somehow get assigned in a non-deterministic way when we use multiple threads. In the small Colibre test I get the same halo propertes in snapshot 000 but the TrackIds differ.

In #126 I make the TrackID values to be dependent on the HostHaloId, so they will be reproducible (even in a stronger way than your solution here, since it depends on the number of MPI ranks you have). I wonder if I can now use the TrackID as the rng seed for when we do the subsampling. Will test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants