Skip to content

Comments

Fix norm and add redistribute function to DistributedArray#184

Open
rohanbabbar04 wants to merge 9 commits intomainfrom
redistribute-norm
Open

Fix norm and add redistribute function to DistributedArray#184
rohanbabbar04 wants to merge 9 commits intomainfrom
redistribute-norm

Conversation

@rohanbabbar04
Copy link
Collaborator

Fixes #177

  • Fix norm with axis=None as default.
  • Add a new function named redistribute.
  • Add sendrecv in _mpi

@rohanbabbar04 rohanbabbar04 changed the title Fix norm and add redistribute Fix norm and add redistribute function to DistributedArray Feb 12, 2026
Use ncp instead of np

Add engine and fix recvbuf in sendrecv

Add nccl sendrecv

Return recvbuf in _sendrecv for nccl
@rohanbabbar04 rohanbabbar04 marked this pull request as ready for review February 13, 2026 12:41
@rohanbabbar04
Copy link
Collaborator Author

While fixing norm, I think it would be better to have a redistribute function which redistributes along a specified axis. This PR introduces a function using the MPI_SendRecv call which redistributes the global array along a new axis.

if axis != x.axis:
    raise NotImplementedError("Choose axis along the partition.")

The above was written as the norm calculated along an axis which is not the distributed axis would lead to issues, one fix which could remove it completely is to use redistribute in such cases.

x = x.redistribute(axis=axis)       # This would redistribute the DistributedArray x along the new axis and norm will work fine

@mrava87
Copy link
Contributor

mrava87 commented Feb 13, 2026

@rohanbabbar04 thanks!

I did check when I created the issue and apparently we never even tested the axis=-1 = flatten behavior, so I think it is safe to make a backward incompatible change here to really align with numpy.linalg.norm (as we stated in the docstring). Also, we currently support ND distributed arrays, but for anything interesting (operators/solvers) we really still use only 1D distributed arrays, so I think the likelihood of breaking someone's code is very low 😄

I had a quick look at the PR; before I review this in more details, please:

  • Add docstring to redistribute to explain what it is and add some comments in the method as it doesn't look so trivial
  • in _sendrecv could you also follow what we do for the other methods wrt nccl and create a utility method in _nccl. The method in DistributedMixIn should really NOT contain any actual implementation
  • Add more details to axis in norm, eg wrt to None
  • Still confused about the if axis != x.axis: condition and raise. To me if an 2d array is distributed over axis=0 and you ask for the norm over axis=1, it should be the easiest case as the norm will return a 1d array with the size of axis=0 and so each rank will just need to compute the norm of its local array with numpy, isn't it? I still think that the redistribute method you created could be very useful in many cases, but I am not sure its the efficient way to go about the norm of a non-distributed axis.

Minor fix, add docs for axis=None

Minor fix
@rohanbabbar04
Copy link
Collaborator Author

rohanbabbar04 commented Feb 15, 2026

Thanks @mrava87 for the review.

Here are the responses to your comments.

  1. Added some content to the function to explain what it does and its basic implementation.
  2. Done, I have created a new method called nccl_sendrecv.
  3. Done
  4. You are right, It could be done in a much easier way by applying np.linalg.norm(x.local_array...) which I have added in the function. I was actually focused on the redistribute that I forgot this completely 😅 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Norm with axis=-1

2 participants