HealDA Sensor Embedder #1397

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

aayushg55 wants to merge 3 commits into NVIDIA:main from aayushg55:healda-sensor-embedder

+1,020 −0

Contributor

aayushg55 commented Feb 11, 2026 •

edited

Loading

PhysicsNeMo Pull Request

Description

Adds obs sensor embedding modules used in HealDA.

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.
The CHANGELOG.md is up to date with these changes.
An issue is linked to this pull request.
If I am implementing a new model or modifying any existing model, I have followed the Models Implementation Coding Standards.

Dependencies

Review Process

All PRs are reviewed by the PhysicsNeMo team before merging.

Depending on which files are changed, GitHub may automatically assign a maintainer for review.

We are also testing AI-based code review tools (e.g., Greptile), which may add automated comments with a confidence score.
This score reflects the AI’s assessment of merge readiness and is not a qualitative judgment of your work, nor is
it an indication that the PR will be accepted / rejected.

AI-generated feedback should be reviewed critically for usefulness.
You are not required to respond to every AI comment, but they are intended to help both authors and reviewers.
Please react to Greptile comments with 👍 or 👎 to provide feedback on their accuracy.


          healda sensor embedder

3f518f5

Contributor Author

aayushg55 commented Feb 11, 2026

@NickGeneva @pzharrington Here are the sensor embedding modules.

aayushg55 marked this pull request as ready for review

February 11, 2026 01:28

Contributor

greptile-apps bot commented Feb 11, 2026 •

edited

Loading

Greptile Overview

Greptile Summary

Adds sensor embedding modules for HealDA, implementing multi-sensor observation tokenization and aggregation onto HEALPix grids.

Implements ObsTokenizer for converting observations with metadata into feature vectors
Implements SensorEmbedder for embedding single-sensor observations onto HEALPix grids with scatter aggregation
Implements MultiSensorObsEmbedding for fusing multiple sensor embeddings with reordering to HEALPIX_PAD_XY format
Implements UniformFusion for averaging sensor embeddings with variance-preserving scaling
Includes comprehensive test coverage with gradient checks, non-regression tests, and edge case handling
Previous review comments have flagged missing jaxtyping annotations and shape validation in several methods (MOD-005, MOD-006)

Important Files Changed

Filename	Overview
physicsnemo/experimental/models/healda/init.py	Adds exports for new sensor embedding modules - clean implementation
physicsnemo/experimental/models/healda/point_embed.py	Implements multi-sensor observation embedding with comprehensive validation and documentation; previously flagged jaxtyping annotation and shape validation issues already noted
test/models/healda/test_point_embed.py	Comprehensive test coverage including gradient checks, forward pass accuracy, and edge case handling

greptile-apps bot reviewed

View reviewed changes

Contributor

greptile-apps bot left a comment

_{4 files reviewed, 10 comments}

_{Edit Code Review Agent Settings | Greptile}

physicsnemo/experimental/models/healda/__init__.py Outdated Show resolved Hide resolved

physicsnemo/experimental/models/healda/point_embed.py Show resolved Hide resolved

physicsnemo/experimental/models/healda/point_embed.py Outdated Show resolved Hide resolved

physicsnemo/experimental/models/healda/point_embed.py Outdated Show resolved Hide resolved

physicsnemo/experimental/models/healda/point_embed.py Outdated Show resolved Hide resolved

physicsnemo/experimental/models/healda/point_embed.py Outdated Show resolved Hide resolved

physicsnemo/experimental/models/healda/point_embed.py Show resolved Hide resolved

physicsnemo/experimental/models/healda/point_embed.py Show resolved Hide resolved

physicsnemo/experimental/models/healda/point_embed.py Show resolved Hide resolved

physicsnemo/experimental/models/healda/point_embed.py Show resolved Hide resolved

aayushg55 commented

View reviewed changes

physicsnemo/experimental/models/healda/point_embed.py Outdated

+                  Parameters
+                  ----------
+                  sensor_configs : list[dict[str, Any]]

Contributor Author

aayushg55 Feb 11, 2026

The list of dicts sensor_configs feels a bit messy. Previously, we encapsulated this in a dataclass. Open to any suggestions.

Collaborator

pzharrington Feb 11, 2026

Not sure if @NickGeneva agrees, but my 2c is that the dataclass is not necessarily the worst thing to have if it is a constructor argument; I assume you refactored bc this module subclasses physicsnemo.Module and thus the constructor args need to be JSON-serializable for the .from_checkpoint() functionality. If this is more of a helper module, specific to HealDA, imo it is ok to have it just subclass torch.nn.Module and then it could accept a dataclass if that's preferred -- not sure if that makes sense though, e.g. if the dataclass is intended to be a user-configurable thing that is passed to the top-level HealDA model (which should be a physicsnemo.Module)...

Contributor Author

aayushg55 Feb 11, 2026

Yea I removed the dataclass due to json-serializaling issues, and it seemed custom dataclasses were not preferred. For now, I switched to having separate parameters, each being a list, and then validated for matching length, as suggested by @NickGeneva.

NickGeneva reviewed

View reviewed changes

physicsnemo/experimental/models/healda/point_embed.py Outdated

		return out


		def _offsets_to_batch_idx(offsets: torch.Tensor) -> torch.Tensor:

Collaborator

NickGeneva Feb 11, 2026

General comment for all Tensor inputs, can these all get updated to JAX typing. This will allow you to spec the dimensions

Here's an example:
https://github.com/NVIDIA/physicsnemo/blob/main/physicsnemo/models/srrn/super_res_net.py#L297

Contributor Author

aayushg55 Feb 11, 2026

Thanks, added jaxtyping annotations everywhere

physicsnemo/experimental/models/healda/point_embed.py Outdated

+                  local_platform: torch.Tensor,
+                  obs_type: torch.Tensor,
+                  offsets: torch.Tensor,
+                  expected_num_sensors: int,

Collaborator

NickGeneva Feb 11, 2026

why is this needed here? can we just assume offsets.shape[0] is the right dim? Bit confused about the check below with expected_num_sensors

Contributor Author

aayushg55 Feb 11, 2026

Sure, we can remove this check

physicsnemo/experimental/models/healda/point_embed.py Outdated

+                  for sensor_idx in range(nsensors):
+                      end = offsets[sensor_idx, -1, -1].item()
+                      start = 0 if sensor_idx == 0 else offsets[sensor_idx - 1, -1, -1].item()

Collaborator

NickGeneva Feb 11, 2026

Confused about these two lines, we have:

offsets[sensor_idx, -1, -1].item()
offsets[sensor_idx - 1, -1, -1].item()

so different dimensions... also why if conditional statement start = 0 if

Contributor Author

aayushg55 Feb 11, 2026

offsets[sensor_idx, -1, -1] refers to the end of the current sensor.
offsets[sensor_idx - 1, -1, -1] refers to the end of the previous sensor at sensor_idx - 1. The alignment of the two lines makes it seem like we are indexing into an extra dimension. Can clean up the if statement and add some comments.

Contributor Author

aayushg55 Feb 11, 2026

Simplified logic and added comments to make this clearer

physicsnemo/experimental/models/healda/point_embed.py

+                  def __init__(
+                      self,
+                      *,

Collaborator

NickGeneva Feb 11, 2026

why is the general positional args needed here?

Contributor Author

aayushg55 Feb 11, 2026

No strong preference for using it. I included the keyword only * since nplatform and nplatform are both int, so with keyword only, it would force the user to be clear about what they are setting each to.

physicsnemo/experimental/models/healda/point_embed.py Outdated

+                  ) -> torch.Tensor:
+                      """Aggregate observations to spatial grid and project to output dimension."""
+                      # Convert observation pixels to aggregator grid resolution
+                      aggregation_pix = pix // int(4.0 ** (hpx_level - self.hpx_level))

Collaborator

NickGeneva Feb 11, 2026

Why is it hpx_level - self.hpx_level not following exactly what this line is doing

Collaborator

NickGeneva Feb 11, 2026

Maybe these can have better names, self.hpx_level is the target emb level right... what is hpx_level then? The representitive hpx resolution of the sensor?

Contributor Author

aayushg55 Feb 11, 2026

self.hpx_level is the grid level of the sensor/model, whereas hpx_level is the level corresponding to the incoming pix tensor. This accounts for if the loader calculated pix on a higher resolution. Since we use the hpx nest format, converting from higher to lower resolution can be done by dividing by 4^(difference in level). But we don't need this added complexity and can assume pix is of the model level. Can simplify to not pass in hpx level/remove this

Contributor Author

aayushg55 Feb 11, 2026

Removed

physicsnemo/experimental/models/healda/point_embed.py Outdated

		)


		def _prod(shape):

Collaborator

NickGeneva Feb 11, 2026

does math.prod not work? You import the math module already

Contributor Author

aayushg55 Feb 11, 2026

Using .numel() instead

physicsnemo/experimental/models/healda/point_embed.py Outdated

+                      offsets: torch.Tensor,
+                      hpx_level: int,
+                  ) -> torch.Tensor:
+                      if self.use_checkpoint:

Collaborator

NickGeneva Feb 11, 2026

Whats this condition for?

Collaborator

NickGeneva Feb 11, 2026

What is use_checkpoint

Contributor Author

aayushg55 Feb 11, 2026

This is for gradient_checkpointing. It could be useful for memory saving, although I haven't tested what the saving actually is. Open to removing.

physicsnemo/experimental/models/healda/point_embed.py Outdated

+                  def __init__(
+                      self,
+                      sensor_configs: list[dict[str, Any]],

Collaborator

NickGeneva Feb 11, 2026

If the sensor config is like

name: sensor name (bookkeeping, unused).
nchannel: number of sensor channels.
nplatform: number of sensor platforms.

Could this just be three separate parameters of list[str], list[int], and list[int] which get validated that the lengths are the same?

Contributor Author

aayushg55 Feb 11, 2026

Yea I think the separate lists make the most sense. The sensor names are not actually used by the model- I had initially thought to keep them around as metadata so it is easier to map each embedder network to what it actually is, but not sure if it is worth keeping.

physicsnemo/experimental/models/healda/point_embed.py

+                      Flattened local platform ids of each observation with shape :math:`(N_{obs},)`.
+                  obs_type : torch.Tensor
+                      Flattened observation type ids with shape :math:`(N_{obs},)`.
+                  offsets : torch.Tensor

Collaborator

NickGeneva Feb 11, 2026

Not following how this tensor (S, B, T) is mapped to (N_{obs},).

Why would this just not be of size (S,)

Collaborator

NickGeneva Feb 11, 2026

Is int input of the forward actually (B, T, N_obs)?

Contributor Author

aayushg55 Feb 11, 2026 •

edited

Loading

All of our inputs are this packed/flattened nobs tensor, where it is packed across time, then batch, then sensor. The offsets then indicate where each "window" or sample in the full flat tensor ends. Basically, it is describing a (N_sensor,B,T,N_sensor_obs) tensor, but all flattened so that we don't need to use padding. The batch and time support is to make the module more general, although we only really use B=1 and T=1.

coreyjadams reviewed

View reviewed changes

physicsnemo/experimental/models/healda/point_embed.py Outdated

Comment on lines 30 to 49

+              HEALPIXPAD_AVAILABLE = check_version_spec("earth2grid", "0.1.0", hard_fail=False)
+              if HEALPIXPAD_AVAILABLE:
+                  _healpix_mod = importlib.import_module("earth2grid.healpix")
+                  hpx_grid = _healpix_mod.Grid
+                  HEALPIX_PAD_XY = _healpix_mod.HEALPIX_PAD_XY
+                  HEALPIX_NEST = _healpix_mod.NEST
+              else:
+                  HEALPIX_PAD_XY = None
+                  HEALPIX_NEST = None
+                  def hpx_grid(*args, **kwargs):
+                      """Dummy symbol for missing earth2grid backend."""
+                      raise ImportError(
+                          (
+                              "earth2grid is not installed, cannot use it as a backend for HEALPix padding.\n"
+                              "Install earth2grid from https://github.com/NVlabs/earth2grid.git to enable the accelerated path.\n"
+                              "pip install --no-build-isolation https://github.com/NVlabs/earth2grid/archive/main.tar.gz"
+                          )
+                      )

Collaborator

coreyjadams Feb 11, 2026

Some recent updates made an effort to reduce this boilerplate. See, for example this PR: #1390

The functionality is already merged, that one is just applying it. It may simplify your life here.

Contributor Author

aayushg55 Feb 11, 2026

Thanks, updated to use the OptionalImport functionality.

aayushg55 added 2 commits

February 11, 2026 14:27


          sensor embedder cleanup - jaxtyping annotations, comments, simplifica…

a93047a

…tion


          Merge branch 'main' of https://github.com/NVIDIA/physicsnemo into hea…

2fdc677

…lda-sensor-embedder

Contributor Author

aayushg55 commented Feb 11, 2026

greptile-apps bot reviewed

View reviewed changes

Contributor

greptile-apps bot left a comment

_{4 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

NickGeneva NickGeneva left review comments

coreyjadams coreyjadams left review comments

pzharrington pzharrington left review comments

+1 more reviewer

greptile-apps[bot] greptile-apps[bot] left review comments

At least 1 approving review is required to merge this pull request.

Labels

None yet