[Fix] Update dataset conversion for InternData-N1 VLN-PE v0.5 dataset format #288

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

kew6688 merged 3 commits into InternRobotics:dev from kew6688:dataset_update

Feb 6, 2026

+392 −7

docs/compatibility.md

-Original file line number
+Diff line change
@@ -0,0 +1,13 @@
+    # Compatibility
+    ## v0.3.1
+    ### InternData-N1 update to v0.5
+    The InternData-N1 VLN-PE trajectory training dataset has been upgraded from `v0.1` to `v0.5`. This update introduces minor structural changes in the dataset layout and updates the LeRobot-to-LMDB conversion logic to match the new `v0.5` data structure.
+    The training pipeline now uses the new key name:
+    - `instruction_text` → `task`
+    The updated conversion logic is **not compatible** with InternData-N1 `v0.1`.

internnav/dataset/cma_lerobot_dataset.py

            
                      Original file line number
                      Diff line number
                      Diff line change
                  
    @@ -6,7 +6,7 @@
  
    from internnav.dataset.base import BaseDataset, ObservationsDict, _block_shuffle

    from internnav.model.utils.feature_extract import extract_instruction_tokens

    from internnav.utils.lerobot_as_lmdb import LerobotAsLmdb

    from internnav.utils.loader import LerobotAsLmdb

    class CMALerobotDataset(BaseDataset):

    @@ -38,8 +38,9 @@ def __init__(
  
            self.camera_name = self.config.il.camera_name

            self.lerobot_as_lmdb = LerobotAsLmdb(self.lerobot_features_dir)

            self.lmdb_keys = self.lerobot_as_lmdb.get_all_keys()

            self.lmdb_keys = self.lerobot_as_lmdb.get_all_keys(allow_scan_list=['r2r']) # r2r / r2r_aliengo / r2r_flash

            self.length = len(self.lmdb_keys)

            print(f"total keys in traj_data: {len(self.lmdb_keys)}")

            # For CMA-CLIP

            self.use_clip_encoders = False

internnav/dataset/rdp_lerobot_dataset.py

            
                      Original file line number
                      Diff line number
                      Diff line change
                  
    @@ -26,7 +26,7 @@
  
    from internnav.model.basemodel.LongCLIP.model import longclip

    from internnav.model.utils.feature_extract import extract_instruction_tokens

    from internnav.utils.geometry_utils import get_delta, normalize_data, to_local_coords

    from internnav.utils.lerobot_as_lmdb import LerobotAsLmdb

    from internnav.utils.loader import LerobotAsLmdb

    def _convert_image_to_rgb(image):

    @@ -103,8 +103,9 @@ def __init__(
  
            self.to_pil = ToPILImage()

            self.image_processor = _transform(n_px=224)  # copy from clip-long

            self.lerobot_as_lmdb = LerobotAsLmdb(self.lerobot_features_dir)

            self.lmdb_keys = self.lerobot_as_lmdb.get_all_keys()

            self.lmdb_keys = self.lerobot_as_lmdb.get_all_keys(allow_scan_list=['r2r']) # r2r / r2r_aliengo / r2r_flash

            self.length = len(self.lmdb_keys)

            print(f"total keys in traj_data: {len(self.lmdb_keys)}")

            self.start = 0

            self.end = self.length

    @@ -192,7 +193,7 @@ def _load_next(self):  # noqa: C901
  
                    episodes_in_json = data_to_load['episodes_in_json']

                    instructions = [

                        episodes_in_json[ep_idx]['instruction_text'][: self.config.model.text_encoder.max_length]

                        episodes_in_json[ep_idx]['task'][: self.config.model.text_encoder.max_length]

                        for ep_idx in range(len(episodes_in_json))

                    ]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Fix] Update dataset conversion for InternData-N1 VLN-PE v0.5 dataset format #288

Diff view

Diff view

There are no files selected for viewing

Uh oh!