[Question] Guidance on Preparing the nuScenes Dataset

Hi authors,

First of all, thank you for your incredible work on Sonata and for open-sourcing the pre-trained models and code. The results are very impressive, and I'm excited to explore the capabilities of your self-supervised representations.

I am particularly interested in using the Sonata model for outdoor semantic segmentation on the **nuScenes dataset**, as demonstrated in Table 8 of your paper.

I have carefully reviewed the `README.md` in this repository. It provides a clear guide for running inference on custom data, specifying the required dictionary format (`coord`, `color`, `normal`, etc.). The README also mentions that the pre-training process can be reproduced using the **Pointcept** codebase.

I've begun looking into the Pointcept repository to understand the data preparation pipeline for nuScenes, but I've found the conversion process from the raw dataset to the required input format to be quite involved.

To ensure I'm on the right track, I was hoping you could provide some clarification on the following points:

1.  **Data Conversion Scripts:** Could you please point me to the specific scripts or documentation within the `Pointcept` repository that are responsible for processing the raw nuScenes dataset (LiDAR sweeps and annotations) into the format consumed by the model?

2.  **Feature Mapping:** The required input format specifies `coord`, `color`, and `normal` fields. Raw nuScenes LiDAR points typically have `x, y, z, intensity`. Could you clarify how these are mapped?
    * Is the `color` information derived by projecting points onto camera images, or is `intensity` used as a proxy?
    * How are `normal` vectors generated for the outdoor scenes? Or are they optional for the outdoor pre-trained model?

3.  **Reference Sample:** Would it be possible to share a simple, standalone script that processes a single nuScenes sample into the final format? A working example would be immensely helpful for understanding the full pipeline.

Any guidance or pointers you could offer would be greatly appreciated and would be a great help to me and others in the community looking to work with outdoor datasets.

Thank you again for your time and for this fantastic contribution!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] Guidance on Preparing the nuScenes Dataset #38

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Question] Guidance on Preparing the nuScenes Dataset #38

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions