Rong Li*,Β Β
Yuhao Dong*,Β Β
Tianshuai Hu*,Β Β
Ao Liang*,Β Β
Youquan Liu*,Β Β
Dongyue Lu*
Liang Pan,Β Β
Lingdong Kongβ ,Β Β
Junwei Liangβ‘,Β Β
Ziwei Liuβ‘
*Equal contribution Β β Project lead Β β‘Corresponding authors
- Cross-Platform: First 3D grounding dataset spanning vehicle, drone, and quadruped platforms
- Large-Scale: Large-scale annotated samples across diverse real-world scenarios
- Multi-Modal: Synchronized RGB, LiDAR, and language annotations
- Challenging: Complex outdoor environments with varying object densities and viewpoints
- Reproducible: Unified evaluation protocols and baseline implementations
If you find our work helpful, please consider citing:
@inproceedings{li2025_3eed,
title = {{3EED}: Ground Everything Everywhere in {3D}},
author = {Rong Li and Yuhao Dong and Tianshuai Hu and Ao Liang and Youquan Liu and Dongyue Lu and Liang Pan and Lingdong Kong and Junwei Liang and Ziwei Liu},
booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
volume = {38},
year = {2025}
}π For detailed dataset statistics and analysis, please refer to our paper.
- [2025.10] Dataset and code are now publicly available on HuggingFace and GitHub! π¦
- [2025.09] 3EED has been accepted to NeurIPS 2025 Dataset and Benchmark Track! π
- Highlights
- Statistics
- News
- Table of Contents
- Installation
- Pretrained Models
- Dataset
- Quick Start
- License
- Acknowledgements
We support both CUDA 11 and CUDA 12 environments. Choose the one that matches your system:
Option 1: CUDA 11.1 Environment
| Component | Version |
|---|---|
| CUDA | 11.1 |
| cuDNN | 8.0.5 |
| PyTorch | 1.9.1+cu111 |
| torchvision | 0.10.1+cu111 |
| Python | 3.10 / 3.11 |
Option 2: CUDA 12.4 Environment
| Component | Version |
|---|---|
| CUDA | 12.4 |
| cuDNN | 8.0.5 |
| PyTorch | 2.5.1+cu124 |
| torchvision | 0.20.1+cu124 |
| Python | 3.10 / 3.11 |
cd ops/teed_pointnet/pointnet2_batch
python setup.py develop
cd ../roiaware_pool3d
python setup.py developDownload the RoBERTa-base checkpoint from HuggingFace and move it to data/roberta_base.
Download the 3EED dataset from HuggingFace:
π Dataset Link: https://huggingface.co/datasets/RRRong/3EED
After extraction, organize your dataset as follows:
data/3eed/
βββ drone/ # Drone platform data
β βββ scene-0001/
β β βββ 0000_0/
β β β βββ image.jpg
β β β βββ lidar.bin
β β β βββ meta_info.json
β β βββ ...
β βββ ...
βββ quad/ # Quadruped platform data
β βββ scene-0001/
β βββ ...
βββ waymo/ # Vehicle platform data
β βββ scene-0001/
β βββ ...
βββ roberta_base/ # Language model weights
βββ splits/ # Train/val split files
βββ drone_train.txt
βββ drone_val.txt
βββ quad_train.txt
βββ quad_val.txt
βββ waymo_train.txt
βββ waymo_val.txt
Train the baseline model on different platform combinations:
# Train on all platforms (recommended for best performance)
bash scripts/train_3eed.sh
# Train on single platform
bash scripts/train_waymo.sh # Vehicle only
bash scripts/train_drone.sh # Drone only
bash scripts/train_quad.sh # Quadruped onlyOutput:
- Checkpoints:
logs/Train_<datasets>_Val_<datasets>/<timestamp>/ - Training logs:
logs/Train_<datasets>_Val_<datasets>/<timestamp>/log.txt - TensorBoard logs:
logs/Train_<datasets>_Val_<datasets>/<timestamp>/tensorboard/
Evaluate trained models on validation sets:
Quick Evaluation:
# Evaluate on all platforms
bash scripts/val_3eed.sh
# Evaluate on single platform
bash scripts/val_waymo.sh # Vehicle
bash scripts/val_drone.sh # Drone
bash scripts/val_quad.sh # Quadruped- Update
--checkpoint_pathin the script to point to your trained model - Ensure the validation dataset is downloaded and properly structured
Output:
- Results saved to:
<checkpoint_dir>/evaluation/Val_<dataset>/<timestamp>/
Visualize predictions with 3D bounding boxes overlaid on point clouds:
# Visualize prediction results
python utils/visualize_pred.pyVisualization Output:
- π’ Ground Truth: Green bounding box
- π΄ Prediction: Red bounding box
Output Structure:
visualizations/
βββ waymo/
β βββ scene-0001_frame-0000/
β β βββ pointcloud.ply
β β βββ pred/gt_bbox.ply
β β βββ info.txt
β βββ ...
βββ drone/
βββ quad/
Baseline models and predictions are available at: Huggingface
This repository is released under the Apache 2.0 License (see LICENSE).
We sincerely thank the following projects and teams that made this work possible:
- BUTD-DETR - Bottom-Up Top-Down DETR for visual grounding
- WildRefer - Wild referring expression comprehension
- Waymo Open Dataset - Vehicle platform data
- M3ED - Drone and quadruped platform data
| π Awesome | Projects |
|---|---|
![]() |
3D and 4D World Modeling: A Survey [GitHub Repo] - [Project Page] - [Paper] |
![]() |
WorldLens: Full-Spectrum Evaluations of Driving World Models in Real World [GitHub Repo] - [Project Page] - [Paper] |
![]() |
LiDARCrafter: Dynamic 4D World Modeling from LiDAR Sequences [GitHub Repo] - [Project Page] - [Paper] |
![]() |
Are VLMs Ready for Autonomous Driving? A Study from Reliability, Data & Metric Perspectives [GitHub Repo] - [Project Page] - [Paper] |
![]() |
Perspective-Invariant 3D Object Detection [GitHub Repo] - [Project Page] - [Paper] |
![]() |
DynamicCity: Large-Scale 4D Occupancy Generation from Dynamic Scenes [GitHub Repo] - [Project Page] - [Paper] |
β€οΈ by the 3EED Team







