Skip to content

Commit ae6c343

Browse files
authored
Update README of occupancy prediction (#123)
1 parent cce49ba commit ae6c343

1 file changed

Lines changed: 7 additions & 219 deletions

File tree

  • autonomous_driving/occupancy_prediction

autonomous_driving/occupancy_prediction/README.md

Lines changed: 7 additions & 219 deletions
Original file line numberDiff line numberDiff line change
@@ -1,48 +1,29 @@
11

2-
3-
4-
<div id="top" align="center">
5-
6-
# CVPR 2023 3D Occupancy Prediction Challenge
7-
**The world's First 3D Occupancy Benchmark for Scene Perception in Autonomous Driving.**
8-
9-
10-
11-
12-
13-
<a href="#devkit">
14-
<img alt="devkit: v0.1.0" src="https://img.shields.io/badge/devkit-v0.1.0-blueviolet"/>
15-
</a>
16-
<a href="#license">
17-
<img alt="License: Apache2.0" src="https://img.shields.io/badge/license-Apache%202.0-blue.svg"/>
18-
</a>
19-
20-
<img src="./figs/occupanc_1.gif" width="696px">
21-
22-
</div>
23-
242
## InternImage-based Baseline for CVPR23 Occupancy Prediction Challenge!!!!
253

264
We improve our baseline with a more powerful image backbone: **InaternImage**, which shows its excellent ability within a series of leaderboards and benchmarks, such as *COCO* and *nuScenes*.
275

286

29-
#### openmmlab packages requirements
7+
#### 1. Requirements
308
```bash
9+
python>=3.8
3110
torch==1.12 # recommend
3211
mmcv-full>=1.5.0
3312
mmdet==2.24.0
3413
mmsegmentation==0.24.0
3514
timm
3615
numpy==1.22
16+
mmdet3d==0.18.1 # recommend
3717
```
3818

39-
### Install DCNv3 for InternImage
19+
20+
### 2. Install DCNv3 for InternImage
4021
```bash
4122
cd projects/mmdet3d_plugin/bevformer/backbones/ops_dcnv3
4223
bash make.sh # requires torch>=1.10
4324
```
4425

45-
### Train with InternImage-Small
26+
### 3. Train with InternImage-Small
4627

4728
```bash
4829
./tools/dist_train.sh projects/configs/bevformer/bevformer_intern-s_occ.py 8 # consumes less than 14G memory
@@ -51,206 +32,13 @@ bash make.sh # requires torch>=1.10
5132
Notes: InatenImage provides abundant pre-trained model weights that can be used!!!
5233

5334

54-
### Performance compared to baseline
35+
### 4. Performance compared to baseline
5536

5637
model name|weight| mIoU | others | barrier | bicycle | bus | car | construction_vehicle | motorcycle | pedestrian | traffic_cone | trailer | truck | driveable_surface | other_flat | sidewalk | terrain | manmade | vegetation |
5738
----|:----------:| :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :----------------------: | :---: | :------: | :------: |
5839
bevformer_intern-s_occ|[Google Drive](https://drive.google.com/file/d/1LV9K8hrskKf51xY1wbqTKzK7WZmVXEV_/view?usp=sharing)| 25.11 | 6.93 | 35.57 | 10.40 | 35.97 | 41.23 | 13.72 | 20.30 | 21.10 | 18.34 | 19.18 | 28.64 | 49.82 | 30.74 | 31.00 | 27.44 | 19.29 | 17.29 |
5940
bevformer_base_occ|[Google Drive](https://drive.google.com/file/d/1NyoiosafAmne1qiABeNOPXR-P-y0i7_I/view?usp=share_link)| 23.67 | 5.03 | 38.79 | 9.98 | 34.41 | 41.09 | 13.24 | 16.50 | 18.15 | 17.83 | 18.66 | 27.70 | 48.95 | 27.73 | 29.08 | 25.38 | 15.41 | 14.46 |
6041

61-
## Table of Contents
62-
- [CVPR 2023 Occupancy Prediction Challenge](#cvpr-2023-occupancy-prediction-challenge)
63-
- [Introduction](#introduction)
64-
- [Task Definition](#task-definition)
65-
- [Rules for Occupancy Challenge](#rules-for-occupancy-challenge)
66-
- [Evaluation Metrics](#evaluation-metrics)
67-
- [mIoU](#miou)
68-
- [F Score](#f-score)
69-
- [Data](#data)
70-
- [Basic Information](#basic-information)
71-
- [Download](#download)
72-
- [Hierarchy](#hierarchy)
73-
- [Known Issues](#known-issues)
74-
- [Getting Started](#getting-started)
75-
- [Timeline](#challenge-timeline)
76-
- [Leaderboard](#leaderboard)
77-
- [License](#license)
78-
79-
80-
## Introduction
81-
Understanding the 3D surroundings including the background stuffs and foreground objects is important for autonomous driving. In the traditional 3D object detection task, a foreground object is represented by the 3D bounding box. However, the geometrical shape of the object is complex, which can not be represented by a simple 3D box, and the perception of the background is absent. The goal of this task is to predict the 3D occupancy of the scene. In this task, we provide a large-scale occupancy benchmark based on the nuScenes dataset. The benchmark is a voxelized representation of the 3D space, and the occupancy state and semantics of the voxel in 3D space are jointly estimated in this task. The complexity of this task lies in the dense prediction of 3D space given the surround-view image.
82-
83-
<p align="right">(<a href="#top">back to top</a>)</p>
84-
85-
## Task Definition
86-
Given images from multiple cameras, the goal is to predict the current occupancy state and semantics of each voxel grid in the scene. The voxel state is predicted to be either free or occupied. If a voxel is occupied, its semantic class needs to be predicted, as well. Besides, we also provide a binary observed/unobserved mask for each frame. An observed voxel is defined as an invisible grid in the current camera observation, which is ignored in the evaluation stage.
87-
88-
### Rules for Occupancy Challenge
89-
* We allow using annotations provided in the nuScenes dataset, and during inference, the input modality of the model should be camera only.
90-
* Other public/private datasets are not allowed in the challenge in any form (except ImageNet or MS-COCO pre-trained image backbone).
91-
* No future frame is allowed during inference.
92-
* In order to check the compliance, we will ask the participants to provide technical reports to the challenge committee and the participant will be asked to provide a public talk about the method after winning the award.
93-
94-
<p align="right">(<a href="#top">back to top</a>)</p>
95-
96-
## Evaluation Metrics
97-
Leaderboard ranking for this challenge is by the intersection-over-union (mIoU) over all classes.
98-
### mIoU
99-
100-
Let $C$ be he number of classes.
101-
102-
$$
103-
mIoU=\frac{1}{C}\displaystyle \sum_{c=1}^{C}\frac{TP_c}{TP_c+FP_c+FN_c},
104-
$$
105-
106-
where $TP_c$ , $FP_c$ , and $FN_c$ correspond to the number of true positive, false positive, and false negative predictions for class $c_i$.
107-
108-
### F-Score
109-
We also measure the F-score as the harmonic mean of the completeness $P_c$ and the accuracy $P_a$.
110-
111-
$$
112-
F-score=\left( \frac{P_a^{-1}+P_c^{-1}}{2} \right) ^{-1} ,
113-
$$
114-
115-
where $P_a$ is the percentage of predicted voxels that are within a distance threshold to the ground truth voxels, and $P_c$ is the percentage of ground truth voxels that are within a distance threshold to the predicted voxels.
116-
117-
<p align="right">(<a href="#top">back to top</a>)</p>
118-
119-
120-
## Data
121-
<div id="top" align="center">
122-
<img src="./figs/mask.jpg">
123-
</div>
124-
<div id="top" align="center">
125-
Figure 1. Semantic labels (left), visibility masks in the LiDAR (middle) and the camera (right) view. Grey voxels are unobserved in LiDAR view and white voxels are observed in the accumulative LiDAR view but unobserved in the current camera view.
126-
</div>
127-
128-
### Basic Information
129-
<div align="center">
130-
131-
| Type | Info |
132-
| :----: | :----: |
133-
| mini | 404 |
134-
| train | 28,130 |
135-
| val | 6,019 |
136-
| test | 6,006 |
137-
| cameras | 6 |
138-
| voxel size | 0.4m |
139-
| range | [-40m, -40m, -1m, 40m, 40m, 5.4m]|
140-
| volume size | [200, 200, 16]|
141-
| #classes | 0 - 17 |
142-
143-
</div>
144-
145-
- The dataset contains 18 classes. The definition of classes from 0 to 16 is the same as the [nuScenes-lidarseg](https://github.com/nutonomy/nuscenes-devkit/blob/fcc41628d41060b3c1a86928751e5a571d2fc2fa/python-sdk/nuscenes/eval/lidarseg/README.md) dataset. The label 17 category represents voxels that are not occupied by anything, which is named as `free`. Voxel semantics for each sample frame is given as `[semantics]` in the labels.npz.
146-
147-
- <strong>How are the labels annotated?</strong> The ground truth labels of occupancy derive from accumulative LiDAR scans with human annotations.
148-
- If a voxel reflects a LiDAR point, then it is assigned as the same semantic label as the LiDAR point;
149-
- If a LiDAR beam passes through a voxel in the air, the voxel is set to be `free`;
150-
- Otherwise, we set the voxel to be unknown, or unobserved. This happens due to the sparsity of the LiDAR or the voxel is occluded, e.g. by a wall. In the dataset, `[mask_lidar]` is a 0-1 binary mask, where 0's represent unobserved voxels. As shown in Fig.1(b), grey voxels are unobserved. Due to the limitation of the visualization tool, we only show unobserved voxels at the same height as the ground.
151-
152-
- <strong>Camera visibility.</strong> Note that the installation positions of LiDAR and cameras are different, therefore, some observed voxels in the LiDAR view are not seen by the cameras. Since we focus on a vision-centric task, we provide a binary voxel mask `[mask_camera]`, indicating whether the voxels are observed or not in the current camera view. As shown in Fig.1(c), white voxels are observed in the accumulative LiDAR view but unobserved in the current camera view.
153-
154-
- Both `[mask_lidar]` and `[mask_camera]` masks are optional for training. Participants do not need to predict the masks. Only `[mask_camera]` is used for evaluation; the unobserved voxels are not involved during calculating the F-score and mIoU.
155-
156-
157-
### Download
158-
The files mentioned below can also be downloaded via <img src="https://user-images.githubusercontent.com/29263416/222076048-21501bac-71df-40fa-8671-2b5f8013d2cd.png" alt="OpenDataLab" width="18"/>[OpenDataLab](https://opendatalab.com/CVPR2023-3D-Occupancy/download).It is recommended to use provided [command line interface](https://opendatalab.com/CVPR2023-3D-Occupancy/cli) for acceleration.
159-
160-
| Subset | Google Drive <img src="https://ssl.gstatic.com/docs/doclist/images/drive_2022q3_32dp.png" alt="Google Drive" width="18"/> | Baidu Cloud <img src="https://nd-static.bdstatic.com/m-static/v20-main/favicon-main.ico" alt="Baidu Yun" width="18"/> | Size |
161-
| :---: | :---: | :---: | :---: |
162-
| mini | [data](https://drive.google.com/drive/folders/1ksWt4WLEqOxptpWH2ZN-t1pjugBhg3ME?usp=share_link) | [data](https://pan.baidu.com/s/1IvOoJONwzKBi32Ikjf8bSA?pwd=5uv6) | approx. 440M |
163-
| trainval | [data](https://drive.google.com/drive/folders/1JObO75iTA2Ge5fa8D3BWC8R7yIG8VhrP?usp=share_link) | [data](https://pan.baidu.com/s/1_4yE0__UDIJS8JtBSB0Bpg?pwd=li5h) | approx. 32G |
164-
| test | coming soon | coming soon | ~ |
165-
166-
* Mini and trainval data contain three parts -- `imgs`, `gts` and `annotations`. The `imgs` datas have the same hierarchy with the image samples in the original nuScenes dataset.
167-
168-
169-
### Hierarchy
170-
The hierarchy of folder `Occpancy3D-nuScenes-V1.0/` is described below:
171-
```
172-
└── Occpancy3D-nuScenes-V1.0
173-
|
174-
├── mini
175-
|
176-
├── trainval
177-
| ├── imgs
178-
| | ├── CAM_BACK
179-
| | | ├── n015-2018-07-18-11-07-57+0800__CAM_BACK__1531883530437525.jpg
180-
| | | └── ...
181-
| | ├── CAM_BACK_LEFT
182-
| | | ├── n015-2018-07-18-11-07-57+0800__CAM_BACK_LEFT__1531883530447423.jpg
183-
| | | └── ...
184-
| | └── ...
185-
| |
186-
| ├── gts
187-
| | ├── [scene_name]
188-
| | | ├── [frame_token]
189-
| | | | └── labels.npz
190-
| | | └── ...
191-
| | └── ...
192-
| |
193-
| └── annotations.json
194-
|
195-
└── test
196-
├── imgs
197-
└── annotations.json
198-
199-
```
200-
- `imgs/` contains images captured by various cameras.
201-
- `gts/` contains the ground truth of each sample. `[scene_name]` specifies a sequence of frames, and `[frame_token]` specifies a single frame in a sequence.
202-
- `annotations.json` contains meta infos of the dataset.
203-
- `labels.npz` contains `[semantics]`, `[mask_lidar]`, and `[mask_camera]` for each frame.
204-
205-
```
206-
annotations {
207-
"train_split": ["scene-0001", ...], <list> -- training dataset split by scene_name
208-
"val_split": list ["scene-0003", ...], <list> -- validation dataset split by scene_name
209-
"scene_infos" { <dict> -- meta infos of the scenes
210-
[scene_name]: { <str> -- name of the scene.
211-
[frame_token]: { <str> -- samples in a scene, ordered by time
212-
"timestamp": <str> -- timestamp (or token), unique by sample
213-
"camera_sensor": { <dict> -- meta infos of the camera sensor
214-
[cam_token]: { <str> -- token of the camera
215-
"img_path": <str> -- corresponding image file path, *.jpg
216-
"intrinsic": <float> [3, 3] -- intrinsic camera calibration
217-
"extrinsic":{ <dict> -- extrinsic parameters of the camera
218-
"translation": <float> [3] -- coordinate system origin in meters
219-
"rotation": <float> [4] -- coordinate system orientation as quaternion
220-
}
221-
"ego_pose": { <dict> -- vehicle pose of the camera
222-
"translation": <float> [3] -- coordinate system origin in meters
223-
"rotation": <float> [4] -- coordinate system orientation as quaternion
224-
}
225-
},
226-
...
227-
},
228-
"ego_pose": { <dict> -- vehicle pose
229-
"translation": <float> [3] -- coordinate system origin in meters
230-
"rotation": <float> [4] -- coordinate system orientation as quaternion
231-
},
232-
"gt_path": <str> -- corresponding 3D voxel gt path, *.npz
233-
"next": <str> -- frame_token of the previous keyframe in the scene
234-
"prev": <str> -- frame_token of the next keyframe in the scene
235-
}
236-
]
237-
}
238-
}
239-
}
240-
```
241-
242-
### Known Issues
243-
- Nuscene ([issues-721](https://github.com/nutonomy/nuscenes-devkit/issues/721)) lacks translation in the z-axis, which makes it hard to recover accurate 6d localization and would lead to the misalignment of point clouds while accumulating them over whole scenes. Ground stratification occurs in several data.
244-
245-
<p align="right">(<a href="#top">back to top</a>)</p>
246-
247-
## Getting Started
248-
249-
We provide a baseline model based on [BEVFormer](https://github.com/fundamentalvision/BEVFormer).
250-
251-
Please refer to [getting_started](docs/getting_started.md) for details.
252-
253-
<p align="right">(<a href="#top">back to top</a>)</p>
25442

25543

25644
## Challenge Timeline

0 commit comments

Comments
 (0)