Technical specifications

Hereafter, we provide all details about sensors present in our dataset. This includes their type, their capture rate, their unit, their reference frame and the sensor positions. Details on how they are stored in the dataset are given on the Data organization page.

This page is organized as follow:

Sensors positioning and Frames of reference definition

Winter, sunset Figure 1: Sensors position
Winter, sunset Figure 2: Frames of reference definition

Figure 1 shows the sensor locations on the drone used to generate our dataset. Cameras are represented by the pyramids. The blue cube shows the IMU and the GPS receiver locations.

Figure 2 illustrates the frames of reference used for all position-related data. The World frame is defined at the starting point of the trajectory and oriented such that the drone yaw is equal to zero. Other axis are horizontal. The Body frame is rigidly attached to the drone with its origin corresponding to the position of the IMU and GPS receiver. All frames use the North, East, Down (NED) axes convention.

Positioning data

For all position-related data, we use the North, East, Down (NED) axes convention. Distances are expressed in meters, rotations in quaternions, angles in radians, and time in seconds. The positioning information stored in the dataset is as follows:

It is important to note that the GPS position is not given by the standard longitude, latitude and altitude information, but by a simple position in meters expressed in the World frame. This position in meters is obtained by projecting the position given with the longitude/latitude/altitude format relatively to the first point of the trajectory.

Additionally, our dataset stores some information about the state of the sensors. The following sensors data are made available:

Visual data

Each trajectory record comes with eight video streams corresponding to the (1) left, (2) right and (3) down-looking RGB camera views and the (4) segmentation, (5) depth, (6) normals, (7) disparity and (8) occlusion maps seen by the left camera. Each video stream consists in a set of successively numbered pictures stored in a dedicated directory. The image formats and content are the following:

Camera intrinsic matrix

The cameras used to record visual data all share the same intrinsic matrix. For an image of height \(h\) and width \(w\), this matrix is given by: $$\mathbf{K} = \begin{bmatrix} f_x & 0 & c_x \\ 0 & f_y & c_y \\ 0 & 0 & 1 \end{bmatrix} \quad \text{with} \quad f_x=c_x= w/2 \ \text{and} \ f_y=c_y= h/2 $$ This corresponds to visual field of view of 90 degrees.

Semantic segmentation classes

Id Class content
3Dirt ground
4Ground vegetation
5Rocky ground
8Water plane
9Man-made construction
11Train track
12Road sign
13Other man-made objects

Sensors summary table

Data Sampling freq. Ref. frame Unit Misc.
Ground-truth position100HzWorld[m]
Ground-truth velocity100HzWorld[m/s]
Ground-truth acceleration100HzWorld[m/s²]
Ground-truth attitude100HzWorld[quaternion]
Ground-truth angular velocity100HzBody[rad/s]
IMU acceleration100HzBody[m/s²]Initial bias estimate provided
IMU angular velocity100HzBody[rad/s]Initial bias estimate provided
GPS position1HzWorld[m]
GPS velocity1HzWorld[m/s]
GPS signal information1Hzn/an/aIncludes number of visible satellites, GDOP, PDOP, HDOP and VDOP
Downward-looking RGB picture25Hzn/an/a90° fov, Captured by downward-looking camera
Right stereo RGB picture25Hzn/an/a90° fov, Captured by right camera
Left stereo RGB picture25Hzn/an/a90° fov, Captured by left camera
Stereo disparity map25Hzn/a[pixel]90° fov, Captured by left camera
Stereo occlusion map25Hzn/an/a90° fov, Captured by left camera
Depth map25Hzn/a[m]90° fov, Captured by left camera
Surfaces normal map25HzBodyn/a90° fov, Captured by left camera
Semantic segmentation map25Hzn/an/a90° fov, Captured by left camera