During the review of our paper, or even during the development phase of the dataset, we received some questions which could not be answered in the paper due the lack of space. We however felt that we should answer them. This is what we do here.

How can I project points in world coordinates from depth maps?

Our depth maps feature the Euclidean distance of the objects to the focal point for each pixel. We acknowledge that it is not the standard way of encoding depth maps, but, for depth estimation tasks, we found interesting the idea of providing maps giving the real distance of objects to the camera. This particular encoding means that the formula required for points projection needs to be adapted.

For the depth \(z\) encoded at pixel \((x_i, y_i)\) in a frame of size \((h,w)\), the 3D position of the corresponding point in the camera frame of reference \(c\) is given by: $$ (x_c,y_c,z_c) = r_{xy} * (x_i-h/2, y_i-h/2, f)\textrm{, with } r_{xy} = \frac{z}{\sqrt{(x_i-h/2)^2+(y_i-h/2)^2+f^2}} $$ where \(f\) is the focal length of the camera ( \(f=h/2=w/2\) for Mid-Air).

Converting these coordinates to the drone body frame \(b\) simply be obtained by the following transformation: $$ (x_b, y_b, z_b) = (z_c, y_c, x_c) $$

Finally, getting the coordinates of the point in the world frame of reference \(w\) just constist in projecting it from the drone body frame by using the following relation: $$ (x_w, y_w, z_w) = \textbf R_d(x_b, y_b, z_b)+ \textbf p_d $$ where \(\textbf R_d\) and \(\textbf p_d\) are respectively the rotation matrix of the attitude of the drone and the position of the drone in the world frame of reference.

Why doesn't the dataset feature {insert what you want here}? It's a pity because it limits potential applications.

Developing such a dataset is time consuming and possibilities are endless. Therefore, we had to make choices on the features to prioritize. We did that based on a series of constraints which are ours and may unfortunately be different from yours.

In a general manner, we tried to be complimentary to already existing datasets in terms of proposed features. We believe that performances of good robust algorithms for tasks such as ego-motion or depth estimation should be similar on different datasets. We therefore think that learning simultaneously on different datasets to increase the diversity of the training content should be considered while waiting for a all-in-one, ultra-complete dataset... if it ever comes.

For example, we received questions about te lack of moving and man-made objects. Our dataset features only few moving and/or man-made objects simply because there are not a lot of them in typical unstructured environments. There are however several of other datasets featuring such objects (the KITTI dataset is one of them for example). These datasets can be merged together (and with ours if you need some particular feature of Mid-Air) to combine all of their features.

Finally, we want to emphasize that we will maintain and maybe continue to develop our dataset. So, a feature that is missing now may be be added to it later on.

Why didn't you simulate camera imperfections (lens distortion, chromatic aberrations, under/overexposure, ...)?

Most of these imperfections can easily be simulated in postprocessing on existing images. Since there are so much different camera models with different imperfections, we didn't want to restrict everybody to just one imperfection model. Rather than that, we prefer to leave everyone free of implementing the imperfection model which suits its use case.

Why didn't you generate severe weather conditions (rain, snow, heavy wind, ...)?

There are two main reasons for this. The first one is that drones are not supposed to be flying in such severe weathers. This therefore strongly lessen the need for the presence of such additions in the dataset. And from a technical point of view, it appeared that, with our current render pipeline, it is difficult to guarantee that rain- and snowfalls will be consistent between two consecutive frames.

Why is motion blur mentioned as a rendering limitation, although Unreal engine 4 supports generating motion blur?

Unreal Engine 4 (UE4) indeed supports motion blur, but not for all renderings. We use the Airsim simulator as an API to UE4. Airsim uses RenderTargets to simulate the different onboard cameras. Unfortunately, it appears that the motion blur feature is not supported by RenderTargets.

Why didn't you include optical flow? It's so trivial to get it in CG...

We agree that, in some softwares (such as Blender for example), it is trivial to get optical flow. However, as for motion blur, optical flow is not available for Unreal Engine 4 RenderTargets (which are the simulated cameras).