Tanks and Temples Benchmark

We present both training data and testing data. The testing datasets are organized into two groups: intermediate and advanced. The intermediate group contains sculptures, large vehicles, and house-scale buildings with outside-looking-in camera trajectories. The advanced group contains large indoor scenes imaged from within and large outdoor scenes with complex geometric layouts and camera trajectories.

Quickstart

For each scene, we provide a high-resolution video. Your task is to reconstruct a 3D model from it. Once you have reconstructed the entire intermediate set or the advanced set, you can submit your results for evaluation and put your name on the leaderboard. For quick start, we provide a uniformly distributed set of frames from each video. They can be used as input to off-the-shelf reconstruction systems such as COLMAP. (See our tutorial page for instructions on how to setup a workable system.)
For advanced users, please download the videos as they are the raw 4K videos captured with a high-end camera. Additionally, for training datasets, ground-truth geometry is provided for training purposes.

For your convenience, we provide a python downloader that can be used to download some or all of the data. We highly recommend using the downloader. It will automatically check MD5 to make sure the downloaded files are complete.

Python Downloader

Usage:

> python download_t2_dataset.py [-h] [-s] [--modality MODALITY] [--group GROUP] [--unpack_off] [--calc_md5_off]

Example 1: download all videos for intermediate and advanced scenes
> python download_t2_dataset.py --modality video --group both

Example 2: download image sets for intermediate scenes (quick start setting)
> python download_t2_dataset.py --modality image --group intermediate

Example 3: show the status of downloaded data
> python download_t2_dataset.py -s

Intermediate

		video	image set
Family
Francis
Horse
Lighthouse
M60
Panther
Playground
Train
download entire group

Advanced

		video	image set
Auditorium
Ballroom
Courtroom
Museum
Palace
Temple
download entire group

Training Data

	ground truth	video	image set
Barn
Caterpillar
Church
Courthouse
Ignatius
Meeting room
Truck

Results on Training Data

This table contains the COLMAP reconstruction results for the training data image sets provided in the table above. The image sets are sampled at a frame rate of 1 fps from the video while the video was recorded with 29.97 fps. To find the corresponding frame F to the image I you need to calculate: F = int(I*29.97), starting with I=0. The reconstructions are made with an "out of the box" COLMAP configuration and can be downloaded as *.ply files together with the camera poses (stored in *.log file format). The alignment text file contains the transformation matrix to align the COLMAP reconstruction to the according ground-truth point cloud, and the *.json crop files contain the bounding box coordinates for each model. If you want to work on all training set models you can speed up the download process by getting the zip file here, containing all the necessary files.

	Reconstruction	Camera Poses	Alignment	Cropfiles
Barn
Caterpillar
Church
Courthouse
Ignatius
Meeting room
Truck

Individual Scans

This table contains individual scans from the training dataset. Each zip-file contains a set of pre-aligned and cleaned *.ply files, and a *.txt file with the origin of the regarding scanner positions. E.g., if you merge all the scans of a scene into one file, and use voxel-grid downsampling to make a uniform point density in the overlapping areas, you will end up with the ground truth file. This data can be useful if you want to calculate normals or meshes for the laser scanned GT files from the training dataset.

	ground truth	individual scans
Barn
Caterpillar
Church
Courthouse
Ignatius
Meeting room
Truck

Camera Calibration

We do not explicitly provide the exact camera intrinsics to encourage an individual optimization of it. Some methods however need initial parameters like the focal length and the principal point to run. For this we found a pinhole camera model with the following parameters working well for both camera setups:

Principal point offset: x₀ = W/2, y₀ = H/2 Focal length: f_x = f_y = 0.7 * W

With W and H being the width and height of the frames in pixels.

License

This Dataset is made freely available to academic and non-academic entities for non-commercial purposes such as academic research, teaching, scientific publications, or personal experimentation. Permission is granted to use the data given that you agree to our license terms.