google-research

This repository contains Tensorflow 2 models and a small set of labeled transparent object data from the KeyPose project. There are sample programs for displaying the data, running the models to predict keypoints on the data, and training from data.

The full dataset can be downloaded using the directions at the end of this README. It contains stereo and depth image sequences (720p) of 15 small transparent objects in 5 categories (ball, bottle, cup, mug, heart, tree), against 10 different textured backgrounds, with 4 poses for each object. There are a total of 600 sequences with approximately 48k stereo and depth images. The depth images are taken with both transparent and opaque objects in the exact same position. All RGB and depth images are registered for pixel correspondence, the camera parameters and pose are given, and keypoints are labeled in each image and in 3D.

Setup and sample programs

To install required python libraries (running in directory above keypose/):

pip3 install -r keypose/requirements.txt

To look at images and ground-truth keypoints (running in directory above keypose/):

$ python3 -m keypose.show_keypoints keypose/data/bottle_0/texture_5_pose_0/ keypose/objects/bottle_0.obj

To predict keypoints from a model (running in directory above keypose/), first download the models:

keypose/download_models.sh

Then run the predict command:

$ python3 -m keypose.predict keypose/models/bottle_0_t5/ keypose/data/bottle_0/texture_5_pose_0/ keypose/objects/bottle_0.obj

Repository layout

top level: contains the sample programs.
configs/ contains training configuration files.
data/ contains the transparent object data.
models/ contains the trained Tensorflow models.
objects/ contains simplified vertex CAD files for each object, for use in display.
tfrecords/ contains tfrecord structures for training, created from transparent object data

Data directory structure and files.

The directory structure for the data divides into one directory for each object, with sub-directories for each texture/pose sequence. Each sequence has about 80 images, numbered sequentially with a prefix.

  bottle_0/
      texture_0_pose_0/
           000000_L.png        - Left image (reference image)
           000000_L.pbtxt      - Left image parameters
           000000_R.png        - Right image
           000000_R.pbtxt      - Right image parameters
           000000_border.png   - Border of the object in the left image (grayscale)
           000000_mask.png     - Mask of the object in the left image (grayscale)
           000000_Dt.exr       - Depth image for the transparent object
           000000_Do.exr       - Depth image for the opaque object
           ...

Model naming conventions.

In the models/ directory, there are a set of sub-directories containing TF KeyPose models trained for individual and category predictions.

  bottle_0_t5/          - Trained on bottle_0, leaving out texture_5_pose_* data
  bottle_1_t5/
  ...
  bottles_t5/           - Trained on all bottles, leaving out texture_5_pose_* data
  bottles_cups_t5/
  mugs_t5/

  mugs_m0/              - Trained on all mugs except mug_0

So, for example, you can use the predict.py program to run the mugs_m0 model against any of the sequences in data/mug_0/, to show how the model performs against a mug it has not seen before. Similarly, running the model bottle_0_t5 against any sequence in data/bottle_0/texture_5_pose_* will do predictions against the test set.

Downloading data and models.

Data and models are available publicly on Google Cloud Storage.

To download all the models, use:

keypose/download_models.sh

This will populate the models/ directory.

The image files are large, and divided up by object. To get any individual object, use:

wget https://storage.googleapis.com/keypose-transparent-object-dataset/<object>.zip

Then unzip at the keypose/ directory, and it will populate the appropriate data/ directories. These are the 15 available objects:

ball_0
bottle_0
bottle_1
bottle_2
cup_0
cup_1
mug_0
mug_1
mug_2
mug_3
mug_4
mug_5
mug_6
heart_0
tree_0

Training a model.

To train a model, the data must be put into tfrecord format, and then grouped according to train and test records. There are programs below that will do this. For a quick start, you can download a set of tfrecords for training a bottle_0 model that uses texture 5 as the test set.

To get the sample tfrecords (3.4 GB), use the following:

wget https://storage.googleapis.com/keypose-transparent-object-dataset/tfrecords.zip

Then unzip at the keypose/ directory, and it will populate the appropriate tfrecords/ directories.

To train, use this command (running in directory above keypose/):

$ python3 -m keypose.trainer configs/bottle_0_t5 /tmp/model

This will run the training using Estimators, and put checkpoints and saved models into /tmp/model. It helps to have a good GPU with lots of memory. If you run out of memory on the GPU, reduce the batch size in configs/default.yaml from 32 to 16 or 8. Periodically the trainer will write out models in the saved_model format.

Generating tfrecords from image directories

To put images and associated metadata into tfrecord format, suitable for training, use the following command (running in directory above keypose/):

$ python3 -m keypose.gen_tfrecords configs/tfset_bottle_0_t5
``

The configuration file `configs/tfset_bottle_0_t5.pbtxt` is a protobuf
that names the tfrecord output directory, as well as the directory
paths for inputs for train and val.  It also lets you resize the
original images -- for example, the sample tfrecords used in training
are resized to 756x424, with a camera focal length of 400 pixels.

The output of the process is in `keypose/tfrecords/<name>/tfrecords`.
This directory is suitable for input to `trainer.py`.