This is Project 4 for UW CSE P576 Computer Vision.
Getting Started: Download the source ipynb for this notebook and use File->Upload Notebook in Colab.
This project: In this project you will build a model for stereo matching. First, you'll implement and test standard block matching as a Tensorflow graph. Next, you will add a convolutional net to the output of your block matcher, and train the network using images from the KITTI dataset.
What to turn in: Turn in a pdf of your completed ipynb notebook as well as the source .ipynb. Clearly describe any enhancements or experiments you tried in your ipynb notebook.
version 051618
Licensed under the Apache License, Version 2.0 (the "License");
This is not an official Google product.
#@title
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import numpy as np
import tensorflow as tf
import PIL.Image as pil
import matplotlib.pyplot as plt
%matplotlib inline
plt.rcParams['figure.figsize'] = (16.0, 10.0)
tf.logging.set_verbosity(tf.logging.INFO)
def image_open(filename):
"""
Returns a numpy float image with values in the range (0,1)
"""
pil_im = pil.open(filename)
im_np = np.array(pil_im).astype(np.float32)
im_np /= 255.0
return im_np
def grey_to_rgb(img):
"""
Convert greyscale to rgb image
"""
if (len(img.shape)==2):
img = np.expand_dims(img, 2)
img3 = np.repeat(img, 3, 2)
return img3
def normalise_01(im):
"""
Normalise image to the range (0,1)
"""
mx = im.max()
mn = im.min()
den = mx-mn
small_val = 1e-9
if (den < small_val):
print('image normalise_01 -- divisor is very small')
den = small_val
return (im-mn)/den
#@title
# Download KITTI data
!wget -nc https://courses.cs.washington.edu/courses/csep576/18sp/projects/project4/kitti_stereo2012.zip && unzip -n kitti_stereo2012.zip
We'll start by reading in some images to perform block matching on. The code below displays a random left image, and the average of left and right images. You should see that the pixel difference (disparity) for nearby objects is larger for foreground objects than for distant ones.
# Visualise left,right and disparity images
num_data=194
im_id=np.random.randint(num_data)
left_im=tf.image.decode_image(tf.read_file('kitti_stereo2012/training/image_0/'+str(im_id).zfill(6)+'_10.png'))
right_im=tf.image.decode_image(tf.read_file('kitti_stereo2012/training/image_1/'+str(im_id).zfill(6)+'_10.png'))
disp_im=tf.image.decode_image(tf.read_file('kitti_stereo2012/training/disp_occ/'+str(im_id).zfill(6)+'_10.png'))
left_im=tf.to_float(left_im[:,:,0])/255.0
right_im=tf.to_float(right_im[:,:,0])/255.0
disp_im=tf.to_float(disp_im[:,:,0])
sess=tf.Session()
left_im_np=sess.run(left_im)
right_im_np=sess.run(right_im)
sess.close()
plt.subplot(2,1,1)
plt.imshow(grey_to_rgb(left_im_np))
plt.axis('off')
plt.subplot(2,1,2)
plt.imshow(grey_to_rgb(0.5*left_im_np+0.5*right_im_np))
plt.axis('off')
Complete the code block below to visualise the ground truth disparity data. What is the approximate disparity range? Hint: try using np.histogram
with a set of images. Which regions of the image have valid disparity values? Try a averaging over a several images to visualise the spatial range of the depth data.
"""
********************************************************
*** TODO: write code to visualise disparity ground truth
********************************************************
"""
"""
********************************************************
"""
You will now write Tensorflow code to perform block matching. Edit the function compute_cost_volume
below to compute a 3D tensor such that cost_volume[row,col,disparity]
is the SSD between a patch of size block_match_size
from the left image, and a patch shifted by disparity
pixels in the right image. Hint: try the Tensorflow functions tf.extract_image_patches
and tf.manip.roll
.
def compute_cost_volume(left_im, right_im,block_match_size,disparity_range):
# FORNOW: random cost volume
cost_volume=tf.random_normal([370,1226,disparity_range])
"""
**************************************************************
*** TODO: write code to compute the block matching cost volume
**************************************************************
Inputs: left_im=left image (H, W)
right_im=right image (H, W)
block_match_size=size of square for patch matching
disparity_range=range of disparities to compute
Outputs: cost_volume=output cost volume (H, W, disparity_range)
"""
"""
**************************************************************
"""
return cost_volume
The code below calls compute_cost_volume
and finds the min cost at each pixel to estimate disparity. Check the output looks sensible. What is the effect of block_match_size
and disparity_range
on the results?
cost_vol=compute_cost_volume(left_im,right_im,block_match_size=3,disparity_range=80)
block_est=tf.argmin(cost_vol,axis=2)
block_est=tf.to_float(block_est)
sess=tf.Session()
cost_vol_np=sess.run(cost_vol)
block_est_np=sess.run(block_est)
sess.close()
plt.subplot(1,1,1)
plt.imshow(grey_to_rgb(normalise_01(block_est_np)))
plt.axis('off')
The code below uses the compute_cost_volume
function in a data provider that creates batches by extracting patches from the training data. In subsequent experiments, you can adjust block matching settings using the parameters in the StereoDataProvider
class.
#@title
class StereoDataProvider:
"""
Provide data for stereo images
"""
def __init__(self):
self.params={}
self.params['patch_size']=64
self.params['num_patches_per_image']=100
self.params['block_match_size']=3
self.params['disparity_range']=80
self.params['ignore_top_rows']=100
self.params['batch_size']=5
self.params['shuffle_buffer_size']=1000
self.params['prefetch_buffer_size']=2
self.params['num_epochs']=10
self.params['height']=370
self.params['width']=1226
self.input_filenames=[]
self.target_filenames=[]
def get_batch(self):
# get a batch of patch data
dataset=tf.data.Dataset.from_tensor_slices((self.input_filenames,self.target_filenames))
dataset=dataset.repeat(self.params['num_epochs'])
dataset=dataset.map(self.read_images)
dataset=dataset.map(self.preprocess_images)
dataset=dataset.map(self.get_patches)
dataset=dataset.apply(tf.contrib.data.unbatch())
dataset=dataset.shuffle(buffer_size=self.params['shuffle_buffer_size'])
dataset=dataset.batch(self.params['batch_size'])
dataset=dataset.prefetch(self.params['prefetch_buffer_size'])
iterator=dataset.make_one_shot_iterator()
dataset_element=iterator.get_next()
return dataset_element
def get_preprocessed_images(self):
# get preprocessed images for inference
dataset=tf.data.Dataset.from_tensor_slices((self.input_filenames,self.target_filenames))
dataset=dataset.map(self.read_images)
dataset=dataset.map(self.preprocess_images)
dataset=dataset.batch(1)
dataset=dataset.prefetch(1)
iterator=dataset.make_one_shot_iterator()
dataset_element=iterator.get_next()
return dataset_element
def read_images(self,input_filenames,target_filename):
left_im=tf.image.decode_image(tf.read_file(input_filenames[0]))
right_im=tf.image.decode_image(tf.read_file(input_filenames[1]))
disp_im=tf.image.decode_image(tf.read_file(target_filename))
height=self.params['height']
width=self.params['width']
left_im=left_im[0:height,0:width,0]
right_im=right_im[0:height,0:width,0]
disp_im=disp_im[0:height,0:width,0]
# set shape explicitly (ambiguous due to image read)
left_im.set_shape((height,width))
right_im.set_shape((height,width))
disp_im.set_shape((height,width))
input_data=tf.to_float(tf.stack((left_im,right_im),2))/255.0 # images in range 0-1
target_disp=tf.to_float(tf.expand_dims(disp_im,2)) # disparity in pixels
return (input_data,target_disp)
def preprocess_images(self,input_data,target_disp):
# preprocess images to help depth learning
block_match_size=self.params['block_match_size']
disparity_range=self.params['disparity_range']
left_im=input_data[:,:,0]
right_im=input_data[:,:,1]
cost_volume=compute_cost_volume(left_im,right_im,block_match_size,disparity_range)
input_data=cost_volume
#input_data=tf.concat((cost_volume,input_data),axis=2)
return(input_data,target_disp)
def get_patches(self,input_data,target_disp):
# Extract patches from the input and targets
num_patches=self.params['num_patches_per_image']
patch_size=self.params['patch_size']
joined=tf.concat((input_data,target_disp),2)
joined_bands=joined.get_shape()[2] # static shape, inferred from graph
patch_list=[]
# don't select patches in first disparity range pixels
# or top rows of pixels where there is no disparity data
joined=joined[self.params['ignore_top_rows']:-1,self.params['disparity_range']:-1,:]
for _ in range(num_patches):
patch = tf.random_crop(joined,[patch_size,patch_size,joined_bands])
patch_list.append(patch)
patches=tf.stack(patch_list)
input_patches=patches[:,:,:,0:-1]
target_patches=patches[:,:,:,-1]
target_patches=tf.expand_dims(target_patches,3)
return (input_patches,target_patches)
# The following function returns a stereo data provider for KITTI dataset
def get_kitti_2012_dataset(image_id, data_dir):
# return kitti 2012 dataset corresponding to image_ids
input_filenames=[]
target_filenames=[]
for i in image_id:
left_filename=data_dir+'/image_0/'+str(i).zfill(6)+'_10.png'
right_filename=data_dir+'/image_1/'+str(i).zfill(6)+'_10.png'
disp_filename=data_dir+'/disp_occ/'+str(i).zfill(6)+'_10.png'
input_filenames.append((left_filename,right_filename))
target_filenames.append(disp_filename)
data_prov=StereoDataProvider()
data_prov.input_filenames=input_filenames
data_prov.target_filenames=target_filenames
return data_prov
We will now launch a Tensorboard so that we can monitor the progress of training. The code below provides a function launch_tensorboard
that allows us to access tensorbaord form the colab VM (this is from https://github.com/mixuala/colab_utils). Run this code and check that you can launch and connect to a Tensorboard (there will be nothing there yet).
#@title
# Copyright 2018 Michael Lin. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Contains code for adding common services to non-persistent `colaboratory` VM sessions
When training on `colaboratory` VMs it it often useful to monitor the session via
`tensorboard`. This script helps you launches tensorboard on the `colaboratory` VM and
uses `ngrok` to create a secure introspective tunnel to access tensorboard via public URL.
see: https://stackoverflow.com/questions/47818822/can-i-use-tensorboard-with-google-colab
************************************
* A simple working script *
************************************
```
import os
import colab_utils.tboard
# set paths
ROOT = %pwd
LOG_DIR = os.path.join(ROOT, 'log')
# will install `ngrok`, if necessary
# will create `log_dir` if path does not exist
tboard.launch_tensorboard( bin_dir=ROOT, log_dir=LOG_DIR )
```
"""
import os
import requests
import shutil
import subprocess
__all__ = [
'install_ngrok',
'launch_tensorboard',
]
def __shell__(cmd, split=True):
# get_ipython().system_raw(cmd)
result = get_ipython().getoutput(cmd, split=split)
if result and not split:
result = result.strip('\n')
return result
# tested OK
def install_ngrok(bin_dir="/tmp"):
""" download and install ngrok on local vm instance
Args:
bin_dir: full path for the target directory for the `ngrok` binary
"""
TARGET_DIR = bin_dir
CWD = os.getcwd()
is_grok_avail = os.path.isfile(os.path.join(TARGET_DIR,'ngrok'))
if is_grok_avail:
print("ngrok installed")
else:
import platform
plat = platform.platform() # 'Linux-4.4.64+-x86_64-with-Ubuntu-17.10-artful'
if 'x86_64' in plat:
os.chdir('/tmp')
print("calling wget https://bin.equinox.io/c/4VmDzA7iaHb/ngrok-stable-linux-amd64.zip ..." )
get_ipython().system_raw( "wget https://bin.equinox.io/c/4VmDzA7iaHb/ngrok-stable-linux-amd64.zip" )
print("calling unzip ngrok-stable-linux-amd64.zip ...")
get_ipython().system_raw( "unzip ngrok-stable-linux-amd64.zip" )
os.rename("ngrok", "{}/ngrok".format(TARGET_DIR))
os.remove("ngrok-stable-linux-amd64.zip")
is_grok_avail = os.path.isfile(os.path.join(TARGET_DIR,'ngrok'))
os.chdir(TARGET_DIR)
if is_grok_avail:
print("ngrok installed. path={}".format(os.path.join(TARGET_DIR,'ngrok')))
else:
# ValueError: ERROR: ngrok not found, path=
raise ValueError( "ERROR: ngrok not found, path=".format(TARGET_DIR) )
else:
raise NotImplementedError( "ERROR, ngrok install not configured for this platform, platform={}".format(plat))
os.chdir(CWD)
return
# tested OK
def launch_tensorboard(bin_dir="/tmp", log_dir="/tmp", retval=False):
"""returns a public tensorboard url based on the ngrok package
checks if `ngrok` is available, and installs, if necessary, to `bin_dir`
launches tensorboard, if necessary
see: https://stackoverflow.com/questions/47818822/can-i-use-tensorboard-with-google-colab
Args:
bin_dir: full path for the target directory for the `ngrok` binary
log_dir: full path for the tensorflow `log_dir`
Return:
public url for tensorboard if retval==True
NOTE: the method will print a link to stdout (cell output) for the tensorflow URL.
But the link printed from the return value has an extra "%27" in the URL which causes an error
"""
install_ngrok(bin_dir)
if not tf.gfile.Exists(log_dir): tf.gfile.MakeDirs(log_dir)
# check status of tensorboard and ngrok
ps = __shell__("ps -ax")
is_tensorboard_running = len([f for f in ps if "tensorboard" in f ]) > 0
is_ngrok_running = len([f for f in ps if "ngrok" in f ]) > 0
print("status: tensorboard={}, ngrok={}".format(is_tensorboard_running, is_ngrok_running))
if not is_tensorboard_running:
get_ipython().system_raw(
'tensorboard --logdir {} --host 0.0.0.0 --port 6006 &'
.format(log_dir)
)
is_tensorboard_running = True
if not is_ngrok_running:
# grok should be installed in /tmp/ngrok
get_ipython().system_raw('{}/ngrok http 6006 &'.format(bin_dir))
is_ngrok_running = True
# get tensorboard url
# BUG: getting connection refused for HTTPConnectionPool(host='localhost', port=4040)
# on first run, retry works
import time
time.sleep(3)
retval = requests.get('http://localhost:4040/api/tunnels')
tensorboard_url = retval.json()['tunnels'][0]['public_url'].strip()
print("tensorboard url=", tensorboard_url)
if retval:
return tensorboard_url
# launch tensorboard
root_dir=%pwd
log_dir=root_dir+'/stereo_model1'
launch_tensorboard(bin_dir=root_dir,log_dir=log_dir)
The following are some useful shell commands to check tensorboard status and remove/download training checkpoints as needed.
#!ps ax | grep tensorboard
#!ps ax | grep ngrok
#!ls {log_dir}
#!rm -r {log_dir}/*
#from google.colab import files
#!zip -r log_dir.zip {log_dir}
#files.download('log_dir.zip')
#uploaded = files.upload()
#!unzip log_dir.zip
#!mv content/* .
#!rmdir content
Now define a Tensorflow model that takes the block matching cost volume as input and filters it using a convolutional network to estimate the depth. Your model should output a pred
tensor containing disparity estimates, this should be the same shape as the target_disp
.
def stereo_model1(features,labels,mode):
input_data=features
target_disp=labels
# FORNOW: predict single value for depth
pred = tf.zeros_like(tf.expand_dims(input_data[:,:,:,0],3))
pred = tf.contrib.layers.bias_add(pred)
# TODO: set border width to exclude edge pixels that are invalid due to convolution
border_width=3
"""
*********************************************
*** TODO: define a model for depth estimation
*********************************************
Inputs: input_data=input features (B, H, W, NF)
target_disp=disparity ground truth (B, H, W, 1)
Outputs: pred=disparity predictions (B, H, W, 1)
"""
"""
*********************************************
"""
if mode == tf.estimator.ModeKeys.PREDICT:
return tf.estimator.EstimatorSpec(mode=mode, predictions=pred)
# don't compute loss where there is no ground truth,
# or over invalid part of convolution
valid=tf.to_float(tf.greater(target_disp,0.01))
N=border_width
if (N>1):
paddings=[[0,0],[N,N],[N,N],[0,0]]
border=tf.pad(tf.ones_like(target_disp[:,N:-N,N:-N,:]),paddings)
valid=valid*border
# block matching loss
cost_volume=input_data
block_estimate=tf.argmin(cost_volume,axis=3)
block_estimate=tf.to_float(tf.expand_dims(block_estimate,3))
# loss
loss = tf.losses.mean_squared_error(target_disp,pred,weights=valid)
if mode == tf.estimator.ModeKeys.TRAIN:
#optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.001)
optimizer = tf.train.AdamOptimizer(learning_rate=0.001)
train_op = optimizer.minimize(loss=loss, global_step=tf.train.get_global_step())
tf.summary.image("pred",pred)
tf.summary.image("target",target_disp)
tf.summary.image("block",block_estimate)
return tf.estimator.EstimatorSpec(mode=mode, loss=loss, train_op=train_op)
# if mode == tf.estimator.ModeKeys.EVAL
mse=tf.metrics.mean_squared_error(target_disp,pred,weights=valid)
block=tf.metrics.mean_squared_error(target_disp,block_estimate,weights=valid)
eval_metric_ops = {"mse": mse, "block": block}
tf.summary.scalar("mse", mse[1])
tf.summary.scalar("block", block[1])
return tf.estimator.EstimatorSpec(mode=mode, loss=loss, eval_metric_ops=eval_metric_ops)
Now, train your model using the tf.estimator.train_and_evaluate
function. A good baseline for comparison is the performance of the block matching estimate, reported as block
in Tensorboard. Experiment with your model configuration, inputs and parameter settings to try to get a good stereo estimates. Record your findings in the notebook below. Note: if you change the model, you will need to erase previous checkpoints using the shell commands above. Also, if you model is large it probably will run faster if you set the runtime to GPU, via the Runtime->Change Runtime Type menu.
data_dir="kitti_stereo2012/training"
num_data=194
num_eval=15
num_train=num_data-num_eval
train_ids=np.arange(0,num_train)
eval_ids=np.arange(num_train,num_train+num_eval)
train_data=get_kitti_2012_dataset(train_ids, data_dir)
eval_data=get_kitti_2012_dataset(eval_ids, data_dir)
# create model
stereo_regressor = tf.estimator.Estimator(
model_fn=stereo_model1, model_dir=log_dir)
# train model
train_spec = tf.estimator.TrainSpec(input_fn=train_data.get_batch, max_steps=100000)
eval_spec = tf.estimator.EvalSpec(input_fn=eval_data.get_batch, steps=1000, start_delay_secs=0, throttle_secs=120)
tf.estimator.train_and_evaluate(stereo_regressor, train_spec, eval_spec)
The code below performs inference using the model on full size images from the eval set.
pred=stereo_regressor.predict(input_fn=eval_data.get_preprocessed_images)
num_eval=4
plt.rcParams['figure.figsize'] = (16.0, num_eval*4.0)
for i in range(num_eval):
img_pred=next(pred)
plt.subplot(num_eval,1,i+1)
plt.imshow(grey_to_rgb(normalise_01(img_pred)))
plt.axis('off')