In this project I trained a Fully Convolutional Network (FCN) to classify each pixel of an image as ROAD or NOT ROAD.
I used the KITTI Dataset avaialable at https://www.cvlibs.net/datasets/kitti/eval_road.php
The dataset consists of 289 training and 290 test images. It contains three different categories of road scenes:
- uu - urban unmarked (98/100)
- um - urban marked (95/96)
- umm - urban multiple marked lanes (96/94)
- urban - combination of the three above
Ground truth has been generated by manual annotation of the images and is available for two different road terrain types:
- road - the road area, i.e, the composition of all lanes, and
- lane - the ego-lane, i.e., the lane the vehicle is currently driving on (only available for category "um").
Ground truth is provided for training images only.
The original paper that made available the KITTI Dataset by Jannik Fritsch et al. can be found at https://www.cvlibs.net/publications/Fritsch2013ITSC.pdf
The FCN was based on the paper by Jonathan Long et al. https://people.eecs.berkeley.edu/~jonlong/long_shelhamer_fcn.pdf
Following the paper by Jonathan Long, it uses the original VGG 16 network and replaces the fully connected layers with three 1x1 convolutions for layers 7, 4 and 3, adding skip layers between them.
- keep_prob: 0.5
- learning_rate: 0.0005
- epochs: 30
- batch_size: 8
After several trials, choosing a keep probability of 0.5, a learning rate of 0.0005 and 30 epochs in batches of 8 images was the run with good results. The loss continually decreased and in the 30th epoch it ended between 0.0200 and 0.0300.
...
- loss 0.0242 (images: 8, labels: 8)
- loss 0.0289 (images: 8, labels: 8)
- loss 0.0181 (images: 1, labels: 1)
Running epoch 30/100
...
I run the final model for 100 epochs in batches of 8 images. It took 50 minutes to complete (GTX 1080) and reached a final loss of about 0.0100
The final network generated the following TensorFlow model when saved:
SIZE NAME
----------------------------------------
513M - model_01.pb
513M - model_01.meta
513M - model_01.ckpt.meta
4.8K - model_01.ckpt.index
1.6G - model_01.ckpt.data-00000-of-00001
A few examples from the best model run:
I run the final model in some videos from my dashcam. The results are remarkable good in portions of the route with similar characteristics than the KITTI dataset.
Considering that none of these images were used for training and the video was completely different, the predictions look good:
Complete videos:
When building the initial model, I didn't consider the kernel_initializer parameter in the layers (it used the default initializer). That caused the model to generate segmentations with noisy borders:
kernel_initializer with default values | kernel_initializer with truncated normal values |
---|---|