In [1]:
from features import *
from detection import *
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import glob
import csv
import pandas as pd
import sklearn.utils
import pickle
from sklearn.svm import LinearSVC
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
%matplotlib inline
%load_ext autoreload
%autoreload 2
Vehicle Detection Project
The goals / steps of this project are the following:
All the code for the feature extraction part of the car detection can be down in features.py
.
Three main features were extracted:
The spatial provides us information of the spatial location of the pixels in the image. The color histogram features gives gives insight in the color distribution of the image, and finally the HOG features yields details on the edges and shape of the cars. All three of these features combined are important features to help the classifier distinguing car from non-car images.
A sample of car and non-car images along with their three features were compared below
In [22]:
img = mpimg.imread('./vehicles/GTI_MiddleClose/image0190.png', format=np.uint8)
noncar_img = mpimg.imread('./non-vehicles/GTI/image119.png', format=np.uint8)
cvt_img = cv2.cvtColor(img, cv2.COLOR_RGB2HSV)
cvt_noncar_img = cv2.cvtColor(noncar_img, cv2.COLOR_RGB2HSV)
# HOG
orientation = 9
px_per_cell = 8
cell_per_block = 2
# Color histogram
nbins = 16
bins_range = (0, 256)
spatial_size = (16, 16)
bin_features = bin_spatial(cvt_img, size=spatial_size)
color_features = color_hist(cvt_img, nbins=nbins, bins_range=bins_range)
hog_features, hog_img = get_hog_features(cvt_img[:, :, 0], orientation, pix_per_cell=px_per_cell, cell_per_block=cell_per_block,
vis=True, feature_vec=True)
print("Binned color features shape", bin_features.shape)
print("Color histogram features shape", color_features.shape)
print("HOG features shape", hog_features.shape)
bin_noncar_features = bin_spatial(cvt_noncar_img, size=spatial_size)
color_noncar_features = color_hist(cvt_noncar_img, nbins=32, bins_range=bins_range)
noncar_hog_features, noncar_hog_img = get_hog_features(cvt_noncar_img[:, :, 0], orientation, pix_per_cell=px_per_cell, cell_per_block=cell_per_block,
vis=True, feature_vec=True)
f, axarr = plt.subplots(2, 4, figsize=(24, 9))
axarr[0][0].imshow(img)
axarr[0][0].set_title("Car")
axarr[1][0].imshow(noncar_img)
axarr[1][0].set_title("Non car")
axarr[0][1].imshow(hog_img, cmap='gray')
axarr[0][1].set_title("Car HOG")
axarr[0][2].plot(bin_features)
axarr[0][2].set_title("Car binned color features")
axarr[0][3].plot(color_features)
axarr[0][3].set_title("Car color histogram")
axarr[1][1].imshow(noncar_hog_img, cmap='gray')
axarr[1][1].set_title("Non car HOG")
axarr[1][2].plot(bin_noncar_features)
axarr[1][2].set_title("Non car binned color features")
axarr[1][3].plot(color_noncar_features)
axarr[1][3].set_title("Non car color histogram")
Out[22]:
After experimenting with various combination of values, the YCrCb
color space and HOG parameters of orientations=8
, pixels_per_cell=(8, 8)
and cells_per_block=(2, 2)
.
Not only did the output yield good results in the classification of cars, but it has a good number of elements in the feature vector, i.e. 2580 features. Having a higher dimensionality makes the training of the classifier much more slower.
In [ ]:
def write_csv(data_folder, dst_folder):
train_files = list(glob.glob(data_folder + '/*/*.png'))
labels = [path.split('/')[1] for path in train_files]
imgtype = [path.split('/')[2] for path in train_files]
image_names = [path.split('/')[-1] for path in train_files]
# save csv
dst_file = data_folder.split('/')[-1] + ".csv"
csv_file = dst_folder + "/" + dst_file
with open(csv_file, "w") as f:
writer = csv.writer(f)
writer.writerow(['image_path', 'image_name', 'label', 'image_type'])
for index, feature in enumerate(train_files):
writer.writerow([train_files[index], image_names[index], labels[index], imgtype[index]])
print("Saved:", csv_file)
In [ ]:
data_vehicles_folder = "./vehicles"
data_non_vehicles_folder = "./non-vehicles"
dst_folder = "./"
write_csv(data_vehicles_folder, dst_folder)
write_csv(data_non_vehicles_folder, dst_folder)
After writing the csv file, the data is loaded using pandas. It can be seen that there is an approximately equal amount of positive and negative samples (cars vs non-car images). The dataset also contains car images from different angles. The dataset images are explored further below.
In [4]:
vehicles = pd.read_csv('./vehicles.csv')
nonvehicles = pd.read_csv('./non-vehicles.csv')
print("Number of vehicle images:", vehicles.shape[0])
print("Nubmer of non-vehicle images:", nonvehicles.shape[0])
data = pd.concat((vehicles, nonvehicles), ignore_index=True)
#data = sklearn.utils.shuffle(data)
data.head()
Out[4]:
In [5]:
classes = data['label'].unique()
num_classes = len(classes)
types = vehicles['image_type'].unique()
num_types = len(types)
img_per_row = 5
plt.figure(figsize=(10, 4))
for i, img_type in enumerate(types):
plt.subplot(num_classes, img_per_row, i + 1)
imgpath = vehicles.loc[(vehicles['image_type'] == img_type) & (vehicles['label'] == 'vehicles')].sample(1)['image_path']
img = mpimg.imread(imgpath.values[0])
plt.imshow(img)
plt.title("{}".format(img_type), fontsize=10)
plt.axis('off')
for i in range(5, 10):
plt.subplot(num_classes, img_per_row, i + 1)
imgpath = data.loc[(data['label'] == 'non-vehicles')].sample(1)['image_path']
img = mpimg.imread(imgpath.values[0])
plt.imshow(img)
plt.title("Non vehicle", fontsize=10)
plt.axis('off')
In [6]:
color_space = 'YCrCb' # Can be RGB, HSV, LUV, HLS, YUV, YCrCb
orient = 9 # HOG orientations
pix_per_cell = 8 # HOG pixels per cell
cell_per_block = 2 # HOG cells per block
hog_channel = 0 # Can be 0, 1, 2, or "ALL"
spatial_size = (16, 16) # Spatial binning dimensions
hist_bins = 16 # Number of histogram bins
spatial_feat = True # Spatial features on or off
hist_feat = True # Histogram features on or off
hog_feat = True # HOG features on or off
car_features = extract_features(vehicles['image_path'], color_space=color_space,
spatial_size=spatial_size, hist_bins=hist_bins,
orient=orient, pix_per_cell=pix_per_cell,
cell_per_block=cell_per_block,
hog_channel=hog_channel, spatial_feat=spatial_feat,
hist_feat=hist_feat, hog_feat=hog_feat)
notcar_features = extract_features(nonvehicles['image_path'], color_space=color_space,
spatial_size=spatial_size, hist_bins=hist_bins,
orient=orient, pix_per_cell=pix_per_cell,
cell_per_block=cell_per_block,
hog_channel=hog_channel, spatial_feat=spatial_feat,
hist_feat=hist_feat, hog_feat=hog_feat)
#labels = np.array(data['label'])
labels = np.hstack((np.ones(len(car_features)), np.zeros(len(notcar_features))))
features = np.vstack((car_features, notcar_features)).astype(np.float64)
# Fit a per-column scaler
X_scaler = StandardScaler().fit(features)
# Apply the scaler to X
features_scaled = X_scaler.transform(features)
PCA is performed to reduce the dimensionality of the data. This helps a good amount to deal with the high dimensionality of data which slows down the training time significantly. The choice of 25 components were kept, as the explained variance plateaus after 25 principal components as seen below.
In [7]:
%%time
from sklearn.decomposition import RandomizedPCA, PCA
n_components = 25
pca = PCA(n_components=n_components, whiten=True)
pca = pca.fit(features_scaled)
pca_features = pca.transform(features_scaled)
explained_variance = pca.explained_variance_ratio_
components = pca.components_
print("Total explained variance by {} principal components: {:.4f}".format(n_components, sum(explained_variance[:n_components])))
# plot pca
plt.xlabel('Dimension')
plt.ylabel('Explained Variance')
plt.title("Explained Variances of PCA")
_ = plt.plot(pca.explained_variance_ratio_)
In [8]:
X_train, X_test, y_train, y_test = train_test_split(pca_features, labels, test_size=0.20, random_state=42)
print("Training data:", X_train.shape)
print("Testing data:", X_test.shape)
In [9]:
data_dict = {}
data_dict['features_scaled'] = features_scaled
data_dict['labels'] = labels
data_dict['spatial_feat'] = spatial_feat
data_dict['hist_feat'] = hist_feat
data_dict['hog_feat'] = hog_feat
data_dict['orient'] = orient
data_dict['color_space'] = color_space
data_dict['pix_per_cell'] = pix_per_cell
data_dict['cell_per_block'] = cell_per_block
data_dict['hog_channel'] = hog_channel
data_dict['spatial_size'] = spatial_size
data_dict['hist_bins'] = hist_bins
data_dict['pca'] = pca
data_dict['scaler'] = X_scaler
save_file = './pca_data.pkl'
with open(save_file, 'wb') as f:
pickle.dump(data_dict, f)
In [10]:
%%time
from sklearn.svm import SVC
clf_svc = SVC(kernel='rbf', class_weight='balanced', probability=True, C=10, gamma=0.1)
clf_svc.fit(X_train, y_train)
print(clf_svc)
In [11]:
from sklearn.externals import joblib
fn = './vehicle_vs_non_vehicle_model_svcrbf_pca.pkl'
joblib.dump(clf_svc, fn)
print('model saved')
In [12]:
%%time
acc = clf_svc.score(X_test, y_test)
print('Accuracy on test data is {}'.format(acc))
#predictions = clf_svc.predict(X_test)
We use a sliding window search to find the car in the camera image. Specific locations are searched depending on if the car is close, mid and far range of the camera's perspective. The search space is visualized below in green bounding boxes, with the detected car in blue.
Cars between pixels 400 and 500 on the y direction appear smaller since they are further from the car, hence this area will be searched with a smaller window size. As we reach close and mid range, the window size is increased along with the start of the x and y locations since we want to maximize the range of search.
In [233]:
#img = mpimg.imread('./bbox-example-image.jpg', format=np.uint8)
img = mpimg.imread('./test_images/test1.jpg', format=np.uint8)
color_space = 'YCrCb' # Can be RGB, HSV, LUV, HLS, YUV, YCrCb
orient = 9 # HOG orientations
pix_per_cell = 8 # HOG pixels per cell
cell_per_block = 2 # HOG cells per block
hog_channel = 0 # Can be 0, 1, 2, or "ALL"
spatial_size = (16, 16) # Spatial binning dimensions
hist_bins = 16 # Number of histogram bins
spatial_feat = True # Spatial features on or off
hist_feat = True # Histogram features on or off
hog_feat = True # HOG features on or off
y_start_stop = [400, 500]
window_size = (50, 50)
xy_overlap=(0.75, 0.75)
x_start_stop = [400, 1000]
y_start_stop_list = [[450, 580], [500, 720], [500, 720]]# Min and max in y to search in slide_window()
window_list = [(50, 50), (96, 96), (200, 200)]
windows = []
search_dict = {
'far': [[[50, 50]], [[400, 500]], [[400, 1000]]],
'mid': [[[96, 96]], [[400, 500]], [[None, None]]],
'close': [[[128, 128]], [[400, 550]], [[None, None]]],
# 'very-close': [[[150, 150]], [[400, None]], [[None, None]]],
}
for item, param in search_dict.items():
for window_size, y_start_stop, x_start_stop in zip(param[0], param[1], param[2]):
windows_to_search = slide_window(img, x_start_stop=x_start_stop, y_start_stop=y_start_stop,
xy_window=window_size, xy_overlap=xy_overlap)
windows += windows_to_search
#windows = slide_window(img, x_start_stop=x_start_stop, y_start_stop=y_start_stop,
# xy_window=window_size, xy_overlap=xy_overlap)
X_scaler = StandardScaler().fit(features)
hot_windows = search_windows(img, windows, clf_svc, X_scaler, color_space=color_space,
spatial_size=spatial_size, hist_bins=hist_bins,
orient=orient, pix_per_cell=pix_per_cell,
cell_per_block=cell_per_block,
hog_channel=hog_channel, spatial_feat=spatial_feat,
hist_feat=hist_feat, hog_feat=hog_feat, pca_feat=True, pca=pca)
draw_image = img.copy()
draw_image = draw_boxes(draw_image, windows, color=(0, 255, 0), thick=6)
window_img = draw_boxes(draw_image, hot_windows, color=(0, 0, 255), thick=6)
plt.imshow(window_img)
Out[233]:
To remove multiple detections, a heat map is used to represent regions that were detected with a car. A threshold is applied to ensure that only regions that were detected multiple times are kept. This first ensures to remove the multiple detections, but also removes the false positives.
In [234]:
heat = np.zeros_like(img[:,:,0]).astype(np.float)
# Add heat to each box in box list
heat = add_heat(heat, hot_windows)
# Apply threshold to help remove false positives
heat = apply_threshold(heat, 1)
# Visualize the heatmap when displaying
heatmap = np.clip(heat, 0, 255)
# Find final boxes from heatmap using label function
labels = label(heatmap)
draw_img = draw_labeled_bboxes(np.copy(img), labels)
fig = plt.figure()
plt.subplot(121)
plt.imshow(draw_img)
plt.title('Car Positions')
plt.subplot(122)
plt.imshow(heatmap, cmap='hot')
plt.title('Heat Map')
fig.tight_layout()
In [241]:
#img = mpimg.imread('./bbox-example-image.jpg', format=np.uint8)
img = mpimg.imread('./test_images/test1.jpg', format=np.uint8)
color_space = 'YCrCb' # Can be RGB, HSV, LUV, HLS, YUV, YCrCb
orient = 9 # HOG orientations
pix_per_cell = 8 # HOG pixels per cell
cell_per_block = 2 # HOG cells per block
hog_channel = 0 # Can be 0, 1, 2, or "ALL"
spatial_size = (16, 16) # Spatial binning dimensions
hist_bins = 16 # Number of histogram bins
spatial_feat = True # Spatial features on or off
hist_feat = True # Histogram features on or off
hog_feat = True # HOG features on or off
heat_thresh = 1
car_detector = CarDetector(clf_svc, pca, X_scaler, color_space=color_space, orient=orient,
pix_per_cell=pix_per_cell, cell_per_block=cell_per_block, hog_channel=hog_channel,
spatial_size=spatial_size, hist_bins=hist_bins, spatial_feat=spatial_feat,
hist_feat=hist_feat, hog_feat=hog_feat, heat_thresh=heat_thresh)
# Run the pipeline on the test images
images = glob.glob('./test_images/*.jpg')
for fname in images:
img = mpimg.imread(fname)
#print(fname)
output_image = car_detector.pipeline(img)
mpimg.imsave('output_images/output_{}'.format(fname.split("/")[2]), output_image)
f, axarr = plt.subplots(2, 3, figsize=(24, 9))
f.tight_layout()
output_images = glob.glob('./output_images/*.jpg')
out_imgs = [mpimg.imread(img) for img in output_images]
for i in range(2):
for j in range(3):
axarr[i, j].imshow(out_imgs[i * 3 + j])
plt.axis('off')
plt.subplots_adjust(left=0., right=1, top=0.9, bottom=0.)
In [235]:
from moviepy.editor import VideoFileClip
car_detector = CarDetector(clf_svc, pca, X_scaler)
project_output = 'project_output.mp4'
clip = VideoFileClip('./project_video.mp4')
outclip = clip.fl_image(car_detector.pipeline)
%time outclip.write_videofile(project_output, audio=False)
In [236]:
from IPython.display import HTML
HTML("""
<video width="960" height="540" controls>
<source src="{0}">
</video>
""".format("project_output.mp4"))
Out[236]:
For the detection of vehicles, three main features were used: spatial binned features, color histograms and histogram of oriented gradients. Combined, these features provide a good amount of information for whether a car is there such as the colors, spatial location of pixels, edges of the car. These features were combined to give a feature vector of 2580 dimensions.
This was one of the main challenges I faced: the curse of dimensionality. I first used all of these dimensions with a LinearSVC() but it did not yield great results, so I turned to a SVM and tried different kernels. However, this took hours to train because of the dimensionality of the data and the number of samples (> 10,000). To resolve this issue, I used PCA to reduce the dimensionality of the data and kept only 25 components that explained the majority of the variance in the data. This reduced significantly the training time, the dimensionality but still gave great results (over 99% on the testing set).
After selecting the right features and tuning the model, I had to use images from the camera feed of the car and look for car images using a sliding window technique. The difficult part of this section is to account for obstacles, obstructed images, various sizes of car images. To deal with this, I searched regions where cars appear smaller with a smaller window size, and vice-versa for locations where the car appears larger (but now with a bigger window size).
For future improvements, I have to come up with a method to seperate 2 cars when they overlap or obstruct one another. As of now, it will count it as a single detection with a big bounding box covering both cars. I also have issues with detecting fully cars when they are obstructed. This would be addressed by training on a larger dataset and augmenting the data to account for all the environmental effects on the car visually.