Object Detection

In this tutorial, you will learn:

  • the basic structure of Faster R-CNN.
  • to perform inference with a MMDetection detector.
  • to train a new detector with a new dataset.

Let's start!

Install MMDetection


In [1]:
# Check nvcc version
!nvcc -V
# Check GCC version
!gcc --version


nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Mon_Oct_12_20:09:46_PDT_2020
Cuda compilation tools, release 11.1, V11.1.105
Build cuda_11.1.TC455_06.29190527_0
gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.


In [2]:
# install dependencies: (use cu111 because colab has CUDA 11.1)
!pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 -f https://download.pytorch.org/whl/torch_stable.html

# install mmcv-full thus we could use CUDA operators
!pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/cu111/torch1.9.0/index.html

# Install mmdetection
!rm -rf mmdetection
!git clone https://github.com/open-mmlab/mmdetection.git
%cd mmdetection

!pip install -e .


Looking in links: https://download.pytorch.org/whl/torch_stable.html
Collecting torch==1.9.0+cu111
  Downloading https://download.pytorch.org/whl/cu111/torch-1.9.0%2Bcu111-cp37-cp37m-linux_x86_64.whl (2041.3 MB)
     |█████████████                   | 834.1 MB 1.5 MB/s eta 0:13:16tcmalloc: large alloc 1147494400 bytes == 0x55a4587ba000 @  0x7f26db5db615 0x55a41edd03bc 0x55a41eeb118a 0x55a41edd31cd 0x55a41eec5b3d 0x55a41ee47458 0x55a41ee4202f 0x55a41edd4aba 0x55a41ee472c0 0x55a41ee4202f 0x55a41edd4aba 0x55a41ee43cd4 0x55a41eec6986 0x55a41ee43350 0x55a41eec6986 0x55a41ee43350 0x55a41eec6986 0x55a41ee43350 0x55a41edd4f19 0x55a41ee18a79 0x55a41edd3b32 0x55a41ee471dd 0x55a41ee4202f 0x55a41edd4aba 0x55a41ee43cd4 0x55a41ee4202f 0x55a41edd4aba 0x55a41ee42eae 0x55a41edd49da 0x55a41ee43108 0x55a41ee4202f
     |████████████████▌               | 1055.7 MB 1.4 MB/s eta 0:11:52tcmalloc: large alloc 1434370048 bytes == 0x55a49ce10000 @  0x7f26db5db615 0x55a41edd03bc 0x55a41eeb118a 0x55a41edd31cd 0x55a41eec5b3d 0x55a41ee47458 0x55a41ee4202f 0x55a41edd4aba 0x55a41ee472c0 0x55a41ee4202f 0x55a41edd4aba 0x55a41ee43cd4 0x55a41eec6986 0x55a41ee43350 0x55a41eec6986 0x55a41ee43350 0x55a41eec6986 0x55a41ee43350 0x55a41edd4f19 0x55a41ee18a79 0x55a41edd3b32 0x55a41ee471dd 0x55a41ee4202f 0x55a41edd4aba 0x55a41ee43cd4 0x55a41ee4202f 0x55a41edd4aba 0x55a41ee42eae 0x55a41edd49da 0x55a41ee43108 0x55a41ee4202f
     |█████████████████████           | 1336.2 MB 1.3 MB/s eta 0:09:01tcmalloc: large alloc 1792966656 bytes == 0x55a421c42000 @  0x7f26db5db615 0x55a41edd03bc 0x55a41eeb118a 0x55a41edd31cd 0x55a41eec5b3d 0x55a41ee47458 0x55a41ee4202f 0x55a41edd4aba 0x55a41ee472c0 0x55a41ee4202f 0x55a41edd4aba 0x55a41ee43cd4 0x55a41eec6986 0x55a41ee43350 0x55a41eec6986 0x55a41ee43350 0x55a41eec6986 0x55a41ee43350 0x55a41edd4f19 0x55a41ee18a79 0x55a41edd3b32 0x55a41ee471dd 0x55a41ee4202f 0x55a41edd4aba 0x55a41ee43cd4 0x55a41ee4202f 0x55a41edd4aba 0x55a41ee42eae 0x55a41edd49da 0x55a41ee43108 0x55a41ee4202f
     |██████████████████████████▌     | 1691.1 MB 1.3 MB/s eta 0:04:36tcmalloc: large alloc 2241208320 bytes == 0x55a48ca2a000 @  0x7f26db5db615 0x55a41edd03bc 0x55a41eeb118a 0x55a41edd31cd 0x55a41eec5b3d 0x55a41ee47458 0x55a41ee4202f 0x55a41edd4aba 0x55a41ee472c0 0x55a41ee4202f 0x55a41edd4aba 0x55a41ee43cd4 0x55a41eec6986 0x55a41ee43350 0x55a41eec6986 0x55a41ee43350 0x55a41eec6986 0x55a41ee43350 0x55a41edd4f19 0x55a41ee18a79 0x55a41edd3b32 0x55a41ee471dd 0x55a41ee4202f 0x55a41edd4aba 0x55a41ee43cd4 0x55a41ee4202f 0x55a41edd4aba 0x55a41ee42eae 0x55a41edd49da 0x55a41ee43108 0x55a41ee4202f
     |████████████████████████████████| 2041.3 MB 1.1 MB/s eta 0:00:01tcmalloc: large alloc 2041348096 bytes == 0x55a51238c000 @  0x7f26db5da1e7 0x55a41ee065d7 0x55a41edd03bc 0x55a41eeb118a 0x55a41edd31cd 0x55a41eec5b3d 0x55a41ee47458 0x55a41ee4202f 0x55a41edd4aba 0x55a41ee43108 0x55a41ee4202f 0x55a41edd4aba 0x55a41ee43108 0x55a41ee4202f 0x55a41edd4aba 0x55a41ee43108 0x55a41ee4202f 0x55a41edd4aba 0x55a41ee43108 0x55a41ee4202f 0x55a41edd4aba 0x55a41ee43108 0x55a41edd49da 0x55a41ee43108 0x55a41ee4202f 0x55a41edd4aba 0x55a41ee43cd4 0x55a41ee4202f 0x55a41edd4aba 0x55a41ee43cd4 0x55a41ee4202f
tcmalloc: large alloc 2551685120 bytes == 0x55a600300000 @  0x7f26db5db615 0x55a41edd03bc 0x55a41eeb118a 0x55a41edd31cd 0x55a41eec5b3d 0x55a41ee47458 0x55a41ee4202f 0x55a41edd4aba 0x55a41ee43108 0x55a41ee4202f 0x55a41edd4aba 0x55a41ee43108 0x55a41ee4202f 0x55a41edd4aba 0x55a41ee43108 0x55a41ee4202f 0x55a41edd4aba 0x55a41ee43108 0x55a41ee4202f 0x55a41edd4aba 0x55a41ee43108 0x55a41edd49da 0x55a41ee43108 0x55a41ee4202f 0x55a41edd4aba 0x55a41ee43cd4 0x55a41ee4202f 0x55a41edd4aba 0x55a41ee43cd4 0x55a41ee4202f 0x55a41edd5151
     |████████████████████████████████| 2041.3 MB 7.2 kB/s 
Collecting torchvision==0.10.0+cu111
  Downloading https://download.pytorch.org/whl/cu111/torchvision-0.10.0%2Bcu111-cp37-cp37m-linux_x86_64.whl (23.2 MB)
     |████████████████████████████████| 23.2 MB 13.8 MB/s 
Requirement already satisfied: typing-extensions in /usr/local/lib/python3.7/dist-packages (from torch==1.9.0+cu111) (3.10.0.2)
Requirement already satisfied: pillow>=5.3.0 in /usr/local/lib/python3.7/dist-packages (from torchvision==0.10.0+cu111) (7.1.2)
Requirement already satisfied: numpy in /usr/local/lib/python3.7/dist-packages (from torchvision==0.10.0+cu111) (1.19.5)
Installing collected packages: torch, torchvision
  Attempting uninstall: torch
    Found existing installation: torch 1.10.0+cu111
    Uninstalling torch-1.10.0+cu111:
      Successfully uninstalled torch-1.10.0+cu111
  Attempting uninstall: torchvision
    Found existing installation: torchvision 0.11.1+cu111
    Uninstalling torchvision-0.11.1+cu111:
      Successfully uninstalled torchvision-0.11.1+cu111
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
torchtext 0.11.0 requires torch==1.10.0, but you have torch 1.9.0+cu111 which is incompatible.
torchaudio 0.10.0+cu111 requires torch==1.10.0, but you have torch 1.9.0+cu111 which is incompatible.
Successfully installed torch-1.9.0+cu111 torchvision-0.10.0+cu111
Looking in links: https://download.openmmlab.com/mmcv/dist/cu111/torch1.9.0/index.html
Collecting mmcv-full
  Downloading https://download.openmmlab.com/mmcv/dist/cu111/torch1.9.0/mmcv_full-1.4.4-cp37-cp37m-manylinux1_x86_64.whl (67.3 MB)
     |████████████████████████████████| 67.3 MB 1.3 MB/s 
Requirement already satisfied: packaging in /usr/local/lib/python3.7/dist-packages (from mmcv-full) (21.3)
Requirement already satisfied: numpy in /usr/local/lib/python3.7/dist-packages (from mmcv-full) (1.19.5)
Requirement already satisfied: pyyaml in /usr/local/lib/python3.7/dist-packages (from mmcv-full) (3.13)
Collecting addict
  Downloading addict-2.4.0-py3-none-any.whl (3.8 kB)
Requirement already satisfied: opencv-python>=3 in /usr/local/lib/python3.7/dist-packages (from mmcv-full) (4.1.2.30)
Collecting yapf
  Downloading yapf-0.32.0-py2.py3-none-any.whl (190 kB)
     |████████████████████████████████| 190 kB 5.1 MB/s 
Requirement already satisfied: Pillow in /usr/local/lib/python3.7/dist-packages (from mmcv-full) (7.1.2)
Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /usr/local/lib/python3.7/dist-packages (from packaging->mmcv-full) (3.0.7)
Installing collected packages: yapf, addict, mmcv-full
Successfully installed addict-2.4.0 mmcv-full-1.4.4 yapf-0.32.0
Cloning into 'mmdetection'...
remote: Enumerating objects: 22983, done.
remote: Counting objects: 100% (25/25), done.
remote: Compressing objects: 100% (23/23), done.
remote: Total 22983 (delta 4), reused 17 (delta 2), pack-reused 22958
Receiving objects: 100% (22983/22983), 25.79 MiB | 34.48 MiB/s, done.
Resolving deltas: 100% (16102/16102), done.
/content/mmdetection
Obtaining file:///content/mmdetection
Requirement already satisfied: matplotlib in /usr/local/lib/python3.7/dist-packages (from mmdet==2.21.0) (3.2.2)
Requirement already satisfied: numpy in /usr/local/lib/python3.7/dist-packages (from mmdet==2.21.0) (1.19.5)
Requirement already satisfied: pycocotools in /usr/local/lib/python3.7/dist-packages (from mmdet==2.21.0) (2.0.4)
Requirement already satisfied: six in /usr/local/lib/python3.7/dist-packages (from mmdet==2.21.0) (1.15.0)
Collecting terminaltables
  Downloading terminaltables-3.1.10-py2.py3-none-any.whl (15 kB)
Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib->mmdet==2.21.0) (1.3.2)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib->mmdet==2.21.0) (3.0.7)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.7/dist-packages (from matplotlib->mmdet==2.21.0) (0.11.0)
Requirement already satisfied: python-dateutil>=2.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib->mmdet==2.21.0) (2.8.2)
Installing collected packages: terminaltables, mmdet
  Running setup.py develop for mmdet
Successfully installed mmdet-2.21.0 terminaltables-3.1.10

In [3]:
from mmcv import collect_env
collect_env()


Out[3]:
{'CUDA available': True,
 'CUDA_HOME': '/usr/local/cuda',
 'GCC': 'gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0',
 'GPU 0': 'Tesla T4',
 'MMCV': '1.4.4',
 'MMCV CUDA Compiler': '11.1',
 'MMCV Compiler': 'GCC 7.3',
 'NVCC': 'Build cuda_11.1.TC455_06.29190527_0',
 'OpenCV': '4.1.2',
 'PyTorch': '1.9.0+cu111',
 'PyTorch compiling details': 'PyTorch built with:\n  - GCC 7.3\n  - C++ Version: 201402\n  - Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications\n  - Intel(R) MKL-DNN v2.1.2 (Git Hash 98be7e8afa711dc9b66c8ff3504129cb82013cdb)\n  - OpenMP 201511 (a.k.a. OpenMP 4.5)\n  - NNPACK is enabled\n  - CPU capability usage: AVX2\n  - CUDA Runtime 11.1\n  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86\n  - CuDNN 8.0.5\n  - Magma 2.5.2\n  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.9.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, \n',
 'Python': '3.7.12 (default, Jan 15 2022, 18:48:18) [GCC 7.5.0]',
 'TorchVision': '0.10.0+cu111',
 'sys.platform': 'linux'}

In [5]:
# Check Pytorch installation
import torch, torchvision
print(torch.__version__, torch.cuda.is_available())

# Check MMDetection installation
import mmdet
print(mmdet.__version__)

# Check mmcv installation
from mmcv.ops import get_compiling_cuda_version, get_compiler_version
print(get_compiling_cuda_version())
print(get_compiler_version())


1.9.0+cu111 True
2.21.0
11.1
GCC 7.3

Perform Inference with An MMDet detector

A two-stage detector

In this tutorial, we use Faster R-CNN, a simple two-stage detector as an example.

The high-level architecture of Faster R-CNN is shown in the following picture. More details can be found in the paper.

Briefly, it uses a convolutional neural network (CNN) as backbone to extract features from an image. Then, it uses a region proposal network (RPN) to predict proposals, i.e., potential objects. After that, it uses a feature extractor to crop features for the region of interests (RoI), and uses a RoI Head to perform classification and bounding box prediction.


In [6]:
# We download the pre-trained checkpoints for inference and finetuning.
!mkdir checkpoints
!wget -c https://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/faster_rcnn_r50_caffe_fpn_mstrain_3x_coco/faster_rcnn_r50_caffe_fpn_mstrain_3x_coco_20210526_095054-1f77628b.pth \
      -O checkpoints/faster_rcnn_r50_caffe_fpn_mstrain_3x_coco_20210526_095054-1f77628b.pth


--2022-02-08 11:29:13--  https://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/faster_rcnn_r50_caffe_fpn_mstrain_3x_coco/faster_rcnn_r50_caffe_fpn_mstrain_3x_coco_20210526_095054-1f77628b.pth
Resolving download.openmmlab.com (download.openmmlab.com)... 47.252.96.28
Connecting to download.openmmlab.com (download.openmmlab.com)|47.252.96.28|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 167291982 (160M) [application/octet-stream]
Saving to: ‘checkpoints/faster_rcnn_r50_caffe_fpn_mstrain_3x_coco_20210526_095054-1f77628b.pth’

checkpoints/faster_ 100%[===================>] 159.54M  7.92MB/s    in 22s     

2022-02-08 11:29:37 (7.28 MB/s) - ‘checkpoints/faster_rcnn_r50_caffe_fpn_mstrain_3x_coco_20210526_095054-1f77628b.pth’ saved [167291982/167291982]


In [7]:
import mmcv
from mmcv.runner import load_checkpoint

from mmdet.apis import inference_detector, show_result_pyplot
from mmdet.models import build_detector

# Choose to use a config and initialize the detector
config = 'configs/faster_rcnn/faster_rcnn_r50_caffe_fpn_mstrain_3x_coco.py'
# Setup a checkpoint file to load
checkpoint = 'checkpoints/faster_rcnn_r50_caffe_fpn_mstrain_3x_coco_20210526_095054-1f77628b.pth'

# Set the device to be used for evaluation
device='cuda:0'

# Load the config
config = mmcv.Config.fromfile(config)
# Set pretrained to be None since we do not need pretrained model here
config.model.pretrained = None

# Initialize the detector
model = build_detector(config.model)

# Load checkpoint
checkpoint = load_checkpoint(model, checkpoint, map_location=device)

# Set the classes of models for inference
model.CLASSES = checkpoint['meta']['CLASSES']

# We need to set the model's cfg for inference
model.cfg = config

# Convert the model to GPU
model.to(device)
# Convert the model into evaluation mode
model.eval()


load checkpoint from local path: checkpoints/faster_rcnn_r50_caffe_fpn_mstrain_3x_coco_20210526_095054-1f77628b.pth
Out[7]:
FasterRCNN(
  (backbone): ResNet(
    (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
    (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (relu): ReLU(inplace=True)
    (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
    (layer1): ResLayer(
      (0): Bottleneck(
        (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
        (downsample): Sequential(
          (0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        )
      )
      (1): Bottleneck(
        (conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
      )
      (2): Bottleneck(
        (conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
      )
    )
    (layer2): ResLayer(
      (0): Bottleneck(
        (conv1): Conv2d(256, 128, kernel_size=(1, 1), stride=(2, 2), bias=False)
        (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
        (downsample): Sequential(
          (0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)
          (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        )
      )
      (1): Bottleneck(
        (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
      )
      (2): Bottleneck(
        (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
      )
      (3): Bottleneck(
        (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
      )
    )
    (layer3): ResLayer(
      (0): Bottleneck(
        (conv1): Conv2d(512, 256, kernel_size=(1, 1), stride=(2, 2), bias=False)
        (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
        (downsample): Sequential(
          (0): Conv2d(512, 1024, kernel_size=(1, 1), stride=(2, 2), bias=False)
          (1): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        )
      )
      (1): Bottleneck(
        (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
      )
      (2): Bottleneck(
        (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
      )
      (3): Bottleneck(
        (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
      )
      (4): Bottleneck(
        (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
      )
      (5): Bottleneck(
        (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
      )
    )
    (layer4): ResLayer(
      (0): Bottleneck(
        (conv1): Conv2d(1024, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)
        (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
        (downsample): Sequential(
          (0): Conv2d(1024, 2048, kernel_size=(1, 1), stride=(2, 2), bias=False)
          (1): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        )
      )
      (1): Bottleneck(
        (conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
      )
      (2): Bottleneck(
        (conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
      )
    )
  )
  init_cfg={'type': 'Pretrained', 'checkpoint': 'open-mmlab://detectron2/resnet50_caffe'}
  (neck): FPN(
    (lateral_convs): ModuleList(
      (0): ConvModule(
        (conv): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1))
      )
      (1): ConvModule(
        (conv): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1))
      )
      (2): ConvModule(
        (conv): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1))
      )
      (3): ConvModule(
        (conv): Conv2d(2048, 256, kernel_size=(1, 1), stride=(1, 1))
      )
    )
    (fpn_convs): ModuleList(
      (0): ConvModule(
        (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      )
      (1): ConvModule(
        (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      )
      (2): ConvModule(
        (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      )
      (3): ConvModule(
        (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      )
    )
  )
  init_cfg={'type': 'Xavier', 'layer': 'Conv2d', 'distribution': 'uniform'}
  (rpn_head): RPNHead(
    (loss_cls): CrossEntropyLoss()
    (loss_bbox): L1Loss()
    (rpn_conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (rpn_cls): Conv2d(256, 3, kernel_size=(1, 1), stride=(1, 1))
    (rpn_reg): Conv2d(256, 12, kernel_size=(1, 1), stride=(1, 1))
  )
  init_cfg={'type': 'Normal', 'layer': 'Conv2d', 'std': 0.01}
  (roi_head): StandardRoIHead(
    (bbox_roi_extractor): SingleRoIExtractor(
      (roi_layers): ModuleList(
        (0): RoIAlign(output_size=(7, 7), spatial_scale=0.25, sampling_ratio=0, pool_mode=avg, aligned=True, use_torchvision=False)
        (1): RoIAlign(output_size=(7, 7), spatial_scale=0.125, sampling_ratio=0, pool_mode=avg, aligned=True, use_torchvision=False)
        (2): RoIAlign(output_size=(7, 7), spatial_scale=0.0625, sampling_ratio=0, pool_mode=avg, aligned=True, use_torchvision=False)
        (3): RoIAlign(output_size=(7, 7), spatial_scale=0.03125, sampling_ratio=0, pool_mode=avg, aligned=True, use_torchvision=False)
      )
    )
    (bbox_head): Shared2FCBBoxHead(
      (loss_cls): CrossEntropyLoss()
      (loss_bbox): L1Loss()
      (fc_cls): Linear(in_features=1024, out_features=81, bias=True)
      (fc_reg): Linear(in_features=1024, out_features=320, bias=True)
      (shared_convs): ModuleList()
      (shared_fcs): ModuleList(
        (0): Linear(in_features=12544, out_features=1024, bias=True)
        (1): Linear(in_features=1024, out_features=1024, bias=True)
      )
      (cls_convs): ModuleList()
      (cls_fcs): ModuleList()
      (reg_convs): ModuleList()
      (reg_fcs): ModuleList()
      (relu): ReLU(inplace=True)
    )
    init_cfg=[{'type': 'Normal', 'std': 0.01, 'override': {'name': 'fc_cls'}}, {'type': 'Normal', 'std': 0.001, 'override': {'name': 'fc_reg'}}, {'type': 'Xavier', 'distribution': 'uniform', 'override': [{'name': 'shared_fcs'}, {'name': 'cls_fcs'}, {'name': 'reg_fcs'}]}]
  )
)

From the printed model, we will find that the model does consist of the components that we described earlier. It uses ResNet as its CNN backbone, and has a RPN head and RoI Head. In addition, the model has a neural network module, named neck, directly after the CNN backbone. It is a feature pyramid network (FPN) for enhancing the multi-scale features.

Inference the detector

Since the model is successfully created and loaded, let's see how good it is. We use the high-level API inference_detector implemented in the MMDetection. This API is created to ease the inference process. The details of the codes can be found here.


In [8]:
# Use the detector to do inference
img = 'demo/demo.jpg'
result = inference_detector(model, img)


/content/mmdetection/mmdet/datasets/utils.py:69: UserWarning: "ImageToTensor" pipeline is replaced by "DefaultFormatBundle" for batch inference. It is recommended to manually replace it in the test data pipeline in your config file.
  'data pipeline in your config file.', UserWarning)
/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at  /pytorch/c10/core/TensorImpl.h:1156.)
  return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)

In [9]:
# Let's plot the result
show_result_pyplot(model, img, result, score_thr=0.3)


Train A Detector on A Customized Dataset

To train a new detector, there are usually three things to do:

  1. Support a new dataset
  2. Modify the config
  3. Train a new detector

Support a new dataset

There are three ways to support a new dataset in MMDetection:

  1. Reorganize the dataset into a COCO format.
  2. Reorganize the dataset into a middle format.
  3. Implement a new dataset.

We recommend the first two methods, as they are usually easier than the third one.

In this tutorial, we give an example that converts the data into the formats of existing datasets, e.g. COCO, VOC, etc. Other methods and more advanced usages can be found in the doc.

First, let's download a tiny dataset obtained from KITTI. We select the first 75 images and their annotations from the 3D object detection dataset (it is the same dataset as the 2D object detection dataset but with 3D annotations). We convert the original images from PNG to JPEG format with 80% quality to reduce the size of the dataset.


In [10]:
# download, decompress the data
!wget https://download.openmmlab.com/mmdetection/data/kitti_tiny.zip
!unzip kitti_tiny.zip > /dev/null


--2022-02-08 11:33:06--  https://download.openmmlab.com/mmdetection/data/kitti_tiny.zip
Resolving download.openmmlab.com (download.openmmlab.com)... 47.245.16.66
Connecting to download.openmmlab.com (download.openmmlab.com)|47.245.16.66|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 6918271 (6.6M) [application/zip]
Saving to: ‘kitti_tiny.zip’

kitti_tiny.zip      100%[===================>]   6.60M  4.69MB/s    in 1.4s    

2022-02-08 11:33:09 (4.69 MB/s) - ‘kitti_tiny.zip’ saved [6918271/6918271]


In [11]:
# Check the directory structure of the tiny data

# Install tree first
!apt-get -q install tree
!tree kitti_tiny


Reading package lists...
Building dependency tree...
Reading state information...
The following packages were automatically installed and are no longer required:
  cuda-command-line-tools-10-0 cuda-command-line-tools-10-1
  cuda-command-line-tools-11-0 cuda-compiler-10-0 cuda-compiler-10-1
  cuda-compiler-11-0 cuda-cuobjdump-10-0 cuda-cuobjdump-10-1
  cuda-cuobjdump-11-0 cuda-cupti-10-0 cuda-cupti-10-1 cuda-cupti-11-0
  cuda-cupti-dev-11-0 cuda-documentation-10-0 cuda-documentation-10-1
  cuda-documentation-11-0 cuda-documentation-11-1 cuda-gdb-10-0 cuda-gdb-10-1
  cuda-gdb-11-0 cuda-gpu-library-advisor-10-0 cuda-gpu-library-advisor-10-1
  cuda-libraries-10-0 cuda-libraries-10-1 cuda-libraries-11-0
  cuda-memcheck-10-0 cuda-memcheck-10-1 cuda-memcheck-11-0 cuda-nsight-10-0
  cuda-nsight-10-1 cuda-nsight-11-0 cuda-nsight-11-1 cuda-nsight-compute-10-0
  cuda-nsight-compute-10-1 cuda-nsight-compute-11-0 cuda-nsight-compute-11-1
  cuda-nsight-systems-10-1 cuda-nsight-systems-11-0 cuda-nsight-systems-11-1
  cuda-nvcc-10-0 cuda-nvcc-10-1 cuda-nvcc-11-0 cuda-nvdisasm-10-0
  cuda-nvdisasm-10-1 cuda-nvdisasm-11-0 cuda-nvml-dev-10-0 cuda-nvml-dev-10-1
  cuda-nvml-dev-11-0 cuda-nvprof-10-0 cuda-nvprof-10-1 cuda-nvprof-11-0
  cuda-nvprune-10-0 cuda-nvprune-10-1 cuda-nvprune-11-0 cuda-nvtx-10-0
  cuda-nvtx-10-1 cuda-nvtx-11-0 cuda-nvvp-10-0 cuda-nvvp-10-1 cuda-nvvp-11-0
  cuda-nvvp-11-1 cuda-samples-10-0 cuda-samples-10-1 cuda-samples-11-0
  cuda-samples-11-1 cuda-sanitizer-11-0 cuda-sanitizer-api-10-1
  cuda-toolkit-10-0 cuda-toolkit-10-1 cuda-toolkit-11-0 cuda-toolkit-11-1
  cuda-tools-10-0 cuda-tools-10-1 cuda-tools-11-0 cuda-tools-11-1
  cuda-visual-tools-10-0 cuda-visual-tools-10-1 cuda-visual-tools-11-0
  cuda-visual-tools-11-1 default-jre dkms freeglut3 freeglut3-dev
  keyboard-configuration libargon2-0 libcap2 libcryptsetup12
  libdevmapper1.02.1 libfontenc1 libidn11 libip4tc0 libjansson4
  libnvidia-cfg1-510 libnvidia-common-460 libnvidia-common-510
  libnvidia-extra-510 libnvidia-fbc1-510 libnvidia-gl-510 libpam-systemd
  libpolkit-agent-1-0 libpolkit-backend-1-0 libpolkit-gobject-1-0 libxfont2
  libxi-dev libxkbfile1 libxmu-dev libxmu-headers libxnvctrl0 libxtst6
  nsight-compute-2020.2.1 nsight-compute-2022.1.0 nsight-systems-2020.3.2
  nsight-systems-2020.3.4 nsight-systems-2021.5.2 nvidia-dkms-510
  nvidia-kernel-common-510 nvidia-kernel-source-510 nvidia-modprobe
  nvidia-settings openjdk-11-jre policykit-1 policykit-1-gnome python3-xkit
  screen-resolution-extra systemd systemd-sysv udev x11-xkb-utils
  xserver-common xserver-xorg-core-hwe-18.04 xserver-xorg-video-nvidia-510
Use 'apt autoremove' to remove them.
The following NEW packages will be installed:
  tree
0 upgraded, 1 newly installed, 0 to remove and 39 not upgraded.
Need to get 40.7 kB of archives.
After this operation, 105 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu bionic/universe amd64 tree amd64 1.7.0-5 [40.7 kB]
Fetched 40.7 kB in 0s (146 kB/s)
Selecting previously unselected package tree.
(Reading database ... 155113 files and directories currently installed.)
Preparing to unpack .../tree_1.7.0-5_amd64.deb ...
Unpacking tree (1.7.0-5) ...
Setting up tree (1.7.0-5) ...
Processing triggers for man-db (2.8.3-2ubuntu0.1) ...
kitti_tiny
├── training
│   ├── image_2
│   │   ├── 000000.jpeg
│   │   ├── 000001.jpeg
│   │   ├── 000002.jpeg
│   │   ├── 000003.jpeg
│   │   ├── 000004.jpeg
│   │   ├── 000005.jpeg
│   │   ├── 000006.jpeg
│   │   ├── 000007.jpeg
│   │   ├── 000008.jpeg
│   │   ├── 000009.jpeg
│   │   ├── 000010.jpeg
│   │   ├── 000011.jpeg
│   │   ├── 000012.jpeg
│   │   ├── 000013.jpeg
│   │   ├── 000014.jpeg
│   │   ├── 000015.jpeg
│   │   ├── 000016.jpeg
│   │   ├── 000017.jpeg
│   │   ├── 000018.jpeg
│   │   ├── 000019.jpeg
│   │   ├── 000020.jpeg
│   │   ├── 000021.jpeg
│   │   ├── 000022.jpeg
│   │   ├── 000023.jpeg
│   │   ├── 000024.jpeg
│   │   ├── 000025.jpeg
│   │   ├── 000026.jpeg
│   │   ├── 000027.jpeg
│   │   ├── 000028.jpeg
│   │   ├── 000029.jpeg
│   │   ├── 000030.jpeg
│   │   ├── 000031.jpeg
│   │   ├── 000032.jpeg
│   │   ├── 000033.jpeg
│   │   ├── 000034.jpeg
│   │   ├── 000035.jpeg
│   │   ├── 000036.jpeg
│   │   ├── 000037.jpeg
│   │   ├── 000038.jpeg
│   │   ├── 000039.jpeg
│   │   ├── 000040.jpeg
│   │   ├── 000041.jpeg
│   │   ├── 000042.jpeg
│   │   ├── 000043.jpeg
│   │   ├── 000044.jpeg
│   │   ├── 000045.jpeg
│   │   ├── 000046.jpeg
│   │   ├── 000047.jpeg
│   │   ├── 000048.jpeg
│   │   ├── 000049.jpeg
│   │   ├── 000050.jpeg
│   │   ├── 000051.jpeg
│   │   ├── 000052.jpeg
│   │   ├── 000053.jpeg
│   │   ├── 000054.jpeg
│   │   ├── 000055.jpeg
│   │   ├── 000056.jpeg
│   │   ├── 000057.jpeg
│   │   ├── 000058.jpeg
│   │   ├── 000059.jpeg
│   │   ├── 000060.jpeg
│   │   ├── 000061.jpeg
│   │   ├── 000062.jpeg
│   │   ├── 000063.jpeg
│   │   ├── 000064.jpeg
│   │   ├── 000065.jpeg
│   │   ├── 000066.jpeg
│   │   ├── 000067.jpeg
│   │   ├── 000068.jpeg
│   │   ├── 000069.jpeg
│   │   ├── 000070.jpeg
│   │   ├── 000071.jpeg
│   │   ├── 000072.jpeg
│   │   ├── 000073.jpeg
│   │   └── 000074.jpeg
│   └── label_2
│       ├── 000000.txt
│       ├── 000001.txt
│       ├── 000002.txt
│       ├── 000003.txt
│       ├── 000004.txt
│       ├── 000005.txt
│       ├── 000006.txt
│       ├── 000007.txt
│       ├── 000008.txt
│       ├── 000009.txt
│       ├── 000010.txt
│       ├── 000011.txt
│       ├── 000012.txt
│       ├── 000013.txt
│       ├── 000014.txt
│       ├── 000015.txt
│       ├── 000016.txt
│       ├── 000017.txt
│       ├── 000018.txt
│       ├── 000019.txt
│       ├── 000020.txt
│       ├── 000021.txt
│       ├── 000022.txt
│       ├── 000023.txt
│       ├── 000024.txt
│       ├── 000025.txt
│       ├── 000026.txt
│       ├── 000027.txt
│       ├── 000028.txt
│       ├── 000029.txt
│       ├── 000030.txt
│       ├── 000031.txt
│       ├── 000032.txt
│       ├── 000033.txt
│       ├── 000034.txt
│       ├── 000035.txt
│       ├── 000036.txt
│       ├── 000037.txt
│       ├── 000038.txt
│       ├── 000039.txt
│       ├── 000040.txt
│       ├── 000041.txt
│       ├── 000042.txt
│       ├── 000043.txt
│       ├── 000044.txt
│       ├── 000045.txt
│       ├── 000046.txt
│       ├── 000047.txt
│       ├── 000048.txt
│       ├── 000049.txt
│       ├── 000050.txt
│       ├── 000051.txt
│       ├── 000052.txt
│       ├── 000053.txt
│       ├── 000054.txt
│       ├── 000055.txt
│       ├── 000056.txt
│       ├── 000057.txt
│       ├── 000058.txt
│       ├── 000059.txt
│       ├── 000060.txt
│       ├── 000061.txt
│       ├── 000062.txt
│       ├── 000063.txt
│       ├── 000064.txt
│       ├── 000065.txt
│       ├── 000066.txt
│       ├── 000067.txt
│       ├── 000068.txt
│       ├── 000069.txt
│       ├── 000070.txt
│       ├── 000071.txt
│       ├── 000072.txt
│       ├── 000073.txt
│       └── 000074.txt
├── train.txt
└── val.txt

3 directories, 152 files

In [12]:
# Let's take a look at the dataset image
import mmcv
import matplotlib.pyplot as plt

img = mmcv.imread('kitti_tiny/training/image_2/000073.jpeg')
plt.figure(figsize=(15, 10))
plt.imshow(mmcv.bgr2rgb(img))
plt.show()


After downloading the data, we need to implement a function to convert the KITTI annotation format into the middle format. In this tutorial, we choose to convert them in load_annotations function in a newly implemented KittiTinyDataset.

Let's take a look at the annotation txt file.


In [13]:
# Check the label of a single image
!cat kitti_tiny/training/label_2/000000.txt


Pedestrian 0.00 0 -0.20 712.40 143.00 810.73 307.92 1.89 0.48 1.20 1.84 1.47 8.41 0.01

According to the KITTI's documentation, the first column indicates the class of the object, and the 5th to 8th columns indicate the bboxes. We need to read annotations of each image and convert them into middle format that MMDetection can accept, as follows:

[
    {
        'filename': 'a.jpg',
        'width': 1280,
        'height': 720,
        'ann': {
            'bboxes': <np.ndarray> (n, 4) in (x1, y1, x2, y2) order,
            'labels': <np.ndarray> (n, ),
            'bboxes_ignore': <np.ndarray> (k, 4), (optional field)
            'labels_ignore': <np.ndarray> (k, 4) (optional field)
        }
    },
    ...
]

In [14]:
import copy
import os.path as osp

import mmcv
import numpy as np

from mmdet.datasets.builder import DATASETS
from mmdet.datasets.custom import CustomDataset

@DATASETS.register_module()
class KittiTinyDataset(CustomDataset):

    CLASSES = ('Car', 'Pedestrian', 'Cyclist')

    def load_annotations(self, ann_file):
        cat2label = {k: i for i, k in enumerate(self.CLASSES)}
        # load image list from file
        image_list = mmcv.list_from_file(self.ann_file)
    
        data_infos = []
        # convert annotations to middle format
        for image_id in image_list:
            filename = f'{self.img_prefix}/{image_id}.jpeg'
            image = mmcv.imread(filename)
            height, width = image.shape[:2]
    
            data_info = dict(filename=f'{image_id}.jpeg', width=width, height=height)
    
            # load annotations
            label_prefix = self.img_prefix.replace('image_2', 'label_2')
            lines = mmcv.list_from_file(osp.join(label_prefix, f'{image_id}.txt'))
    
            content = [line.strip().split(' ') for line in lines]
            bbox_names = [x[0] for x in content]
            bboxes = [[float(info) for info in x[4:8]] for x in content]
    
            gt_bboxes = []
            gt_labels = []
            gt_bboxes_ignore = []
            gt_labels_ignore = []
    
            # filter 'DontCare'
            for bbox_name, bbox in zip(bbox_names, bboxes):
                if bbox_name in cat2label:
                    gt_labels.append(cat2label[bbox_name])
                    gt_bboxes.append(bbox)
                else:
                    gt_labels_ignore.append(-1)
                    gt_bboxes_ignore.append(bbox)

            data_anno = dict(
                bboxes=np.array(gt_bboxes, dtype=np.float32).reshape(-1, 4),
                labels=np.array(gt_labels, dtype=np.long),
                bboxes_ignore=np.array(gt_bboxes_ignore,
                                       dtype=np.float32).reshape(-1, 4),
                labels_ignore=np.array(gt_labels_ignore, dtype=np.long))

            data_info.update(ann=data_anno)
            data_infos.append(data_info)

        return data_infos

Modify the config

In the next step, we need to modify the config for the training. To accelerate the process, we finetune a detector using a pre-trained detector.


In [15]:
from mmcv import Config
cfg = Config.fromfile('./configs/faster_rcnn/faster_rcnn_r50_caffe_fpn_mstrain_1x_coco.py')

Given a config that trains a Faster R-CNN on COCO dataset, we need to modify some values to use it for training Faster R-CNN on KITTI dataset. We modify the config of datasets, learning rate schedules, and runtime settings.


In [17]:
from mmdet.apis import set_random_seed

# Modify dataset type and path
cfg.dataset_type = 'KittiTinyDataset'
cfg.data_root = 'kitti_tiny/'

cfg.data.test.type = 'KittiTinyDataset'
cfg.data.test.data_root = 'kitti_tiny/'
cfg.data.test.ann_file = 'train.txt'
cfg.data.test.img_prefix = 'training/image_2'

cfg.data.train.type = 'KittiTinyDataset'
cfg.data.train.data_root = 'kitti_tiny/'
cfg.data.train.ann_file = 'train.txt'
cfg.data.train.img_prefix = 'training/image_2'

cfg.data.val.type = 'KittiTinyDataset'
cfg.data.val.data_root = 'kitti_tiny/'
cfg.data.val.ann_file = 'val.txt'
cfg.data.val.img_prefix = 'training/image_2'

# modify num classes of the model in box head
cfg.model.roi_head.bbox_head.num_classes = 3
# If we need to finetune a model based on a pre-trained detector, we need to
# use load_from to set the path of checkpoints.
cfg.load_from = 'checkpoints/faster_rcnn_r50_caffe_fpn_mstrain_3x_coco_20210526_095054-1f77628b.pth'

# Set up working dir to save files and logs.
cfg.work_dir = './tutorial_exps'

# The original learning rate (LR) is set for 8-GPU training.
# We divide it by 8 since we only use one GPU.
cfg.optimizer.lr = 0.02 / 8
cfg.lr_config.warmup = None
cfg.log_config.interval = 10

# Change the evaluation metric since we use customized dataset.
cfg.evaluation.metric = 'mAP'
# We can set the evaluation interval to reduce the evaluation times
cfg.evaluation.interval = 12
# We can set the checkpoint saving interval to reduce the storage cost
cfg.checkpoint_config.interval = 12

# Set seed thus the results are more reproducible
cfg.seed = 0
set_random_seed(0, deterministic=False)
cfg.device = 'cuda'
cfg.gpu_ids = range(1)

# We can also use tensorboard to log the training process
cfg.log_config.hooks = [
    dict(type='TextLoggerHook'),
    dict(type='TensorboardLoggerHook')]


# We can initialize the logger for training and have a look
# at the final config used for training
print(f'Config:\n{cfg.pretty_text}')


Config:
model = dict(
    type='FasterRCNN',
    backbone=dict(
        type='ResNet',
        depth=50,
        num_stages=4,
        out_indices=(0, 1, 2, 3),
        frozen_stages=1,
        norm_cfg=dict(type='BN', requires_grad=False),
        norm_eval=True,
        style='caffe',
        init_cfg=dict(
            type='Pretrained',
            checkpoint='open-mmlab://detectron2/resnet50_caffe')),
    neck=dict(
        type='FPN',
        in_channels=[256, 512, 1024, 2048],
        out_channels=256,
        num_outs=5),
    rpn_head=dict(
        type='RPNHead',
        in_channels=256,
        feat_channels=256,
        anchor_generator=dict(
            type='AnchorGenerator',
            scales=[8],
            ratios=[0.5, 1.0, 2.0],
            strides=[4, 8, 16, 32, 64]),
        bbox_coder=dict(
            type='DeltaXYWHBBoxCoder',
            target_means=[0.0, 0.0, 0.0, 0.0],
            target_stds=[1.0, 1.0, 1.0, 1.0]),
        loss_cls=dict(
            type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
        loss_bbox=dict(type='L1Loss', loss_weight=1.0)),
    roi_head=dict(
        type='StandardRoIHead',
        bbox_roi_extractor=dict(
            type='SingleRoIExtractor',
            roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0),
            out_channels=256,
            featmap_strides=[4, 8, 16, 32]),
        bbox_head=dict(
            type='Shared2FCBBoxHead',
            in_channels=256,
            fc_out_channels=1024,
            roi_feat_size=7,
            num_classes=3,
            bbox_coder=dict(
                type='DeltaXYWHBBoxCoder',
                target_means=[0.0, 0.0, 0.0, 0.0],
                target_stds=[0.1, 0.1, 0.2, 0.2]),
            reg_class_agnostic=False,
            loss_cls=dict(
                type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
            loss_bbox=dict(type='L1Loss', loss_weight=1.0))),
    train_cfg=dict(
        rpn=dict(
            assigner=dict(
                type='MaxIoUAssigner',
                pos_iou_thr=0.7,
                neg_iou_thr=0.3,
                min_pos_iou=0.3,
                match_low_quality=True,
                ignore_iof_thr=-1),
            sampler=dict(
                type='RandomSampler',
                num=256,
                pos_fraction=0.5,
                neg_pos_ub=-1,
                add_gt_as_proposals=False),
            allowed_border=-1,
            pos_weight=-1,
            debug=False),
        rpn_proposal=dict(
            nms_pre=2000,
            max_per_img=1000,
            nms=dict(type='nms', iou_threshold=0.7),
            min_bbox_size=0),
        rcnn=dict(
            assigner=dict(
                type='MaxIoUAssigner',
                pos_iou_thr=0.5,
                neg_iou_thr=0.5,
                min_pos_iou=0.5,
                match_low_quality=False,
                ignore_iof_thr=-1),
            sampler=dict(
                type='RandomSampler',
                num=512,
                pos_fraction=0.25,
                neg_pos_ub=-1,
                add_gt_as_proposals=True),
            pos_weight=-1,
            debug=False)),
    test_cfg=dict(
        rpn=dict(
            nms_pre=1000,
            max_per_img=1000,
            nms=dict(type='nms', iou_threshold=0.7),
            min_bbox_size=0),
        rcnn=dict(
            score_thr=0.05,
            nms=dict(type='nms', iou_threshold=0.5),
            max_per_img=100)))
dataset_type = 'KittiTinyDataset'
data_root = 'kitti_tiny/'
img_norm_cfg = dict(
    mean=[103.53, 116.28, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False)
train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations', with_bbox=True),
    dict(
        type='Resize',
        img_scale=[(1333, 640), (1333, 672), (1333, 704), (1333, 736),
                   (1333, 768), (1333, 800)],
        multiscale_mode='value',
        keep_ratio=True),
    dict(type='RandomFlip', flip_ratio=0.5),
    dict(
        type='Normalize',
        mean=[103.53, 116.28, 123.675],
        std=[1.0, 1.0, 1.0],
        to_rgb=False),
    dict(type='Pad', size_divisor=32),
    dict(type='DefaultFormatBundle'),
    dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels'])
]
test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=(1333, 800),
        flip=False,
        transforms=[
            dict(type='Resize', keep_ratio=True),
            dict(type='RandomFlip'),
            dict(
                type='Normalize',
                mean=[103.53, 116.28, 123.675],
                std=[1.0, 1.0, 1.0],
                to_rgb=False),
            dict(type='Pad', size_divisor=32),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='Collect', keys=['img'])
        ])
]
data = dict(
    samples_per_gpu=2,
    workers_per_gpu=2,
    train=dict(
        type='KittiTinyDataset',
        ann_file='train.txt',
        img_prefix='training/image_2',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(type='LoadAnnotations', with_bbox=True),
            dict(
                type='Resize',
                img_scale=[(1333, 640), (1333, 672), (1333, 704), (1333, 736),
                           (1333, 768), (1333, 800)],
                multiscale_mode='value',
                keep_ratio=True),
            dict(type='RandomFlip', flip_ratio=0.5),
            dict(
                type='Normalize',
                mean=[103.53, 116.28, 123.675],
                std=[1.0, 1.0, 1.0],
                to_rgb=False),
            dict(type='Pad', size_divisor=32),
            dict(type='DefaultFormatBundle'),
            dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels'])
        ],
        data_root='kitti_tiny/'),
    val=dict(
        type='KittiTinyDataset',
        ann_file='val.txt',
        img_prefix='training/image_2',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(
                type='MultiScaleFlipAug',
                img_scale=(1333, 800),
                flip=False,
                transforms=[
                    dict(type='Resize', keep_ratio=True),
                    dict(type='RandomFlip'),
                    dict(
                        type='Normalize',
                        mean=[103.53, 116.28, 123.675],
                        std=[1.0, 1.0, 1.0],
                        to_rgb=False),
                    dict(type='Pad', size_divisor=32),
                    dict(type='ImageToTensor', keys=['img']),
                    dict(type='Collect', keys=['img'])
                ])
        ],
        data_root='kitti_tiny/'),
    test=dict(
        type='KittiTinyDataset',
        ann_file='train.txt',
        img_prefix='training/image_2',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(
                type='MultiScaleFlipAug',
                img_scale=(1333, 800),
                flip=False,
                transforms=[
                    dict(type='Resize', keep_ratio=True),
                    dict(type='RandomFlip'),
                    dict(
                        type='Normalize',
                        mean=[103.53, 116.28, 123.675],
                        std=[1.0, 1.0, 1.0],
                        to_rgb=False),
                    dict(type='Pad', size_divisor=32),
                    dict(type='ImageToTensor', keys=['img']),
                    dict(type='Collect', keys=['img'])
                ])
        ],
        data_root='kitti_tiny/'))
evaluation = dict(interval=12, metric='mAP')
optimizer = dict(type='SGD', lr=0.0025, momentum=0.9, weight_decay=0.0001)
optimizer_config = dict(grad_clip=None)
lr_config = dict(
    policy='step',
    warmup=None,
    warmup_iters=500,
    warmup_ratio=0.001,
    step=[8, 11])
runner = dict(type='EpochBasedRunner', max_epochs=12)
checkpoint_config = dict(interval=12)
log_config = dict(
    interval=10,
    hooks=[dict(type='TextLoggerHook'),
           dict(type='TensorboardLoggerHook')])
custom_hooks = [dict(type='NumClassCheckHook')]
dist_params = dict(backend='nccl')
log_level = 'INFO'
load_from = 'checkpoints/faster_rcnn_r50_caffe_fpn_mstrain_3x_coco_20210526_095054-1f77628b.pth'
resume_from = None
workflow = [('train', 1)]
opencv_num_threads = 0
mp_start_method = 'fork'
work_dir = './tutorial_exps'
seed = 0
gpu_ids = range(0, 1)

Train a new detector

Finally, lets initialize the dataset and detector, then train a new detector! We use the high-level API train_detector implemented by MMDetection. This is also used in our training scripts. For details of the implementation, please see here.


In [18]:
from mmdet.datasets import build_dataset
from mmdet.models import build_detector
from mmdet.apis import train_detector


# Build dataset
datasets = [build_dataset(cfg.data.train)]

# Build the detector
model = build_detector(cfg.model)
# Add an attribute for visualization convenience
model.CLASSES = datasets[0].CLASSES

# Create work_dir
mmcv.mkdir_or_exist(osp.abspath(cfg.work_dir))
train_detector(model, datasets, cfg, distributed=False, validate=True)


/content/mmdetection/mmdet/datasets/custom.py:180: UserWarning: CustomDataset does not support filtering empty gt images.
  'CustomDataset does not support filtering empty gt images.')
2022-02-08 11:38:22,273 - mmdet - INFO - load checkpoint from local path: checkpoints/faster_rcnn_r50_caffe_fpn_mstrain_3x_coco_20210526_095054-1f77628b.pth
2022-02-08 11:38:22,406 - mmdet - WARNING - The model and loaded state dict do not match exactly

size mismatch for roi_head.bbox_head.fc_cls.weight: copying a param with shape torch.Size([81, 1024]) from checkpoint, the shape in current model is torch.Size([4, 1024]).
size mismatch for roi_head.bbox_head.fc_cls.bias: copying a param with shape torch.Size([81]) from checkpoint, the shape in current model is torch.Size([4]).
size mismatch for roi_head.bbox_head.fc_reg.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([12, 1024]).
size mismatch for roi_head.bbox_head.fc_reg.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([12]).
2022-02-08 11:38:22,410 - mmdet - INFO - Start running, host: root@503df4019aac, work_dir: /content/mmdetection/tutorial_exps
2022-02-08 11:38:22,412 - mmdet - INFO - Hooks will be executed in the following order:
before_run:
(VERY_HIGH   ) StepLrUpdaterHook                  
(NORMAL      ) CheckpointHook                     
(LOW         ) EvalHook                           
(VERY_LOW    ) TextLoggerHook                     
(VERY_LOW    ) TensorboardLoggerHook              
 -------------------- 
before_train_epoch:
(VERY_HIGH   ) StepLrUpdaterHook                  
(NORMAL      ) NumClassCheckHook                  
(LOW         ) IterTimerHook                      
(LOW         ) EvalHook                           
(VERY_LOW    ) TextLoggerHook                     
(VERY_LOW    ) TensorboardLoggerHook              
 -------------------- 
before_train_iter:
(VERY_HIGH   ) StepLrUpdaterHook                  
(LOW         ) IterTimerHook                      
(LOW         ) EvalHook                           
 -------------------- 
after_train_iter:
(ABOVE_NORMAL) OptimizerHook                      
(NORMAL      ) CheckpointHook                     
(LOW         ) IterTimerHook                      
(LOW         ) EvalHook                           
(VERY_LOW    ) TextLoggerHook                     
(VERY_LOW    ) TensorboardLoggerHook              
 -------------------- 
after_train_epoch:
(NORMAL      ) CheckpointHook                     
(LOW         ) EvalHook                           
(VERY_LOW    ) TextLoggerHook                     
(VERY_LOW    ) TensorboardLoggerHook              
 -------------------- 
before_val_epoch:
(NORMAL      ) NumClassCheckHook                  
(LOW         ) IterTimerHook                      
(VERY_LOW    ) TextLoggerHook                     
(VERY_LOW    ) TensorboardLoggerHook              
 -------------------- 
before_val_iter:
(LOW         ) IterTimerHook                      
 -------------------- 
after_val_iter:
(LOW         ) IterTimerHook                      
 -------------------- 
after_val_epoch:
(VERY_LOW    ) TextLoggerHook                     
(VERY_LOW    ) TensorboardLoggerHook              
 -------------------- 
after_run:
(VERY_LOW    ) TextLoggerHook                     
(VERY_LOW    ) TensorboardLoggerHook              
 -------------------- 
2022-02-08 11:38:22,414 - mmdet - INFO - workflow: [('train', 1)], max: 12 epochs
2022-02-08 11:38:22,417 - mmdet - INFO - Checkpoints will be saved to /content/mmdetection/tutorial_exps by HardDiskBackend.
2022-02-08 11:38:35,245 - mmdet - INFO - Epoch [1][10/25]	lr: 2.500e-03, eta: 0:03:51, time: 0.799, data_time: 0.231, memory: 2455, loss_rpn_cls: 0.0254, loss_rpn_bbox: 0.0173, loss_cls: 0.5374, acc: 81.6309, loss_bbox: 0.3946, loss: 0.9746
2022-02-08 11:38:38,778 - mmdet - INFO - Epoch [1][20/25]	lr: 2.500e-03, eta: 0:02:41, time: 0.353, data_time: 0.024, memory: 2455, loss_rpn_cls: 0.0158, loss_rpn_bbox: 0.0119, loss_cls: 0.1778, acc: 93.3789, loss_bbox: 0.3290, loss: 0.5344
2022-02-08 11:38:46,422 - mmdet - INFO - Epoch [2][10/25]	lr: 2.500e-03, eta: 0:02:10, time: 0.576, data_time: 0.230, memory: 2456, loss_rpn_cls: 0.0203, loss_rpn_bbox: 0.0139, loss_cls: 0.1573, acc: 94.4824, loss_bbox: 0.2689, loss: 0.4603
2022-02-08 11:38:50,015 - mmdet - INFO - Epoch [2][20/25]	lr: 2.500e-03, eta: 0:01:58, time: 0.360, data_time: 0.023, memory: 2456, loss_rpn_cls: 0.0127, loss_rpn_bbox: 0.0127, loss_cls: 0.1446, acc: 94.6777, loss_bbox: 0.2154, loss: 0.3854
2022-02-08 11:38:57,686 - mmdet - INFO - Epoch [3][10/25]	lr: 2.500e-03, eta: 0:01:46, time: 0.575, data_time: 0.226, memory: 2456, loss_rpn_cls: 0.0064, loss_rpn_bbox: 0.0104, loss_cls: 0.0943, acc: 96.5039, loss_bbox: 0.1586, loss: 0.2697
2022-02-08 11:39:01,390 - mmdet - INFO - Epoch [3][20/25]	lr: 2.500e-03, eta: 0:01:39, time: 0.370, data_time: 0.024, memory: 2456, loss_rpn_cls: 0.0072, loss_rpn_bbox: 0.0132, loss_cls: 0.1439, acc: 94.6191, loss_bbox: 0.2597, loss: 0.4242
2022-02-08 11:39:09,266 - mmdet - INFO - Epoch [4][10/25]	lr: 2.500e-03, eta: 0:01:31, time: 0.590, data_time: 0.228, memory: 2456, loss_rpn_cls: 0.0057, loss_rpn_bbox: 0.0134, loss_cls: 0.1181, acc: 95.4199, loss_bbox: 0.2243, loss: 0.3616
2022-02-08 11:39:13,065 - mmdet - INFO - Epoch [4][20/25]	lr: 2.500e-03, eta: 0:01:26, time: 0.379, data_time: 0.024, memory: 2456, loss_rpn_cls: 0.0050, loss_rpn_bbox: 0.0117, loss_cls: 0.1196, acc: 95.4004, loss_bbox: 0.2120, loss: 0.3484
2022-02-08 11:39:20,854 - mmdet - INFO - Epoch [5][10/25]	lr: 2.500e-03, eta: 0:01:19, time: 0.582, data_time: 0.228, memory: 2456, loss_rpn_cls: 0.0028, loss_rpn_bbox: 0.0091, loss_cls: 0.1021, acc: 96.1719, loss_bbox: 0.2075, loss: 0.3216
2022-02-08 11:39:24,557 - mmdet - INFO - Epoch [5][20/25]	lr: 2.500e-03, eta: 0:01:14, time: 0.369, data_time: 0.023, memory: 2456, loss_rpn_cls: 0.0030, loss_rpn_bbox: 0.0106, loss_cls: 0.0942, acc: 96.6309, loss_bbox: 0.1926, loss: 0.3003
2022-02-08 11:39:32,255 - mmdet - INFO - Epoch [6][10/25]	lr: 2.500e-03, eta: 0:01:07, time: 0.576, data_time: 0.226, memory: 2456, loss_rpn_cls: 0.0025, loss_rpn_bbox: 0.0081, loss_cls: 0.0787, acc: 97.2363, loss_bbox: 0.1827, loss: 0.2721
2022-02-08 11:39:35,900 - mmdet - INFO - Epoch [6][20/25]	lr: 2.500e-03, eta: 0:01:02, time: 0.364, data_time: 0.023, memory: 2456, loss_rpn_cls: 0.0035, loss_rpn_bbox: 0.0100, loss_cls: 0.0901, acc: 96.5332, loss_bbox: 0.1857, loss: 0.2893
2022-02-08 11:39:43,555 - mmdet - INFO - Epoch [7][10/25]	lr: 2.500e-03, eta: 0:00:56, time: 0.576, data_time: 0.228, memory: 2456, loss_rpn_cls: 0.0023, loss_rpn_bbox: 0.0093, loss_cls: 0.0877, acc: 96.7383, loss_bbox: 0.1736, loss: 0.2730
2022-02-08 11:39:47,186 - mmdet - INFO - Epoch [7][20/25]	lr: 2.500e-03, eta: 0:00:52, time: 0.362, data_time: 0.024, memory: 2456, loss_rpn_cls: 0.0040, loss_rpn_bbox: 0.0112, loss_cls: 0.0889, acc: 96.6699, loss_bbox: 0.1800, loss: 0.2840
2022-02-08 11:39:54,874 - mmdet - INFO - Epoch [8][10/25]	lr: 2.500e-03, eta: 0:00:46, time: 0.575, data_time: 0.227, memory: 2456, loss_rpn_cls: 0.0020, loss_rpn_bbox: 0.0094, loss_cls: 0.0748, acc: 97.0801, loss_bbox: 0.1381, loss: 0.2243
2022-02-08 11:39:58,511 - mmdet - INFO - Epoch [8][20/25]	lr: 2.500e-03, eta: 0:00:41, time: 0.364, data_time: 0.025, memory: 2456, loss_rpn_cls: 0.0031, loss_rpn_bbox: 0.0081, loss_cls: 0.0743, acc: 97.0801, loss_bbox: 0.1635, loss: 0.2489
2022-02-08 11:40:06,228 - mmdet - INFO - Epoch [9][10/25]	lr: 2.500e-04, eta: 0:00:35, time: 0.577, data_time: 0.227, memory: 2456, loss_rpn_cls: 0.0024, loss_rpn_bbox: 0.0085, loss_cls: 0.0649, acc: 97.5781, loss_bbox: 0.1307, loss: 0.2065
2022-02-08 11:40:09,873 - mmdet - INFO - Epoch [9][20/25]	lr: 2.500e-04, eta: 0:00:31, time: 0.365, data_time: 0.025, memory: 2456, loss_rpn_cls: 0.0010, loss_rpn_bbox: 0.0066, loss_cls: 0.0530, acc: 97.9199, loss_bbox: 0.1090, loss: 0.1695
2022-02-08 11:40:17,597 - mmdet - INFO - Epoch [10][10/25]	lr: 2.500e-04, eta: 0:00:25, time: 0.579, data_time: 0.227, memory: 2456, loss_rpn_cls: 0.0041, loss_rpn_bbox: 0.0084, loss_cls: 0.0676, acc: 97.3633, loss_bbox: 0.1367, loss: 0.2168
2022-02-08 11:40:21,269 - mmdet - INFO - Epoch [10][20/25]	lr: 2.500e-04, eta: 0:00:21, time: 0.367, data_time: 0.025, memory: 2456, loss_rpn_cls: 0.0008, loss_rpn_bbox: 0.0055, loss_cls: 0.0593, acc: 97.7246, loss_bbox: 0.1277, loss: 0.1934
2022-02-08 11:40:29,010 - mmdet - INFO - Epoch [11][10/25]	lr: 2.500e-04, eta: 0:00:15, time: 0.579, data_time: 0.228, memory: 2456, loss_rpn_cls: 0.0007, loss_rpn_bbox: 0.0072, loss_cls: 0.0618, acc: 97.5977, loss_bbox: 0.1196, loss: 0.1892
2022-02-08 11:40:32,714 - mmdet - INFO - Epoch [11][20/25]	lr: 2.500e-04, eta: 0:00:11, time: 0.370, data_time: 0.024, memory: 2456, loss_rpn_cls: 0.0011, loss_rpn_bbox: 0.0074, loss_cls: 0.0552, acc: 97.9297, loss_bbox: 0.1246, loss: 0.1883
2022-02-08 11:40:40,497 - mmdet - INFO - Epoch [12][10/25]	lr: 2.500e-05, eta: 0:00:05, time: 0.583, data_time: 0.227, memory: 2456, loss_rpn_cls: 0.0010, loss_rpn_bbox: 0.0060, loss_cls: 0.0563, acc: 97.7637, loss_bbox: 0.1237, loss: 0.1871
2022-02-08 11:40:44,191 - mmdet - INFO - Epoch [12][20/25]	lr: 2.500e-05, eta: 0:00:01, time: 0.369, data_time: 0.024, memory: 2456, loss_rpn_cls: 0.0013, loss_rpn_bbox: 0.0049, loss_cls: 0.0487, acc: 98.0273, loss_bbox: 0.0890, loss: 0.1439
2022-02-08 11:40:45,980 - mmdet - INFO - Saving checkpoint at 12 epochs
[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 25/25, 9.7 task/s, elapsed: 3s, ETA:     0s
---------------iou_thr: 0.5---------------
2022-02-08 11:40:51,306 - mmdet - INFO - 
+------------+-----+------+--------+-------+
| class      | gts | dets | recall | ap    |
+------------+-----+------+--------+-------+
| Car        | 62  | 129  | 0.968  | 0.869 |
| Pedestrian | 13  | 38   | 0.846  | 0.752 |
| Cyclist    | 7   | 51   | 0.571  | 0.123 |
+------------+-----+------+--------+-------+
| mAP        |     |      |        | 0.581 |
+------------+-----+------+--------+-------+
2022-02-08 11:40:51,309 - mmdet - INFO - Epoch(val) [12][25]	AP50: 0.5810, mAP: 0.5813

Understand the log

From the log, we can have a basic understanding on the training process and know how well the detector is trained.

First, since the dataset we are using is small, we loaded a pre-trained Faster R-CNN model and fine-tune it for detection. The original Faster R-CNN is trained on COCO dataset that contains 80 classes but KITTI Tiny dataset only have 3 classes. Therefore, the last FC layers of the pre-trained Faster R-CNN for classification and regression have different weight shape and are not used.

Second, after training, the detector is evaluated by the default VOC-style evaluation. The results show that the detector achieves 58.1 mAP on the val dataset, not bad!

We can also check the tensorboard to see the curves.


In [19]:
# load tensorboard in colab
%load_ext tensorboard

# see curves in tensorboard
%tensorboard --logdir ./tutorial_exps


From the tensorboard, we can observe that changes of loss and learning rate. We can see the losses of each branch gradually decrease as the training goes by.

Test the Trained Detector

After finetuning the detector, let's visualize the prediction results!


In [20]:
img = mmcv.imread('kitti_tiny/training/image_2/000068.jpeg')

model.cfg = cfg
result = inference_detector(model, img)
show_result_pyplot(model, img, result)


/content/mmdetection/mmdet/datasets/utils.py:69: UserWarning: "ImageToTensor" pipeline is replaced by "DefaultFormatBundle" for batch inference. It is recommended to manually replace it in the test data pipeline in your config file.
  'data pipeline in your config file.', UserWarning)

What to Do Next?

So far, we have learnt how to test and train a two-stage detector using MMDetection. To further explore MMDetection, you could do several other things as shown below:

  • Try single-stage detectors, e.g., RetinaNet and SSD in MMDetection model zoo. Single-stage detectors are more commonly used than two-stage detectors in industry.
  • Try anchor-free detectors, e.g., FCOS and RepPoints in MMDetection model zoo. Anchor-free detector is a new trend in the object detection community.
  • Try 3D object detection using MMDetection3D, also one of the OpenMMLab projects. In MMDetection3D, not only can you try all the methods supported in MMDetection but also some 3D object detectors.