The Kinetics Human Action Video Dataset

Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset (CVPR2017)

Contribution Points

  • New Large Dataset
  • New Video Model ## dataset ### New dataset
  • 160,000 clips

    • 400 human action classes, 400 video clips for each action = 400x400 = 160,000 clips
    • Each clip lasts around 10s and is taken from a different YouTube video.

Previous datasets

  • HMDB-51

  • UCF-101

    • total 14,xxx ~ 15,xxx clips = 101 actions x 25 groups x 4-7 videos of an action.
      • The videos from the same group may share some common features, such as similar background, similar viewpoint, etc.
      • 5 types
        • 1)Human-Object Interaction 2) Body-Motion Only 3) Human-Human Interaction 4) Playing Musical Instruments 5) Sports.
    • Ucf101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402, 2012.
    • URL: http://crcv.ucf.edu/data/UCF101.php

Previous Models

Optical Flow Frames

  • The flow field is visualized using hue to indicate the direction and intensity for the magnitude.
  • Bi-directional optical flow. backward(past) & forward(future)

New Model

Bootstrapping 3D filters from 2D Filters.