Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset (CVPR2017)

Contribution Points

New Large Dataset
New Video Model ## dataset ### New dataset
160,000 clips
- 400 human action classes, 400 video clips for each action = 400x400 = 160,000 clips
- Each clip lasts around 10s and is taken from a different YouTube video.

HMDB-51
- 7,000 clips, 51 actions
- HMDB: a large video database for human motion recog- nition. In Proceedings of the International Conference on Computer Vision (ICCV), 2011.
- URL: http://serre-lab.clps.brown.edu/resource/hmdb-a-large-human-motion-database/
UCF-101
- total 14,xxx ~ 15,xxx clips = 101 actions x 25 groups x 4-7 videos of an action.
  - The videos from the same group may share some common features, such as similar background, similar viewpoint, etc.
  - 5 types
    - 1)Human-Object Interaction 2) Body-Motion Only 3) Human-Human Interaction 4) Playing Musical Instruments 5) Sports.
- Ucf101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402, 2012.
- URL: http://crcv.ucf.edu/data/UCF101.php

The flow field is visualized using hue to indicate the direction and intensity for the magnitude.
Bi-directional optical flow. backward(past) & forward(future)