In [1]:

    
import IPython.display
IPython.display.YouTubeVideo('leVZjVahdKs')









    Out[1]:

如何使用和开发微信聊天机器人的系列教程

A workshop to develop & use an intelligent and interactive chat-bot in WeChat

by: GU Zhan (Sam)

October 2018 : Update to support Python 3 in local machine, e.g. iss-vm.

April 2017 ======= Scan the QR code to become trainer's friend in WeChat =====>>

第五课：视频识别和处理

Lesson 5: Video Recognition & Processing

识别视频消息中的物体名字 (Label Detection: Detect entities within the video, such as "dog", "flower" or "car")
识别视频的场景片段 (Shot Change Detection: Detect scene changes within the video)
识别受限内容 (Explicit Content Detection: Detect adult content within a video)
生成视频字幕 (Video Transcription BETA: Transcribes video content in English)

Using Google Cloud Platform's Machine Learning APIs

From the same API console, choose "Dashboard" on the left-hand menu and "Enable API".

Enable the following APIs for your project (search for them) if they are not already enabled:

Google Cloud Video Intelligence API

Finally, because we are calling the APIs from Python (clients in many other languages are available), let's install the Python package (it's not installed by default on Datalab)



In [2]:

    
# Copyright 2016 Google Inc. Licensed under the Apache License, Version 2.0 (the "License"); 
# !pip install --upgrade google-api-python-client

短片预览 / Video viewing



In [3]:

    
# 多媒体文件的二进制base64码转换 (Define media pre-processing functions)

# Import the base64 encoding library.
import base64, io, sys, IPython.display

# Python 2
if sys.version_info[0] < 3:
    import urllib2
# Python 3
else:
    import urllib.request

# Pass the media data to an encoding function.
def encode_media(media_file):
    with io.open(media_file, "rb") as media_file:
        media_content = media_file.read()
# Python 2
    if sys.version_info[0] < 3:
        return base64.b64encode(media_content).decode('ascii')
# Python 3
    else:
        return base64.b64encode(media_content).decode('utf-8')



In [4]:

    
video_file = 'reference/video_IPA.mp4'
# video_file = 'reference/SampleVideo_360x240_1mb.mp4'
# video_file = 'reference/SampleVideo_360x240_2mb.mp4'



In [5]:

    
IPython.display.HTML(data=
        '''<video alt="test" controls><source src="data:video/mp4;base64,{0}" type="video/mp4" /></video>'''
        .format(encode_media(video_file)))









    Out[5]:

Install the client library for Video Intelligence / Processing



In [6]:

    
!pip install --upgrade google-cloud-videointelligence









    



Requirement already up-to-date: google-cloud-videointelligence in /home/iss-user/anaconda3/envs/iss-env-py3/lib/python3.6/site-packages (1.5.0)
Requirement already satisfied, skipping upgrade: google-api-core[grpc]<2.0.0dev,>=0.1.0 in /home/iss-user/anaconda3/envs/iss-env-py3/lib/python3.6/site-packages (from google-cloud-videointelligence) (1.5.0)
Requirement already satisfied, skipping upgrade: protobuf>=3.4.0 in /home/iss-user/anaconda3/envs/iss-env-py3/lib/python3.6/site-packages (from google-api-core[grpc]<2.0.0dev,>=0.1.0->google-cloud-videointelligence) (3.6.1)
Requirement already satisfied, skipping upgrade: pytz in /home/iss-user/anaconda3/envs/iss-env-py3/lib/python3.6/site-packages (from google-api-core[grpc]<2.0.0dev,>=0.1.0->google-cloud-videointelligence) (2018.3)
Requirement already satisfied, skipping upgrade: google-auth<2.0.0dev,>=0.4.0 in /home/iss-user/anaconda3/envs/iss-env-py3/lib/python3.6/site-packages (from google-api-core[grpc]<2.0.0dev,>=0.1.0->google-cloud-videointelligence) (1.5.1)
Requirement already satisfied, skipping upgrade: googleapis-common-protos<2.0dev,>=1.5.3 in /home/iss-user/anaconda3/envs/iss-env-py3/lib/python3.6/site-packages (from google-api-core[grpc]<2.0.0dev,>=0.1.0->google-cloud-videointelligence) (1.5.3)
Requirement already satisfied, skipping upgrade: setuptools>=34.0.0 in /home/iss-user/anaconda3/envs/iss-env-py3/lib/python3.6/site-packages (from google-api-core[grpc]<2.0.0dev,>=0.1.0->google-cloud-videointelligence) (38.5.1)
Requirement already satisfied, skipping upgrade: requests<3.0.0dev,>=2.18.0 in /home/iss-user/anaconda3/envs/iss-env-py3/lib/python3.6/site-packages (from google-api-core[grpc]<2.0.0dev,>=0.1.0->google-cloud-videointelligence) (2.18.4)
Requirement already satisfied, skipping upgrade: six>=1.10.0 in /home/iss-user/anaconda3/envs/iss-env-py3/lib/python3.6/site-packages (from google-api-core[grpc]<2.0.0dev,>=0.1.0->google-cloud-videointelligence) (1.11.0)
Requirement already satisfied, skipping upgrade: grpcio>=1.8.2; extra == "grpc" in /home/iss-user/anaconda3/envs/iss-env-py3/lib/python3.6/site-packages (from google-api-core[grpc]<2.0.0dev,>=0.1.0->google-cloud-videointelligence) (1.10.0)
Requirement already satisfied, skipping upgrade: cachetools>=2.0.0 in /home/iss-user/anaconda3/envs/iss-env-py3/lib/python3.6/site-packages (from google-auth<2.0.0dev,>=0.4.0->google-api-core[grpc]<2.0.0dev,>=0.1.0->google-cloud-videointelligence) (2.1.0)
Requirement already satisfied, skipping upgrade: rsa>=3.1.4 in /home/iss-user/anaconda3/envs/iss-env-py3/lib/python3.6/site-packages (from google-auth<2.0.0dev,>=0.4.0->google-api-core[grpc]<2.0.0dev,>=0.1.0->google-cloud-videointelligence) (3.4.2)
Requirement already satisfied, skipping upgrade: pyasn1-modules>=0.2.1 in /home/iss-user/anaconda3/envs/iss-env-py3/lib/python3.6/site-packages (from google-auth<2.0.0dev,>=0.4.0->google-api-core[grpc]<2.0.0dev,>=0.1.0->google-cloud-videointelligence) (0.2.1)
Requirement already satisfied, skipping upgrade: chardet<3.1.0,>=3.0.2 in /home/iss-user/anaconda3/envs/iss-env-py3/lib/python3.6/site-packages (from requests<3.0.0dev,>=2.18.0->google-api-core[grpc]<2.0.0dev,>=0.1.0->google-cloud-videointelligence) (3.0.4)
Requirement already satisfied, skipping upgrade: idna<2.7,>=2.5 in /home/iss-user/anaconda3/envs/iss-env-py3/lib/python3.6/site-packages (from requests<3.0.0dev,>=2.18.0->google-api-core[grpc]<2.0.0dev,>=0.1.0->google-cloud-videointelligence) (2.6)
Requirement already satisfied, skipping upgrade: urllib3<1.23,>=1.21.1 in /home/iss-user/anaconda3/envs/iss-env-py3/lib/python3.6/site-packages (from requests<3.0.0dev,>=2.18.0->google-api-core[grpc]<2.0.0dev,>=0.1.0->google-cloud-videointelligence) (1.22)
Requirement already satisfied, skipping upgrade: certifi>=2017.4.17 in /home/iss-user/anaconda3/envs/iss-env-py3/lib/python3.6/site-packages (from requests<3.0.0dev,>=2.18.0->google-api-core[grpc]<2.0.0dev,>=0.1.0->google-cloud-videointelligence) (2018.8.13)
Requirement already satisfied, skipping upgrade: pyasn1>=0.1.3 in /home/iss-user/anaconda3/envs/iss-env-py3/lib/python3.6/site-packages (from rsa>=3.1.4->google-auth<2.0.0dev,>=0.4.0->google-api-core[grpc]<2.0.0dev,>=0.1.0->google-cloud-videointelligence) (0.4.2)



In [7]:

    
# Imports the Google Cloud client library
from google.cloud import videointelligence



In [8]:

    
# [Optional] Display location of service account API key if defined in GOOGLE_APPLICATION_CREDENTIALS
!echo $GOOGLE_APPLICATION_CREDENTIALS



In [9]:

    
##################################################################
# (1) Instantiates a client - using GOOGLE_APPLICATION_CREDENTIALS
# video_client = videointelligence.VideoIntelligenceServiceClient()

# 
# (2) Instantiates a client - using 'service account json' file
video_client = videointelligence.VideoIntelligenceServiceClient.from_service_account_json(
        "/media/sf_vm_shared_folder/000-cloud-api-key/mtech-ai-7b7e049cf5f6.json")
##################################################################

* 识别视频消息中的物体名字 (Label Detection: Detect entities within the video, such as "dog", "flower" or "car")

https://cloud.google.com/video-intelligence/docs/analyze-labels

didi_video_label_detection()



In [10]:

    
from google.cloud import videointelligence

def didi_video_label_detection(path):
    """Detect labels given a local file path. (Demo)"""
    """ Detects labels given a GCS path. (Exercise / Workshop Enhancement)"""

##################################################################
# (1) Instantiates a client - using GOOGLE_APPLICATION_CREDENTIALS
#     video_client = videointelligence.VideoIntelligenceServiceClient()

# 
# (2) Instantiates a client - using 'service account json' file
    video_client = videointelligence.VideoIntelligenceServiceClient.from_service_account_json(
        "/media/sf_vm_shared_folder/000-cloud-api-key/mtech-ai-7b7e049cf5f6.json")
##################################################################

    features = [videointelligence.enums.Feature.LABEL_DETECTION]

    with io.open(path, 'rb') as movie:
        input_content = movie.read()

    operation = video_client.annotate_video(
        features=features, input_content=input_content)
    print('\nProcessing video for label annotations:')

    result = operation.result(timeout=90)
    print('\nFinished processing.')

    # Process video/segment level label annotations
    segment_labels = result.annotation_results[0].segment_label_annotations
    for i, segment_label in enumerate(segment_labels):
        print('Video label description: {}'.format(
            segment_label.entity.description))
        for category_entity in segment_label.category_entities:
            print('\tLabel category description: {}'.format(
                category_entity.description))

        for i, segment in enumerate(segment_label.segments):
            start_time = (segment.segment.start_time_offset.seconds +
                          segment.segment.start_time_offset.nanos / 1e9)
            end_time = (segment.segment.end_time_offset.seconds +
                        segment.segment.end_time_offset.nanos / 1e9)
            positions = '{}s to {}s'.format(start_time, end_time)
            confidence = segment.confidence
            print('\tSegment {}: {}'.format(i, positions))
            print('\tConfidence: {}'.format(confidence))
        print('\n')

    # Process shot level label annotations
    shot_labels = result.annotation_results[0].shot_label_annotations
    for i, shot_label in enumerate(shot_labels):
        print('Shot label description: {}'.format(
            shot_label.entity.description))
        for category_entity in shot_label.category_entities:
            print('\tLabel category description: {}'.format(
                category_entity.description))

        for i, shot in enumerate(shot_label.segments):
            start_time = (shot.segment.start_time_offset.seconds +
                          shot.segment.start_time_offset.nanos / 1e9)
            end_time = (shot.segment.end_time_offset.seconds +
                        shot.segment.end_time_offset.nanos / 1e9)
            positions = '{}s to {}s'.format(start_time, end_time)
            confidence = shot.confidence
            print('\tSegment {}: {}'.format(i, positions))
            print('\tConfidence: {}'.format(confidence))
        print('\n')

    # Process frame level label annotations
    frame_labels = result.annotation_results[0].frame_label_annotations
    for i, frame_label in enumerate(frame_labels):
        print('Frame label description: {}'.format(
            frame_label.entity.description))
        for category_entity in frame_label.category_entities:
            print('\tLabel category description: {}'.format(
                category_entity.description))

        # Each frame_label_annotation has many frames,
        # here we print information only about the first frame.
        frame = frame_label.frames[0]
        time_offset = frame.time_offset.seconds + frame.time_offset.nanos / 1e9
        print('\tFirst frame time offset: {}s'.format(time_offset))
        print('\tFirst frame confidence: {}'.format(frame.confidence))
        print('\n')
        
    return segment_labels, shot_labels, frame_labels



In [11]:

    
# video_file = 'reference/video_IPA.mp4'



In [12]:

    
didi_segment_labels, didi_shot_labels, didi_frame_labels = didi_video_label_detection(video_file)









    



Processing video for label annotations:

Finished processing.
Video label description: toy
	Segment 0: 0.0s to 5.5s
	Confidence: 0.9247158169746399


Video label description: lego
	Label category description: toy
	Segment 0: 0.0s to 5.5s
	Confidence: 0.9257180094718933


Video label description: robot
	Label category description: technology
	Label category description: machine
	Segment 0: 0.0s to 5.5s
	Confidence: 0.32479360699653625


Shot label description: toy
	Segment 0: 0.0s to 5.5s
	Confidence: 0.9247158169746399


Shot label description: lego
	Label category description: toy
	Segment 0: 0.0s to 5.5s
	Confidence: 0.9257180094718933


Shot label description: robot
	Label category description: technology
	Label category description: machine
	Segment 0: 0.0s to 5.5s
	Confidence: 0.32479360699653625



In [13]:

    
didi_segment_labels









    Out[13]:





[entity {
  entity_id: "/m/0138tl"
  description: "toy"
  language_code: "en-US"
}
segments {
  segment {
    start_time_offset {
    }
    end_time_offset {
      seconds: 5
      nanos: 500000000
    }
  }
  confidence: 0.9247158169746399
}
, entity {
  entity_id: "/m/04ndr"
  description: "lego"
  language_code: "en-US"
}
category_entities {
  entity_id: "/m/0138tl"
  description: "toy"
  language_code: "en-US"
}
segments {
  segment {
    start_time_offset {
    }
    end_time_offset {
      seconds: 5
      nanos: 500000000
    }
  }
  confidence: 0.9257180094718933
}
, entity {
  entity_id: "/m/06fgw"
  description: "robot"
  language_code: "en-US"
}
category_entities {
  entity_id: "/m/07c1v"
  description: "technology"
  language_code: "en-US"
}
category_entities {
  entity_id: "/m/0dkw5"
  description: "machine"
  language_code: "en-US"
}
segments {
  segment {
    start_time_offset {
    }
    end_time_offset {
      seconds: 5
      nanos: 500000000
    }
  }
  confidence: 0.32479360699653625
}
]



In [14]:

    
didi_shot_labels









    Out[14]:





[entity {
  entity_id: "/m/0138tl"
  description: "toy"
  language_code: "en-US"
}
segments {
  segment {
    start_time_offset {
    }
    end_time_offset {
      seconds: 5
      nanos: 500000000
    }
  }
  confidence: 0.9247158169746399
}
, entity {
  entity_id: "/m/04ndr"
  description: "lego"
  language_code: "en-US"
}
category_entities {
  entity_id: "/m/0138tl"
  description: "toy"
  language_code: "en-US"
}
segments {
  segment {
    start_time_offset {
    }
    end_time_offset {
      seconds: 5
      nanos: 500000000
    }
  }
  confidence: 0.9257180094718933
}
, entity {
  entity_id: "/m/06fgw"
  description: "robot"
  language_code: "en-US"
}
category_entities {
  entity_id: "/m/07c1v"
  description: "technology"
  language_code: "en-US"
}
category_entities {
  entity_id: "/m/0dkw5"
  description: "machine"
  language_code: "en-US"
}
segments {
  segment {
    start_time_offset {
    }
    end_time_offset {
      seconds: 5
      nanos: 500000000
    }
  }
  confidence: 0.32479360699653625
}
]



In [15]:

    
didi_frame_labels









    Out[15]:





[]

* 识别视频的场景片段 (Shot Change Detection: Detect scene changes within the video)

https://cloud.google.com/video-intelligence/docs/shot_detection

didi_video_shot_detection()



In [16]:

    
from google.cloud import videointelligence

def didi_video_shot_detection(path):
    """ Detects camera shot changes given a local file path """

##################################################################
# (1) Instantiates a client - using GOOGLE_APPLICATION_CREDENTIALS
#     video_client = videointelligence.VideoIntelligenceServiceClient()

# 
# (2) Instantiates a client - using 'service account json' file
    video_client = videointelligence.VideoIntelligenceServiceClient.from_service_account_json(
        "/media/sf_vm_shared_folder/000-cloud-api-key/mtech-ai-7b7e049cf5f6.json")
##################################################################

    features = [videointelligence.enums.Feature.SHOT_CHANGE_DETECTION]
#     features = [videointelligence.enums.Feature.LABEL_DETECTION]

    with io.open(path, 'rb') as movie:
        input_content = movie.read()
    
#     operation = video_client.annotate_video(path, features=features)
    operation = video_client.annotate_video(features=features, input_content=input_content)
    print('\nProcessing video for shot change annotations:')

    result = operation.result(timeout=180)
    print('\nFinished processing.')

    for i, shot in enumerate(result.annotation_results[0].shot_annotations):
        start_time = (shot.start_time_offset.seconds +
                      shot.start_time_offset.nanos / 1e9)
        end_time = (shot.end_time_offset.seconds +
                    shot.end_time_offset.nanos / 1e9)
        print('\tShot {}: {} to {}'.format(i, start_time, end_time))
        
    return result



In [17]:

    
# video_file = 'reference/video_IPA.mp4'



In [18]:

    
didi_result = didi_video_shot_detection(video_file)









    



Processing video for shot change annotations:

Finished processing.
	Shot 0: 0.0 to 5.5



In [19]:

    
didi_result









    Out[19]:





annotation_results {
  shot_annotations {
    start_time_offset {
    }
    end_time_offset {
      seconds: 5
      nanos: 500000000
    }
  }
}

* 识别受限内容 (Explicit Content Detection: Detect adult content within a video)

didi_video_safesearch_detection()



In [20]:

    
from google.cloud import videointelligence

def didi_video_safesearch_detection(path):
    """ Detects explicit content given a local file path. """

##################################################################
# (1) Instantiates a client - using GOOGLE_APPLICATION_CREDENTIALS
#     video_client = videointelligence.VideoIntelligenceServiceClient()

# 
# (2) Instantiates a client - using 'service account json' file
    video_client = videointelligence.VideoIntelligenceServiceClient.from_service_account_json(
        "/media/sf_vm_shared_folder/000-cloud-api-key/mtech-ai-7b7e049cf5f6.json")
##################################################################

    features = [videointelligence.enums.Feature.EXPLICIT_CONTENT_DETECTION]

    with io.open(path, 'rb') as movie:
        input_content = movie.read()
    
#     operation = video_client.annotate_video(path, features=features)
    operation = video_client.annotate_video(features=features, input_content=input_content)
    print('\nProcessing video for explicit content annotations:')

    result = operation.result(timeout=90)
    print('\nFinished processing.')

    likely_string = ("Unknown", "Very unlikely", "Unlikely", "Possible",
                     "Likely", "Very likely")

    # first result is retrieved because a single video was processed
    for frame in result.annotation_results[0].explicit_annotation.frames:
        frame_time = frame.time_offset.seconds + frame.time_offset.nanos / 1e9
        print('Time: {}s'.format(frame_time))
        print('\tpornography: {}'.format(
            likely_string[frame.pornography_likelihood]))
        
    return result



In [21]:

    
# video_file = 'reference/video_IPA.mp4'



In [22]:

    
didi_result = didi_video_safesearch_detection(video_file)









    



Processing video for explicit content annotations:

Finished processing.
Time: 0.070218s
	pornography: Very unlikely
Time: 1.262424s
	pornography: Very unlikely
Time: 2.265889s
	pornography: Very unlikely
Time: 3.2359999999999998s
	pornography: Very unlikely
Time: 4.288838s
	pornography: Very unlikely
Time: 5.358866s
	pornography: Very unlikely

[ Beta Features ] * 生成视频字幕 (Video Transcription BETA: Transcribes video content in English)

https://cloud.google.com/video-intelligence/docs/beta

Cloud Video Intelligence API includes the following beta features in version v1p1beta1:

Speech Transcription - the Video Intelligence API can transcribe speech to text from the audio in supported video files. Learn more.



In [23]:

    
# Beta Features: videointelligence_v1p1beta1
from google.cloud import videointelligence_v1p1beta1 as videointelligence

def didi_video_speech_transcription(path):
    """Transcribe speech given a local file path."""

##################################################################
# (1) Instantiates a client - using GOOGLE_APPLICATION_CREDENTIALS
#     video_client = videointelligence.VideoIntelligenceServiceClient()

# 
# (2) Instantiates a client - using 'service account json' file
    video_client = videointelligence.VideoIntelligenceServiceClient.from_service_account_json(
        "/media/sf_vm_shared_folder/000-cloud-api-key/mtech-ai-7b7e049cf5f6.json")
##################################################################

    features = [videointelligence.enums.Feature.SPEECH_TRANSCRIPTION]

    with io.open(path, 'rb') as movie:
        input_content = movie.read()
    
    config = videointelligence.types.SpeechTranscriptionConfig(
        language_code='en-US',
        enable_automatic_punctuation=True)
    video_context = videointelligence.types.VideoContext(
        speech_transcription_config=config)

#     operation = video_client.annotate_video(
#         input_uri, 
#         features=features,
#         video_context=video_context)
    operation = video_client.annotate_video(
        features=features,
        input_content=input_content, 
        video_context=video_context)

    print('\nProcessing video for speech transcription.')

    result = operation.result(timeout=180)   
    
    # There is only one annotation_result since only
    # one video is processed.
    annotation_results = result.annotation_results[0]
    speech_transcription = annotation_results.speech_transcriptions[0]
    
    if str(speech_transcription) == '': # result.annotation_results[0].speech_transcriptions[0] == ''
        print('\nNOT FOUND: video for speech transcription.')
    else:
        alternative = speech_transcription.alternatives[0]
        print('Transcript: {}'.format(alternative.transcript))
        print('Confidence: {}\n'.format(alternative.confidence))

        print('Word level information:')
        for word_info in alternative.words:
            word = word_info.word
            start_time = word_info.start_time
            end_time = word_info.end_time
            print('\t{}s - {}s: {}'.format(
                start_time.seconds + start_time.nanos * 1e-9,
                end_time.seconds + end_time.nanos * 1e-9,
                word))

    return result



In [24]:

    
# video_file = 'reference/video_IPA.mp4'



In [25]:

    
didi_result = didi_video_speech_transcription(video_file)









    



Processing video for speech transcription.
Transcript: Hi everyone. It's great to meet you in intelligent process automation course.
Confidence: 0.8206615447998047

Word level information:
	0.0s - 0.30000000000000004s: Hi
	0.30000000000000004s - 0.8s: everyone.
	0.8s - 1.6s: It's
	1.6s - 1.7000000000000002s: great
	1.7000000000000002s - 2.1s: to
	2.1s - 2.2s: meet
	2.2s - 2.5s: you
	2.5s - 2.6s: in
	2.6s - 3.5s: intelligent
	3.5s - 4.1s: process
	4.1s - 4.7s: automation
	4.7s - 5.4s: course.



In [26]:

    
didi_result









    Out[26]:





annotation_results {
  speech_transcriptions {
    alternatives {
      transcript: "Hi everyone. It\'s great to meet you in intelligent process automation course."
      confidence: 0.8206615447998047
      words {
        start_time {
        }
        end_time {
          nanos: 300000000
        }
        word: "Hi"
      }
      words {
        start_time {
          nanos: 300000000
        }
        end_time {
          nanos: 800000000
        }
        word: "everyone."
      }
      words {
        start_time {
          nanos: 800000000
        }
        end_time {
          seconds: 1
          nanos: 600000000
        }
        word: "It\'s"
      }
      words {
        start_time {
          seconds: 1
          nanos: 600000000
        }
        end_time {
          seconds: 1
          nanos: 700000000
        }
        word: "great"
      }
      words {
        start_time {
          seconds: 1
          nanos: 700000000
        }
        end_time {
          seconds: 2
          nanos: 100000000
        }
        word: "to"
      }
      words {
        start_time {
          seconds: 2
          nanos: 100000000
        }
        end_time {
          seconds: 2
          nanos: 200000000
        }
        word: "meet"
      }
      words {
        start_time {
          seconds: 2
          nanos: 200000000
        }
        end_time {
          seconds: 2
          nanos: 500000000
        }
        word: "you"
      }
      words {
        start_time {
          seconds: 2
          nanos: 500000000
        }
        end_time {
          seconds: 2
          nanos: 600000000
        }
        word: "in"
      }
      words {
        start_time {
          seconds: 2
          nanos: 600000000
        }
        end_time {
          seconds: 3
          nanos: 500000000
        }
        word: "intelligent"
      }
      words {
        start_time {
          seconds: 3
          nanos: 500000000
        }
        end_time {
          seconds: 4
          nanos: 100000000
        }
        word: "process"
      }
      words {
        start_time {
          seconds: 4
          nanos: 100000000
        }
        end_time {
          seconds: 4
          nanos: 700000000
        }
        word: "automation"
      }
      words {
        start_time {
          seconds: 4
          nanos: 700000000
        }
        end_time {
          seconds: 5
          nanos: 400000000
        }
        word: "course."
      }
    }
  }
}

Wrap cloud APIs into Functions() for conversational virtual assistant (VA):

Reuse above defined Functions().



In [35]:

    
def didi_video_processing(video_file):
    didi_video_reply  = u'[ Video 视频处理结果 ]\n\n'
    
    didi_video_reply += u'[ didi_video_label_detection 识别视频消息中的物体名字 ]\n\n' \
    + str(didi_video_label_detection(video_file)) + u'\n\n'
    
    didi_video_reply += u'[ didi_video_shot_detection 识别视频的场景片段 ]\n\n' \
    + str(didi_video_shot_detection(video_file)) + u'\n\n'
    
    didi_video_reply += u'[ didi_video_safesearch_detection 识别受限内容 ]\n\n' \
    + str(didi_video_safesearch_detection(video_file)) + u'\n\n'
    
    didi_video_reply += u'[ didi_video_speech_transcription 生成视频字幕 ]\n\n' \
    + str(didi_video_speech_transcription(video_file)) + u'\n\n'
    
    return didi_video_reply



In [37]:

    
# [Optional] Agile testing:
# parm_video_response = didi_video_processing(video_file)
# print(parm_video_response)









    



Processing video for label annotations:

Finished processing.
Video label description: lego
	Label category description: toy
	Segment 0: 0.0s to 5.5s
	Confidence: 0.9257180094718933


Video label description: robot
	Label category description: technology
	Label category description: machine
	Segment 0: 0.0s to 5.5s
	Confidence: 0.32479360699653625


Video label description: toy
	Segment 0: 0.0s to 5.5s
	Confidence: 0.9247158169746399


Shot label description: lego
	Label category description: toy
	Segment 0: 0.0s to 5.5s
	Confidence: 0.9257180094718933


Shot label description: robot
	Label category description: technology
	Label category description: machine
	Segment 0: 0.0s to 5.5s
	Confidence: 0.32479360699653625


Shot label description: toy
	Segment 0: 0.0s to 5.5s
	Confidence: 0.9247158169746399



Processing video for shot change annotations:

Finished processing.
	Shot 0: 0.0 to 5.5

Processing video for explicit content annotations:

Finished processing.
Time: 0.070218s
	pornography: Very unlikely
Time: 1.262424s
	pornography: Very unlikely
Time: 2.265889s
	pornography: Very unlikely
Time: 3.2359999999999998s
	pornography: Very unlikely
Time: 4.288838s
	pornography: Very unlikely
Time: 5.358866s
	pornography: Very unlikely

Processing video for speech transcription.
Transcript: Hi everyone. It's great to meet you in intelligent process automation course.
Confidence: 0.8206615447998047

Word level information:
	0.0s - 0.30000000000000004s: Hi
	0.30000000000000004s - 0.8s: everyone.
	0.8s - 1.6s: It's
	1.6s - 1.7000000000000002s: great
	1.7000000000000002s - 2.1s: to
	2.1s - 2.2s: meet
	2.2s - 2.5s: you
	2.5s - 2.6s: in
	2.6s - 3.5s: intelligent
	3.5s - 4.1s: process
	4.1s - 4.7s: automation
	4.7s - 5.4s: course.






    Out[37]:





'[ Video 视频处理结果 ]\n\n[ didi_video_label_detection 识别视频消息中的物体名字 ]\n\n([entity {\n  entity_id: "/m/04ndr"\n  description: "lego"\n  language_code: "en-US"\n}\ncategory_entities {\n  entity_id: "/m/0138tl"\n  description: "toy"\n  language_code: "en-US"\n}\nsegments {\n  segment {\n    start_time_offset {\n    }\n    end_time_offset {\n      seconds: 5\n      nanos: 500000000\n    }\n  }\n  confidence: 0.9257180094718933\n}\n, entity {\n  entity_id: "/m/06fgw"\n  description: "robot"\n  language_code: "en-US"\n}\ncategory_entities {\n  entity_id: "/m/07c1v"\n  description: "technology"\n  language_code: "en-US"\n}\ncategory_entities {\n  entity_id: "/m/0dkw5"\n  description: "machine"\n  language_code: "en-US"\n}\nsegments {\n  segment {\n    start_time_offset {\n    }\n    end_time_offset {\n      seconds: 5\n      nanos: 500000000\n    }\n  }\n  confidence: 0.32479360699653625\n}\n, entity {\n  entity_id: "/m/0138tl"\n  description: "toy"\n  language_code: "en-US"\n}\nsegments {\n  segment {\n    start_time_offset {\n    }\n    end_time_offset {\n      seconds: 5\n      nanos: 500000000\n    }\n  }\n  confidence: 0.9247158169746399\n}\n], [entity {\n  entity_id: "/m/04ndr"\n  description: "lego"\n  language_code: "en-US"\n}\ncategory_entities {\n  entity_id: "/m/0138tl"\n  description: "toy"\n  language_code: "en-US"\n}\nsegments {\n  segment {\n    start_time_offset {\n    }\n    end_time_offset {\n      seconds: 5\n      nanos: 500000000\n    }\n  }\n  confidence: 0.9257180094718933\n}\n, entity {\n  entity_id: "/m/06fgw"\n  description: "robot"\n  language_code: "en-US"\n}\ncategory_entities {\n  entity_id: "/m/07c1v"\n  description: "technology"\n  language_code: "en-US"\n}\ncategory_entities {\n  entity_id: "/m/0dkw5"\n  description: "machine"\n  language_code: "en-US"\n}\nsegments {\n  segment {\n    start_time_offset {\n    }\n    end_time_offset {\n      seconds: 5\n      nanos: 500000000\n    }\n  }\n  confidence: 0.32479360699653625\n}\n, entity {\n  entity_id: "/m/0138tl"\n  description: "toy"\n  language_code: "en-US"\n}\nsegments {\n  segment {\n    start_time_offset {\n    }\n    end_time_offset {\n      seconds: 5\n      nanos: 500000000\n    }\n  }\n  confidence: 0.9247158169746399\n}\n], [])\n\n[ didi_video_shot_detection 识别视频的场景片段 ]\n\nannotation_results {\n  shot_annotations {\n    start_time_offset {\n    }\n    end_time_offset {\n      seconds: 5\n      nanos: 500000000\n    }\n  }\n}\n\n\n[ didi_video_safesearch_detection 识别受限内容 ]\n\nannotation_results {\n  explicit_annotation {\n    frames {\n      time_offset {\n        nanos: 70218000\n      }\n      pornography_likelihood: VERY_UNLIKELY\n    }\n    frames {\n      time_offset {\n        seconds: 1\n        nanos: 262424000\n      }\n      pornography_likelihood: VERY_UNLIKELY\n    }\n    frames {\n      time_offset {\n        seconds: 2\n        nanos: 265889000\n      }\n      pornography_likelihood: VERY_UNLIKELY\n    }\n    frames {\n      time_offset {\n        seconds: 3\n        nanos: 236000000\n      }\n      pornography_likelihood: VERY_UNLIKELY\n    }\n    frames {\n      time_offset {\n        seconds: 4\n        nanos: 288838000\n      }\n      pornography_likelihood: VERY_UNLIKELY\n    }\n    frames {\n      time_offset {\n        seconds: 5\n        nanos: 358866000\n      }\n      pornography_likelihood: VERY_UNLIKELY\n    }\n  }\n}\n\n\n[ didi_video_speech_transcription 生成视频字幕 ]\n\nannotation_results {\n  speech_transcriptions {\n    alternatives {\n      transcript: "Hi everyone. It\\\'s great to meet you in intelligent process automation course."\n      confidence: 0.8206615447998047\n      words {\n        start_time {\n        }\n        end_time {\n          nanos: 300000000\n        }\n        word: "Hi"\n      }\n      words {\n        start_time {\n          nanos: 300000000\n        }\n        end_time {\n          nanos: 800000000\n        }\n        word: "everyone."\n      }\n      words {\n        start_time {\n          nanos: 800000000\n        }\n        end_time {\n          seconds: 1\n          nanos: 600000000\n        }\n        word: "It\\\'s"\n      }\n      words {\n        start_time {\n          seconds: 1\n          nanos: 600000000\n        }\n        end_time {\n          seconds: 1\n          nanos: 700000000\n        }\n        word: "great"\n      }\n      words {\n        start_time {\n          seconds: 1\n          nanos: 700000000\n        }\n        end_time {\n          seconds: 2\n          nanos: 100000000\n        }\n        word: "to"\n      }\n      words {\n        start_time {\n          seconds: 2\n          nanos: 100000000\n        }\n        end_time {\n          seconds: 2\n          nanos: 200000000\n        }\n        word: "meet"\n      }\n      words {\n        start_time {\n          seconds: 2\n          nanos: 200000000\n        }\n        end_time {\n          seconds: 2\n          nanos: 500000000\n        }\n        word: "you"\n      }\n      words {\n        start_time {\n          seconds: 2\n          nanos: 500000000\n        }\n        end_time {\n          seconds: 2\n          nanos: 600000000\n        }\n        word: "in"\n      }\n      words {\n        start_time {\n          seconds: 2\n          nanos: 600000000\n        }\n        end_time {\n          seconds: 3\n          nanos: 500000000\n        }\n        word: "intelligent"\n      }\n      words {\n        start_time {\n          seconds: 3\n          nanos: 500000000\n        }\n        end_time {\n          seconds: 4\n          nanos: 100000000\n        }\n        word: "process"\n      }\n      words {\n        start_time {\n          seconds: 4\n          nanos: 100000000\n        }\n        end_time {\n          seconds: 4\n          nanos: 700000000\n        }\n        word: "automation"\n      }\n      words {\n        start_time {\n          seconds: 4\n          nanos: 700000000\n        }\n        end_time {\n          seconds: 5\n          nanos: 400000000\n        }\n        word: "course."\n      }\n    }\n  }\n}\n\n\n'

Define a global variable for future 'video search' function enhancement



In [38]:

    
parm_video_response = {} # Define a global variable for future 'video search' function enhancement

Start interactive conversational virtual assistant (VA):

Import ItChat, etc. 导入需要用到的一些功能程序库：



In [39]:

    
import itchat
from itchat.content import *

Log in using QR code image / 用微信App扫QR码图片来自动登录



In [43]:

    
# itchat.auto_login(hotReload=True) # hotReload=True: 退出程序后暂存登陆状态。即使程序关闭，一定时间内重新开启也可以不用重新扫码。
itchat.auto_login(enableCmdQR=-2) # enableCmdQR=-2: 命令行显示QR图片









    



Getting uuid of QR code.
Downloading QR code.






    



                                                                              
  ██████████████    ██      ████      ██  ████    ██  ██      ██████████████  
  ██          ██    ██        ██        ██  ██      ██        ██          ██  
  ██  ██████  ██  ██████████    ████  ██    ████    ██  ████  ██  ██████  ██  
  ██  ██████  ██  ██  ████    ████  ██  ██████      ████████  ██  ██████  ██  
  ██  ██████  ██    ██    ████    ██  ████      ██  ████      ██  ██████  ██  
  ██          ██      ██        ██  ██  ██████  ██    ██      ██          ██  
  ██████████████  ██  ██  ██  ██  ██  ██  ██  ██  ██  ██  ██  ██████████████  
                    ██  ██  ████████    ████  ██  ████  ████                  
        ████  ████  ████    ██    ██    ██      ████                ████      
            ██  ████              ██      ████████  ██  ████  ██████    ██    
    ██        ██  ██  ██████  ██  ██  ██    ██████  ██  ████████  ██      ██  
    ████    ██  ██      ██  ██  ██  ██████  ██  ████  ████  ██████  ████  ██  
  ██    ████  ████████████            ██████      ██    ████  ████  ████████  
      ██    ██  ██████████████  ████    ████  ██████  ████████    ██  ██      
      ██      ████    ████    ██  ██████  ████    ████  ██  ████  ██████████  
  ██  ████████    ██      ██      ██████  ██  ████      ██  ██    ████████    
        ██  ██████  ████    ██████        ██████    ██    ██  ████    ██  ██  
    ██████        ██████  ████████              ██████  ██  ██  ██      ██    
  ██████  ██  ████████  ██    ████  ██████    ██  ██    ██      ██    ██  ██  
  ██    ██      ██  ██  ██      ██  ██████        ████      ██      ██        
        ██  ██████  ████  ██  ██████    ██  ██  ████████  ██████  ████        
    ██              ██████  ████████          ██  ██████████    ████  ██  ██  
  ██  ██  ██  ██    ██  ████    ██        ██  ██  ██    ██        ██      ██  
  ██  ████████    ████    ██████████    ████    ████  ██████    ██    ██  ██  
  ██  ████  ██████        ██████  ██  ██████        ████      ██    ██  ██    
  ██  ██  ██      ████████      ████    ████████  ██  ██████    ██████  ██    
  ████    ████████          ██  ████████████  ██    ██████    ██    ████████  
  ██  ████            ██  ██      ██        ████  ██  ██  ████    ██  ██      
  ██      ██████████  ████    ████  ██  ████  ████  ██    ██████████  ██      
                  ██    ████      ████      ██    ████    ██      ████████    
  ██████████████  ██  ██████  ██                ████      ██  ██  ██████  ██  
  ██          ██      ██    ████            ██  ████████  ██      ████  ██    
  ██  ██████  ██  ██  ██  ██████████  ██        ████    ████████████    ██    
  ██  ██████  ██  ██  ██            ██    ██          ██████  ██    ██        
  ██  ██████  ██    ██        ██  ████            ██  ██  ██    ████████████  
  ██          ██      ████    ██      ██████████      ██      ██  ██  ██████  
  ██████████████    ██    ██████  ████    ██  ██    ██    ██  ██    ██    ██  
                                                                              






    



Please scan the QR code to log in.
Please press confirm on your phone.
Loading the contact, this may take a little while.
Login successfully as 白黑



In [44]:

    
# @itchat.msg_register([VIDEO], isGroupChat=True)
@itchat.msg_register([VIDEO])
def download_files(msg):
    msg.download(msg.fileName)
    print('\nDownloaded video file name is: %s' % msg['FileName'])
    
    ##############################################################################################################
    #                                          call video analysis APIs                                          #
    ##############################################################################################################
    global parm_video_response # save into global variable, which can be accessed by next WeChat keyword search
    
    # python 2 version WeChat Bot
    #  parm_video_response = KudosData_VIDEO_DETECTION(encode_media(msg['FileName']))
    
    # python 3 version WeChat Bot
    parm_video_response = didi_video_processing(msg['FileName'])

    
    ##############################################################################################################
    #                                          format video API results                                          #
    ##############################################################################################################
    
    # python 2 version WeChat Bot
    # video_analysis_reply = KudosData_video_generate_reply(parm_video_response)

    # python 3 version WeChat Bot
    video_analysis_reply = parm_video_response # Exercise / Workshop Enhancement: To pase and format result nicely.
    
    
    print ('')
    print(video_analysis_reply)
    return video_analysis_reply



In [45]:

    
itchat.run()









    



Start auto replying.






    



Downloaded video file name is: 181028-160905.mp4

Processing video for label annotations:

Finished processing.
Video label description: fisheye lens
	Label category description: photography
	Segment 0: 0.0s to 5.675675s
	Confidence: 0.919499397277832


Video label description: toy
	Segment 0: 0.0s to 5.675675s
	Confidence: 0.41455480456352234


Shot label description: fisheye lens
	Label category description: photography
	Segment 0: 0.0s to 5.675675s
	Confidence: 0.9222605228424072


Shot label description: toy
	Segment 0: 0.0s to 5.675675s
	Confidence: 0.34078821539878845



Processing video for shot change annotations:

Finished processing.
	Shot 0: 0.0 to 5.675675

Processing video for explicit content annotations:

Finished processing.
Time: 0.204446s
	pornography: Very unlikely
Time: 1.22106s
	pornography: Very unlikely
Time: 2.311796s
	pornography: Very unlikely
Time: 3.504806s
	pornography: Very unlikely
Time: 4.3297229999999995s
	pornography: Very unlikely
Time: 5.282879s
	pornography: Very unlikely

Processing video for speech transcription.

NOT FOUND: video for speech transcription.

[ Video 视频处理结果 ]

[ didi_video_label_detection ]

([entity {
  entity_id: "/m/03bdrd"
  description: "fisheye lens"
  language_code: "en-US"
}
category_entities {
  entity_id: "/m/05wkw"
  description: "photography"
  language_code: "en-US"
}
segments {
  segment {
    start_time_offset {
    }
    end_time_offset {
      seconds: 5
      nanos: 675675000
    }
  }
  confidence: 0.919499397277832
}
, entity {
  entity_id: "/m/0138tl"
  description: "toy"
  language_code: "en-US"
}
segments {
  segment {
    start_time_offset {
    }
    end_time_offset {
      seconds: 5
      nanos: 675675000
    }
  }
  confidence: 0.41455480456352234
}
], [entity {
  entity_id: "/m/03bdrd"
  description: "fisheye lens"
  language_code: "en-US"
}
category_entities {
  entity_id: "/m/05wkw"
  description: "photography"
  language_code: "en-US"
}
segments {
  segment {
    start_time_offset {
    }
    end_time_offset {
      seconds: 5
      nanos: 675675000
    }
  }
  confidence: 0.9222605228424072
}
, entity {
  entity_id: "/m/0138tl"
  description: "toy"
  language_code: "en-US"
}
segments {
  segment {
    start_time_offset {
    }
    end_time_offset {
      seconds: 5
      nanos: 675675000
    }
  }
  confidence: 0.34078821539878845
}
], [])

[ didi_video_shot_detection ]

annotation_results {
  shot_annotations {
    start_time_offset {
    }
    end_time_offset {
      seconds: 5
      nanos: 675675000
    }
  }
}


[ didi_video_safesearch_detection ]

annotation_results {
  explicit_annotation {
    frames {
      time_offset {
        nanos: 204446000
      }
      pornography_likelihood: VERY_UNLIKELY
    }
    frames {
      time_offset {
        seconds: 1
        nanos: 221060000
      }
      pornography_likelihood: VERY_UNLIKELY
    }
    frames {
      time_offset {
        seconds: 2
        nanos: 311796000
      }
      pornography_likelihood: VERY_UNLIKELY
    }
    frames {
      time_offset {
        seconds: 3
        nanos: 504806000
      }
      pornography_likelihood: VERY_UNLIKELY
    }
    frames {
      time_offset {
        seconds: 4
        nanos: 329723000
      }
      pornography_likelihood: VERY_UNLIKELY
    }
    frames {
      time_offset {
        seconds: 5
        nanos: 282879000
      }
      pornography_likelihood: VERY_UNLIKELY
    }
  }
}


[ didi_video_speech_transcription ]

annotation_results {
  speech_transcriptions {
  }
}









    



Bye~



In [47]:

    
# interupt kernel, then logout
itchat.logout() # 安全退出









    Out[47]:





<ItchatReturnValue: {'BaseResponse': {'ErrMsg': '请求成功', 'Ret': 0, 'RawMsg': 'logout successfully.'}}>






    



LOG OUT!

Exercise / Workshop Enhancement:

[提问 1] 使用文字来搜索视频内容？需要怎么处理？ [Question 1] Can we use text (keywords) as input to search video content? How?

[提问 2] 使用图片来搜索视频内容？需要怎么处理？ [Question 2] Can we use an image as input to search video content? How?



In [ ]:

    
'''

# Private conversational mode / 单聊模式，基于关键词进行视频搜索：
@itchat.msg_register([TEXT])
def text_reply(msg):
#     if msg['isAt']:
        list_keywords = [x.strip() for x in msg['Text'].split(',')]
        # call video search function:
        search_responses = KudosData_search(list_keywords) # return is a list
        # Format search results:
        search_reply = u'[ Video Search 视频搜索结果 ]' + '\n'
        if len(search_responses) == 0:
            search_reply += u'[ Nill 无结果 ]'
        else:
            for i in range(len(search_responses)): search_reply += '\n' + str(search_responses[i])
        print ('')
        print (search_reply)
        return search_reply
    
    '''



In [ ]:

    
'''

# Group conversational mode / 群聊模式，基于关键词进行视频搜索：
@itchat.msg_register([TEXT], isGroupChat=True)
def text_reply(msg):
    if msg['isAt']:
        list_keywords = [x.strip() for x in msg['Text'].split(',')]
        # call video search function:
        search_responses = KudosData_search(list_keywords) # return is a list
        # Format search results:
        search_reply = u'[ Video Search 视频搜索结果 ]' + '\n'
        if len(search_responses) == 0:
            search_reply += u'[ Nill 无结果 ]'
        else:
            for i in range(len(search_responses)): search_reply += '\n' + str(search_responses[i])
        print ('')
        print (search_reply)
        return search_reply
    
    '''

恭喜您！已经完成了：

第五课：视频识别和处理

Lesson 5: Video Recognition & Processing

识别视频消息中的物体名字 (Label Detection: Detect entities within the video, such as "dog", "flower" or "car")
识别视频的场景片段 (Shot Change Detection: Detect scene changes within the video)
识别受限内容 (Explicit Content Detection: Detect adult content within a video)
生成视频字幕 (Video Transcription BETA: Transcribes video content in English)

下一课是:

第六课：交互式虚拟助手的智能应用

Lesson 6: Interactive Conversatioinal Virtual Assistant Applications / Intelligent Process Automations

虚拟员工: 贷款填表申请审批一条龙自动化流程（Virtual Worker: When Chat-bot meets RPA-bot for mortgage loan application automation)
虚拟员工: 文字指令交互（Conversational automation using text/message command)
虚拟员工: 语音指令交互（Conversational automation using speech/voice command)
虚拟员工: 多种语言交互（Conversational automation with multiple languages)

如何使用和开发微信聊天机器人的系列教程

A workshop to develop & use an intelligent and interactive chat-bot in WeChat

WeChat is a popular social media app, which has more than 800 million monthly active users.

第五课：视频识别和处理

Lesson 5: Video Recognition & Processing

Using Google Cloud Platform's Machine Learning APIs

短片预览 / Video viewing

Install the client library for Video Intelligence / Processing

* 识别视频消息中的物体名字 (Label Detection: Detect entities within the video, such as "dog", "flower" or "car")

* 识别视频的场景片段 (Shot Change Detection: Detect scene changes within the video)

* 识别受限内容 (Explicit Content Detection: Detect adult content within a video)

[ Beta Features ] * 生成视频字幕 (Video Transcription BETA: Transcribes video content in English)

Wrap cloud APIs into Functions() for conversational virtual assistant (VA):

Start interactive conversational virtual assistant (VA):

Import ItChat, etc. 导入需要用到的一些功能程序库：

Log in using QR code image / 用微信App扫QR码图片来自动登录

Exercise / Workshop Enhancement:

恭喜您！已经完成了：

第五课：视频识别和处理

Lesson 5: Video Recognition & Processing

下一课是:

第六课：交互式虚拟助手的智能应用

Lesson 6: Interactive Conversatioinal Virtual Assistant Applications / Intelligent Process Automations