In [1]:
from IPython.display import YouTubeVideo
YouTubeVideo('FPgo-hI7OiE')


Out[1]:

如何使用和开发微信聊天机器人的系列教程

A workshop to develop & use an intelligent and interactive chat-bot in WeChat

http://www.KudosData.com

by: Sam.Gu@KudosData.com

May 2017 ========== Scan the QR code to become trainer's friend in WeChat ========>>

第二课:图像识别和处理

Lesson 2: Image Recognition & Processing

  • 识别图片消息中的物体名字 (Recognize objects in image)

      [1] 物体名 (General Object)
      [2] 地标名 (Landmark Object)
      [3] 商标名 (Logo Object)
  • 识别图片消息中的文字 (OCR: Extract text from image)

      包含简单文本翻译 (Call text translation API)
  • 识别人脸 (Recognize human face)

      基于人脸的表情来识别喜怒哀乐等情绪 (Identify sentiment and emotion from human face)
  • 不良内容识别 (Explicit Content Detection)

Using Google Cloud Platform's Machine Learning APIs

From the same API console, choose "Dashboard" on the left-hand menu and "Enable API".

Enable the following APIs for your project (search for them) if they are not already enabled:

  1. Google Translate API
  2. Google Cloud Vision API
  3. Google Natural Language API
  4. Google Cloud Speech API

Finally, because we are calling the APIs from Python (clients in many other languages are available), let's install the Python package (it's not installed by default on Datalab)


In [ ]:
# Copyright 2016 Google Inc.
# Licensed under the Apache License, Version 2.0 (the "License"); 
# !pip install --upgrade google-api-python-client

导入需要用到的一些功能程序库:


In [1]:
import io, os, subprocess, sys, time, datetime, requests, itchat
from itchat.content import *
from googleapiclient.discovery import build


Using Google Cloud Platform's Machine Learning APIs

First, visit API console, choose "Credentials" on the left-hand menu. Choose "Create Credentials" and generate an API key for your application. You should probably restrict it by IP address to prevent abuse, but for now, just leave that field blank and delete the API key after trying out this demo.

Copy-paste your API Key here:


In [2]:
# Here I read in my own API_KEY from a file, which is not shared in Github repository:
# with io.open('../../API_KEY.txt') as fp: 
#     for line in fp: APIKEY = line

# You need to un-comment below line and replace 'APIKEY' variable with your own GCP API key:
APIKEY='AIzaSyCvxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'

In [3]:
# Below is for GCP Language Tranlation API
service = build('translate', 'v2', developerKey=APIKEY)

图片二进制base64码转换 (Define image pre-processing functions)


In [4]:
# Import the base64 encoding library.
import base64
# Pass the image data to an encoding function.
def encode_image(image_file):
    with open(image_file, "rb") as image_file:
        image_content = image_file.read()
    return base64.b64encode(image_content)

机器智能API接口控制参数 (Define control parameters for API)


In [5]:
# control parameter for Image API:
parm_image_maxResults = 10 # max objects or faces to be extracted from image analysis

# control parameter for Language Translation API:
parm_translation_origin_language = '' # original language in text: to be overwriten by TEXT_DETECTION
parm_translation_target_language = 'zh' # target language for translation: Chinese

* 识别图片消息中的物体名字 (Recognize objects in image)

[1] 物体名 (General Object)

In [6]:
# Running Vision API
# 'LABEL_DETECTION'
def KudosData_LABEL_DETECTION(image_base64, API_type, maxResults):
    vservice = build('vision', 'v1', developerKey=APIKEY)
    request = vservice.images().annotate(body={
        'requests': [{
                'image': {
#                     'source': {
#                         'gcs_image_uri': IMAGE
#                     }
                      "content": image_base64
                },
                'features': [{
                    'type': API_type,
                    'maxResults': maxResults,
                }]
            }],
        })
    responses = request.execute(num_retries=3)
    image_analysis_reply = u'\n[ ' + API_type + u' 物体识别 ]\n'
    # 'LABEL_DETECTION'
    if responses['responses'][0] != {}:
        for i in range(len(responses['responses'][0]['labelAnnotations'])):
            image_analysis_reply += responses['responses'][0]['labelAnnotations'][i]['description'] \
            + '\n( confidence ' +  str(responses['responses'][0]['labelAnnotations'][i]['score']) + ' )\n'
    else:
        image_analysis_reply += u'[ Nill 无结果 ]\n'
        
    return image_analysis_reply

* 识别图片消息中的物体名字 (Recognize objects in image)

[2] 地标名 (Landmark Object)

In [7]:
# Running Vision API
# 'LANDMARK_DETECTION'
def KudosData_LANDMARK_DETECTION(image_base64, API_type, maxResults):
    vservice = build('vision', 'v1', developerKey=APIKEY)
    request = vservice.images().annotate(body={
        'requests': [{
                'image': {
#                     'source': {
#                         'gcs_image_uri': IMAGE
#                     }
                      "content": image_base64
                },
                'features': [{
                    'type': API_type,
                    'maxResults': maxResults,
                }]
            }],
        })
    responses = request.execute(num_retries=3)
    image_analysis_reply = u'\n[ ' + API_type + u' 地标识别 ]\n'
    # 'LANDMARK_DETECTION'
    if responses['responses'][0] != {}:
        for i in range(len(responses['responses'][0]['landmarkAnnotations'])):
            image_analysis_reply += responses['responses'][0]['landmarkAnnotations'][i]['description'] \
            + '\n( confidence ' +  str(responses['responses'][0]['landmarkAnnotations'][i]['score']) + ' )\n'
    else:
        image_analysis_reply += u'[ Nill 无结果 ]\n'
        
    return image_analysis_reply

* 识别图片消息中的物体名字 (Recognize objects in image)

[3] 商标名 (Logo Object)

In [8]:
# Running Vision API
# 'LOGO_DETECTION'
def KudosData_LOGO_DETECTION(image_base64, API_type, maxResults):
    vservice = build('vision', 'v1', developerKey=APIKEY)
    request = vservice.images().annotate(body={
        'requests': [{
                'image': {
#                     'source': {
#                         'gcs_image_uri': IMAGE
#                     }
                      "content": image_base64
                },
                'features': [{
                    'type': API_type,
                    'maxResults': maxResults,
                }]
            }],
        })
    responses = request.execute(num_retries=3)
    image_analysis_reply = u'\n[ ' + API_type + u' 商标识别 ]\n'
    # 'LOGO_DETECTION'
    if responses['responses'][0] != {}:
        for i in range(len(responses['responses'][0]['logoAnnotations'])):
            image_analysis_reply += responses['responses'][0]['logoAnnotations'][i]['description'] \
            + '\n( confidence ' +  str(responses['responses'][0]['logoAnnotations'][i]['score']) + ' )\n'
    else:
        image_analysis_reply += u'[ Nill 无结果 ]\n'
        
    return image_analysis_reply

* 识别图片消息中的文字 (OCR: Extract text from image)


In [9]:
# Running Vision API
# 'TEXT_DETECTION'
def KudosData_TEXT_DETECTION(image_base64, API_type, maxResults):
    vservice = build('vision', 'v1', developerKey=APIKEY)
    request = vservice.images().annotate(body={
        'requests': [{
                'image': {
#                     'source': {
#                         'gcs_image_uri': IMAGE
#                     }
                      "content": image_base64
                },
                'features': [{
                    'type': API_type,
                    'maxResults': maxResults,
                }]
            }],
        })
    responses = request.execute(num_retries=3)
    image_analysis_reply = u'\n[ ' + API_type + u' 文字提取 ]\n'
    # 'TEXT_DETECTION'
    if responses['responses'][0] != {}:
        image_analysis_reply += u'----- Start Original Text -----\n'
        image_analysis_reply += u'( Original Language 原文: ' + responses['responses'][0]['textAnnotations'][0]['locale'] \
        + ' )\n'        
        image_analysis_reply += responses['responses'][0]['textAnnotations'][0]['description'] + '----- End Original Text -----\n'

        ##############################################################################################################
        #                                        translation of detected text                                        #
        ##############################################################################################################
        parm_translation_origin_language = responses['responses'][0]['textAnnotations'][0]['locale']
        # Call translation if parm_translation_origin_language is not parm_translation_target_language
        if parm_translation_origin_language != parm_translation_target_language:
            inputs=[responses['responses'][0]['textAnnotations'][0]['description']] # TEXT_DETECTION OCR results only
            outputs = service.translations().list(source=parm_translation_origin_language, 
                                                  target=parm_translation_target_language, q=inputs).execute()
            image_analysis_reply += u'\n----- Start Translation -----\n'
            image_analysis_reply += u'( Target Language 译文: ' + parm_translation_target_language + ' )\n'
            image_analysis_reply += outputs['translations'][0]['translatedText'] + '\n' + '----- End Translation -----\n'
            print('Compeleted: Translation    API ...')
        ##############################################################################################################
    else:
        image_analysis_reply += u'[ Nill 无结果 ]\n'
        
    return image_analysis_reply

* 识别人脸 (Recognize human face)

* 基于人脸的表情来识别喜怒哀乐等情绪 (Identify sentiment and emotion from human face)


In [10]:
# Running Vision API
# 'FACE_DETECTION'
def KudosData_FACE_DETECTION(image_base64, API_type, maxResults):
    vservice = build('vision', 'v1', developerKey=APIKEY)
    request = vservice.images().annotate(body={
        'requests': [{
                'image': {
#                     'source': {
#                         'gcs_image_uri': IMAGE
#                     }
                      "content": image_base64
                },
                'features': [{
                    'type': API_type,
                    'maxResults': maxResults,
                }]
            }],
        })
    responses = request.execute(num_retries=3)
    image_analysis_reply = u'\n[ ' + API_type + u' 面部表情 ]\n'
    # 'FACE_DETECTION'
    if responses['responses'][0] != {}:
        for i in range(len(responses['responses'][0]['faceAnnotations'])):
            image_analysis_reply += u'----- No.' + str(i+1) + ' Face -----\n'
            
            image_analysis_reply += u'>>> Joy 喜悦: \n' \
            + responses['responses'][0]['faceAnnotations'][i][u'joyLikelihood'] + '\n'
            
            image_analysis_reply += u'>>> Anger 愤怒: \n' \
            + responses['responses'][0]['faceAnnotations'][i][u'angerLikelihood'] + '\n'
            
            image_analysis_reply += u'>>> Sorrow 悲伤: \n' \
            + responses['responses'][0]['faceAnnotations'][i][u'sorrowLikelihood'] + '\n'
            
            image_analysis_reply += u'>>> Surprise 惊奇: \n' \
            + responses['responses'][0]['faceAnnotations'][i][u'surpriseLikelihood'] + '\n'
            
            image_analysis_reply += u'>>> Headwear 头饰: \n' \
            + responses['responses'][0]['faceAnnotations'][i][u'headwearLikelihood'] + '\n'
            
            image_analysis_reply += u'>>> Blurred 模糊: \n' \
            + responses['responses'][0]['faceAnnotations'][i][u'blurredLikelihood'] + '\n'
            
            image_analysis_reply += u'>>> UnderExposed 欠曝光: \n' \
            + responses['responses'][0]['faceAnnotations'][i][u'underExposedLikelihood'] + '\n'
    else:
        image_analysis_reply += u'[ Nill 无结果 ]\n'
            
    return image_analysis_reply

* 不良内容识别 (Explicit Content Detection)

Detect explicit content like adult content or violent content within an image.


In [11]:
# Running Vision API
# 'SAFE_SEARCH_DETECTION'
def KudosData_SAFE_SEARCH_DETECTION(image_base64, API_type, maxResults):
    vservice = build('vision', 'v1', developerKey=APIKEY)
    request = vservice.images().annotate(body={
        'requests': [{
                'image': {
#                     'source': {
#                         'gcs_image_uri': IMAGE
#                     }
                      "content": image_base64
                },
                'features': [{
                    'type': API_type,
                    'maxResults': maxResults,
                }]
            }],
        })
    responses = request.execute(num_retries=3)
    image_analysis_reply = u'\n[ ' + API_type + u' 不良内容 ]\n'
    # 'SAFE_SEARCH_DETECTION'
    if responses['responses'][0] != {}:
        image_analysis_reply += u'>>> Adult 成人: \n' + responses['responses'][0]['safeSearchAnnotation'][u'adult'] + '\n'
        image_analysis_reply += u'>>> Violence 暴力: \n' + responses['responses'][0]['safeSearchAnnotation'][u'violence'] + '\n'
        image_analysis_reply += u'>>> Spoof 欺骗: \n' + responses['responses'][0]['safeSearchAnnotation'][u'spoof'] + '\n'
        image_analysis_reply += u'>>> Medical 医疗: \n' + responses['responses'][0]['safeSearchAnnotation'][u'medical'] + '\n'
    else:
        image_analysis_reply += u'[ Nill 无结果 ]\n'
    return image_analysis_reply

用微信App扫QR码图片来自动登录


In [12]:
itchat.auto_login(hotReload=True) # hotReload=True: 退出程序后暂存登陆状态。即使程序关闭,一定时间内重新开启也可以不用重新扫码。
# itchat.auto_login(enableCmdQR=-2) # enableCmdQR=-2: 命令行显示QR图片


Getting uuid of QR code.
Downloading QR code.
Please scan the QR code to log in.
www.KudosData.com : QR.png
Please press confirm on your phone.
Loading the contact, this may take a little while.
Login successfully as 酷豆陪聊妹

In [13]:
# @itchat.msg_register([PICTURE], isGroupChat=True)
@itchat.msg_register([PICTURE])
def download_files(msg):
    parm_translation_origin_language = 'zh' # will be overwriten by TEXT_DETECTION
    msg.download(msg.fileName)
    print('\nDownloaded image file name is: %s' % msg['FileName'])
    image_base64 = encode_image(msg['FileName'])
    
    ##############################################################################################################
    #                                          call image analysis APIs                                          #
    ##############################################################################################################
    
    image_analysis_reply = u'[ Image Analysis 图像分析结果 ]\n'

    # 1. LABEL_DETECTION:
    image_analysis_reply += KudosData_LABEL_DETECTION(image_base64, 'LABEL_DETECTION', parm_image_maxResults)
    # 2. LANDMARK_DETECTION:
    image_analysis_reply += KudosData_LANDMARK_DETECTION(image_base64, 'LANDMARK_DETECTION', parm_image_maxResults)
    # 3. LOGO_DETECTION:
    image_analysis_reply += KudosData_LOGO_DETECTION(image_base64, 'LOGO_DETECTION', parm_image_maxResults)
    # 4. TEXT_DETECTION:
    image_analysis_reply += KudosData_TEXT_DETECTION(image_base64, 'TEXT_DETECTION', parm_image_maxResults)
    # 5. FACE_DETECTION:
    image_analysis_reply += KudosData_FACE_DETECTION(image_base64, 'FACE_DETECTION', parm_image_maxResults)
    # 6. SAFE_SEARCH_DETECTION:
    image_analysis_reply += KudosData_SAFE_SEARCH_DETECTION(image_base64, 'SAFE_SEARCH_DETECTION', parm_image_maxResults)

    print('Compeleted: Image Analysis API ...')
    
    return image_analysis_reply

In [14]:
itchat.run()


Start auto replying.
Downloaded image file name is: 170518-044004.png
Compeleted: Translation    API ...
Compeleted: Image Analysis API ...
Bye~

In [15]:
# interupt kernel, then logout
itchat.logout() # 安全退出


Out[15]:
<ItchatReturnValue: {'BaseResponse': {'Ret': 0, 'ErrMsg': u'\u8bf7\u6c42\u6210\u529f', 'RawMsg': 'logout successfully.'}}>
LOG OUT!

恭喜您!已经完成了:

第二课:图像识别和处理

Lesson 2: Image Recognition & Processing

  • 识别图片消息中的物体名字 (Recognize objects in image)

      [1] 物体名 (General Object)
      [2] 地标名 (Landmark Object)
      [3] 商标名 (Logo Object)
  • 识别图片消息中的文字 (OCR: Extract text from image)

      包含简单文本翻译 (Call text translation API)
  • 识别人脸 (Recognize human face)

      基于人脸的表情来识别喜怒哀乐等情绪 (Identify sentiment and emotion from human face)
  • 不良内容识别 (Explicit Content Detection)

下一课是:

第三课:自然语言处理:语音合成和识别

Lesson 3: Natural Language Processing 1

  • 消息文字转成语音 (Speech synthesis: text to voice)
  • 语音转换成消息文字 (Speech recognition: voice to text)
  • 消息文字的多语言互译 (Text based language translation)