Lesson 47:

Checking your Email Inbox

Python can be used read emails.

The standard protocol for reading email is 'Internet Message Access Protocol' (IMAP). IMAP is a fairly complex protocol, and are basically as old as the internet itself; Python includes a module called imaplib, but its not very straightforward.

Instead we will the imapclient and pyzmail modules. Unfortunately, pyzmail is not available for Python 3.4, so we will be using the default email module to simulate is functionality.


In [1]:
import imapclient
import email

The first step in setting up email is creating a connection object again to interact with an email, but this time for the IMAP server.

This function takes a domain name, and an optional parameter asking if we are using SSL encrpytion (which is a variant of TLS discussed in Lesson 45). This domain name again depends on your email provider, an example list is available on Mr. Sweigarts blog in 16-2.


In [2]:
conn = imapclient.IMAPClient('imap.gmail.com', ssl=True)

We can now pass in our login paramaters via the .login() method.


In [4]:
# Real values were used in testing, and removed for Github
# Due to the nature of Gmail's security, you may have to allow access from 'less secure apps' (like this script)
# The setting can be changed here: https://www.google.com/settings/u/2/security/lesssecureapps

conn.login('youremail@gmail.com','yourpassword')


Out[4]:
b'vvk.mnn@gmail.com authenticated (Success)'

We get a byte string with a response if we login correctly.

Next we choose which email folder to access with the .select_folder method. You will typically want the 'Inbox' folder. We will also pass a readonly=True parameter to make sure we don't accidentally delete something. We can use the .list_folders() method to list all available folders, and switch the readonly parameter to false to actually edit messages.


In [31]:
conn.list_folders()


Out[31]:
[((b'\\HasNoChildren',), b'/', 'INBOX'),
 ((b'\\HasNoChildren',), b'/', 'Notes'),
 ((b'\\HasChildren', b'\\Noselect'), b'/', '[Gmail]'),
 ((b'\\All', b'\\HasNoChildren'), b'/', '[Gmail]/All Mail'),
 ((b'\\Drafts', b'\\HasNoChildren'), b'/', '[Gmail]/Drafts'),
 ((b'\\HasNoChildren', b'\\Important'), b'/', '[Gmail]/Important'),
 ((b'\\HasNoChildren', b'\\Sent'), b'/', '[Gmail]/Sent Mail'),
 ((b'\\HasNoChildren', b'\\Junk'), b'/', '[Gmail]/Spam'),
 ((b'\\Flagged', b'\\HasNoChildren'), b'/', '[Gmail]/Starred'),
 ((b'\\HasNoChildren', b'\\Trash'), b'/', '[Gmail]/Trash')]

In [5]:
conn.select_folder('INBOX', readonly=True)


Out[5]:
{b'EXISTS': 33841,
 b'FLAGS': (b'\\Answered',
  b'\\Flagged',
  b'\\Draft',
  b'\\Deleted',
  b'\\Seen',
  b'$Forwarded',
  b'$Junk',
  b'$NotJunk',
  b'$NotPhishing',
  b'$Phishing',
  b'Junk',
  b'JunkRecorded',
  b'NotJunk'),
 b'HIGHESTMODSEQ': 11092866,
 b'PERMANENTFLAGS': (),
 b'READ-ONLY': [b''],
 b'RECENT': 0,
 b'UIDNEXT': 88498,
 b'UIDVALIDITY': 2}

We get a series of byte strings, if we have connected correctly.

We can now search for specific emails. However, IMAP has its own search syntax, and requires special operators, and will return a list of UIDs (Unique IDs) describing each email. There are many available search keys, depending on the service, with more examples here in 16-3.


In [6]:
UIDs = conn.search('SINCE 26-May-2016')
print(UIDs)


[88130, 88131, 88132, 88133, 88134, 88135, 88136, 88137, 88138, 88139, 88140, 88141, 88142, 88143, 88144, 88145, 88146, 88147, 88148, 88149, 88150, 88151, 88152, 88153, 88154, 88155, 88156, 88157, 88158, 88159, 88160, 88161, 88162, 88163, 88164, 88165, 88166, 88167, 88168, 88169, 88170, 88171, 88172, 88173, 88174, 88175, 88176, 88177, 88178, 88179, 88180, 88181, 88182, 88183, 88184, 88185, 88186, 88187, 88188, 88189, 88190, 88191, 88192, 88193, 88194, 88195, 88196, 88197, 88198, 88199, 88200, 88201, 88202, 88203, 88204, 88205, 88206, 88207, 88208, 88209, 88210, 88211, 88212, 88213, 88214, 88215, 88216, 88217, 88218, 88219, 88220, 88221, 88222, 88223, 88224, 88225, 88226, 88227, 88228, 88229, 88230, 88231, 88232, 88233, 88234, 88235, 88236, 88237, 88238, 88239, 88240, 88241, 88242, 88243, 88244, 88245, 88246, 88247, 88248, 88249, 88250, 88251, 88252, 88253, 88254, 88255, 88256, 88257, 88258, 88259, 88260, 88261, 88262, 88263, 88264, 88265, 88266, 88267, 88268, 88269, 88270, 88271, 88273, 88274, 88275, 88276, 88277, 88278, 88279, 88280, 88281, 88282, 88283, 88284, 88285, 88286, 88287, 88288, 88289, 88290, 88291, 88292, 88293, 88294, 88295, 88296, 88297, 88298, 88299, 88300, 88301, 88302, 88303, 88304, 88305, 88306, 88307, 88308, 88309, 88310, 88311, 88312, 88313, 88314, 88315, 88316, 88317, 88318, 88319, 88320, 88321, 88322, 88323, 88324, 88325, 88326, 88327, 88328, 88329, 88330, 88331, 88332, 88333, 88334, 88335, 88336, 88337, 88338, 88339, 88340, 88341, 88342, 88343, 88344, 88345, 88346, 88347, 88348, 88349, 88350, 88351, 88352, 88353, 88354, 88355, 88356, 88357, 88358, 88359, 88360, 88361, 88362, 88363, 88364, 88365, 88366, 88367, 88368, 88369, 88370, 88371, 88372, 88373, 88374, 88375, 88376, 88377, 88378, 88379, 88380, 88381, 88382, 88383, 88384, 88385, 88386, 88387, 88388, 88389, 88390, 88391, 88392, 88393, 88394, 88395, 88396, 88397, 88398, 88399, 88400, 88401, 88402, 88403, 88404, 88405, 88406, 88407, 88408, 88409, 88410, 88411, 88412, 88413, 88414, 88415, 88416, 88417, 88418, 88419, 88420, 88421, 88422, 88423, 88424, 88425, 88426, 88427, 88428, 88429, 88430, 88431, 88432, 88433, 88434, 88435, 88436, 88437, 88438, 88439, 88440, 88441, 88442, 88443, 88444, 88445, 88446, 88447, 88448, 88449, 88450, 88451, 88452, 88453, 88454, 88455, 88456, 88457, 88458, 88459, 88460, 88461, 88462, 88463, 88464, 88465, 88466, 88467, 88468, 88469, 88470, 88471, 88472, 88473, 88474, 88475, 88476, 88477, 88478, 88479, 88480, 88481, 88482, 88483, 88484, 88485, 88486, 88487, 88488, 88489, 88490, 88491, 88492, 88493, 88494, 88495, 88496, 88497]

We can use the .delete_messages() methods to delete a list of UIDs. We won't be running it here.


In [ ]:
conn.delete_messages([88130, 88131, 88132])

In [7]:
# The following function is also available in this module to search Gmail for a line of text. 
# I found this faster than iterating over the set to find the Python test email from Lesson 46
conn.gmail_search('Subject: Python Test Email')


Out[7]:
[88177]

We now have to translate these UIDs into the actual emails, and we can do that using the .fetch() method.

It requires a list of UIDs to retrive, as well as list of what parts of the email you need; typically ['BODY[]','FLAGS'] which contains most of the information an average user might need.

Below is the email we sent as part of Lesson 46, which we will store in a variable.


In [8]:
rawMessage = conn.fetch([88177], ['BODY[]','FLAGS'])
print(rawMessage)


defaultdict(<class 'dict'>, {88177: {b'SEQ': 33522, b'FLAGS': (b'\\Seen',), b'BODY[]': b'Bcc: vvk.mnn@gmail.com\r\nReturn-Path: <vvk.mnn@gmail.com>\r\nReceived: from TLAVivekM.local ([216.46.12.2])\r\n        by smtp.gmail.com with ESMTPSA id o1sm3872631qte.36.2016.05.26.08.01.43\r\n        for <vvk.mnn@gmail.com>\r\n        (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);\r\n        Thu, 26 May 2016 08:01:44 -0700 (PDT)\r\nMessage-ID: <57470fd8.813aed0a.38d14.3f58@mx.google.com>\r\nDate: Thu, 26 May 2016 08:01:44 -0700 (PDT)\r\nFrom: vvk.mnn@gmail.com\r\nSubject: Python Test Email \r\n\r\n Look,\r\n script text!\r\n'}})

In [9]:
type(rawMessage)


Out[9]:
collections.defaultdict

Because this value is returned as a collections.defaultdict, we must use a series of keys to parse it and explore its values. This process is explained more thoroughly on Mr. Sweigart's blog..

Below is a rough explanation of the key approach, as initially explain the process behind keys.


In [10]:
message = email.message_from_bytes(rawMessage[88177][b'BODY[]'])

Once this value has been stored, we can use the .get() method to pull out various variables from the email.


In [29]:
print(message)
print(type(message))


Bcc: vvk.mnn@gmail.com
Return-Path: <vvk.mnn@gmail.com>
Received: from TLAVivekM.local ([216.46.12.2])
        by smtp.gmail.com with ESMTPSA id o1sm3872631qte.36.2016.05.26.08.01.43
        for <vvk.mnn@gmail.com>
        (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
        Thu, 26 May 2016 08:01:44 -0700 (PDT)
Message-ID: <57470fd8.813aed0a.38d14.3f58@mx.google.com>
Date: Thu, 26 May 2016 08:01:44 -0700 (PDT)
From: vvk.mnn@gmail.com
Subject: Python Test Email 

 Look,
 script text!

<class 'email.message.Message'>

We can now access different elements in this message object.


In [12]:
message.get('Subject')


Out[12]:
'Python Test Email '

In [16]:
message.get('from')


Out[16]:
'vvk.mnn@gmail.com'

The value for an email's body is defined as its 'payload', and accessible via the method get_payload()


In [28]:
message.get_payload()


Out[28]:
' Look,\r\n script text!\r\n'

We can end our session using the .logout method on the connection object.


In [32]:
conn.logout()


Out[32]:
b'LOGOUT Requested'

Recap

  • The imapclient module allows use to read emails in Python via IMAP.
  • The first step is creating a connection object with the domain and port number for the email service.
  • The next step is passing the .list_folders() and .select_folder() method to decide which folder we will interact with.
  • We can search emails using the .search() method to query via parameters.
  • Once we have found an email we wish to use, we can use the .fetch method to return an email object, and pass it into another function for interaction, here email.message_from_bytes().
  • Now that it's been stored in an email object, it can be interacted with a variety of methods to get values.
  • The body of a message is typically retrieved via the get_payload() method.
  • Once complete, a session can be ended with the .logout method on the connection object.