Using Metadata

One of the nice things in OS X since the introducion of Spotlight is that we now have some command line tools to look at metadata. The command mdls will return all the metadata for a file and we can use it in various ways.

Let's have a look at a PNG file.

(If you want to follow along and run the commands go to the Cell menu above and select All Output > Clear.)


In [1]:
mdData = !mdls images/dashboard_files_tab.png

In [2]:
mdData


Out[2]:
['kMDItemBitsPerSample           = 40',
 'kMDItemColorSpace              = "RGB"',
 'kMDItemContentCreationDate     = 2015-07-09 17:15:19 +0000',
 'kMDItemContentModificationDate = 2015-07-09 17:15:19 +0000',
 'kMDItemContentType             = "public.png"',
 'kMDItemContentTypeTree         = (',
 '    "public.png",',
 '    "public.image",',
 '    "public.data",',
 '    "public.item",',
 '    "public.content"',
 ')',
 'kMDItemDateAdded               = 2015-07-09 17:15:19 +0000',
 'kMDItemDisplayName             = "dashboard_files_tab.png"',
 'kMDItemFSContentChangeDate     = 2015-07-09 17:15:19 +0000',
 'kMDItemFSCreationDate          = 2015-07-09 17:15:19 +0000',
 'kMDItemFSCreatorCode           = ""',
 'kMDItemFSFinderFlags           = 0',
 'kMDItemFSHasCustomIcon         = (null)',
 'kMDItemFSInvisible             = 0',
 'kMDItemFSIsExtensionHidden     = 0',
 'kMDItemFSIsStationery          = (null)',
 'kMDItemFSLabel                 = 0',
 'kMDItemFSName                  = "dashboard_files_tab.png"',
 'kMDItemFSNodeCount             = (null)',
 'kMDItemFSOwnerGroupID          = 20',
 'kMDItemFSOwnerUserID           = 501',
 'kMDItemFSSize                  = 116878',
 'kMDItemFSTypeCode              = ""',
 'kMDItemHasAlphaChannel         = 1',
 'kMDItemKind                    = "Portable Network Graphics image"',
 'kMDItemLastUsedDate            = 2015-07-14 05:18:20 +0000',
 'kMDItemLogicalSize             = 116878',
 'kMDItemOrientation             = 0',
 'kMDItemPhysicalSize            = 118784',
 'kMDItemPixelCount              = 1638952',
 'kMDItemPixelHeight             = 1036',
 'kMDItemPixelWidth              = 1582',
 'kMDItemProfileName             = "Color LCD"',
 'kMDItemResolutionHeightDPI     = 144',
 'kMDItemResolutionWidthDPI      = 144',
 'kMDItemUseCount                = 1',
 'kMDItemUsedDates               = (',
 '    "2015-07-13 14:00:00 +0000"',
 ')']

We can also use straight Python to do the same thing, though it looks a little more complex.


In [10]:
import  subprocess
p = subprocess.Popen(['mdls', 'images/dashboard_files_tab.png'], stdout=subprocess.PIPE, 
                                                                  stderr=subprocess.PIPE)
mdString, err = p.communicate()
mdData = mdString.split('\n')

In [11]:
mdData


Out[11]:
['kMDItemBitsPerSample           = 40',
 'kMDItemColorSpace              = "RGB"',
 'kMDItemContentCreationDate     = 2015-07-09 17:15:19 +0000',
 'kMDItemContentModificationDate = 2015-07-09 17:15:19 +0000',
 'kMDItemContentType             = "public.png"',
 'kMDItemContentTypeTree         = (',
 '    "public.png",',
 '    "public.image",',
 '    "public.data",',
 '    "public.item",',
 '    "public.content"',
 ')',
 'kMDItemDateAdded               = 2015-07-09 17:15:19 +0000',
 'kMDItemDisplayName             = "dashboard_files_tab.png"',
 'kMDItemFSContentChangeDate     = 2015-07-09 17:15:19 +0000',
 'kMDItemFSCreationDate          = 2015-07-09 17:15:19 +0000',
 'kMDItemFSCreatorCode           = ""',
 'kMDItemFSFinderFlags           = 0',
 'kMDItemFSHasCustomIcon         = (null)',
 'kMDItemFSInvisible             = 0',
 'kMDItemFSIsExtensionHidden     = 0',
 'kMDItemFSIsStationery          = (null)',
 'kMDItemFSLabel                 = 0',
 'kMDItemFSName                  = "dashboard_files_tab.png"',
 'kMDItemFSNodeCount             = (null)',
 'kMDItemFSOwnerGroupID          = 20',
 'kMDItemFSOwnerUserID           = 501',
 'kMDItemFSSize                  = 116878',
 'kMDItemFSTypeCode              = ""',
 'kMDItemHasAlphaChannel         = 1',
 'kMDItemKind                    = "Portable Network Graphics image"',
 'kMDItemLastUsedDate            = 2015-07-14 05:18:20 +0000',
 'kMDItemLogicalSize             = 116878',
 'kMDItemOrientation             = 0',
 'kMDItemPhysicalSize            = 118784',
 'kMDItemPixelCount              = 1638952',
 'kMDItemPixelHeight             = 1036',
 'kMDItemPixelWidth              = 1582',
 'kMDItemProfileName             = "Color LCD"',
 'kMDItemResolutionHeightDPI     = 144',
 'kMDItemResolutionWidthDPI      = 144',
 'kMDItemUseCount                = 1',
 'kMDItemUsedDates               = (',
 '    "2015-07-13 14:00:00 +0000"',
 ')',
 '']

The first thing you will notice is that everything starts kMDItem and after that is what you might call the "real name". Notice that what we have is a list of strings so we can easily extract one.


In [12]:
match = [s for s in mdData if 'PixelHeight' in s]

In [13]:
match


Out[13]:
['kMDItemPixelHeight             = 1036']

This makes match a list of strings, in this case only one. Notice that match[0] is actually valid Python code to declare the variable kMDItemPixelHeight so we can just run the string and end up with a usable variable.


In [14]:
exec match[0]

In [15]:
kMDItemPixelHeight


Out[15]:
1036

Now do the same for the width and name.


In [17]:
match = [s for s in mdData if 'PixelWidth' in s]
exec match[0]
match = [s for s in mdData if 'FSName' in s]
exec match[0]

And we can print a nice string with the name and size of the picture.


In [18]:
print kMDItemFSName + '\t' + str(kMDItemPixelHeight) +'x' + str(kMDItemPixelWidth)


dashboard_files_tab.png	1036x1582

We could easily turn this into a function. If we turn it into a function we could also add some error checking.


In [27]:
import subprocess

def printSize(fname):
    # run mdls
    p = subprocess.Popen(['mdls', fname], stdout=subprocess.PIPE, 
        stderr=subprocess.PIPE)
    mdString, err = p.communicate()
    if err:
        return "Could not open " + fname
        
    # the above returns the output as a single string with '\n'
    # between output lines. Let's make it a list of lines
    mdData = mdString.split('\n')

    # All the above is the equivalent of the IPython `mData = !mdls fileName`

    # get the height
    match = [s for s in mdData if 'PixelHeight' in s]
    if not match:
        return fname + "  Is Not Picture"
    exec match[0]
    # get the width
    match = [s for s in mdData if 'PixelWidth' in s]
    if not match:
        return fname + "  Is Not Picture"
    exec match[0]
    # get the name
    match = [s for s in mdData if 'FSName' in s]
    if not match:
        return fname + "  File Name Error"
    exec match[0]

    return kMDItemFSName + '  ' +str(kMDItemPixelHeight) +'x' + str(kMDItemPixelWidth)

In [28]:
printSize('images/dashboard_running_tab.png')


Out[28]:
'dashboard_running_tab.png  1008x1572'

Now let's try some errors.


In [29]:
printSize('Index.ipynb')


Out[29]:
'Index.ipynb  Is Not Picture'

In [33]:
printSize('I_dont_exist.txt')


Out[33]:
'I_dont_exist.txt  Is Not Picture'

Conclusion

So that's a quick look at mdls. Explore the data that it returns for various other files. For photos you have taken yourself there is a lot of other information. You should also have a look at other document types such as PDF files. Below are some examples from my computer.


In [36]:
book = !mdls photoshop1pdf.pdf

In [37]:
book


Out[37]:
['kMDItemAuthors                 = (',
 '    "Corrie Haffly"',
 ')',
 'kMDItemContentCreationDate     = 2006-09-07 00:27:54 +0000',
 'kMDItemContentModificationDate = 2006-09-07 00:27:54 +0000',
 'kMDItemContentType             = "com.adobe.pdf"',
 'kMDItemContentTypeTree         = (',
 '    "com.adobe.pdf",',
 '    "public.data",',
 '    "public.item",',
 '    "public.composite-content",',
 '    "public.content"',
 ')',
 'kMDItemCreator                 = "Adobe InDesign CS2 (4.0.3)"',
 'kMDItemDateAdded               = 2015-03-29 22:48:10 +0000',
 'kMDItemDisplayName             = "photoshop1pdf.pdf"',
 'kMDItemEncodingApplications    = (',
 '    "Adobe PDF Library 7.0"',
 ')',
 'kMDItemFSContentChangeDate     = 2006-09-07 00:27:54 +0000',
 'kMDItemFSCreationDate          = 2006-09-07 00:27:54 +0000',
 'kMDItemFSCreatorCode           = ""',
 'kMDItemFSFinderFlags           = 0',
 'kMDItemFSHasCustomIcon         = (null)',
 'kMDItemFSInvisible             = 0',
 'kMDItemFSIsExtensionHidden     = 0',
 'kMDItemFSIsStationery          = (null)',
 'kMDItemFSLabel                 = 0',
 'kMDItemFSName                  = "photoshop1pdf.pdf"',
 'kMDItemFSNodeCount             = (null)',
 'kMDItemFSOwnerGroupID          = 20',
 'kMDItemFSOwnerUserID           = 501',
 'kMDItemFSSize                  = 42000077',
 'kMDItemFSTypeCode              = ""',
 'kMDItemKind                    = "Portable Document Format (PDF)"',
 'kMDItemLastUsedDate            = 2015-07-14 05:07:49 +0000',
 'kMDItemLogicalSize             = 42000077',
 'kMDItemNumberOfPages           = 294',
 'kMDItemPageHeight              = 720',
 'kMDItemPageWidth               = 576',
 'kMDItemPhysicalSize            = 42000384',
 'kMDItemSecurityMethod          = "None"',
 'kMDItemTitle                   = "The Photoshop Anthology: 101 Web Design Tips, Tricks and Techniques"',
 'kMDItemUseCount                = 2',
 'kMDItemUsedDates               = (',
 '    "2015-07-13 14:00:00 +0000"',
 ')',
 'kMDItemVersion                 = "1.4"']

In [38]:
other = !mdls On_Basilisk_Station.mobi

In [39]:
other


Out[39]:
['kMDItemContentCreationDate     = 2008-06-19 22:17:54 +0000',
 'kMDItemContentModificationDate = 2008-06-19 22:17:54 +0000',
 'kMDItemContentType             = "dyn.ah62d4rv4ge80455cre"',
 'kMDItemContentTypeTree         = (',
 '    "public.data",',
 '    "public.item"',
 ')',
 'kMDItemDateAdded               = 2015-03-29 22:21:49 +0000',
 'kMDItemDisplayName             = "On_Basilisk_Station.mobi"',
 'kMDItemFSContentChangeDate     = 2008-06-19 22:17:54 +0000',
 'kMDItemFSCreationDate          = 2008-06-19 22:17:54 +0000',
 'kMDItemFSCreatorCode           = ""',
 'kMDItemFSFinderFlags           = 0',
 'kMDItemFSHasCustomIcon         = (null)',
 'kMDItemFSInvisible             = 0',
 'kMDItemFSIsExtensionHidden     = 0',
 'kMDItemFSIsStationery          = (null)',
 'kMDItemFSLabel                 = 0',
 'kMDItemFSName                  = "On_Basilisk_Station.mobi"',
 'kMDItemFSNodeCount             = (null)',
 'kMDItemFSOwnerGroupID          = 20',
 'kMDItemFSOwnerUserID           = 501',
 'kMDItemFSSize                  = 676360',
 'kMDItemFSTypeCode              = ""',
 'kMDItemKind                    = "Kindle Document"',
 'kMDItemLastUsedDate            = 2015-07-14 04:50:55 +0000',
 'kMDItemLogicalSize             = 676360',
 'kMDItemPhysicalSize            = 679936',
 'kMDItemUseCount                = 1',
 'kMDItemUsedDates               = (',
 '    "2015-07-13 14:00:00 +0000"',
 ')']

In [40]:
candid = !mdls GoingCandid.pdf

In [41]:
candid


Out[41]:
['kMDItemAuthors                 = (',
 '    "Thomas Leuthard"',
 ')',
 'kMDItemContentCreationDate     = 2011-08-01 23:30:41 +0000',
 'kMDItemContentModificationDate = 2011-08-01 23:30:41 +0000',
 'kMDItemContentType             = "com.adobe.pdf"',
 'kMDItemContentTypeTree         = (',
 '    "com.adobe.pdf",',
 '    "public.data",',
 '    "public.item",',
 '    "public.composite-content",',
 '    "public.content"',
 ')',
 'kMDItemCreator                 = "Microsoft\xc2\xae Word 2010"',
 'kMDItemDateAdded               = 2015-03-29 22:38:09 +0000',
 'kMDItemDescription             = "Street Photography"',
 'kMDItemDisplayName             = "GoingCandid.pdf"',
 'kMDItemEncodingApplications    = (',
 '    "Microsoft\\U00ae Word 2010"',
 ')',
 'kMDItemFSContentChangeDate     = 2011-08-01 23:30:41 +0000',
 'kMDItemFSCreationDate          = 2011-08-01 23:30:41 +0000',
 'kMDItemFSCreatorCode           = ""',
 'kMDItemFSFinderFlags           = 0',
 'kMDItemFSHasCustomIcon         = (null)',
 'kMDItemFSInvisible             = 0',
 'kMDItemFSIsExtensionHidden     = 0',
 'kMDItemFSIsStationery          = (null)',
 'kMDItemFSLabel                 = 0',
 'kMDItemFSName                  = "GoingCandid.pdf"',
 'kMDItemFSNodeCount             = (null)',
 'kMDItemFSOwnerGroupID          = 20',
 'kMDItemFSOwnerUserID           = 501',
 'kMDItemFSSize                  = 8501963',
 'kMDItemFSTypeCode              = ""',
 'kMDItemKeywords                = (',
 '    "85mm Street Photography"',
 ')',
 'kMDItemKind                    = "Portable Document Format (PDF)"',
 'kMDItemLastUsedDate            = 2015-06-29 04:11:35 +0000',
 'kMDItemLogicalSize             = 8501963',
 'kMDItemNumberOfPages           = 96',
 'kMDItemPageHeight              = 567',
 'kMDItemPageWidth               = 425.28',
 'kMDItemPhysicalSize            = 8503296',
 'kMDItemSecurityMethod          = "None"',
 'kMDItemTitle                   = "Going Candid..."',
 'kMDItemUseCount                = 2',
 'kMDItemUsedDates               = (',
 '    "2015-06-28 14:00:00 +0000"',
 ')',
 'kMDItemVersion                 = "1.6"']