Lesson 42:

Reading Excel Spreadsheets

The openpyxl module allows you to manipulate Excel sheets within Python.

Excel files have the following terminology:

  • A collection of sheets is a workbook, and saved with a .xlsx extension.
  • A workbook contains multiple sheets, each of which is a single spreadsheet, or worksheet.
  • Each sheet has columns and rows, defined by letters and numbers respecitvely.
  • The intersection between a column and a row is a cell.

In [1]:
import openpyxl

We must first nagivate to the directory containing the spreadsheets, which for this notebook is the subdirectory 'files'.


In [8]:
# Import OS module to navigate directories
import os

# Change the directory to the excel file location, using relative and absolute paths as previously discussed.
os.chdir('files')

In [6]:
os.listdir()


Out[6]:
['.DS_Store',
 '112065.pdf',
 '26645.pdf',
 'alarm.wav',
 'allMyCats1.py',
 'allMyCats1.py.backup',
 'allMyCats2.py',
 'allMyCats2.py.backup',
 'AutomateSearch.png',
 'backupToZip.py',
 'backupToZip.py.backup',
 'bacon.txt',
 'birthdays.py',
 'birthdays.py.backup',
 'boxPrint.py',
 'boxPrint.py.backup',
 'buggyAddingProgram.py',
 'buggyAddingProgram.py.backup',
 'bulletPointAdder.py',
 'bulletPointAdder.py.backup',
 'calcProd.py',
 'calcProd.py.backup',
 'catlogo.png',
 'catnapping.py',
 'catnapping.py.backup',
 'census2010.py',
 'census2010.py.backup',
 'censuspopdata.xlsx',
 'characterCount.py',
 'characterCount.py.backup',
 'coinFlip.py',
 'coinFlip.py.backup',
 'combinedminutes.pdf',
 'combinePdfs.py',
 'combinePdfs.py.backup',
 'countdown.py',
 'countdown.py.backup',
 'CSSSelector.png',
 'demo.docx',
 'dictionary.txt',
 'dimensions.xlsx',
 'downloadXkcd.py',
 'downloadXkcd.py.backup',
 'duesRecords.xlsx',
 'encrypted.pdf',
 'encryptedminutes.pdf',
 'error_log.txt',
 'errorExample.py',
 'errorExample.py.backup',
 'example.csv',
 'example.html',
 'example.xlsx',
 'example.zip',
 'excelSpreadsheets.zip',
 'exitExample.py',
 'exitExample.py.backup',
 'factorialLog.py',
 'factorialLog.py.backup',
 'Factoriallog.txt',
 'fiveTimes.py',
 'fiveTimes.py.backup',
 'formFiller.py',
 'formFiller.py.backup',
 'freezeExample.xlsx',
 'getDocxText.py',
 'getDocxText.py.backup',
 'guessTheNumber.py',
 'guessTheNumber.py.backup',
 'guests.txt',
 'headings.docx',
 'hello.py',
 'hello.py.backup',
 'helloFunc.py',
 'helloFunc.py.backup',
 'helloFunc2.py',
 'helloFunc2.py.backup',
 'helloworld.docx',
 'helloworld.txt',
 'inventory.py',
 'inventory.py.backup',
 'isPhoneNumber.py',
 'isPhoneNumber.py.backup',
 'littleKid.py',
 'littleKid.py.backup',
 'lucky.py',
 'lucky.py.backup',
 'magic8Ball.py',
 'magic8Ball.py.backup',
 'magic8Ball2.py',
 'magic8Ball2.py.backup',
 'mapIt.py',
 'mapIt.py.backup',
 'mcb.pyw',
 'meetingminutes.pdf',
 'meetingminutes2.pdf',
 'merged.xlsx',
 'mouseNow.py',
 'mouseNow.py.backup',
 'mouseNow2.py',
 'mouseNow2.py.backup',
 'multidownloadXkcd.py',
 'multidownloadXkcd.py.backup',
 'multipleParagraphs.docx',
 'mycatdata',
 'myPets.py',
 'myPets.py.backup',
 'newbacon.txt',
 'newfiles',
 'newfolder',
 'passingReference.py',
 'passingReference.py.backup',
 'phoneAndEmail.py',
 'phoneAndEmail.py.backup',
 'picnicTable.py',
 'picnicTable.py.backup',
 'prettyCharacterCount.py',
 'prettyCharacterCount.py.backup',
 'printRandom.py',
 'printRandom.py.backup',
 'produceSales.xlsx',
 'pw.py',
 'pw.py.backup',
 'quickWeather.py',
 'quickWeather.py.backup',
 'randomQuizGenerator.py',
 'randomQuizGenerator.py.backup',
 'readCensusExcel.py',
 'readCensusExcel.py.backup',
 'readDocx.py',
 'readDocx.py.backup',
 'removeCsvHeader.py',
 'removeCsvHeader.py.backup',
 'removeCsvHeader.zip',
 'renameDates.py',
 'renameDates.py.backup',
 'resizeAndAddLogo.py',
 'resizeAndAddLogo.py.backup',
 'restyled.docx',
 'RomeoAnd Juliet.txt',
 'sameName.py',
 'sameName.py.backup',
 'sameName2.py',
 'sameName2.py.backup',
 'sameName3.py',
 'sameName3.py.backup',
 'sameName4.py',
 'sameName4.py.backup',
 'sampleChart.xlsx',
 'sendDuesReminders.py',
 'sendDuesReminders.py.backup',
 'stopwatch.py',
 'stopwatch.py.backup',
 'styled.xlsx',
 'styles.xlsx',
 'swordfish.py',
 'swordfish.py.backup',
 'textMyself.py',
 'textMyself.py.backup',
 'threadDemo.py',
 'threadDemo.py.backup',
 'ticTacToe.py',
 'ticTacToe.py.backup',
 'torrentStarter.py',
 'torrentStarter.py.backup',
 'twoPage.docx',
 'updatedProduceSales.xlsx',
 'updateProduce.py',
 'updateProduce.py.backup',
 'validateInput.py',
 'validateInput.py.backup',
 'vampire.py',
 'vampire.py.backup',
 'vampire2.py',
 'vampire2.py.backup',
 'watermark.pdf',
 'zeroDivide.py',
 'zeroDivide.py.backup',
 'zophie.png']

We must now open the workbook file.


In [12]:
workbook = openpyxl.load_workbook('example.xlsx')
type(workbook)


Out[12]:
openpyxl.workbook.workbook.Workbook

Once the workbook is loaded, we can interact with specific sheets by loading them via workbook methods.


In [13]:
sheet = workbook.get_sheet_by_name('Sheet1')
type(sheet)


Out[13]:
openpyxl.worksheet.worksheet.Worksheet

We can also use the .get_sheet_names() method to print all sheet names, in case we aren't sure.


In [14]:
workbook.get_sheet_names()


Out[14]:
['Sheet1', 'Sheet2', 'Sheet3']

We can now interact with specific cells by creating cell objects, referenced via a sheet method.


In [18]:
# Just references an object exists; requires an additional method to interact with
sheet['A1']


Out[18]:
<Cell Sheet1.A1>

The .value method returns the actual value in the cell.


In [19]:
cell = sheet['A1']
cell.value


Out[19]:
datetime.datetime(2015, 4, 5, 13, 34, 2)

This particular cell returns a datetime reference from Excel via Python's own datetime module. A string value is available by passing into the str() function:


In [22]:
print(str(cell.value))
print(str(sheet['A1'].value))


2015-04-05 13:34:02
2015-04-05 13:34:02

All cell values inherit their data types from Excel.


In [35]:
print("The value in cell %s is '%s' and is type %s." %('A1', sheet['A1'].value, type(sheet['A1'].value)))
print("The value in cell %s is '%s' and is type %s." %('B1', sheet['B1'].value, type(sheet['B1'].value)))
print("The value in cell %s is '%s' and is type %s." %('C1', sheet['C1'].value, type(sheet['C1'].value)))


The value in cell A1 is '2015-04-05 13:34:02' and is type <class 'datetime.datetime'>.
The value in cell B1 is 'Apples' and is type <class 'str'>.
The value in cell C1 is '73' and is type <class 'int'>.

You can also reference cells via rows and columns. Excel rows start at 1 and columns at A.


In [39]:
# B1 Cell
sheet.cell(row = 1, column = 2)


Out[39]:
<Cell Sheet1.B1>

This can be useful for iterative or looping operations.


In [41]:
for i in range(1,8):
    print(i, sheet.cell(row=i, column=2).value)


1 Apples
2 Cherries
3 Pears
4 Oranges
5 Apples
6 Bananas
7 Strawberries

Recap

  • The OpenPyXl module (openpyxl) handles Excel spreadsheet files (.xlsx).
  • openpyxl.load_workbook() opens a workbook file at a location and returns a workbook object.
  • The .get_sheet_names() method on a workbook prints all the sheet names inide a workbook.
  • The .get_sheet_name() method creates a worksheet object from a sheet in the workbook.
  • The index notation (sheet['A1']) returns cell objects.
  • Cell objects use the .value method which allows you to see the content of the cell.
  • The cell() method also returns a cell object from a sheet, from a defined row and column.