In [ ]:
We're going to use something called a with statement to open a file and read the contents. The open() function takes at least two arguments: The path to the file you're opening and what "mode" you're opening it in.
To start with, we're going to use the 'r' mode to read the data. We'll use the default arguments for delimiter -- comma -- and we don't need to specify a quote character.
Important: If you open a data file in w (write) mode, anything that's already in the file will be erased.
The file we're using -- MLB roster data from 2017 -- lives at data/mlb.csv.
Once we have the file open, we're going to use some functionality from the csv module to iterate over the lines of data and print each one.
Specifically, we're going to use the csv.reader method, which returns a list of lines in the data file. Each line, in turn, is a list of the "cells" of data in that line.
Then we're going to loop over the lines of data and print each line. We can also use bracket notation to retrieve elements from inside each line of data.
In [ ]:
# open the MLB data file `as` mlb
# create a reader object
# loop over the rows in the file
# assign variables to each element in the row (shortcut!)
# print the row, which is a list
In [ ]:
# open the MLB data file `as` mlb
# create a reader object
# move past the header row
# loop over the rows in the file
# assign variables to each element in the row (shortcut!)
# print the line of data ~only~ if the player is on the Twins
# print the row, which is a list
In [ ]:
# open the MLB data file `as` mlb
# create a reader object
# move past the header row
# loop over the rows in the file
# assign variables to each element in the row (shortcut!)
# print the line of data ~only~ if the player is on the Twins
# print the row, which is a list
Sometimes it's more convenient to work with data files as a list of dictionaries instead of a list of lists. That way, you don't have to remember the position of each "column" of data -- you can just reference the column name. To do it, we'll use a csv.DictReader object instead of a csv.reader object. Otherwise the code is much the same.
In [ ]:
# open the MLB data file `as` mlb
# create a reader object
# loop over the rows in the file
# print just the player's name (the column header is "NAME")
In [ ]:
# define the column names
# let's make a few rows of data to write
# open an output file in write mode
# create a writer object
# write the header row
# loop over the data and write to file
In [ ]:
# define the column names
# let's make a few rows of data to write
# open an output file in write mode
# create a writer object -- pass the list of column names to the `fieldnames` keyword argument
# use the writeheader method to write the header row
# loop over the data and write to file
Sometimes you want to open multiple files at the same time. One thing you might want to do: Opening a file of raw data in read mode, clean each row in a loop and write out the clean data to a new file.
You can open multiple files in the same with block -- just separate your open() functions with a comma.
For this example, we're not going to do any cleaning -- we're just going to copy the contents of one file to another.
In [ ]:
# open the MLB data file `as` mlb
# also, open `mlb-copy.csv` to write to
# create a reader object
# create a writer object
# we're going to use the `fieldnames` attribute of the DictReader object
# as our output headers, as well
# b/c we're basically just making a copy
# write header row
# loop over the rows in the file
# what type of object is `row`?
# how would we find out?
# write row to output file