Python Data Files

File operations using Python and libraries.

CSV files in native Python

Python provides a native module to perform CSV file operations.

Official documentation: CSV module in Python 2.7

List to CSV



In [3]:

    
# For reading/writing CSV files
import csv
# For listing system file folders
from subprocess import check_output

# Use with open to ensure file is closed when block ends
# The wb flag opens file for writing
with open('data/fileops/vehicles.csv', 'wb') as csv_file:
    # Prepare csv writer
    wtr = csv.writer(csv_file, delimiter=',', quotechar='"',
                   quoting=csv.QUOTE_MINIMAL)
    # Write CSV header row
    wtr.writerow(['type', 'wheels', 'speed', 'weight', 'invented'])
    # Write CSV data rows
    wtr.writerow(['Scooter', 2, 150, 109.78, 1817])
    wtr.writerow(['Car', 4, 250, 1818.45, 1885]) 
    wtr.writerow(['Plane', 10, 850, 270000, 1903])

# Check file created
print(check_output(["ls", "data/fileops"]).decode("utf8"))









    



game-scores.csv
vehicles.csv

CSV to List



In [4]:

    
# The rb flag opens file for reading
with open('data/fileops/vehicles.csv', 'rb') as csv_file:
    rdr = csv.reader(csv_file, delimiter=',', quotechar='"')
    for row in rdr:
        print '\t'.join(row)









    



type	wheels	speed	weight	invented
Scooter	2	150	109.78	1817
Car	4	250	1818.45	1885
Plane	10	850	270000	1903

Dictionary to CSV



In [5]:

    
# Dictionary data structures can be used to represent rows
game1_scores = {'Game':'Quarter', 'Team A': 45, 'Team B': 90}
game2_scores = {'Game':'Semi', 'Team A': 80, 'Team B': 32}
game3_scores = {'Game':'Final', 'Team A': 70, 'Team B': 68}

headers = ['Game', 'Team A', 'Team B']

# Create CSV from dictionaries
with open('data/fileops/game-scores.csv', 'wb') as df:
    dict_wtr = csv.DictWriter(df, fieldnames=headers)
    dict_wtr.writeheader()
    dict_wtr.writerow(game1_scores)
    dict_wtr.writerow(game2_scores)
    dict_wtr.writerow(game3_scores)

print(check_output(["ls", "data/fileops"]).decode("utf8"))









    



game-scores.csv
vehicles.csv

CSV to Dictionary



In [6]:

    
# Read CSV into dictionary data structure
with open('data/fileops/game-scores.csv', 'rb') as df:
    dict_rdr = csv.DictReader(df)
    for row in dict_rdr:
        print('\t'.join([row['Game'], row['Team A'], row['Team B']]))
    print('\t'.join(row.keys()))









    



Quarter	45	90
Semi	80	32
Final	70	68
Game	Team A	Team B

Pandas for CSV file operations

Pandas goal is to become the most powerful and flexible open source data analysis / manipulation tool available in any language. Pandas includes file operations capabilities for CSV, among other formats.

CSV operations in Pandas are much faster than in native Python.

DataFrame to CSV



In [12]:

    
import pandas as pd

# Create a DataFrame
df = pd.DataFrame({
        'Name' : ['Josh', 'Eli', 'Ram', 'Bil'],
        'Sales' : [34.32, 12.1, 4.77, 31.63],
        'Region' : ['North', 'South', 'West', 'East'],
        'Product' : ['PC', 'Phone', 'SW', 'Cloud']})
df









    Out[12]:






  
    
      
      Name
      Product
      Region
      Sales
    
  
  
    
      0
      Josh
      PC
      North
      34.32
    
    
      1
      Eli
      Phone
      South
      12.10
    
    
      2
      Ram
      SW
      West
      4.77
    
    
      3
      Bil
      Cloud
      East
      31.63



In [31]:

    
# DataFrame to CSV
df.to_csv('data/fileops/sales.csv', index=False)

print(check_output(["ls", "data/fileops"]).decode("utf8"))









    



game-scores.csv
sales.csv
sales.xlsx
vehicles.csv

CSV to DataFrame



In [28]:

    
# CSV to DataFrame
df2 = pd.read_csv('data/fileops/sales.csv')

df2









    Out[28]:






  
    
      
      Name
      Product
      Region
      Sales
    
  
  
    
      0
      Josh
      PC
      North
      34.32
    
    
      1
      Eli
      Phone
      South
      12.10
    
    
      2
      Ram
      SW
      West
      4.77
    
    
      3
      Bil
      Cloud
      East
      31.63

DataFrame to Excel



In [30]:

    
# DataFrame to XLSX Excel file
df.to_excel('data/fileops/sales.xlsx', index=False)

print(check_output(["ls", "data/fileops"]).decode("utf8"))









    



game-scores.csv
sales.csv
sales.xlsx
vehicles.csv

Excel to DataFrame



In [29]:

    
# Excel to DataFrame
df3 = pd.read_excel('data/fileops/sales.xlsx')

df3









    Out[29]:






  
    
      
      Name
      Product
      Region
      Sales
    
  
  
    
      0
      Josh
      PC
      North
      34.32
    
    
      1
      Eli
      Phone
      South
      12.10
    
    
      2
      Ram
      SW
      West
      4.77
    
    
      3
      Bil
      Cloud
      East
      31.63

	Name	Product	Region	Sales
0	Josh	PC	North	34.32
1	Eli	Phone	South	12.10
2	Ram	SW	West	4.77
3	Bil	Cloud	East	31.63