First, import necessary libraries
In [1]:
import pandas as pd
We can issue shell commands too
In [2]:
!dir
Read in the excel file (from the github repo) and turn into a DataFrame
In [3]:
df = pd.read_excel("https://github.com/chris1610/pbpython/blob/master/data/sample-salesv3.xlsx?raw=True")
Let's look at the file
In [4]:
df.head()
Out[4]:
In [5]:
df.tail()
Out[5]:
In [6]:
df.describe()
Out[6]:
In [7]:
df.info()
We can easily perform math operations on the data
In [8]:
df.quantity.sum()
Out[8]:
In [9]:
df["unit price"].sum()
Out[9]:
Now we can show how to aggregate some data
In [10]:
df.groupby('name')["unit price"].sum()
Out[10]:
Do multiple groupbys
In [11]:
df.groupby(["name", "sku"])["quantity", "unit price"].mean().head()
Out[11]:
Make sure to set the date as a date column
In [12]:
df["date"] = pd.to_datetime(df['date'])
df.info()
Now we have a date time object
In [13]:
df.head()
Out[13]:
Let's see how much we sold by month by customer
In [14]:
df.set_index('date').groupby('name')["ext price"].resample("M").sum().head()
Out[14]:
What about every 3 Months?
In [15]:
df.set_index('date').groupby('name')["ext price"].resample("3M").sum().head()
Out[15]:
What about quarterly ending in Dec?
In [16]:
df.set_index('date').groupby('name')["ext price"].resample('Q-DEC').sum().head()
Out[16]:
Annual Number
In [17]:
df.set_index('date').groupby('name')["ext price"].resample('A-DEC').sum()
Out[17]:
DataFrame vs. Series
In [18]:
df.set_index('date').groupby('name')["ext price",].resample('A-DEC').sum()
Out[18]:
In [19]:
df.set_index('date').groupby('name')["ext price",].resample('A-DEC').sum().to_excel("test.xlsx")
In [ ]: