IT Maintenance Report-Data Extraction-Monthly


IT Maintenance Report

Want to pull data from BearBuy PO that have to do with IT Maintenance for UCSF

-Don't want pure hardware or software products

-Don't want accessories

-Only Maintenance Items for hardware and software

Challenges Remaining after Data Cleansing:

1. Some data doesn't have a start/end date

2. If maintenance is included with a product, the maint. unit dollar amount might not be extractable

Benefits with this approach:

1. Running log of all PO items instead of only monthly data dumps

2. Automatically extracts start and end dates if dates are in standard format

3. Continuosly will calculate from time run how long until everything expires (given provided End date)

4. Attempt to make data as accurate as possible i.e weed out non-IT maintenance

5. Able to get data for non-FAS

6. Active/Expired labels appended to data

Different from previous approach:

1. Currently not connecting manager of data with line items as was done previously

2. Question about start and end date (seems opposite on reports)

Code to Edit Monthly UCSF BearBuy PO Data Extractions

Current WIP:

a. Check for overlap in PO vs. Supplier supplied data


In [191]:
#This will be the code to extract all of the UCSF maintenance items

import pandas as pd
from pandas import *
import csv
import re

report= pd.read_csv('C:/Users/alexisperez/Documents/IT Maintenance Renewal/Code/IT Maintenance Report Updated.csv')
data= pd.read_csv('C:/Users/alexisperez/Documents/IT Maintenance Renewal/Code/Data/Aug15_Aug28.csv')
non_items=pd.read_csv('C:/Users/alexisperez/Documents/IT Maintenance Renewal/Code/Non-Items.csv')

In [192]:
##############################FILTERS DATA OUT AND CREATES START AND END DATES################################################

#Create two columns with the start and end dates from within the data
#Tried doing this after I filtered out, but it seemed to work a better when I did this first 
data['End Date'] = ''
data['Start Date']=''

#Getting all #/#/# - #/#/# format dates
pattern=re.compile('(?s)(\d{1,2}\/\d{1,2}\/\d{2,4}).*?(\d{1,2}\/\d{1,2}\/\d{2,4}).*')

first_list = []
second_list = []
for x in data['Product Description']:
  m = re.search(pattern,x)
  if m is None:
          first_list.append('')
          second_list.append('')
  else:
      first_list.append(m.group(1))
      second_list.append(m.group(2))
    
    
data['Start Date'] = Series(first_list)
data['End Date'] = Series(second_list)


#Filter out non IT suppliers
non_items['Words']=non_items['Words'].fillna('blank').astype('str')

data['Product Description']=data['Product Description'].str.lower()
data['Supplier Name']=data['Supplier Name'].str.lower()


#Filter the entire data set with what we want; Maintenance spend, however this can contain non-IT related maintenance spend
List= ('maintenance','maint','renew','support','service contract')
List= '|'.join(List)
 
maint_data= data[data['Product Description'].str.contains(List)]


#Now we want only IT-related maintenance

Non_IT= non_items['Suppliers']
Non_IT= '|'.join(Non_IT)

Items=non_items['Words']
Items= '|'.join(Items)

filtered= maint_data[-maint_data['Supplier Name'].str.contains(Non_IT)]
filtered=filtered[-filtered['Product Description'].str.contains(Items)]

#Only want certain columns
filtered = filtered.loc[:,['PO #','Creation Date','Supplier Number','Supplier Name','Product Description','Manufacturer','Commodity Code','Quantity','Extended Price','Home Department','Start Date','End Date']]

#Create some new columns
filtered['Length[Years]']=''
filtered['Hardware/Software'] = ''
filtered['Director']=''
filtered['Manager']=''

################################CALCULATES TIME LEFT AND LABELS EXPIRED/ACTIVE################################################

#Looking at today's date and determing when the products will expire with restpect to today

from datetime import datetime
from datetime import date
import datetime
today = datetime.date.today()


#Calculate difference from today and the end date
#Set astype to datetime in order to calculate difference in days and years
filtered['Start Date']=filtered['Start Date'].astype('datetime64[ns]')
filtered['End Date']=filtered['End Date'].astype('datetime64[ns]')


filtered['today']=today
filtered['today']=filtered['today'].astype('datetime64[ns]')

#Calculate how long until it expires
filtered['diff']=filtered['End Date']-filtered['today']
filtered['Time left[days]']=filtered['diff'].apply(lambda x: x / np.timedelta64(1,'D'))
filtered['Time left[Year]']=filtered['Time left[days]'].apply(lambda x: x/365).apply(lambda x: np.round(x,2))

#Labeling if a product is expired based on today's date and End Date 

filtered['Expired/Active'] = '' 

active_column = Series.copy(filtered['Time left[days]'])
active_column=active_column.astype(basestring)
active_column[ active_column > 0] = 'Active'
active_column[ active_column < 0] = 'Expired'
active_column[ active_column == 0] = ''
filtered['Expired/Active'] = active_column
filtered['Expired/Active']=filtered['Expired/Active'].fillna('')


filtered['Time left[days]']=filtered['Time left[days]'].astype('str').fillna('').str.replace('nan','')
filtered['Time left[Year]']=filtered['Time left[Year]'].astype('str').fillna('').str.replace('nan','')



#Get the Length of Maintenance
filtered['Length[Years]']  = filtered['End Date'] - filtered['Start Date']

#Tried creating get_year(x) func but encountered errors, just applyed several functions...
filtered['Length[Years]']=filtered['Length[Years]'].apply(lambda x: x / np.timedelta64(1,'D')).apply(lambda x: x/365)
filtered['Length[Years]']=filtered['Length[Years]'].apply(lambda x: np.round(x,2))
#Fillna('') gives nan so replace nan 
filtered['Length[Years]']=filtered['Length[Years]'].astype('str').fillna('').str.replace('nan','')

#Don't need these columns
filtered=filtered.drop(['diff','today'],1)

#Change dtype out of timedelta and into string...can't just convert to string first need to convert to object then string
#Want to get out of datetime so I can plot smoothly these dates
filtered['Start Date']=filtered['Start Date'].astype('O').astype('str')
filtered['Start Date']=filtered['Start Date'].str.replace('00:00:00','').str.replace('nan','')

filtered['End Date']=filtered['End Date'].astype('O').astype('str')
filtered['End Date']=filtered['End Date'].str.replace('00:00:00','').str.replace('nan','')


filtered['Time left[days]']=filtered['Time left[days]'].astype('str').fillna('').str.replace('nan','')
filtered['Time left[Year]']=filtered['Time left[Year]'].astype('str').fillna('').str.replace('nan','')

###############################DETERMINES HARDWARE AND SOFTWARE BASED OFF OLD REPORT################################################

#The following code is a brute force way to determine what is Hardware/Software based off past data
###Note Only determines line items where suppliers have one unique H/S instance i.e Supplier X only has Hardware Items###
#Instead of 'predicting' or using a for loop, there are multiple filters to break down the data to what we want
  # followed by simplfying the data to only single suppliers and their H/S instance
    #Then we merge the monthly report with the Simple Single H/S Instance Supplier DF
    
#Using past data only want Suppliers with either all hardware or all software for historical line items
#Pivot table based on unique items per supplier
report=report[-report['Hardware/Software'].isnull()]
pivot =pivot_table(report,rows=['Supplier Name'], values=['Hardware/Software'], aggfunc=lambda x: len(x.unique()))
pivot=pivot.reset_index()


#Tried to do a for loop, but just decided to look at filtered df of unique values == 1
pivot_1=pivot[pivot['Hardware/Software'] == 1]

#Tried to merge/add column of H/S values to H/S count and it went messy so I just filtered out the original df by suppliers with 1 H.S
report_only_1=report[report['Supplier Name'].isin(pivot_1['Supplier Name'])]

#For easy joining, take only the two columns necessary
report_only_1=report_only_1.loc[:,['Supplier Name','Hardware/Software']]

#Want only the suppliers that are in the Monthly report for simple merging
report_only_1_filt=report_only_1[report_only_1['Supplier Name'].isin(monthly['Supplier Name'])]


#In order to merge 'properly' Only want ONE isntance of each supplier name and associated H/S
#Because each of these suppliers only has one H/S (by what we filtered off of) then no need for multiple instances 
report_only_1_filt=report_only_1_filt.drop_duplicates('Supplier Name')

monthly=merge(monthly,report_only_1_filt, left_on='Supplier Name',right_on='Supplier Name', how='left')

monthly['Hardware/Software']=monthly['Hardware/Software_y']
monthly=monthly.drop('Hardware/Software_y',1)

In [193]:
filtered.to_csv('C:/Users/alexisperez/Documents/IT Maintenance Renewal/Code/Monthly Test Report.csv')

Double check the output, Edit, Save and Append monthly to continual report

Code to Update Log of Running Items in Current IT Maintenance Report


In [ ]:
#Looking at today's date and determing when the products will expire with restpect to today
from datetime import datetime
from datetime import date
import datetime
today = datetime.date.today()


#Calculate difference from today and the end date

report['Start Date']=report['Start Date'].astype('datetime64[ns]')
report['End Date']=report['End Date'].astype('datetime64[ns]')


report['today']=today
report['today']=report['today'].astype('datetime64[ns]')


report['diff']=report['End Date']-report['today']

report['Time left[days]']=report['diff'].apply(lambda x: x / np.timedelta64(1,'D'))
report['Time left[Year]']=report['Time left[days]'].apply(lambda x: x/365).apply(lambda x: np.round(x,2))


#Labeling if a product is expired based on today's date and End Date 

report['Expired/Active'] = '' 

active_column = Series.copy(report['Time left[days]'])
active_column=active_column.astype(basestring)
active_column[ active_column > 0] = 'Active'
active_column[ active_column < 0] = 'Expired'
active_column[ active_column == 0] = ''
report['Expired/Active'] = active_column

report['Expired/Active']=report['Expired/Active'].fillna('')




report['Time left[days]']=report['Time left[days]'].astype('str').fillna('').str.replace('nan','')
report['Time left[Year]']=report['Time left[Year]'].astype('str').fillna('').str.replace('nan','')


report=report.drop(['diff','today'],1)


report['Start Date']=report['Start Date'].astype('O').astype('str')
report['Start Date']=report['Start Date'].str.replace('00:00:00','').str.replace('nan','')

report['End Date']=report['End Date'].astype('O').astype('str')
report['End Date']=report['End Date'].str.replace('00:00:00','').str.replace('nan','')

#Labeling if a product is expired based on today's date and End Date

Now that current month and past data is updated according to 'Today' Append the past month onto the running log of IT Maintenance Data


In [197]:
monthly_updated=pd.read_csv('C:/Users/alexisperez/Documents/IT Maintenance Renewal/Code/Monthly Test Report.csv')
report=report.append(monthly_updated)
report.to_csv('C:/Users/alexisperez/Documents/IT Maintenance Renewal/Code/IT Maintenance Report Updated.csv',index=False)

Rules Based Determination for Hardware/Software?


In [279]:
#The following code is a brute force way to determine what is Hardware/Software based off past data
###Note Only determines line items where suppliers have one unique H/S instance i.e Supplier X only has Hardware Items###
#Instead of 'predicting' or using a for loop, there are multiple filters to break down the data to what we want
  # followed by simplfying the data to only single suppliers and their H/S instance
    #Then we merge the monthly report with the Simple Single H/S Instance Supplier DF
    
#Using past data only want Suppliers with either all hardware or all software for historical line items
#Pivot table based on unique items per supplier
report=report[-report['Hardware/Software'].isnull()]
pivot =pivot_table(report,rows=['Supplier Name'], values=['Hardware/Software'], aggfunc=lambda x: len(x.unique()))
pivot=pivot.reset_index()


#Tried to do a for loop, but just decided to look at filtered df of unique values == 1
pivot_1=pivot[pivot['Hardware/Software'] == 1]

#Tried to merge/add column of H/S values to H/S count and it went messy so I just filtered out the original df by suppliers with 1 H.S
report_only_1=report[report['Supplier Name'].isin(pivot_1['Supplier Name'])]

#For easy joining, take only the two columns necessary
report_only_1=report_only_1.loc[:,['Supplier Name','Hardware/Software']]

#Want only the suppliers that are in the Monthly report for simple merging
report_only_1_filt=report_only_1[report_only_1['Supplier Name'].isin(monthly['Supplier Name'])]


#In order to merge 'properly' Only want ONE isntance of each supplier name and associated H/S
#Because each of these suppliers only has one H/S (by what we filtered off of) then no need for multiple instances 
report_only_1_filt=report_only_1_filt.drop_duplicates('Supplier Name')

monthly=merge(monthly,report_only_1_filt, left_on='Supplier Name',right_on='Supplier Name', how='left')

monthly['Hardware/Software']=monthly['Hardware/Software_y']
monthly=monthly.drop('Hardware/Software_y',1)


Out[279]:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 32 entries, 0 to 31
Data columns (total 23 columns):
Unnamed: 0             32  non-null values
PO #                   32  non-null values
Creation Date          32  non-null values
Supplier Number        32  non-null values
Supplier Name          32  non-null values
Product Description    32  non-null values
Manufacturer           3  non-null values
Commodity Code         32  non-null values
Quantity               32  non-null values
Extended Price         32  non-null values
Home Department        32  non-null values
Start Date             6  non-null values
End Date               9  non-null values
Length (Years)         6  non-null values
Hardware/Software_x    0  non-null values
H/S Indicator          0  non-null values
Director               0  non-null values
Manager                0  non-null values
Time left[days]        4  non-null values
Time left[Year]        4  non-null values
Expired/Active         8  non-null values
Length                 5  non-null values
Hardware/Software      21  non-null values
dtypes: float64(8), int64(5), object(10)