Relaunched Hedge Funds

This program is a replication of Stata Script Version 0.2


In [18]:
import pandas as pd
from datetime import timedelta

# ****************** Program Settings ******************
Folder 		= ""			 # Location of program scripts
Data 		= "temp/"		 # Location to which temporary files are generated
DataSource  = "data/"		 # Location of the original data files (ASCII)
Gap_Days = 60 				 # To be quantify as a gap, the holidays that a fund manager takes must be at least 60 days.
              				 # If you would like to allow overlapping funds, change this to a negative number.
# ******************************************************

Prepare Data


In [3]:
# Load manager details
data_manager = pd.read_stata(DataSource + 'PeopleDetails.dta')
data_manager = data_manager[data_manager.PersonTypeID==1]
variables_to_keep = "ProductReference PersonID First Last JobTitle Address1 Address2 Address3 CityName StateName Zip CountryName".split()
data_manager = data_manager[variables_to_keep]
data_manager.head()


Out[3]:
ProductReference PersonID First Last JobTitle Address1 Address2 Address3 CityName StateName Zip CountryName
1 5 2508 Robert Thompson Portfolio Manager 3351 W Bearss Ave Tampa Florida 33618 United States
4 8 139 Enrico Massignani Performance Contact Via Ugo Bassi 6 Milan None 20159 Italy
6 9 139 Enrico Massignani Performance Contact Via Ugo Bassi 6 Milan None 20159 Italy
9 10 139 Enrico Massignani Performance Contact Via Ugo Bassi 6 Milan None 20159 Italy
11 11 139 Enrico Massignani Performance Contact Via Ugo Bassi 6 Milan None 20159 Italy

In [4]:
# Extract Inception / PerformnaceEndData date
data_dates = pd.read_stata(DataSource + 'ProductDetails.dta')
variables_to_keep = "ProductReference InceptionDate PerformanceEndDate".split()
data_dates = data_dates[variables_to_keep]
data_dates.head()


Out[4]:
ProductReference InceptionDate PerformanceEndDate
0 86 1993-01-01 2013-09-30
1 87 1992-01-01 2008-09-30
2 90 1993-07-01 1997-12-31
3 91 1993-07-01 1997-12-31
4 92 1993-07-01 2000-12-31

Analysis

Merge managers with the start/end dates


In [5]:
# Inner join - non-matches are excluded
data_merged = data_manager.merge(data_dates, on='ProductReference', how='inner')
data_merged = data_merged.sort(columns=['PersonID', 
                                        'PerformanceEndDate', 
                                        'InceptionDate',
                                        'ProductReference'])
data_merged.head()


Out[5]:
ProductReference PersonID First Last JobTitle Address1 Address2 Address3 CityName StateName Zip CountryName InceptionDate PerformanceEndDate
1197 1763 4 Douglas Bry President 3609 South Wadsworth Blvd. Suite 250 Denver Colorado 80235 United States 1996-08-01 1998-03-31
312 637 4 Douglas Bry President 3609 South Wadsworth Blvd. Suite 250 Denver Colorado 80235 United States 1991-03-01 2011-03-31
1120 1679 4 Douglas Bry President 3609 South Wadsworth Blvd. Suite 250 Denver Colorado 80235 United States 1991-03-01 2011-03-31
1856 2508 9 Jerald Weintraub (none) 900 Montgomery Street San Francisco California 94133 United States 1992-05-08 2006-10-31
2314 3721 9 Jerald Weintraub (none) 900 Montgomery Street San Francisco California 94133 United States 1998-11-18 2007-05-31

Find First End Date for each PersonID. Variable first_end_date is defined as the performance ending date of the first fund of each person ID.


In [6]:
# Create a GroupBy object
grouped = data_merged[['PersonID', 'PerformanceEndDate']].groupby('PersonID', as_index=False, axis=0)  # these will aplit the DataFrame on its index (rows).
grouped.groups
print(grouped.get_group(4))
grouped.min().head()


      PersonID PerformanceEndDate
1197         4         1998-03-31
312          4         2011-03-31
1120         4         2011-03-31
Out[6]:
PersonID PerformanceEndDate
0 4 1998-03-31
1 9 2006-10-31
2 11 1997-04-30
3 15 1999-08-31
4 19 2002-03-31

In [7]:
# Find the smallest value in the end date for each PersonID
transformed = grouped.min()
transformed['first_end_date'] = transformed['PerformanceEndDate']
transformed = transformed.drop('PerformanceEndDate', axis=1)
transformed.head()


Out[7]:
PersonID first_end_date
0 4 1998-03-31
1 9 2006-10-31
2 11 1997-04-30
3 15 1999-08-31
4 19 2002-03-31

In [ ]:

Main Program (Takes a while)

Check the pair-wise duration gaps of each funds by Person ID


In [36]:
# Merge back to the main dataset
data_main = data_merged.merge(transformed, how='outer', 
                                left_on='PersonID', right_on='PersonID')

data_main = data_main.sort(columns=['PersonID', 
                                    'InceptionDate', 
                                    'PerformanceEndDate', 
                                    'ProductReference'])

################################################
# WARNING: DEBUG CODE -- NEEDS TO BE DISABLED
# data_main = data_main[:2000:]
# data_main = data_main[data_main.PersonID==799]
################################################

def find_gaps(person_panel):
    '''The function finds the number of relaunched hedge funds.
    A gap is defined as there exists a fund, of which the PerformanceEndDate
    is before the inception date of all funds that were incepted later than
    this fund.
    Returns the number of gaps; and all of the Fund ID which is proceeding 
    each gap.
    '''
    gaps_number = 0
    fund_ID_preceed_gap = []
    
    # Reset index from 0
    person_panel = person_panel.reset_index(drop=True)

    # print("Her data panel is as below:")
    # print(person_panel)
    
    for i, i_row in person_panel.iterrows():
        # Reset criteria status for a new i_row
        criteria_backward = True
        criteria_forward = True
        criteria_exist_after = False
        
        for j, j_row in person_panel.iterrows():
            if i==j:
                # Skip
                continue
                
            # print('Comparison now made for row: ', i, j)

            # Days to define the gap is NOT yet incorporated.

            # Criteria 1 - Looking backward: check if i_end>j_end for all j of which j_inc<i_inc
            # For all funds earlier than i
            # j: Inc-----End
            # i:      Inc------------End
            if j_row.InceptionDate<i_row.InceptionDate:
                criteria_backward *= check_backward(i_row, j_row)

            # Criteria 2 - Looking forward: check if i_end<j_inc for all j of which j_inc>i_inc
            # For all funds later than i
            # i: Inc---------End
            #                   |***GAP***|
            # j:                           Inc--------End
            if j_row.InceptionDate>=i_row.InceptionDate:
                criteria_forward *= check_forward(i_row, j_row)

            # Criteria 3 - There must be funds incepted after fund i
            if j_row.InceptionDate>=i_row.InceptionDate:
                criteria_exist_after = True
                
        # If Criteria 1,2,3 are all satisfied
        # Fund i is the fund proceeding a gap
        if criteria_backward==True and criteria_forward==True and criteria_exist_after==True:
            fund_ID_preceed_gap.append(i_row.ProductReference)
            gaps_number += 1
        
    return (gaps_number, fund_ID_preceed_gap)
    
def check_backward(i_row, j_row):
    """
    Compares the PerformanceEndDate for i_row and j_row.
    It returns True if i_end>j_end;
    It returns False otherwise.
    """
    if i_row.PerformanceEndDate >= j_row.PerformanceEndDate:
        return True
    else:
        return False
 
def check_forward(i_row, j_row):
    """
    Compares the PerformanceEndDate for i_row to the InceptionDate of j_row.
    Returns True if i_end<j_inc;
    Returns False otherwise.
    """
    global Gap_Days # Find the global variable Gap_Days
    gap_days = timedelta(days=Gap_Days)
    
    if i_row.PerformanceEndDate + gap_days <= j_row.InceptionDate:
        return True
    else:
        return False
    
grouped_by_person = data_main.groupby('PersonID', as_index=False, axis=0) 
grouped_by_person.groups

# A new dict to store number of gaps found for each person
gaps = {}
fund_IDs = {}

for person_id, person_panel in grouped_by_person:
    # print("The person's PersonID is :", person_id)
    gaps_number, fund_ID_preceed_gap = find_gaps(person_panel[['ProductReference',
                        'InceptionDate',
                        'PerformanceEndDate']])
    gaps[person_id] = gaps_number
    fund_IDs[person_id] = fund_ID_preceed_gap


Out[36]:
0
4 0
9 0
2387 0
11 0
15 0

In [50]:
# Transform two dict into DataFrame
data_gaps = pd.DataFrame.from_dict(gaps, orient='index')
data_gaps.columns = ['number_of_gaps']
data_gaps['PersonID'] = data_gaps.index
data_gaps = data_gaps.reset_index(drop=True)
data_gaps.head()
data_gap_fund_IDs = pd.DataFrame.from_dict(fund_IDs, orient='index')
data_gap_fund_IDs.columns = ['fund_IDs_proceeding_gap']
data_gap_fund_IDs['PersonID'] = data_gap_fund_IDs.index
data_gap_fund_IDs = data_gap_fund_IDs.reset_index(drop=True)
data_gap_fund_IDs.head()


Out[50]:
fund_IDs_proceeding_gap PersonID
0 NaN 4
1 NaN 9
2 NaN 2387
3 NaN 11
4 NaN 15

In [60]:
# Merge with the main dataset
data_output = data_main.merge(data_gaps, how='outer', 
                                left_on='PersonID', right_on='PersonID')
data_output = data_output.merge(data_gap_fund_IDs, how='outer', 
                                left_on='PersonID', right_on='PersonID')
data_output.head()


Out[60]:
ProductReference PersonID First Last JobTitle Address1 Address2 Address3 CityName StateName Zip CountryName InceptionDate PerformanceEndDate first_end_date number_of_gaps fund_IDs_proceeding_gap
0 637 4 Douglas Bry President 3609 South Wadsworth Blvd. Suite 250 Denver Colorado 80235 United States 1991-03-01 2011-03-31 1998-03-31 0 NaN
1 1679 4 Douglas Bry President 3609 South Wadsworth Blvd. Suite 250 Denver Colorado 80235 United States 1991-03-01 2011-03-31 1998-03-31 0 NaN
2 1763 4 Douglas Bry President 3609 South Wadsworth Blvd. Suite 250 Denver Colorado 80235 United States 1996-08-01 1998-03-31 1998-03-31 0 NaN
3 2508 9 Jerald Weintraub (none) 900 Montgomery Street San Francisco California 94133 United States 1992-05-08 2006-10-31 2006-10-31 0 NaN
4 3721 9 Jerald Weintraub (none) 900 Montgomery Street San Francisco California 94133 United States 1998-11-18 2007-05-31 2006-10-31 0 NaN

In [61]:
# Output datafiles
data_output.to_excel('data_output.xlsx')
data_output.to_stata('data_output.dta', convert_dates={13:'td', 14:'td', 15:'td'})