1. Dataset Preparation

Overview

In this phase, a startups dataset will be properly created and prepared for further feature analysis. Different features will be created here by combining information from the CSV files we have available: acquisitions.csv, investments.csv, rounds.csv and companies.csv.

Load all available data from CSV general files



In [1]:

    
#All imports here
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn import preprocessing
from datetime import datetime
from dateutil import relativedelta

%matplotlib inline



In [2]:

    
#Let's start by importing our csv files into dataframes
df_companies = pd.read_csv('data/companies.csv')
df_acquisitions = pd.read_csv('data/acquisitions.csv')
df_investments = pd.read_csv('data/investments.csv')
df_rounds = pd.read_csv('data/rounds.csv')

Start main dataset by USA companies from companies.csv

We'll be using in the analysis only USA based companies since companies from other countries have a large amount of missing data



In [3]:

    
#Our final database will be stored in 'startups_USA'

startups_USA = df_companies[df_companies['country_code'] == 'USA']
startups_USA.head()









    Out[3]:







  
    
      
      permalink
      name
      homepage_url
      category_list
      funding_total_usd
      status
      country_code
      state_code
      region
      city
      funding_rounds
      founded_at
      first_funding_at
      last_funding_at
    
  
  
    
      1
      /organization/-qounter
      :Qounter
      http://www.qounter.com
      Application Platforms|Real Time|Social Network...
      700000
      operating
      USA
      DE
      DE - Other
      Delaware City
      2
      2014-09-04
      2014-03-01
      2014-10-14
    
    
      4
      /organization/004-technologies
      004 Technologies
      http://004gmbh.de/en/004-interact
      Software
      -
      operating
      USA
      IL
      Springfield, Illinois
      Champaign
      1
      2010-01-01
      2014-07-24
      2014-07-24
    
    
      7
      /organization/0xdata
      H2O.ai
      http://h2o.ai/
      Analytics
      33600000
      operating
      USA
      CA
      SF Bay Area
      Mountain View
      4
      2011-01-01
      2013-01-03
      2015-11-09
    
    
      8
      /organization/1
      One Inc.
      http://whatis1.com
      Mobile
      1150050
      operating
      USA
      CA
      SF Bay Area
      San Francisco
      3
      2011-08-01
      2011-07-20
      2014-02-05
    
    
      10
      /organization/1-4-all
      1-4 All
      NaN
      Entertainment|Games|Software
      -
      operating
      USA
      NC
      NC - Other
      Connellys Springs
      1
      NaN
      2013-04-21
      2013-04-21

Extract company category features

Now that we have a first version of our dataset, we'll expand the category_list attribute into dummy variables for categories.



In [4]:

    
from operator import methodcaller

def split_categories(categories):
    #get a unique list of the categories
    splitted_categories = list(categories.astype('str').unique())

    #split each category by |
    splitted_categories = map(methodcaller("split", "|"), splitted_categories)

    #flatten the list of sub categories
    splitted_categories = [item for sublist in splitted_categories for item in sublist]
    
    return splitted_categories

def explore_categories(categories, top_n_categories):
    
    cat = split_categories(categories)

    print 'There are in total {} different categories'.format(len(cat))
    prob = pd.Series(cat).value_counts()
    print prob.head()
    #select first <top_n_categories>
    mask = prob > prob[top_n_categories]
    head_prob = prob.loc[mask].sum()
    tail_prob = prob.loc[~mask].sum()
    total_sum = prob.sum()
    prob = prob.loc[mask]
    prob2 = pd.DataFrame({'top '+str(top_n_categories)+' categories': head_prob, 'others': tail_prob},index=[0])

    fig, axs = plt.subplots(2,1, figsize=(15,6))
    prob.plot(kind='bar', ax=axs[0])
    prob2.plot(kind='bar', ax=axs[1])
    for bar in axs[1].patches:
        height = bar.get_height()
        axs[1].text(bar.get_x() + bar.get_width()/2., 0.50*height, '%.2f' % (float(height)/float(total_sum)*100) + "%", ha='center', va='top')
    fig.tight_layout()
    
    plt.xticks(rotation=90)
    plt.show()
    
    
    
explore_categories(startups_USA['category_list'], top_n_categories=50)









    



There are in total 60813 different categories
Software               2326
Mobile                 2119
Social Media           1246
Enterprise Software    1141
E-Commerce             1137
dtype: int64

Since there are too many categories, we'll be selecting the top 50 more frequent ones.

We see from the chart above, that with these 50 (out of 60813) categories we cover 46% of the companies.



In [5]:

    
def expand_top_categories_into_dummy_variables(df):
    cat = df['category_list'].astype('str')
    cat_count = cat.str.split('|').apply(lambda x: pd.Series(x).value_counts()).sum()
    #Get a dummy dataset for categories
    dummies = cat.str.get_dummies(sep='|')
    #Count of categories splitted first 50)
    top50categories = list(cat_count.sort_values(ascending=False).index[:50])
    #Create a dataframe with the 50 top categories to be concatenated later to the complete dataframe
    categories_df = dummies[top50categories]
    categories_df = categories_df.add_prefix('Category_')
    return pd.concat([df, categories_df], axis=1, ignore_index=False)
    

startups_USA = expand_top_categories_into_dummy_variables(startups_USA)



In [6]:

    
startups_USA.head()









    Out[6]:







  
    
      
      permalink
      name
      homepage_url
      category_list
      funding_total_usd
      status
      country_code
      state_code
      region
      city
      ...
      Category_Music
      Category_Search
      Category_Semiconductors
      Category_Information Technology
      Category_FinTech
      Category_Retail
      Category_Financial Services
      Category_Messaging
      Category_Social Network Media
      Category_Startups
    
  
  
    
      1
      /organization/-qounter
      :Qounter
      http://www.qounter.com
      Application Platforms|Real Time|Social Network...
      700000
      operating
      USA
      DE
      DE - Other
      Delaware City
      ...
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
    
    
      4
      /organization/004-technologies
      004 Technologies
      http://004gmbh.de/en/004-interact
      Software
      -
      operating
      USA
      IL
      Springfield, Illinois
      Champaign
      ...
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
    
    
      7
      /organization/0xdata
      H2O.ai
      http://h2o.ai/
      Analytics
      33600000
      operating
      USA
      CA
      SF Bay Area
      Mountain View
      ...
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
    
    
      8
      /organization/1
      One Inc.
      http://whatis1.com
      Mobile
      1150050
      operating
      USA
      CA
      SF Bay Area
      San Francisco
      ...
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
    
    
      10
      /organization/1-4-all
      1-4 All
      NaN
      Entertainment|Games|Software
      -
      operating
      USA
      NC
      NC - Other
      Connellys Springs
      ...
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
    
  

5 rows × 64 columns

So now we added more 50 categories to our dataset.

Analyzing total funding and funding round features



In [7]:

    
startups_USA['funding_rounds'].hist(bins=range(1,10))
plt.title("Histogram of the number of funding rounds")
plt.ylabel('Number of companies')
plt.xlabel('Number of funding rounds')









    Out[7]:





<matplotlib.text.Text at 0x19d92a20>



In [8]:

    
#funding_total_usd
#funding_rounds
plt.subplot()
startups_USA[startups_USA['funding_total_usd'] != '-']. \
    set_index('name')['funding_total_usd'] \
    .astype(float) \
    .sort_values(ascending=False)\
    [:30].plot(kind='barh', figsize=(5,7))
plt.gca().invert_yaxis()
plt.title('Companies with highest total funding')
plt.ylabel('Companies')
plt.xlabel('Total amount of funding (USD)')









    Out[8]:





<matplotlib.text.Text at 0x19d82240>

Analyzing date variables



In [ ]:

Extract investment rounds features

Here, we'll extract from the rounds.csv file the number of rounds and total amount invested for each different type of investment.



In [9]:

    
# Investment types
df_rounds['funding_round_type'].value_counts()









    Out[9]:





venture                  55494
seed                     30524
debt_financing            6895
angel                     6094
undisclosed               4897
equity_crowdfunding       3257
private_equity            2285
grant                     2200
convertible_note          1817
post_ipo_equity            638
product_crowdfunding       410
non_equity_assistance      191
post_ipo_debt              152
secondary_market            95
Name: funding_round_type, dtype: int64



In [10]:

    
import warnings
warnings.filterwarnings('ignore')

#Iterate over each kind of funding type, and add two new features for each into the dataframe
def add_dummy_for_funding_type(df, aggr_rounds, funding_type):
    funding_df = aggr_rounds.iloc[aggr_rounds.index.get_level_values('funding_round_type') == funding_type].reset_index()
    funding_df.columns = funding_df.columns.droplevel()
    funding_df.columns = ['company_permalink', funding_type, funding_type+'_funding_total_usd', funding_type+'_funding_rounds']
    funding_df = funding_df.drop(funding_type,1)
    
    new_df = pd.merge(df, funding_df, on='company_permalink', how='left')
    new_df = new_df.fillna(0)
    return new_df

def expand_investment_rounds(df, df_rounds):
    #Prepare an aggregated rounds dataframe grouped by company and funding type
    rounds_agg = df_rounds.groupby(['company_permalink', 'funding_round_type'])['raised_amount_usd'].agg({'amount': [ pd.Series.sum, pd.Series.count]})
    #Get available unique funding types
    funding_types = list(rounds_agg.index.levels[1])
    
    #Prepare the dataframe where all the dummy features for each funding type will be added (number of rounds and total sum for each type)
    rounds_df = df[['permalink']]
    rounds_df = rounds_df.rename(columns = {'permalink':'company_permalink'})
    
    #For each funding type, add two more columns to rounds_df
    for funding_type in funding_types:
        rounds_df = add_dummy_for_funding_type(rounds_df, rounds_agg, funding_type)

    #remove the company_permalink variable, since it's already available in the companies dataframe
    rounds_df = rounds_df.drop('company_permalink', 1)

    #set rounds_df to have the same index of the other dataframes
    rounds_df.index = df.index

    return pd.concat([df, rounds_df], axis=1, ignore_index=False)
    
    
    
    
startups_USA = expand_investment_rounds(startups_USA, df_rounds)



In [11]:

    
startups_USA.head()









    Out[11]:







  
    
      
      permalink
      name
      homepage_url
      category_list
      funding_total_usd
      status
      country_code
      state_code
      region
      city
      ...
      product_crowdfunding_funding_total_usd
      product_crowdfunding_funding_rounds
      secondary_market_funding_total_usd
      secondary_market_funding_rounds
      seed_funding_total_usd
      seed_funding_rounds
      undisclosed_funding_total_usd
      undisclosed_funding_rounds
      venture_funding_total_usd
      venture_funding_rounds
    
  
  
    
      1
      /organization/-qounter
      :Qounter
      http://www.qounter.com
      Application Platforms|Real Time|Social Network...
      700000
      operating
      USA
      DE
      DE - Other
      Delaware City
      ...
      0.0
      0.0
      0.0
      0.0
      700000.0
      1.0
      0.0
      0.0
      0.0
      0.0
    
    
      4
      /organization/004-technologies
      004 Technologies
      http://004gmbh.de/en/004-interact
      Software
      -
      operating
      USA
      IL
      Springfield, Illinois
      Champaign
      ...
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
    
    
      7
      /organization/0xdata
      H2O.ai
      http://h2o.ai/
      Analytics
      33600000
      operating
      USA
      CA
      SF Bay Area
      Mountain View
      ...
      0.0
      0.0
      0.0
      0.0
      3000000.0
      1.0
      0.0
      0.0
      30600000.0
      3.0
    
    
      8
      /organization/1
      One Inc.
      http://whatis1.com
      Mobile
      1150050
      operating
      USA
      CA
      SF Bay Area
      San Francisco
      ...
      0.0
      0.0
      0.0
      0.0
      1150050.0
      2.0
      0.0
      0.0
      0.0
      0.0
    
    
      10
      /organization/1-4-all
      1-4 All
      NaN
      Entertainment|Games|Software
      -
      operating
      USA
      NC
      NC - Other
      Connellys Springs
      ...
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
    
  

5 rows × 92 columns

Change dataset index

We'll set the company id (permalink attribute) as the index for the dataset. This simple change will make it easier to attach new features to the dataset.



In [12]:

    
startups_USA = startups_USA.set_index('permalink')

Extract acquisitions features

Here, we'll extract the number of acquisitions were made by each company in our dataset.



In [13]:

    
import warnings
warnings.filterwarnings('ignore')

def extract_feature_number_of_acquisitions(df, df_acquisitions):
    number_of_acquisitions = df_acquisitions.groupby(['acquirer_permalink'])['acquirer_permalink'].agg({'amount': [ pd.Series.count]}).reset_index()
    number_of_acquisitions.columns = number_of_acquisitions.columns.droplevel()
    number_of_acquisitions.columns = ['permalink', 'number_of_acquisitions']
    number_of_acquisitions = number_of_acquisitions.set_index('permalink')
    number_of_acquisitions = number_of_acquisitions.fillna(0)
    new_df = df.join(number_of_acquisitions)
    new_df['number_of_acquisitions'] = new_df['number_of_acquisitions'].fillna(0)
    return new_df

startups_USA = extract_feature_number_of_acquisitions(startups_USA, df_acquisitions)

Extract investments feature

Here, we'll extract the number of investments made by each company in our dataset.

Note: This is not the number of times in which someone invested in the startup. It is the number of times in which each startup have made an investment in other company.



In [14]:

    
import warnings
warnings.filterwarnings('ignore')

def extract_feature_number_of_investments(df, df_investments):
    number_of_investments = df_investments.groupby(['investor_permalink'])['investor_permalink'].agg({'amount': [ pd.Series.count]}).reset_index()
    number_of_investments.columns = number_of_investments.columns.droplevel()
    number_of_investments.columns = ['permalink', 'number_of_investments']
    number_of_investments = number_of_investments.set_index('permalink')
    
    number_of_unique_investments = df_investments.groupby(['investor_permalink'])['company_permalink'].agg({'amount': [ pd.Series.nunique]}).reset_index()
    number_of_unique_investments.columns = number_of_unique_investments.columns.droplevel()
    number_of_unique_investments.columns = ['permalink', 'number_of_unique_investments']
    number_of_unique_investments = number_of_unique_investments.set_index('permalink')
    
    new_df = df.join(number_of_investments)
    new_df['number_of_investments'] = new_df['number_of_investments'].fillna(0)
    new_df = new_df.join(number_of_unique_investments)
    new_df['number_of_unique_investments'] = new_df['number_of_unique_investments'].fillna(0)
    return new_df


startups_USA = extract_feature_number_of_investments(startups_USA, df_investments)

Extract average number of investors and amount invested per round

Here we'll extract two more features

The average number of investors that participated in each around of investment
The average amount invested among all the investment rounds a startup had



In [15]:

    
import warnings
warnings.filterwarnings('ignore')

def extract_feature_avg_investors_per_round(df, investments):
    number_of_investors_per_round = investments.groupby(['company_permalink', 'funding_round_permalink'])['investor_permalink'].agg({'investor_permalink': [ pd.Series.count]}).reset_index()
    number_of_investors_per_round.columns = number_of_investors_per_round.columns.droplevel(0)
    number_of_investors_per_round.columns = ['company_permalink', 'funding_round_permalink', 'count']
    number_of_investors_per_round = number_of_investors_per_round.groupby(['company_permalink']).agg({'count': [ pd.Series.mean]}).reset_index()
    number_of_investors_per_round.columns = number_of_investors_per_round.columns.droplevel(0)
    number_of_investors_per_round.columns = ['company_permalink', 'number_of_investors_per_round']
    number_of_investors_per_round = number_of_investors_per_round.set_index('company_permalink')
    
    
    new_df = df.join(number_of_investors_per_round)
    new_df['number_of_investors_per_round'] = new_df['number_of_investors_per_round'].fillna(-1)
    return new_df

def extract_feature_avg_amount_invested_per_round(df, investments):
    investmentsdf = investments.copy()
    investmentsdf['raised_amount_usd'] = investmentsdf['raised_amount_usd'].astype(float)
    avg_amount_invested_per_round = investmentsdf.groupby(['company_permalink', 'funding_round_permalink'])['raised_amount_usd'].agg({'raised_amount_usd': [ pd.Series.mean]}).reset_index()
    avg_amount_invested_per_round.columns = avg_amount_invested_per_round.columns.droplevel(0)
    avg_amount_invested_per_round.columns = ['company_permalink', 'funding_round_permalink', 'mean']
    avg_amount_invested_per_round = avg_amount_invested_per_round.groupby(['company_permalink']).agg({'mean': [ pd.Series.mean]}).reset_index()
    avg_amount_invested_per_round.columns = avg_amount_invested_per_round.columns.droplevel(0)
    avg_amount_invested_per_round.columns = ['company_permalink', 'avg_amount_invested_per_round']
    avg_amount_invested_per_round = avg_amount_invested_per_round.set_index('company_permalink')
    
    new_df = df.join(avg_amount_invested_per_round)
    new_df['avg_amount_invested_per_round'] = new_df['avg_amount_invested_per_round'].fillna(-1)
    return new_df


startups_USA = extract_feature_avg_investors_per_round(startups_USA, df_investments)
startups_USA = extract_feature_avg_amount_invested_per_round(startups_USA, df_investments)



In [16]:

    
startups_USA.head()









    Out[16]:







  
    
      
      name
      homepage_url
      category_list
      funding_total_usd
      status
      country_code
      state_code
      region
      city
      funding_rounds
      ...
      seed_funding_rounds
      undisclosed_funding_total_usd
      undisclosed_funding_rounds
      venture_funding_total_usd
      venture_funding_rounds
      number_of_acquisitions
      number_of_investments
      number_of_unique_investments
      number_of_investors_per_round
      avg_amount_invested_per_round
    
    
      permalink
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      /organization/-qounter
      :Qounter
      http://www.qounter.com
      Application Platforms|Real Time|Social Network...
      700000
      operating
      USA
      DE
      DE - Other
      Delaware City
      2
      ...
      1.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      -1.000000
      -1.0
    
    
      /organization/004-technologies
      004 Technologies
      http://004gmbh.de/en/004-interact
      Software
      -
      operating
      USA
      IL
      Springfield, Illinois
      Champaign
      1
      ...
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      1.000000
      -1.0
    
    
      /organization/0xdata
      H2O.ai
      http://h2o.ai/
      Analytics
      33600000
      operating
      USA
      CA
      SF Bay Area
      Mountain View
      4
      ...
      1.0
      0.0
      0.0
      30600000.0
      3.0
      0.0
      0.0
      0.0
      3.500000
      8400000.0
    
    
      /organization/1
      One Inc.
      http://whatis1.com
      Mobile
      1150050
      operating
      USA
      CA
      SF Bay Area
      San Francisco
      3
      ...
      2.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      4.333333
      575025.0
    
    
      /organization/1-4-all
      1-4 All
      NaN
      Entertainment|Games|Software
      -
      operating
      USA
      NC
      NC - Other
      Connellys Springs
      1
      ...
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      -1.000000
      -1.0
    
  

5 rows × 96 columns

Drop useless features

Here we'll drop homepage_url, category_list, region, city, country_code We'll also move status to the end of the dataframe



In [17]:

    
#drop features
startups_USA = startups_USA.drop(['name','homepage_url', 'category_list', 'region', 'city', 'country_code'], 1)

#move status to the end of the dataframe
cols = list(startups_USA)
cols.append(cols.pop(cols.index('status')))
startups_USA = startups_USA.ix[:, cols]

Normalize numeric variables

Here we'll set all the numeric variables into the same scale (0 to 1)



In [18]:

    
def normalize_numeric_features(df, columns_to_scale = None):
    min_max_scaler = preprocessing.MinMaxScaler()
    startups_normalized = df.copy()

    #Convert '-' to zeros in funding_total_usd
    startups_normalized['funding_total_usd'] = startups_normalized['funding_total_usd'].replace('-', 0)

    #scale numeric features
    startups_normalized[columns_to_scale] = min_max_scaler.fit_transform(startups_normalized[columns_to_scale])
    return startups_normalized

columns_to_scale = list(startups_USA.filter(regex=(".*(funding_rounds|funding_total_usd)|(number_of|avg_).*")).columns)
startups_USA = normalize_numeric_features(startups_USA, columns_to_scale)

Normalize date variables

Here we'll convert dates to ages in months up to the first day of 2017



In [19]:

    
def date_to_age_in_months(date):
    if date != date or date == 0: #is NaN
        return 0
    date1 = datetime.strptime(date, '%Y-%m-%d')
    date2 = datetime.strptime('2017-01-01', '%Y-%m-%d') #get age until 01/01/2017
    delta =  relativedelta.relativedelta(date2, date1)
    return delta.years * 12 + delta.months


def normalize_date_variables(df):
    date_vars = ['founded_at', 'first_funding_at', 'last_funding_at']
    for var in date_vars:
        df[var] = df[var].map(date_to_age_in_months)
        
    df = normalize_numeric_features(df, date_vars)
    return df
    
    
startups_USA = normalize_date_variables(startups_USA)

Extract state_code features



In [20]:

    
def explore_states(states, top_n_states):

    print 'There are in total {} different states'.format(len(states.unique()))
    prob = pd.Series(states).value_counts()
    print prob.head()
    #select first <top_n_categories>
    mask = prob > prob[top_n_states]
    head_prob = prob.loc[mask].sum()
    tail_prob = prob.loc[~mask].sum()
    total_sum = prob.sum()
    prob = prob.loc[mask]
    prob2 = pd.DataFrame({'top '+str(top_n_states)+' states': head_prob, 'others': tail_prob},index=[0])

    fig, axs = plt.subplots(2,1, figsize=(15,6))
    prob.plot(kind='bar', ax=axs[0])
    prob2.plot(kind='bar', ax=axs[1])
    for bar in axs[1].patches:
        height = bar.get_height()
        axs[1].text(bar.get_x() + bar.get_width()/2., 0.50*height, '%.2f' % (float(height)/float(total_sum)*100) + "%", ha='center', va='top')
    fig.tight_layout()
    
    plt.xticks(rotation=90)
    plt.show()
    
    
    
explore_states(startups_USA['state_code'], top_n_states=15)









    



There are in total 54 different states
CA    12900
NY     3952
MA     2542
TX     1995
FL     1295
Name: state_code, dtype: int64

As we did for the categories variable, in order to decrease the amount of features in our dataset, let's just select the top 15 more frequent states (which cover already 82% of our companies)



In [21]:

    
def expand_top_states_into_dummy_variables(df):
    states = df['state_code'].astype('str')
    
    #Get a dummy dataset for categories
    dummies = pd.get_dummies(states)
    
    #select top most frequent states
    top15states = list(states.value_counts().sort_values(ascending=False).index[:15])
    
    #Create a dataframe with the 15 top states to be concatenated later to the complete dataframe
    states_df = dummies[top15states]
    states_df = states_df.add_prefix('State_')
    new_df = pd.concat([df, states_df], axis=1, ignore_index=False)
    new_df = new_df.drop(['state_code'], axis=1)
    return new_df
    
startups_USA = expand_top_states_into_dummy_variables(startups_USA)

Move status to the end of dataframe and save to file



In [22]:

    
cols = list(startups_USA)
cols.append(cols.pop(cols.index('status')))
startups_USA = startups_USA.ix[:, cols]



In [23]:

    
startups_USA.to_csv('data/startups_pre_processed.csv')



In [24]:

    
startups_USA.head()









    Out[24]:







  
    
      
      funding_total_usd
      funding_rounds
      founded_at
      first_funding_at
      last_funding_at
      Category_Software
      Category_Biotechnology
      Category_Mobile
      Category_Enterprise Software
      Category_E-Commerce
      ...
      State_IL
      State_PA
      State_CO
      State_VA
      State_NJ
      State_GA
      State_OH
      State_NC
      State_MD
      status
    
    
      permalink
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      /organization/-qounter
      0.000023
      0.055556
      0.005917
      0.002825
      0.027344
      0
      0
      0
      0
      0
      ...
      0
      0
      0
      0
      0
      0
      0
      0
      0
      operating
    
    
      /organization/004-technologies
      0.000000
      0.000000
      0.018409
      0.002410
      0.033203
      1
      0
      0
      0
      0
      ...
      1
      0
      0
      0
      0
      0
      0
      0
      0
      operating
    
    
      /organization/0xdata
      0.001117
      0.166667
      0.015779
      0.003906
      0.001953
      0
      0
      0
      0
      0
      ...
      0
      0
      0
      0
      0
      0
      0
      0
      0
      operating
    
    
      /organization/1
      0.000038
      0.111111
      0.014245
      0.005401
      0.042969
      0
      0
      1
      0
      0
      ...
      0
      0
      0
      0
      0
      0
      0
      0
      0
      operating
    
    
      /organization/1-4-all
      0.000000
      0.000000
      0.000000
      0.003656
      0.062500
      1
      0
      0
      0
      0
      ...
      0
      0
      0
      0
      0
      0
      0
      1
      0
      operating
    
  

5 rows × 104 columns

	permalink	name	homepage_url	category_list	funding_total_usd	status	country_code	state_code	region	city	funding_rounds	founded_at	first_funding_at	last_funding_at
1	/organization/-qounter	:Qounter	http://www.qounter.com	Application Platforms\|Real Time\|Social Network...	700000	operating	USA	DE	DE - Other	Delaware City	2	2014-09-04	2014-03-01	2014-10-14
4	/organization/004-technologies	004 Technologies	http://004gmbh.de/en/004-interact	Software	-	operating	USA	IL	Springfield, Illinois	Champaign	1	2010-01-01	2014-07-24	2014-07-24
7	/organization/0xdata	H2O.ai	http://h2o.ai/	Analytics	33600000	operating	USA	CA	SF Bay Area	Mountain View	4	2011-01-01	2013-01-03	2015-11-09
8	/organization/1	One Inc.	http://whatis1.com	Mobile	1150050	operating	USA	CA	SF Bay Area	San Francisco	3	2011-08-01	2011-07-20	2014-02-05
10	/organization/1-4-all	1-4 All	NaN	Entertainment\|Games\|Software	-	operating	USA	NC	NC - Other	Connellys Springs	1	NaN	2013-04-21	2013-04-21

	name	homepage_url	category_list	funding_total_usd	status	country_code	state_code	region	city	funding_rounds	...	seed_funding_rounds	undisclosed_funding_total_usd	undisclosed_funding_rounds	venture_funding_total_usd	venture_funding_rounds	number_of_acquisitions	number_of_investments	number_of_unique_investments	number_of_investors_per_round	avg_amount_invested_per_round
permalink
/organization/-qounter	:Qounter	http://www.qounter.com	Application Platforms\|Real Time\|Social Network...	700000	operating	USA	DE	DE - Other	Delaware City	2	...	1.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	-1.000000	-1.0
/organization/004-technologies	004 Technologies	http://004gmbh.de/en/004-interact	Software	-	operating	USA	IL	Springfield, Illinois	Champaign	1	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	1.000000	-1.0
/organization/0xdata	H2O.ai	http://h2o.ai/	Analytics	33600000	operating	USA	CA	SF Bay Area	Mountain View	4	...	1.0	0.0	0.0	30600000.0	3.0	0.0	0.0	0.0	3.500000	8400000.0
/organization/1	One Inc.	http://whatis1.com	Mobile	1150050	operating	USA	CA	SF Bay Area	San Francisco	3	...	2.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	4.333333	575025.0
/organization/1-4-all	1-4 All	NaN	Entertainment\|Games\|Software	-	operating	USA	NC	NC - Other	Connellys Springs	1	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	-1.000000	-1.0

	funding_total_usd	funding_rounds	founded_at	first_funding_at	last_funding_at	Category_Software	Category_Biotechnology	Category_Mobile	Category_Enterprise Software	Category_E-Commerce	...	State_IL	State_PA	State_CO	State_VA	State_NJ	State_GA	State_OH	State_NC	State_MD	status
permalink
/organization/-qounter	0.000023	0.055556	0.005917	0.002825	0.027344	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	operating
/organization/004-technologies	0.000000	0.000000	0.018409	0.002410	0.033203	1	0	0	0	0	...	1	0	0	0	0	0	0	0	0	operating
/organization/0xdata	0.001117	0.166667	0.015779	0.003906	0.001953	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	operating
/organization/1	0.000038	0.111111	0.014245	0.005401	0.042969	0	0	1	0	0	...	0	0	0	0	0	0	0	0	0	operating
/organization/1-4-all	0.000000	0.000000	0.000000	0.003656	0.062500	1	0	0	0	0	...	0	0	0	0	0	0	0	1	0	operating