Crunchbase: Crunching Data

It is possible for pandas to directly read the data from xls .


In [2]:
import pandas as pd
import numpy as np

We are going to load the Company profile data


In [3]:
xl = pd.io.excel.read_excel('crunchbase_monthly_export_d43b44299ade53.xlsx', sheetname='Companies', index_col='permalink')

In [4]:
xl.info()


<class 'pandas.core.frame.DataFrame'>
Index: 48636 entries, /organization/waywire to /organization/x
Data columns (total 17 columns):
name                 48635 non-null object
homepage_url         45286 non-null object
category_list        44040 non-null object
market               44034 non-null object
funding_total_usd    48636 non-null int64
status               47193 non-null object
country_code         43480 non-null object
state_code           29679 non-null object
region               43480 non-null object
city                 42628 non-null object
funding_rounds       48636 non-null int64
founded_at           37976 non-null object
founded_month        37904 non-null object
founded_quarter      37904 non-null object
founded_year         37904 non-null float64
first_funding_at     48636 non-null object
last_funding_at      48636 non-null object
dtypes: float64(1), int64(2), object(14)
memory usage: 6.7+ MB

In [7]:
xl=xl[xl.founded_year >= 2000]

In [8]:
xl.info()


<class 'pandas.core.frame.DataFrame'>
Index: 34275 entries, /organization/waywire to /organization/zzzzapp-com
Data columns (total 17 columns):
name                 34274 non-null object
homepage_url         32505 non-null object
category_list        32168 non-null object
market               32164 non-null object
funding_total_usd    34275 non-null int64
status               33414 non-null object
country_code         31431 non-null object
state_code           21621 non-null object
region               31431 non-null object
city                 31030 non-null object
funding_rounds       34275 non-null int64
founded_at           34275 non-null object
founded_month        34275 non-null object
founded_quarter      34275 non-null object
founded_year         34275 non-null float64
first_funding_at     34275 non-null object
last_funding_at      34275 non-null object
dtypes: float64(1), int64(2), object(14)
memory usage: 4.7+ MB

What is the most common market addressed by startups?


In [9]:
len(xl.market.value_counts())


Out[9]:
718

That is way too many. So lets look at the top 20.


In [10]:
xl.market.value_counts().head(20)


Out[10]:
Software               3165
Biotechnology          2041
Mobile                 1485
E-Commerce             1338
Curated Web            1273
Enterprise Software     932
Games                   859
Advertising             837
Health Care             747
Social Media            711
Hardware + Software     701
Clean Technology        672
Education               608
Finance                 601
Health and Wellness     544
Analytics               493
Manufacturing           405
Security                355
Hospitality             325
Real Estate             307
dtype: int64

Software seems to be a catch all phrase, when they couldn't place it in a specific category or when there are just way too many products produced by the company. The surprise is 'Biotechnology'. Who knew they were so many?

Ok, now let's create a graph!


In [11]:
from matplotlib import pyplot as plt
%matplotlib inline

In [12]:
from mpltools import style

style.use('ggplot')

In [13]:
xl.market.value_counts().head(30).plot(kind='bar', figsize=(15,8))


Out[13]:
<matplotlib.axes._subplots.AxesSubplot at 0x7feb3bd28ed0>

How are startups distributed among countries?


In [14]:
xl.country_code.value_counts().head(15)


Out[14]:
USA    20716
GBR     1797
CAN      954
DEU      681
FRA      607
IND      601
CHN      563
ISR      538
ESP      413
RUS      251
SGP      239
IRL      233
AUS      230
NLD      223
BRA      220
dtype: int64

In [15]:
xl.country_code.value_counts().head(15).plot(kind='bar', figsize=(15,8))


Out[15]:
<matplotlib.axes._subplots.AxesSubplot at 0x7feb3bd28150>

What is the year on year trend for startups?


In [16]:
xl.founded_year.value_counts(sort=False).tail(25).plot(kind='bar', figsize=(15,8))


Out[16]:
<matplotlib.axes._subplots.AxesSubplot at 0x7feb3b9fac50>

What is the success rate of well-funded startups?


In [17]:
xl.status.value_counts()


Out[17]:
operating    29161
acquired      2327
closed        1926
dtype: int64

In [18]:
xl.status.value_counts()* 100/len(xl)


Out[18]:
operating    85.079504
acquired      6.789205
closed        5.619256
dtype: float64

In [19]:
year_gp = xl.groupby('founded_year')

In [20]:
type(year_gp)


Out[20]:
pandas.core.groupby.DataFrameGroupBy

In [21]:
for year, group in year_gp:
    print year
    print group.info()
    break


2000.0
<class 'pandas.core.frame.DataFrame'>
Index: 883 entries, /organization/1010data to /organization/zoomingo
Data columns (total 17 columns):
name                 883 non-null object
homepage_url         799 non-null object
category_list        840 non-null object
market               840 non-null object
funding_total_usd    883 non-null int64
status               872 non-null object
country_code         847 non-null object
state_code           613 non-null object
region               847 non-null object
city                 836 non-null object
funding_rounds       883 non-null int64
founded_at           883 non-null object
founded_month        883 non-null object
founded_quarter      883 non-null object
founded_year         883 non-null float64
first_funding_at     883 non-null object
last_funding_at      883 non-null object
dtypes: float64(1), int64(2), object(14)
memory usage: 124.2+ KB
None

In [22]:
year_gp = xl[xl.founded_year>=2000].groupby('founded_year')

In [23]:
for year, group in year_gp:
    print year, len(group)


2000.0 883
2001.0 738
2002.0 788
2003.0 963
2004.0 1148
2005.0 1412
2006.0 1807
2007.0 2311
2008.0 2328
2009.0 2956
2010.0 3744
2011.0 4855
2012.0 5115
2013.0 3935
2014.0 1291
2015.0 1

We see that we get a single record for the year 2015. A startup getting funding before it is founded? Lets see what it is.


In [24]:
xl[xl.founded_year==2015]


Out[24]:
name homepage_url category_list market funding_total_usd status country_code state_code region city funding_rounds founded_at founded_month founded_quarter founded_year first_funding_at last_funding_at
permalink
/organization/merchme MerchMe NaN |Marketplaces|Technology|Web Development|E-Com... Web Development 0 operating USA IL Chicago Chicago 1 2015-01-15 00:00:00 2015-01 2015-Q1 2015 2014-09-17 00:00:00 2014-09-17 00:00:00

The company has had funding this year but the founded_year is set wrong. May be it is just bad data. Let us try to ignore this. So we need to add another criteria saying founded_year should be less than 2015. but the above simple filtering syntax will not be enough. We need to create a mask.


In [25]:
mask = (xl.founded_year>=2000) & (xl.founded_year<2015)

In [26]:
year_gp = xl[(xl.founded_year>=2000) & (xl.founded_year<2015)].groupby('founded_year')

In [27]:
xl.status.value_counts()


Out[27]:
operating    29161
acquired      2327
closed        1926
dtype: int64

In [28]:
type(xl.status.value_counts())


Out[28]:
pandas.core.series.Series

In [29]:
xl.status.value_counts().operating2


---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-29-755ea729b897> in <module>()
----> 1 xl.status.value_counts().operating2

/home/sharmi/installs/virtualenv/crunchbase/local/lib/python2.7/site-packages/pandas/core/generic.pyc in __getattr__(self, name)
   1934                 return self[name]
   1935             raise AttributeError("'%s' object has no attribute '%s'" %
-> 1936                                  (type(self).__name__, name))
   1937 
   1938     def __setattr__(self, name, value):

AttributeError: 'Series' object has no attribute 'operating2'

In [14]:
for year, group in year_gp:
    print year, len(group)
    print group.status.value_counts() *100/len(group)


2005.0 1412
operating    76.416431
acquired     14.235127
closed        7.365439
dtype: float64
2006.0 1807
operating    75.539568
acquired     13.060321
closed        9.241837
dtype: float64
2007.0 2311
operating    76.893120
acquired     10.731285
closed       10.514929
dtype: float64
2008.0 2328
operating    79.982818
closed        9.579038
acquired      8.290378
dtype: float64
2009.0 2956
operating    81.292287
closed        8.694181
acquired      7.408660
dtype: float64
2010.0 3744
operating    84.375000
closed        7.959402
acquired      5.368590
dtype: float64
2011.0 4855
operating    87.785788
closed        5.293512
acquired      3.872297
dtype: float64
2012.0 5115
operating    92.101662
closed        2.287390
acquired      1.955034
dtype: float64
2013.0 3935
operating    96.213469
closed        0.686150
acquired      0.584498
dtype: float64
2014.0 1291
operating    99.070488
closed        0.077459
dtype: float64
2015.0 1
operating    100
dtype: float64

In [30]:
index, closed, operating, acquired, unknown = [],[],[],[],[]
for year, group in year_gp:
    index.append(year)
    status = group.status.value_counts() *100/len(group)
    operating.append(status.operating)
    try:
        acquired.append(status.acquired)
    except AttributeError:
        acquired.append(0)
    try:
        closed.append(status.closed )
    except AttributeError:
        closed.append(0)
    unknown.append(100 - operating[-1] -acquired[-1] -closed[-1])

In [31]:
closed, operating, acquired, unknown = np.array(closed), np.array(operating), np.array(acquired), np.array(unknown)

In [111]:
width = 0.35
plt.figure(figsize=(15,5))
p1 = plt.bar(index, operating, width, color='g')
p2 = plt.bar(index, acquired, width, bottom=operating, color='b')
p3 = plt.bar(index, closed, width, bottom=acquired+operating, color='r')
p4 = plt.bar(index, unknown, width, bottom=closed+acquired+operating, color='y')
plt.ylabel('Success %')
plt.title('Percentage of Startup Success by Year')
#plt.xticks([int(i) for i in index])
#plt.yticks(xrange(0,101,10))

plt.legend( (p1[0], p2[0], p3[0], p4[0]), ('Operating', 'Acquired','Closed', 'Unknown'), bbox_to_anchor=(1.1, 1.05) )


Out[111]:
<matplotlib.legend.Legend at 0x7f2ae0b39ad0>

We can see that in 2000, almost 20% of the companies are acquired. We can also see that between 5-10% of the companies are closed, peaking around 2006-2008.

What is the typical duration from date of founding to getting your first funding?

We are fortunate that the data has both founding date and first funding date. But there are a few unspecified values. So let us select only the records that have the founding date and funding date.


In [32]:
xl.info()


<class 'pandas.core.frame.DataFrame'>
Index: 34275 entries, /organization/waywire to /organization/zzzzapp-com
Data columns (total 17 columns):
name                 34274 non-null object
homepage_url         32505 non-null object
category_list        32168 non-null object
market               32164 non-null object
funding_total_usd    34275 non-null int64
status               33414 non-null object
country_code         31431 non-null object
state_code           21621 non-null object
region               31431 non-null object
city                 31030 non-null object
funding_rounds       34275 non-null int64
founded_at           34275 non-null object
founded_month        34275 non-null object
founded_quarter      34275 non-null object
founded_year         34275 non-null float64
first_funding_at     34275 non-null object
last_funding_at      34275 non-null object
dtypes: float64(1), int64(2), object(14)
memory usage: 4.7+ MB

In [37]:
import datetime
first_funding_at = pd.to_datetime(xl.first_funding_at, dayfirst=True, errors='ignore')
print len(first_funding_at)
first_funding_at = first_funding_at[[isinstance(value, datetime.datetime) for index, value in first_funding_at.iteritems() ]]
print len(first_funding_at)


34275
34272

In [38]:
founded_at = pd.to_datetime(xl.founded_at, dayfirst=True)
founded_at = founded_at[[isinstance(value, datetime.datetime) for index, value in founded_at.iteritems()]]

In [39]:
daysdiff = first_funding_at - founded_at
daysdiff = daysdiff[pd.notnull(daysdiff)]
dayslist=[value.days for index, value in daysdiff.iteritems()]

In [40]:
plt.figure(figsize=(15,8))
graph = plt.hist(dayslist, bins=range(0, 6000, 365))



In [169]:
plt.figure(figsize=(15,8))
graph = plt.hist(dayslist, bins=[0, 30, 60, 90, 180, 365, 365+180, 2*365, 2*365 + 180, 3*365])



In [41]:
daysdiff[daysdiff<30]


Out[41]:
permalink
/organization/-qounter               -187 days
/organization/1                       -12 days
/organization/10-minutes-with           0 days
/organization/1000memories           -181 days
/organization/10bestthings              0 days
/organization/140-proof              -194 days
/organization/1calendar              -290 days
/organization/1daylater              -117 days
/organization/1daymakeover              0 days
/organization/1eq                    -250 days
/organization/1st-choice-lawn-care    -10 days
/organization/27-perry                -69 days
/organization/2code-online           -274 days
/organization/2crisk                  -91 days
/organization/2vancouver             -238 days
...
/organization/zoosk              -245 days
/organization/zopa                -59 days
/organization/zopim              -122 days
/organization/zova                -31 days
/organization/zqgame            -1186 days
/organization/zsoup                 0 days
/organization/zubie                 0 days
/organization/zuki                  0 days
/organization/zulily              -15 days
/organization/zupcat                0 days
/organization/zuzuche               0 days
/organization/zyken-nightcove       0 days
/organization/zykis                 0 days
/organization/zynga              -181 days
/organization/zzzzapp-com        -194 days
Length: 4965, dtype: timedelta64[ns]

In [42]:
daysdiff[(daysdiff<30)&(daysdiff>0)]


Out[42]:
Series([], dtype: timedelta64[ns])

More than 5000 companies have acquired funds before starting. Zynga is one of them.

Now let us read in the investments data.


In [118]:
rounds = pd.io.excel.read_excel('crunchbase_monthly_export_d43b44299ade53.xlsx', sheetname='Rounds', index_col='funding_round_permalink')

In [44]:
rounds.info()


<class 'pandas.core.frame.DataFrame'>
Index: 82308 entries, /funding-round/cc409188fa2b63482bd9008f682c2efa to /funding-round/5c5ebcc9d85c2a5f810e319c249a36e7
Data columns (total 15 columns):
company_permalink        82308 non-null object
company_name             82305 non-null object
company_category_list    76985 non-null object
company_market           76979 non-null object
company_country_code     75889 non-null object
company_state_code       55483 non-null object
company_region           75889 non-null object
company_city             74798 non-null object
funding_round_type       82308 non-null object
funding_round_code       22496 non-null object
funded_at                82308 non-null object
funded_month             82300 non-null object
funded_quarter           82300 non-null object
funded_year              82300 non-null float64
raised_amount_usd        69883 non-null float64
dtypes: float64(2), object(13)
memory usage: 10.0+ MB

In [45]:
fund_type = rounds.funding_round_type.value_counts()

In [46]:
fund_type


Out[46]:
venture                 41191
seed                    20613
debt_financing           5593
angel                    4381
undisclosed              3727
equity_crowdfunding      2115
private_equity           1801
grant                    1454
convertible_note          691
post_ipo_equity           381
product_crowdfunding      229
post_ipo_debt              79
secondary_market           53
dtype: int64

In [49]:
fund_type.plot(kind='pie', figsize=(10,10))


Out[49]:
<matplotlib.axes._subplots.AxesSubplot at 0x7feb4183b090>

In [50]:
recent_funding = rounds[rounds.funded_year >= 2000]
funding_sum = recent_funding[['funded_quarter','raised_amount_usd']].groupby('funded_quarter').sum()

In [51]:
funding_sum.plot(kind='bar', figsize=(15, 8))


Out[51]:
<matplotlib.axes._subplots.AxesSubplot at 0x7feb39f90510>

We can see that over all the funding keeps increasing, and there is more interest in Q1, peaking at Q2 and dipping in Q3 and Q4. So what is the story about 2006-Q3?


In [52]:
q3_2006 = recent_funding[recent_funding.funded_quarter=='2006-Q3'].sort("raised_amount_usd", ascending=False)
q3_2006[['company_name', 'company_market', 'raised_amount_usd']].head()


Out[52]:
company_name company_market raised_amount_usd
funding_round_permalink
/funding-round/949ee7711c5604b77dfa2a660eb0ad89 Cardinal Health Hospitals 78795064652
/funding-round/c7f31e1a300f9e4aa8d8ba8cf93ce6ec Cortina Systems Hardware 132000000
/funding-round/fd84230c623d8a57e63217d5f1f69b1d Lumenis Health Care 120000000
/funding-round/ec2e66080a7fee8bef30b8cb30f48470 United Information Technology Co. Enterprise Software 101000000
/funding-round/4ef70909558ae7d85b9ac196d87d2c54 Solexa Biotechnology 75000000

In [53]:
venture_funding = rounds[(rounds.funded_year >= 2000) & (rounds.funding_round_type == 'venture')]
venture_sum = venture_funding[['funded_quarter','raised_amount_usd']].groupby('funded_quarter').sum()
venture_sum.plot(kind='bar', figsize=(15, 8))


Out[53]:
<matplotlib.axes._subplots.AxesSubplot at 0x7feb3a149910>

In [54]:
seed_funding = rounds[(rounds.funded_year >= 2000) & ((rounds.funding_round_type == 'seed') | (rounds.funding_round_type=='angel'))]
seed_sum = seed_funding[['funded_quarter','raised_amount_usd']].groupby('funded_quarter').sum()
seed_sum.plot(kind='bar', figsize=(15, 8))


Out[54]:
<matplotlib.axes._subplots.AxesSubplot at 0x7feb37b10090>

Which investor has invested the most money?

For this we need the next sheet of data.


In [55]:
investments = pd.io.excel.read_excel('crunchbase_monthly_export_d43b44299ade53.xlsx', sheetname='Investments', index_col='funding_round_permalink')

To see the behaviours of investment firms, let us group the data by investment name.


In [56]:
investments.info()


<class 'pandas.core.frame.DataFrame'>
Index: 112590 entries, /funding-round/3b01561dd8c054727c9ddc0705a73f4c to /funding-round/e732b70e579f61fd973820ea9e348d7d
Data columns (total 23 columns):
company_permalink         112590 non-null object
company_name              112587 non-null object
company_category_list     108648 non-null object
company_market            108647 non-null object
company_country_code      105300 non-null object
company_state_code        77851 non-null object
company_region            105300 non-null object
company_city              103940 non-null object
investor_permalink        112528 non-null object
investor_name             112528 non-null object
investor_category_list    28906 non-null object
investor_market           28854 non-null object
investor_country_code     85082 non-null object
investor_state_code       61418 non-null object
investor_region           85082 non-null object
investor_city             84574 non-null object
funding_round_type        112590 non-null object
funding_round_code        53847 non-null object
funded_at                 112590 non-null datetime64[ns]
funded_month              112590 non-null object
funded_quarter            112590 non-null object
funded_year               112590 non-null int64
raised_amount_usd         99563 non-null float64
dtypes: datetime64[ns](1), float64(1), int64(1), object(20)
memory usage: 20.6+ MB

In [57]:
invest_gp = investments[['investor_permalink', 'raised_amount_usd']]\
.groupby('investor_permalink').sum()

In [60]:
type(invest_gp)


Out[60]:
pandas.core.frame.DataFrame

In [62]:
invest_gp.sort('raised_amount_usd', ascending=False).head()


Out[62]:
raised_amount_usd
investor_permalink
/organization/kleiner-perkins-caufield-byers 16460723024
/organization/new-enterprise-associates 13187618119
/organization/sequoia-capital 13019932802
/organization/accel-partners 12206334596
/organization/intel-capital 10195565636

Who is the biggest investor in India?


In [70]:
investments[investments.investor_country_code == 'IND']


Out[70]:
company_permalink company_name company_category_list company_market company_country_code company_state_code company_region company_city investor_permalink investor_name ... investor_state_code investor_region investor_city funding_round_type funding_round_code funded_at funded_month funded_quarter funded_year raised_amount_usd
funding_round_permalink
/funding-round/bbdedaea55b5c2f0a4814a91c7d30ace /organization/kartrocket KartRocket |SaaS|E-Commerce| SaaS IND NaN New Delhi New Delhi /organization/5ideas-in 5ideas.in ... NaN New Delhi Gurgaon seed NaN 2013-07-23 2013-07 2013-Q3 2013 NaN
/funding-round/0c34e9aeda49a68b54b048c125c80365 /organization/vyome-biosciences Vyome Biosciences |Life Sciences|Biotechnology| Life Sciences IND NaN Delhi Delhi /organization/aarin-capital Aarin Capital ... NaN Bangalore Bangalore venture A 2012-09-12 2012-09 2012-Q3 2012 3300000
/funding-round/54202ff7c6901052c657d61584b1f9f6 /organization/invictus-oncology Invictus Oncology |Biotechnology| Biotechnology IND NaN Delhi Delhi /organization/aarin-capital Aarin Capital ... NaN Bangalore Bangalore venture A 2013-05-28 2013-05 2013-Q2 2013 1880000
/funding-round/514241eccd4edab7903b6c7d1f5f42ec /organization/tabtor Tabtor |Education| Education USA NJ Newark Kendall Park /organization/aarin-capital Aarin Capital ... NaN Bangalore Bangalore seed NaN 2013-06-12 2013-06 2013-Q2 2013 1000000
/funding-round/e56ecdeebd8bd8970e41942e28797c63 /organization/theramyt-novobiologics Theramyt Novobiologics |Biotechnology| Biotechnology NaN NaN NaN NaN /organization/aarin-capital Aarin Capital ... NaN Bangalore Bangalore venture A 2013-10-11 2013-10 2013-Q4 2013 4500000
/funding-round/339647785b60e71ab176256b8a062774 /organization/vyome-biosciences Vyome Biosciences |Life Sciences|Biotechnology| Life Sciences IND NaN Delhi Delhi /organization/aarin-capital Aarin Capital ... NaN Bangalore Bangalore venture B 2014-08-12 2014-08 2014-Q3 2014 8000000
/funding-round/ba08e700e66fa5e12384e60e8cbcc7d8 /organization/edcast-inc EdCast Inc. |All Students|Education|Portals| Education USA CA SF Bay Area Mountain View /organization/aarin-capital Aarin Capital ... NaN Bangalore Bangalore venture A 2014-09-22 2014-09 2014-Q3 2014 6000000
/funding-round/3925da8eee16114d6570c3a0985d8ad7 /organization/milk-mantra Milk Mantra |Health and Wellness| Health and Wellness IND NaN Bhubaneswar Bhubaneswar /organization/aavishkaar-venture-management-se... Aavishkaar Venture Management services ... NaN Mumbai Mumbai undisclosed NaN 2011-02-02 2011-02 2011-Q1 2011 NaN
/funding-round/c7087296aaa1e6ee33e9d174022ad444 /organization/electronic-payment-and-services Electronic Payment and Services (EPS) |Hardware + Software| Hardware + Software IND NaN Mumbai Mumbai /organization/aavishkaar-venture-management-se... Aavishkaar Venture Management services ... NaN Mumbai Mumbai venture B 2013-08-26 2013-08 2013-Q3 2013 6000000
/funding-round/cabd3c8428576ef3018e1c91812a732e /organization/electronic-payment-and-services Electronic Payment and Services (EPS) |Hardware + Software| Hardware + Software IND NaN Mumbai Mumbai /organization/aavishkaar-venture-management-se... Aavishkaar Venture Management services ... NaN Mumbai Mumbai venture NaN 2013-12-17 2013-12 2013-Q4 2013 5000000
/funding-round/2107d9f9c8568d80d7972b6992c9c14c /organization/mela-artisans Mela Artisans |Fashion| Fashion USA FL Palm Beaches Boca Raton /organization/aavishkaar-venture-management-se... Aavishkaar Venture Management services ... NaN Mumbai Mumbai venture NaN 2014-03-03 2014-03 2014-Q1 2014 3000000
/funding-round/924be083c917d8ac047241c53fdf651b /organization/milk-mantra Milk Mantra |Health and Wellness| Health and Wellness IND NaN Bhubaneswar Bhubaneswar /organization/aavishkaar-venture-management-se... Aavishkaar Venture Management services ... NaN Mumbai Mumbai venture C 2014-06-30 2014-06 2014-Q2 2014 13080000
/funding-round/3fb22f83e95fb1cabfca2cbbcdb4da48 /organization/esolar eSolar |Green|Clean Technology| Clean Technology USA CA Los Angeles Pasadena /organization/acme-group ACME Group ... NaN Mumbai Mumbai venture F 2009-01-01 2009-01 2009-Q1 2009 40000000
/funding-round/9a466a1e2602795b46eabc65b12277e0 /organization/amiigo Amiigo |Exercise|Tracking|Hardware|Technology|Fitness... Health and Wellness USA UT Salt Lake City Salt Lake City /organization/alpha-investments Alpha Investments ... NaN Mumbai Mumbai seed NaN 2013-01-16 2013-01 2013-Q1 2013 NaN
/funding-round/e4183920092065834cd3ec1e498ac49d /organization/unamia unamia |E-Commerce| E-Commerce IND NaN Bangalore Bangalore /organization/angelprime AngelPrime ... NaN Bangalore Bengaluru seed NaN 2012-10-17 2012-10 2012-Q4 2012 1200000
/funding-round/0155ffde94f5c477acd0e52ce5f87fca /organization/hackerearth HackerEarth |Software| Software IND NaN Bangalore Bangalore /organization/angelprime AngelPrime ... NaN Bangalore Bengaluru seed NaN 2014-02-24 2014-02 2014-Q1 2014 500000
/funding-round/3a4629496f195ec5035271602d485234 /organization/synup Synup |Software| Software USA DE Wilmington, Delaware Wilmington /organization/angelprime AngelPrime ... NaN Bangalore Bengaluru seed NaN 2014-08-11 2014-08 2014-Q3 2014 NaN
/funding-round/f3ca0e0715575437846695885f21349e /organization/erc-eye-care ERC Eye Care |Biotechnology| Biotechnology IND NaN IND - Other Jorhat /organization/ankur-capital Ankur Capital ... NaN Mumbai Mumbai undisclosed NaN 2013-12-18 2013-12 2013-Q4 2013 NaN
/funding-round/2e76287d0606a929e926312e68d6c61e /organization/swiftshift Swift Shift |Software| Software GBR NaN London London /organization/ankur-capital Ankur Capital ... NaN Mumbai Mumbai venture NaN 2014-05-01 2014-05 2014-Q2 2014 0
/funding-round/b1b5dbefec49800b9791f8974990e41c /organization/mydentist MyDentist |Health Care| Health Care IND NaN Mumbai Mumbai /organization/asian-healthcare-fund Asian Healthcare Fund ... NaN New Delhi New Delhi venture NaN 2013-01-01 2013-01 2013-Q1 2013 10000000
/funding-round/f744530cce618681192f6d9e334d0ef5 /organization/healthspring HealthSpring |Hospitals|Biotechnology| Hospitals USA TN Nashville Franklin /organization/asian-healthcare-fund Asian Healthcare Fund ... NaN New Delhi New Delhi venture B 2013-06-19 2013-06 2013-Q2 2013 3700000
/funding-round/ec9d6802389ed053dffb07ac9ee8acfd /organization/forus-health Forus Health |Health Care| Health Care IND NaN Bangalore Bangalore /organization/asian-healthcare-fund Asian Healthcare Fund ... NaN New Delhi New Delhi venture B 2014-01-09 2014-01 2014-Q1 2014 8400000
/funding-round/734f972917a468cbaa0fced4c60310fe /organization/omni-hospitals Omni Hospitals |Biotechnology| Biotechnology IND NaN Hyderabad Hyderabad /organization/ask-pravi ASK Pravi ... NaN Mumbai Mumbai venture A 2013-08-07 2013-08 2013-Q3 2013 9800000
/funding-round/635e75b6014396e40d84117b3c0687ff /organization/thinklink ThinkLink |Consulting| Consulting IND NaN Haryana Haryana /organization/aspada Aspada ... NaN Bangalore Bangalore venture NaN 2014-01-07 2014-01 2014-Q1 2014 1600000
/funding-round/0f9f1cd76ce19e552f771abe59c32483 /organization/neogrowth Neogrowth |Finance| Finance IND NaN Mumbai Mumbai /organization/aspada Aspada ... NaN Bangalore Bangalore venture NaN 2014-03-26 2014-03 2014-Q1 2014 1620000
/funding-round/2afd408dad65e43e9b1af3a852f8ea54 /organization/capital-float Capital Float |Financial Services| Financial Services IND NaN Bangalore Bangalore /organization/aspada Aspada ... NaN Bangalore Bangalore seed NaN 2014-06-25 2014-06 2014-Q2 2014 2000000
/funding-round/f328070e282b09cdf034cc9b941e065e /organization/xamcheck Xamcheck |Education| Education IND NaN Secunderabad Secunderabad /organization/aspada Aspada ... NaN Bangalore Bangalore venture NaN 2014-10-30 2014-10 2014-Q4 2014 1800000
/funding-round/c0b21533794fc35ff56a26e922470838 /organization/dunenetworks DuneNetworks |Semiconductors| Semiconductors USA CA SF Bay Area Sunnyvale /organization/aurum-ventures Aurum Ventures ... NaN Mumbai Mumbai venture B 2008-01-14 2008-01 2008-Q1 2008 12000000
/funding-round/5424f8d2aa768a737b010ac1024308a0 /organization/n-trig N-Trig |Hardware + Software| Hardware + Software ISR NaN Tel Aviv Kfar Saba /organization/aurum-ventures Aurum Ventures ... NaN Mumbai Mumbai venture C 2008-02-20 2008-02 2008-Q1 2008 28000000
/funding-round/8ece8f4668dd0bdc869a6eb0a197df84 /organization/n-trig N-Trig |Hardware + Software| Hardware + Software ISR NaN Tel Aviv Kfar Saba /organization/aurum-ventures Aurum Ventures ... NaN Mumbai Mumbai venture D 2009-01-12 2009-01 2009-Q1 2009 24000000
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
/funding-round/fd88bc8d12635d46e25106f6b2c1cbdf /organization/si2-microsystems Si2 Microsystems |Semiconductors| Semiconductors USA CA SF Bay Area San Jose /organization/ventureast VenturEast ... NaN Chennai Chennai venture NaN 2009-02-04 2009-02 2009-Q1 2009 12700000
/funding-round/0391607b40e8d8d1f3e4cf36e06fc6db /organization/desicrew-solutions DesiCrew Solutions |Education| Education IND NaN Chennai Chennai /organization/ventureast VenturEast ... NaN Chennai Chennai venture B 2012-01-01 2012-01 2012-Q1 2012 1200000
/funding-round/441925078ed4b78a3fe72d98744bd19c /organization/seclore Seclore |Software| Software IND NaN Mumbai Mumbai /organization/ventureast VenturEast ... NaN Chennai Chennai venture NaN 2013-04-24 2013-04 2013-Q2 2013 6000000
/funding-round/c03336b81d71755274c48d7a3719a612 /organization/protea-medical Protea Medical |Health Care| Health Care USA AZ Phoenix Chandler /organization/ventureast VenturEast ... NaN Chennai Chennai venture NaN 2013-12-01 2013-12 2013-Q4 2013 8000000
/funding-round/c608fef63d3a5f68227b386c0507a293 /organization/portea-medical Portea Medical |Healthcare Services|Medical|Health Care| Health Care IND NaN Bangalore Bangalore /organization/ventureast VenturEast ... NaN Chennai Chennai private_equity NaN 2013-12-02 2013-12 2013-Q4 2013 8000000
/funding-round/7e61b0c22631b580112180a94c497566 /organization/pinnacle-engines Pinnacle Engines |Clean Technology| Clean Technology USA CA SF Bay Area San Carlos /organization/ventureast VenturEast ... NaN Chennai Chennai venture C 2014-05-08 2014-05 2014-Q2 2014 NaN
/funding-round/ce3db57eb6e70ab6cd089c09082fbbe7 /organization/polygenta-technologies Polygenta Technologies |Textiles| Textiles IND NaN Mumbai Mumbai /organization/ventureast VenturEast ... NaN Chennai Chennai venture NaN 2014-07-21 2014-07 2014-Q3 2014 3600000
/funding-round/365b84dae9d6e2264f425f91ff9b9454 /organization/seclore Seclore |Software| Software IND NaN Mumbai Mumbai /organization/ventureast-tenet-fund Ventureast Tenet Fund ... NaN Hyderabad Hyderabad seed NaN 2009-06-08 2009-06 2009-Q2 2009 NaN
/funding-round/61a294d317c2f74cdabc4f8b96527799 /organization/icrederity iCrederity |Security|Identity Management|Nonprofits| Nonprofits USA NY New York City New York /organization/ventureast-tenet-fund Ventureast Tenet Fund ... NaN Hyderabad Hyderabad seed NaN 2010-04-01 2010-04 2010-Q2 2010 NaN
/funding-round/064b2a75f8f0259805d14d5d151af925 /organization/inopen InOpen |Education| Education IND NaN Mumbai Mumbai /organization/ventureast-tenet-fund Ventureast Tenet Fund ... NaN Hyderabad Hyderabad seed NaN 2011-08-01 2011-08 2011-Q3 2011 NaN
/funding-round/aeb62fcde22725715d03097e2ea7ff52 /organization/smartrx SmartRx |Health Care Information Technology|Software| Software IND NaN Bangalore Bangalore /organization/ventureast-tenet-fund Ventureast Tenet Fund ... NaN Hyderabad Hyderabad seed NaN 2013-04-01 2013-04 2013-Q2 2013 NaN
/funding-round/fe12e7a0fb551e0e756b10dabd509649 /organization/little-eye-labs Little Eye Labs |Testing|Android|Mobile| Testing IND NaN Bangalore Bangalore /organization/ventureast-tenet-fund Ventureast Tenet Fund ... NaN Hyderabad Hyderabad seed NaN 2013-06-01 2013-06 2013-Q2 2013 NaN
/funding-round/c21c550c4c8ad8f7660a1d9de79ce41b /organization/stylecraze-beauty-care-pvt-ltd StyleCraze Beauty Care Pvt Ltd |Health and Wellness|Beauty| Health and Wellness IND NaN Hyderabad Hyderabad /organization/ventureast-tenet-fund Ventureast Tenet Fund ... NaN Hyderabad Hyderabad seed NaN 2013-08-03 2013-08 2013-Q3 2013 NaN
/funding-round/a9f44c174f683aaa7e14421c0c10f117 /organization/onebreath OneBreath |Consulting| Consulting IND NaN Bangalore Bangalore /organization/ventureast-tenet-fund Ventureast Tenet Fund ... NaN Hyderabad Hyderabad venture NaN 2014-01-28 2014-01 2014-Q1 2014 3000000
/funding-round/563f4fa3e428b5b4738a0d0b39ab085e /organization/mediangels MediAngels |Hospitality| Hospitality IND NaN Mumbai Mumbai /organization/ventureast-tenet-fund Ventureast Tenet Fund ... NaN Hyderabad Hyderabad seed NaN 2014-10-10 2014-10 2014-Q4 2014 1000000
/funding-round/c392231129fde459a31f9b7ee0767735 /organization/interview-master Interview Master |Recruiting|Human Resources|Curated Web| Human Resources IND NaN Bangalore Bangalore /organization/venturenursery VentureNursery ... NaN Mumbai Mumbai seed NaN 2013-06-14 2013-06 2013-Q2 2013 NaN
/funding-round/f923a3256cbf209805ad900a85876234 /organization/klip-in Klip.in |E-Commerce| E-Commerce USA CA SF Bay Area Palo Alto /organization/venturenursery VentureNursery ... NaN Mumbai Mumbai seed NaN 2013-09-06 2013-09 2013-Q3 2013 NaN
/funding-round/6953bc5b4d7302797e060c96fc1b1f1b /organization/hotelsaroundyou HotelsAroundYou |Hotels| Hotels IND NaN Mumbai Mumbai /organization/venturenursery VentureNursery ... NaN Mumbai Mumbai seed NaN 2014-10-27 2014-10 2014-Q4 2014 NaN
/funding-round/1fbdac141bb229304baced175b700a1d /organization/india-orders India Orders |E-Commerce| E-Commerce NaN NaN NaN NaN /organization/vijay-group Vijay Group ... NaN Thane Thane undisclosed NaN 2013-07-09 2013-07 2013-Q3 2013 NaN
/funding-round/e3c18bc577de12897186ea2668ea3842 /organization/mobileveda MobileVeda |Mobile| Mobile IND NaN Vellore Vellore /organization/vit-technology-business-incubator VIT Technology Business Incubator ... NaN Vellore Vellore seed NaN 2007-04-01 2007-04 2007-Q2 2007 9000
/funding-round/76a232e1ac21fd9912dc6ad7c04b5970 /organization/abaltat Tunepresto |Video Editing|Web Development|Music| Music IRL NaN Galway Galway /organization/wdc WDC ... NaN Kolkata Kolkata venture A 2007-01-01 2007-01 2007-Q1 2007 263400
/funding-round/f9c8271bb4d88eab399485796ac49305 /organization/nooked nooked |Shopping|Web Development|Web Tools|E-Commerce| Web Development IRL NaN Sligo Sligo /organization/wdc WDC ... NaN Kolkata Kolkata seed NaN 2008-08-01 2008-08 2008-Q3 2008 1557400
/funding-round/04f480ddf19f264fbfb2a99343eb3d9c /organization/crude-area Crude Area |Product Design|Art|Design| Product Design IND NaN Mumbai Mumbai /organization/webchutney Webchutney ... NaN New Delhi New Delhi angel NaN 2013-07-29 2013-07 2013-Q3 2013 NaN
/funding-round/7cb5c9070cdcb535482b209796698fc2 /organization/opera-solutions Opera Solutions |Business Intelligence|Predictive Analytics|An... Predictive Analytics USA NJ Newark Jersey City /organization/wipro Wipro Technologies ... NaN Bangalore Bangalore private_equity NaN 2013-05-09 2013-05 2013-Q2 2013 30000000
/funding-round/463de5ee1fc9c25cf45e88d2754b203a /organization/opera-solutions Opera Solutions |Business Intelligence|Predictive Analytics|An... Predictive Analytics USA NJ Newark Jersey City /organization/wipro Wipro Technologies ... NaN Bangalore Bangalore venture G 2014-10-23 2014-10 2014-Q4 2014 8199999
/funding-round/64e0d2bafe167a51be69ac678fa78752 /organization/axeda Axeda |Enterprise Software|M2M|Software| Enterprise Software USA MA New Bedford Foxboro /organization/wipro-2 Wipro ... NaN Bangalore Bangalore private_equity NaN 2013-08-20 2013-08 2013-Q3 2013 12000000
/funding-round/7c47dd82bc76607ea8cf60cb6a428f38 /organization/uniphore Uniphore |Software| Software IND NaN IND - Other Taramani /organization/yournest-angel-fund YourNest Angel Fund ... NaN New Delhi Gurgaon angel NaN 2014-03-13 2014-03 2014-Q1 2014 NaN
/funding-round/5abb4d1afe518ab7ff453d362f647580 /organization/antuit Antuit |Analytics| Analytics SGP NaN Singapore Singapore /organization/zodius-capital Zodius Capital ... NaN Mumbai Mumbai venture NaN 2013-08-13 2013-08 2013-Q3 2013 3859800
/funding-round/79d0dc9121121b1147932b1401c6215a /organization/culture-machine Culture Machine |Digital Media|Entertainment|Media| Entertainment IND NaN Mumbai Mumbai /organization/zodius-capital Zodius Capital ... NaN Mumbai Mumbai venture A 2014-09-01 2014-09 2014-Q3 2014 3500000
/funding-round/bc6ec5767c789dec7d704a5a9148db80 /organization/bigbasket-com Bigbasket.com |Online Shopping|Groceries|E-Commerce| E-Commerce IND NaN Bangalore Bangalore /organization/zodius-capital Zodius Capital ... NaN Mumbai Mumbai venture B 2014-09-12 2014-09 2014-Q3 2014 32799999

462 rows × 23 columns


In [76]:
invest_ind = investments[investments.company_country_code=='IND']\
            [['investor_permalink', 'raised_amount_usd']]\
            .groupby('investor_permalink').sum()

In [78]:
len(invest_ind)


Out[78]:
628

In [79]:
invest_us.sort('raised_amount_usd', ascending=False).head(10)


Out[79]:
raised_amount_usd
investor_permalink
/organization/tiger-global 2204850000
/organization/naspers 1560000000
/organization/accel-partners 1540325000
/organization/morgan-stanley 1309500000
/organization/sofina 1243000000
/organization/dst-global 1210000000
/organization/digital-sky-technologies-fo 1210000000
/organization/government-of-singapore-investment-corporation-gic 1000000000
/organization/sequoia-capital 628550000
/organization/softbank-internet-and-media-(simi) 627000000

In [92]:
cross_country_investing = investments[(investments.company_country_code != investments.investor_country_code) & pd.notnull(investments.company_country_code) & pd.notnull(investments.investor_country_code)]
cross_country_investing = cross_country_investing[['investor_country_code', 'company_country_code', 'company_permalink']]
cross_country_investing.info()


<class 'pandas.core.frame.DataFrame'>
Index: 16376 entries, /funding-round/33fbea0c06971aac9297ff2f9e56512f to /funding-round/655045befcf3c5b2805a0fbd73300e78
Data columns (total 3 columns):
investor_country_code    16376 non-null object
company_country_code     16376 non-null object
company_permalink        16376 non-null object
dtypes: object(3)
memory usage: 511.8+ KB

In [94]:
invest_count = cross_country_investing.groupby(['investor_country_code','company_country_code']).count()

In [98]:
invest_count.ix['IND']


Out[98]:
company_permalink
company_country_code
ARE 1
AUS 1
BEL 3
CAN 1
CYM 1
GBR 3
IRL 2
ISR 7
SGP 4
UGA 1
USA 97

In [99]:
invest_count.ix['USA']


Out[99]:
company_permalink
company_country_code
ARE 14
ARG 40
AUS 65
AUT 21
AZE 1
BEL 22
BGD 14
BGR 4
BHR 1
BMU 3
BRA 113
BRN 1
CAN 634
CHE 61
CHL 6
CHN 803
COL 3
CRI 1
CYM 13
CYP 8
CZE 6
DEU 213
DNK 23
ECU 1
EGY 3
ESP 92
EST 13
FIN 32
FRA 127
GBR 904
... ...
MAR 1
MEX 22
MKD 1
MOZ 1
MYS 28
NGA 10
NLD 69
NOR 18
NZL 16
PAK 2
PAN 1
PER 2
PHL 8
POL 1
PRT 5
ROU 1
RUS 73
SGP 131
SLV 1
SRB 3
SWE 78
THA 12
TUR 28
TWN 11
TZA 7
UGA 1
UKR 6
URY 1
VNM 3
ZAF 5

77 rows × 1 columns


In [101]:
invest_count.ix[('USA','IND')]


Out[101]:
company_permalink    583
Name: (USA, IND), dtype: int64

Which market gets the most investment dollars?


In [111]:
investments[['raised_amount_usd','company_market']]\
           .groupby('company_market')\
           .sum()\
           .sort('raised_amount_usd',ascending=False)


Out[111]:
raised_amount_usd
company_market
Biotechnology 1.475935e+11
Software 8.722225e+10
Clean Technology 6.958990e+10
Health Care 5.994843e+10
E-Commerce 4.768432e+10
Enterprise Software 4.241772e+10
Mobile 4.203615e+10
Internet 3.739474e+10
Advertising 3.469002e+10
Semiconductors 3.383968e+10
Finance 2.505558e+10
Curated Web 2.448018e+10
Hardware + Software 2.445069e+10
Technology 2.209097e+10
Web Hosting 2.034072e+10
Games 1.906473e+10
Analytics 1.835564e+10
Transportation 1.678087e+10
Security 1.633595e+10
Health and Wellness 1.603438e+10
Online Shopping 1.408079e+10
Manufacturing 1.222042e+10
Travel 1.087952e+10
Education 9.417769e+09
Social Media 8.986376e+09
Fashion 8.501845e+09
Video 7.549672e+09
Search 7.470412e+09
Automotive 7.043365e+09
SaaS 6.975657e+09
... ...
Carbon 5.184200e+04
Email Newsletters 5.000000e+04
Video Processing 4.500000e+04
Comics 4.000000e+04
Unmanned Air Systems 4.000000e+04
Ventures for Good 4.000000e+04
Intellectual Asset Management 3.999400e+04
Veterinary 3.462300e+04
CAD 2.800000e+04
Soccer 2.500000e+04
Home Owners 2.200000e+04
Self Development 2.000000e+04
Families 2.000000e+04
Direct Sales 1.800000e+04
Advanced Materials NaN
BPO Services NaN
Consumer Lending NaN
Debt Collecting NaN
Estimation and Quoting NaN
Farmers Market NaN
Green Building NaN
Independent Music Labels NaN
Lasers NaN
Lotteries NaN
Physical Security NaN
Social News NaN
Software Compliance NaN
Synthetic Biology NaN
Technical Continuing Education NaN
Weird Hardware NaN

700 rows × 1 columns

Which market gets the most investment dollars in India?


In [113]:
investments[investments.company_country_code == 'IND']\
    [['raised_amount_usd','company_market']]\
    .groupby('company_market')\
    .sum()\
    .sort('raised_amount_usd',ascending=False).head(20)


Out[113]:
raised_amount_usd
company_market
Online Shopping 11305500000
E-Commerce 4018555280
Clean Technology 1772017000
Curated Web 1474882534
Mobile 1258024000
Advertising 897400000
Transportation 724460000
Software 692046500
Finance 656182000
Biotechnology 459790000
Home Automation 364000000
Real Estate 343600000
Hardware 302600000
Education 282740786
Security 217040000
Jewelry 213000000
Analytics 211550000
Health Care 206420000
Hardware + Software 194150000
Games 188040000

In [112]:
investments[(investments.company_country_code == 'IND') | (investments.funded_year >=2013)]\
    [['raised_amount_usd','company_market']]\
    .groupby('company_market')\
    .sum()\
    .sort('raised_amount_usd',ascending=False).head(20)


Out[112]:
raised_amount_usd
company_market
Biotechnology 45784700319
Software 25600534587
E-Commerce 22865083457
Enterprise Software 16630107468
Clean Technology 16468227505
Health Care 16233216253
Finance 15152577885
Transportation 15053838568
Curated Web 13318647294
Online Shopping 12961270701
Mobile 10001123275
Health and Wellness 9470712226
Analytics 7491131425
Education 6489969851
Advertising 6148014461
Hardware + Software 6102775439
Oil and Gas 6059186366
Travel 5244774574
Security 5056600726
Technology 4848314582

In [114]:
investments.ix[1]


Out[114]:
company_permalink                  /organization/advercar
company_name                                     AdverCar
company_category_list                  |Cars|Advertising|
company_market                                Advertising
company_country_code                                  USA
company_state_code                                     CA
company_region                                SF Bay Area
company_city                                San Francisco
investor_permalink        /organization/1-800-flowers-com
investor_name                           1-800-FLOWERS.COM
investor_category_list                       |E-Commerce|
investor_market                                E-Commerce
investor_country_code                                 USA
investor_state_code                                    NY
investor_region                             New York City
investor_city                                    New York
funding_round_type                                   seed
funding_round_code                                    NaN
funded_at                             2012-10-30 00:00:00
funded_month                                      2012-10
funded_quarter                                    2012-Q4
funded_year                                          2012
raised_amount_usd                                 2000000
Name: /funding-round/0347630bdf4b26a66ebc20666d3b2069, dtype: object

Who are the biggest acquirers?

We have to load yet another excel sheet to answer this question.


In [117]:
acquisitions = pd.io.excel.read_excel('crunchbase_monthly_export_d43b44299ade53.xlsx', sheetname='Acquisitions', index_col='company_permalink')

In [119]:
acquisitions.info()


<class 'pandas.core.frame.DataFrame'>
Index: 12851 entries, /organization/waywire to /organization/zytex-group
Data columns (total 21 columns):
company_name              12851 non-null object
company_category_list     9243 non-null object
company_market            9242 non-null object
company_country_code      9599 non-null object
company_state_code        7373 non-null object
company_region            9600 non-null object
company_city              9401 non-null object
acquirer_permalink        12851 non-null object
acquirer_name             12851 non-null object
acquirer_category_list    11160 non-null object
acquirer_market           11160 non-null object
acquirer_country_code     11995 non-null object
acquirer_state_code       9612 non-null object
acquirer_region           11995 non-null object
acquirer_city             11878 non-null object
acquired_at               12851 non-null datetime64[ns]
acquired_month            12850 non-null object
acquired_quarter          12850 non-null object
acquired_year             12850 non-null float64
price_amount              3642 non-null float64
price_currency_code       12847 non-null object
dtypes: datetime64[ns](1), float64(2), object(18)
memory usage: 2.2+ MB

In [120]:
acquisitions = acquisitions[acquisitions.acquired_year>=2000]

In [127]:
acquisitions[['acquirer_permalink','company_name']]\
             .groupby("acquirer_permalink")\
             .count()\
             .sort('company_name',ascending=False)


Out[127]:
company_name
acquirer_permalink
/organization/google 164
/organization/cisco 123
/organization/ibm 121
/organization/microsoft 114
/organization/yahoo 101
/organization/oracle 90
/organization/emc 60
/organization/hewlett-packard 60
/organization/facebook 49
/organization/aol 48
/organization/apple 48
/organization/intel 47
/organization/ebay 45
/organization/amazon 41
/organization/twitter 38
/organization/groupon 33
/organization/nokia 30
/organization/salesforce 30
/organization/adobe-systems 28
/organization/dell 28
/organization/zayo-group 27
/organization/electronicarts 26
/organization/zynga 25
/organization/autodesk 25
/organization/iac 24
/organization/blackberry 24
/organization/qualcomm 23
/organization/thoma-bravo 22
/organization/intuit 20
/organization/homeaway 19
... ...
/organization/hansa-medical 1
/organization/hanon-mckendry 1
/organization/hanlon-investment-management 1
/organization/hangzhou-tigermed-technology-co-ltd 1
/organization/handelsbanken 1
/organization/hanco 1
/organization/halo-healthcare 1
/organization/halma 1
/organization/harrison-gypsum 1
/organization/harvest-power 1
/organization/healthgate-data-corp 1
/organization/harvestmark 1
/organization/healthcare-solutions 1
/organization/healthcare-of-today 1
/organization/healthagen 1
/organization/health-insurance-innovations 1
/organization/health-grades 1
/organization/health-club-media-network 1
/organization/health-care-reit 1
/organization/headstrong 1
/organization/hdfc-bank 1
/organization/hcp-company 1
/organization/hcl 1
/organization/hca 1
/organization/haymarket 1
/organization/hawker-beechcraft 1
/organization/havok 1
/organization/havas-discovery 1
/organization/hasbro-inc 1
/organization/zyraz-technology 1

6485 rows × 1 columns


In [128]:
investments.info()


<class 'pandas.core.frame.DataFrame'>
Index: 112590 entries, /funding-round/3b01561dd8c054727c9ddc0705a73f4c to /funding-round/e732b70e579f61fd973820ea9e348d7d
Data columns (total 23 columns):
company_permalink         112590 non-null object
company_name              112587 non-null object
company_category_list     108648 non-null object
company_market            108647 non-null object
company_country_code      105300 non-null object
company_state_code        77851 non-null object
company_region            105300 non-null object
company_city              103940 non-null object
investor_permalink        112528 non-null object
investor_name             112528 non-null object
investor_category_list    28906 non-null object
investor_market           28854 non-null object
investor_country_code     85082 non-null object
investor_state_code       61418 non-null object
investor_region           85082 non-null object
investor_city             84574 non-null object
funding_round_type        112590 non-null object
funding_round_code        53847 non-null object
funded_at                 112590 non-null datetime64[ns]
funded_month              112590 non-null object
funded_quarter            112590 non-null object
funded_year               112590 non-null int64
raised_amount_usd         99563 non-null float64
dtypes: datetime64[ns](1), float64(1), int64(1), object(20)
memory usage: 20.6+ MB

In [132]:
yearly_invest = investments[investments.funded_year>=2000][['funded_year','company_market']].groupby('funded_year')

In [142]:
index = []
from collections import defaultdict
plotpoints = defaultdict(dict)
for year, group in yearly_invest:
    index.append(year)
    rank = 1
    for key, value in  group.company_market.value_counts().head(10).iteritems():
        plotpoints[key][year] = rank
        rank = rank + 1

In [144]:
with plt.style.context('fivethirtyeight'):
    plt.figure(figsize=(15,8))
    plt.gca().invert_yaxis()
    for market,points in plotpoints.iteritems():
        values =[points.get(year, 11) for year in index]
        plt.plot(index, values)
    #plt.plot(x, np.sin(x) + x + np.random.randn(50))
    #plt.plot(x, np.sin(x) + 0.5 * x + np.random.randn(50))
    #plt.plot(x, np.sin(x) + 2 * x + np.random.randn(50))



In [ ]: