Summary Analysis of the 2017 GitHub Open Source Survey

By R. Stuart Geiger (@staeiou), Berkeley Institute for Data Science

Overview

This notebook analyzes the 2017 Open Source Survey, conducted by staff at GitHub, Inc. and other collaborators (see https://opensourcesurvey.org/2017 and https://github.com/github/open-source-survey). The survey was run in 2017, asking over 50 questions on a variety of topics. The survey's designers explain the motivation, design, and distribution of the survey:

In collaboration with researchers from academia, industry, and the community, GitHub designed a survey to gather high quality and novel data on open source software development practices and communities. We collected responses from 5,500 randomly sampled respondents sourced from over 3,800 open source repositories on GitHub.com, and over 500 responses from a non-random sample of communities that work on other platforms. The results are an open data set about the attitudes, experiences, and backgrounds of those who use, build, and maintain open source software."

Purpose and goal

The GitHub survey team presented analyses of some questions when releasing the survey, but there were many more questions asked that are relevant to researchers and community members. This report is an exploratory analysis of all questions asked in the survey, providing a basic summary of the responses to each question. This report presents and plots summary statistics -- mostly frequency counts, proportions, then a frequency or proportion bar graph -- of all questions asked in the survey. Most questions are presented individually, with panel questions grouped together as appropriate. There are no correlations, regressions, or descriptive breakouts between subgroups. Likert-style questions (e.g. Strongly agree <-> strongly disagree) have not been recoded to numerical, scalar values. There are no discussions or interpretations of results. This is left for future work.

The purpose of this notebook is to facilitate future research on this dataset by giving an overview of the kinds of questions asked in the survey, as well as serve as the basis for a PDF report, published on SocArXiv and OSF at https://osf.io/preprints/socarxiv/qps53/. The notebook is public on GitHub at https://github.com/staeiou/github-survey-analysis and others are encouraged to extend it as they see fit.



In [1]:

    
!pip install pandas seaborn









    



Requirement already satisfied: pandas in /home/staeiou/conda/lib/python3.5/site-packages
Requirement already satisfied: seaborn in /home/staeiou/conda/lib/python3.5/site-packages
Requirement already satisfied: python-dateutil>=2 in /home/staeiou/conda/lib/python3.5/site-packages (from pandas)
Requirement already satisfied: pytz>=2011k in /home/staeiou/conda/lib/python3.5/site-packages (from pandas)
Requirement already satisfied: numpy>=1.7.0 in /home/staeiou/conda/lib/python3.5/site-packages (from pandas)
Requirement already satisfied: six>=1.5 in /home/staeiou/conda/lib/python3.5/site-packages (from python-dateutil>=2->pandas)



In [2]:

    
import pandas as pd
import matplotlib, matplotlib.pyplot as plt
import numpy as np
import seaborn as sns

%matplotlib inline
pd.options.display.float_format = '{:.2f}%'.format # add % to all floats, all floats here are percentages



In [3]:

    
## For making pretty tables when nbconverting to latex

pd.set_option('display.notebook_repr_html', True)

def _repr_latex_(self):
    return "\centering{%s}" % self.to_latex()

pd.DataFrame._repr_latex_ = _repr_latex_  # monkey patch pandas DataFrame

Download and unzip data



In [4]:

    
!unzip -o data_for_public_release.zip









    



Archive:  data_for_public_release.zip
   creating: data_for_public_release/
  inflating: data_for_public_release/negative_incidents.csv  
  inflating: __MACOSX/data_for_public_release/._negative_incidents.csv  
  inflating: data_for_public_release/notes.txt  
  inflating: data_for_public_release/questionnaire.txt  
  inflating: __MACOSX/data_for_public_release/._questionnaire.txt  
  inflating: data_for_public_release/README.txt  
  inflating: __MACOSX/data_for_public_release/._README.txt  
  inflating: data_for_public_release/survey_data.csv  
  inflating: __MACOSX/data_for_public_release/._survey_data.csv



In [5]:

    
!ls data_for_public_release/









    



negative_incidents.csv	questionnaire.txt  survey_data.csv
notes.txt		README.txt

Data processing

Main dataset

Load main dataset into pandas



In [6]:

    
pd.options.display.max_rows = 500



In [7]:

    
survey_df = pd.read_csv("data_for_public_release/survey_data.csv")



In [8]:

    
print("survey_data.csv length:", len(survey_df))









    



survey_data.csv length: 6029



In [9]:

    
survey_complete_df = survey_df.query("STATUS == 'Complete'")
print("survey_data.csv completed responses:", len(survey_complete_df))









    



survey_data.csv completed responses: 3746

Explore the main dataset with some sample responses



In [10]:

    
survey_complete_df[0:3].transpose()









    Out[10]:







  
    
      
      3
      4
      6
    
  
  
    
      RESPONSE.ID
      48
      49
      51
    
    
      DATE.SUBMITTED
      3/21/17 15:42
      3/21/17 15:38
      3/21/17 15:41
    
    
      STATUS
      Complete
      Complete
      Complete
    
    
      PARTICIPATION.TYPE.FOLLOW
      1
      1
      1
    
    
      PARTICIPATION.TYPE.USE.APPLICATIONS
      1
      1
      1
    
    
      PARTICIPATION.TYPE.USE.DEPENDENCIES
      1
      1
      1
    
    
      PARTICIPATION.TYPE.CONTRIBUTE
      1
      1
      0
    
    
      PARTICIPATION.TYPE.OTHER
      0
      0
      0
    
    
      CONTRIBUTOR.TYPE.CONTRIBUTE.CODE
      Frequently
      Occasionally
      NaN
    
    
      CONTRIBUTOR.TYPE.CONTRIBUTE.DOCS
      Rarely
      Rarely
      NaN
    
    
      CONTRIBUTOR.TYPE.PROJECT.MAINTENANCE
      Frequently
      Rarely
      NaN
    
    
      CONTRIBUTOR.TYPE.FILE.BUGS
      Frequently
      Frequently
      NaN
    
    
      CONTRIBUTOR.TYPE.FEATURE.REQUESTS
      Frequently
      Frequently
      NaN
    
    
      CONTRIBUTOR.TYPE.COMMUNITY.ADMIN
      Never
      Occasionally
      NaN
    
    
      EMPLOYMENT.STATUS
      Employed full time
      Full time student
      Employed full time
    
    
      PROFESSIONAL.SOFTWARE
      Frequently
      NaN
      Frequently
    
    
      FUTURE.CONTRIBUTION.INTEREST
      Very interested
      Very interested
      Very interested
    
    
      FUTURE.CONTRIBUTION.LIKELIHOOD
      Very likely
      Very likely
      Somewhat unlikely
    
    
      OSS.USER.PRIORITIES.LICENSE
      Very important to have
      Very important to have
      Very important to have
    
    
      OSS.USER.PRIORITIES.CODE.OF.CONDUCT
      Somewhat important not to have
      Somewhat important to have
      Not important either way
    
    
      OSS.USER.PRIORITIES.CONTRIBUTING.GUIDE
      Somewhat important to have
      Very important to have
      Somewhat important to have
    
    
      OSS.USER.PRIORITIES.CLA
      Not important either way
      Very important to have
      Don't know what this is
    
    
      OSS.USER.PRIORITIES.ACTIVE.DEVELOPMENT
      Somewhat important to have
      Very important to have
      Very important to have
    
    
      OSS.USER.PRIORITIES.RESPONSIVE.MAINTAINERS
      Somewhat important to have
      Very important to have
      Very important to have
    
    
      OSS.USER.PRIORITIES.WELCOMING.COMMUNITY
      Very important to have
      Very important to have
      Somewhat important to have
    
    
      OSS.USER.PRIORITIES.WIDESPREAD.USE
      Somewhat important to have
      Not important either way
      Somewhat important to have
    
    
      OSS.CONTRIBUTOR.PRIORITIES.LICENSE
      Not important either way
      NaN
      NaN
    
    
      OSS.CONTRIBUTOR.PRIORITIES.CODE.OF.CONDUCT
      Somewhat important not to have
      NaN
      NaN
    
    
      OSS.CONTRIBUTOR.PRIORITIES.CONTRIBUTING.GUIDE
      Not important either way
      NaN
      NaN
    
    
      OSS.CONTRIBUTOR.PRIORITIES.CLA
      Not important either way
      NaN
      NaN
    
    
      OSS.CONTRIBUTOR.PRIORITIES.ACTIVE.DEVELOPMENT
      Somewhat important to have
      NaN
      NaN
    
    
      OSS.CONTRIBUTOR.PRIORITIES.RESPONSIVE.MAINTAINERS
      Somewhat important to have
      NaN
      NaN
    
    
      OSS.CONTRIBUTOR.PRIORITIES.WELCOMING.COMMUNITY
      Somewhat important to have
      NaN
      NaN
    
    
      OSS.CONTRIBUTOR.PRIORITIES.WIDESPREAD.USE
      Somewhat important to have
      NaN
      NaN
    
    
      SEEK.OPEN.SOURCE
      Sometimes
      Always
      Always
    
    
      OSS.UX
      Generally easier to use
      About the same
      Generally easier to use
    
    
      OSS.SECURITY
      Generally more secure
      Generally more secure
      About the same
    
    
      OSS.STABILITY
      About the same
      Generally less stable
      About the same
    
    
      INTERNAL.EFFICACY
      Strongly agree
      Strongly agree
      Strongly agree
    
    
      EXTERNAL.EFFICACY
      Strongly agree
      Strongly agree
      Neither agree nor disagree
    
    
      OSS.IDENTIFICATION
      Neither agree nor disagree
      Strongly agree
      Neither agree nor disagree
    
    
      USER.VALUES.STABILITY
      Moderately important
      Extremely important
      Extremely important
    
    
      USER.VALUES.INNOVATION
      Not at all important
      Very important
      Moderately important
    
    
      USER.VALUES.REPLICABILITY
      Very important
      Very important
      Moderately important
    
    
      USER.VALUES.COMPATIBILITY
      Very important
      Very important
      Extremely important
    
    
      USER.VALUES.SECURITY
      Very important
      Very important
      Extremely important
    
    
      USER.VALUES.COST
      Very important
      Not at all important
      Very important
    
    
      USER.VALUES.TRANSPARENCY
      Very important
      Extremely important
      Extremely important
    
    
      USER.VALUES.USER.EXPERIENCE
      Extremely important
      Moderately important
      Very important
    
    
      USER.VALUES.CUSTOMIZABILITY
      Extremely important
      Very important
      Extremely important
    
    
      USER.VALUES.SUPPORT
      Slightly important
      Moderately important
      Not at all important
    
    
      USER.VALUES.TRUSTED.PRODUCER
      Very important
      Slightly important
      Moderately important
    
    
      TRANSPARENCY.PRIVACY.BELIEFS
      People should be able to contribute code witho...
      People should be able to contribute code witho...
      People should be able to contribute code witho...
    
    
      INFO.AVAILABILITY
      A lot of information about me
      A lot of information about me
      A little information about me
    
    
      INFO.JOB
      Yes
      No
      No
    
    
      TRANSPARENCY.PRIVACY.PRACTICES.GENERAL
      I include my real name.
      I include my real name.
      I don't publish this kind of content online.
    
    
      TRANSPARENCY.PRIVACY.PRACTICES.OSS
      I include my real name.
      I include my real name.
      NaN
    
    
      RECEIVED.HELP
      Yes
      Yes
      Yes
    
    
      FIND.HELPER
      Other - Please describe
      I asked for help in a public forum (e.g. in a ...
      I asked a specific person for help.
    
    
      HELPER.PRIOR.RELATIONSHIP
      We knew each other well.
      Total strangers, I didn't know of them previou...
      We knew each other well.
    
    
      RECEIVED.HELP.TYPE
      Writing code or otherwise implementing ideas.
      Installing or using an application.
      Installing or using an application.
    
    
      PROVIDED.HELP
      Yes
      Yes
      Yes
    
    
      FIND.HELPEE
      I reached out to them to offer unsolicited help.
      They asked for help in a public forum (e.g. in...
      They asked me directly for help.
    
    
      HELPEE.PRIOR.RELATIONSHIP
      Total strangers, I didn't know of them previou...
      Total strangers, I didn't know of them previou...
      We knew each other well.
    
    
      PROVIDED.HELP.TYPE
      Writing code or otherwise implementing ideas.
      Installing or using an application.
      Installing or using an application.
    
    
      DISCOURAGING.BEHAVIOR.LACK.OF.RESPONSE
      Yes
      Yes
      Yes
    
    
      DISCOURAGING.BEHAVIOR.REJECTION.WOUT.EXPLANATION
      Yes
      No
      No
    
    
      DISCOURAGING.BEHAVIOR.DISMISSIVE.RESPONSE
      Yes
      Yes
      Yes
    
    
      DISCOURAGING.BEHAVIOR.BAD.DOCS
      Yes
      Yes
      Yes
    
    
      DISCOURAGING.BEHAVIOR.CONFLICT
      Yes
      Yes
      No
    
    
      DISCOURAGING.BEHAVIOR.UNWELCOMING.LANGUAGE
      No
      No
      No
    
    
      OSS.AS.JOB
      Yes, directly-  some or all of my work duties ...
      NaN
      NaN
    
    
      OSS.AT.WORK
      Frequently
      NaN
      Frequently
    
    
      OSS.IP.POLICY
      I am free to contribute without asking for per...
      NaN
      I'm not sure.
    
    
      EMPLOYER.POLICY.APPLICATIONS
      Use of open source applications is acceptable ...
      NaN
      Use of open source applications is encouraged.
    
    
      EMPLOYER.POLICY.DEPENDENCIES
      Use of open source dependencies is acceptable ...
      NaN
      Use of open source dependencies is encouraged.
    
    
      OSS.HIRING
      Very important
      NaN
      NaN
    
    
      IMMIGRATION
      No, I live in the country where I was born.
      No, I live in the country where I was born.
      Yes, and I intend to stay permanently.
    
    
      MINORITY.HOMECOUNTRY
      NaN
      NaN
      No
    
    
      MINORITY.CURRENT.COUNTRY
      No
      No
      No
    
    
      GENDER
      Man
      Man
      Man
    
    
      TRANSGENDER.IDENTITY
      No
      No
      No
    
    
      SEXUAL.ORIENTATION
      No
      Yes
      No
    
    
      WRITTEN.ENGLISH
      Very well
      Very well
      Very well
    
    
      AGE
      35 to 44 years
      17 or younger
      35 to 44 years
    
    
      FORMAL.EDUCATION
      Bachelor's degree
      Secondary (high) school graduate or equivalent
      Vocational/trade program or apprenticeship
    
    
      PARENTS.FORMAL.EDUCATION
      Bachelor's degree
      Master's degree
      Bachelor's degree
    
    
      AGE.AT.FIRST.COMPUTER.INTERNET
      13 - 17 years old
      Younger than 13 years old
      13 - 17 years old
    
    
      LOCATION.OF.FIRST.COMPUTER.INTERNET
      At home (belonging to me or a family member)
      At home (belonging to me or a family member)
      At home (belonging to me or a family member)
    
    
      PARTICIPATION.TYPE.ANY.REPONSE
      1
      1
      1
    
    
      POPULATION
      github
      github
      github
    
    
      OFF.SITE.ID
      NaN
      NaN
      NaN
    
    
      TRANSLATED
      0
      0
      0

Create lists of variables for bulk analysis



In [11]:

    
participation_type_vars = ['PARTICIPATION.TYPE.FOLLOW',
       'PARTICIPATION.TYPE.USE.APPLICATIONS',
       'PARTICIPATION.TYPE.USE.DEPENDENCIES', 'PARTICIPATION.TYPE.CONTRIBUTE',
       'PARTICIPATION.TYPE.OTHER']

contrib_type_vars = ['CONTRIBUTOR.TYPE.CONTRIBUTE.CODE',
       'CONTRIBUTOR.TYPE.CONTRIBUTE.DOCS',
       'CONTRIBUTOR.TYPE.PROJECT.MAINTENANCE', 'CONTRIBUTOR.TYPE.FILE.BUGS',
       'CONTRIBUTOR.TYPE.FEATURE.REQUESTS', 'CONTRIBUTOR.TYPE.COMMUNITY.ADMIN']

contrib_other_vars = ['EMPLOYMENT.STATUS', 'PROFESSIONAL.SOFTWARE',
       'FUTURE.CONTRIBUTION.INTEREST', 'FUTURE.CONTRIBUTION.LIKELIHOOD']

contrib_ident_vars = participation_type_vars + contrib_type_vars + contrib_other_vars



In [12]:

    
user_pri_vars = ['OSS.USER.PRIORITIES.LICENSE', 'OSS.USER.PRIORITIES.CODE.OF.CONDUCT',
       'OSS.USER.PRIORITIES.CONTRIBUTING.GUIDE', 'OSS.USER.PRIORITIES.CLA',
       'OSS.USER.PRIORITIES.ACTIVE.DEVELOPMENT',
       'OSS.USER.PRIORITIES.RESPONSIVE.MAINTAINERS',
       'OSS.USER.PRIORITIES.WELCOMING.COMMUNITY',
       'OSS.USER.PRIORITIES.WIDESPREAD.USE']

contrib_pri_vars = ['OSS.CONTRIBUTOR.PRIORITIES.LICENSE',
       'OSS.CONTRIBUTOR.PRIORITIES.CODE.OF.CONDUCT',
       'OSS.CONTRIBUTOR.PRIORITIES.CONTRIBUTING.GUIDE',
       'OSS.CONTRIBUTOR.PRIORITIES.CLA',
       'OSS.CONTRIBUTOR.PRIORITIES.ACTIVE.DEVELOPMENT',
       'OSS.CONTRIBUTOR.PRIORITIES.RESPONSIVE.MAINTAINERS',
       'OSS.CONTRIBUTOR.PRIORITIES.WELCOMING.COMMUNITY',
       'OSS.CONTRIBUTOR.PRIORITIES.WIDESPREAD.USE']

oss_values_vars = [ 'SEEK.OPEN.SOURCE',
       'OSS.UX', 'OSS.SECURITY', 'OSS.STABILITY', 'INTERNAL.EFFICACY',
       'EXTERNAL.EFFICACY', 'OSS.IDENTIFICATION']

user_values_vars = ['USER.VALUES.STABILITY',
       'USER.VALUES.INNOVATION', 'USER.VALUES.REPLICABILITY',
       'USER.VALUES.COMPATIBILITY', 'USER.VALUES.SECURITY', 'USER.VALUES.COST',
       'USER.VALUES.TRANSPARENCY', 'USER.VALUES.USER.EXPERIENCE',
       'USER.VALUES.CUSTOMIZABILITY', 'USER.VALUES.SUPPORT',
       'USER.VALUES.TRUSTED.PRODUCER']

values_pri_vars = user_pri_vars + contrib_pri_vars + user_values_vars + oss_values_vars



In [13]:

    
privacy_transp_vars = ['TRANSPARENCY.PRIVACY.BELIEFS',
       'INFO.AVAILABILITY', 'INFO.JOB',
       'TRANSPARENCY.PRIVACY.PRACTICES.GENERAL',
       'TRANSPARENCY.PRIVACY.PRACTICES.OSS']



In [14]:

    
help_vars = ['RECEIVED.HELP', 'FIND.HELPER',
       'HELPER.PRIOR.RELATIONSHIP', 'RECEIVED.HELP.TYPE', 'PROVIDED.HELP',
       'FIND.HELPEE', 'HELPEE.PRIOR.RELATIONSHIP', 'PROVIDED.HELP.TYPE']



In [15]:

    
paid_work_vars = ['OSS.AS.JOB',
       'OSS.AT.WORK', 'OSS.IP.POLICY', 'EMPLOYER.POLICY.APPLICATIONS',
       'EMPLOYER.POLICY.DEPENDENCIES', 'OSS.HIRING']



In [16]:

    
discouraging_vars = ['DISCOURAGING.BEHAVIOR.LACK.OF.RESPONSE',
       'DISCOURAGING.BEHAVIOR.REJECTION.WOUT.EXPLANATION',
       'DISCOURAGING.BEHAVIOR.DISMISSIVE.RESPONSE',
       'DISCOURAGING.BEHAVIOR.BAD.DOCS', 'DISCOURAGING.BEHAVIOR.CONFLICT',
       'DISCOURAGING.BEHAVIOR.UNWELCOMING.LANGUAGE']



In [17]:

    
demographic_vars = ['IMMIGRATION',
       'MINORITY.HOMECOUNTRY', 'MINORITY.CURRENT.COUNTRY', 'GENDER',
       'TRANSGENDER.IDENTITY', 'SEXUAL.ORIENTATION', 'WRITTEN.ENGLISH', 'AGE',
       'FORMAL.EDUCATION', 'PARENTS.FORMAL.EDUCATION',
       'AGE.AT.FIRST.COMPUTER.INTERNET', 'LOCATION.OF.FIRST.COMPUTER.INTERNET',
       'PARTICIPATION.TYPE.ANY.REPONSE', 'POPULATION', 'OFF.SITE.ID',
       'TRANSLATED']



In [18]:

    
survey_vars = [contrib_ident_vars, values_pri_vars, privacy_transp_vars, \
               help_vars, paid_work_vars, discouraging_vars, demographic_vars]

Negative incidents

Load into pandas



In [19]:

    
neg_df = pd.read_csv("data_for_public_release/negative_incidents.csv")



In [20]:

    
print("negative_incidents.csv length:", len(survey_df))









    



negative_incidents.csv length: 6029

Explore the negative dataset with some sample responses



In [21]:

    
neg_df[0:3].transpose()









    Out[21]:







  
    
      
      0
      1
      2
    
  
  
    
      NEGATIVE.WITNESS.RUDENESS
      1
      1
      0
    
    
      NEGATIVE.WITNESS.NAME.CALLING
      1
      0
      0
    
    
      NEGATIVE.WITNESS.THREATS
      0
      0
      0
    
    
      NEGATIVE.WITNESS.IMPERSONATION
      0
      0
      1
    
    
      NEGATIVE.WITNESS.SUSTAINED.HARASSMENT
      0
      0
      0
    
    
      NEGATIVE.WITNESS.CROSS.PLATFORM.HARASSMENT
      0
      0
      0
    
    
      NEGATIVE.WITNESS.STALKING
      0
      0
      0
    
    
      NEGATIVE.WITNESS.SEXUAL.ADVANCES
      0
      0
      0
    
    
      NEGATIVE.WITNESS.STEREOTYPING
      0
      0
      0
    
    
      NEGATIVE.WITNESS.DOXXING
      0
      0
      1
    
    
      NEGATIVE.WITNESS.OTHER
      0
      0
      0
    
    
      NEGATIVE.WITNESS.NONE.OF.THE.ABOVE
      0
      0
      0
    
    
      NEGATIVE.EXPERIENCE.RUDENESS
      0
      1
      0
    
    
      NEGATIVE.EXPERIENCE.NAME.CALLING
      0
      0
      0
    
    
      NEGATIVE.EXPERIENCE.THREATS
      0
      0
      0
    
    
      NEGATIVE.EXPERIENCE.IMPERSONATION
      0
      0
      0
    
    
      NEGATIVE.EXPERIENCE.SUSTAINED.HARASSMENT
      0
      0
      0
    
    
      NEGATIVE.EXPERIENCE.CROSS.PLATFORM.HARASSMENT
      0
      0
      0
    
    
      NEGATIVE.EXPERIENCE.STALKING
      0
      0
      0
    
    
      NEGATIVE.EXPERIENCE.SEXUAL.ADVANCES
      0
      0
      0
    
    
      NEGATIVE.EXPERIENCE.STEREOTYPING
      0
      0
      0
    
    
      NEGATIVE.EXPERIENCE.DOXXING
      0
      0
      0
    
    
      NEGATIVE.EXPERIENCE.OTHER
      0
      0
      0
    
    
      NEGATIVE.EXPERIENCE.NONE.OF.THE.ABOVE
      1
      0
      1
    
    
      NEGATIVE.RESPONSE.ASKED.USER.TO.STOP
      0
      0
      0
    
    
      NEGATIVE.RESPONSE.SOLICITED.COMMUNITY.SUPPORT
      0
      0
      0
    
    
      NEGATIVE.RESPONSE.BLOCKED.USER
      0
      0
      0
    
    
      NEGATIVE.RESPONSE.REPORTED.TO.MAINTAINERS
      0
      0
      0
    
    
      NEGATIVE.RESPONSE.REPORTED.TO.HOST.OR.ISP
      0
      0
      0
    
    
      NEGATIVE.RESPONSE.CONSULTED.LEGAL.COUNSEL
      0
      0
      0
    
    
      NEGATIVE.RESPONSE.CONTACTED.LAW.ENFORCEMENT
      0
      0
      0
    
    
      NEGATIVE.RESPONSE.OTHER
      0
      0
      0
    
    
      NEGATIVE.RESPONSE.IGNORED
      0
      1
      0
    
    
      RESPONSE.EFFECTIVENESS.ASKED.USER.TO.STOP
      NaN
      NaN
      NaN
    
    
      RESPONSE.EFFECTIVENESS.SOLICITED.COMMUNITY.SUPPORT
      NaN
      NaN
      NaN
    
    
      RESPONSE.EFFECTIVENESS.BLOCKED.USER
      NaN
      NaN
      NaN
    
    
      RESPONSE.EFFECTIVENESS.REPORTED.TO.MAINTAINERS
      NaN
      NaN
      NaN
    
    
      RESPONSE.EFFECTIVENESS.REPORTED.TO.HOST.OR.ISP
      NaN
      NaN
      NaN
    
    
      RESPONSE.EFFECTIVENESS.CONSULTED.LEGAL.COUNSEL
      NaN
      NaN
      NaN
    
    
      RESPONSE.EFFECTIVENESS.CONTACTED.LAW.ENFORCEMENT
      NaN
      NaN
      NaN
    
    
      RESPONSE.EFFECTIVENESS.OTHER
      NaN
      NaN
      NaN
    
    
      NEGATIVE.CONSEQUENCES.STOPPED.CONTRIBUTING
      0
      0
      1
    
    
      NEGATIVE.CONSEQUENCES.PSEUDONYM
      0
      0
      0
    
    
      NEGATIVE.CONSEQUENCES.WORK.IN.PRIVATE
      0
      0
      0
    
    
      NEGATIVE.CONSEQUENCES.CHANGE.USERNAME
      0
      0
      0
    
    
      NEGATIVE.CONSEQUENCES.CHANGE.ONLINE.PRESENCE
      0
      0
      0
    
    
      NEGATIVE.CONSEQUENCES.SUGGEST.COC
      0
      0
      0
    
    
      NEGATIVE.CONSEQUENCES.PRIVATE.COMMUNITY.DISCUSSION
      0
      0
      0
    
    
      NEGATIVE.CONSEQUENCES.PUBLIC.COMMUNITY.DISCUSSION
      0
      1
      0
    
    
      NEGATIVE.CONSEQUENCES.OFFLINE.CHANGES
      0
      0
      0
    
    
      NEGATIVE.CONSEQUENCES.OTHER
      0
      0
      0
    
    
      NEGATIVE.CONSEQUENCES.NONE.OF.THE.ABOVE
      1
      0
      0
    
    
      NEGATIVE.WITNESS.ANY.RESPONSE
      1
      1
      1
    
    
      NEGATIVE.EXPERIENCE.ANY.RESPONSE
      1
      1
      1
    
    
      NEGATIVE.RESPONSE.ANY.RESPONSE
      0
      1
      0
    
    
      NEGATIVE.CONSEQUENCES.ANY.RESPONSE
      1
      1
      1
    
    
      POPULATION
      github
      github
      github

Create lists of variables for bulk analysis



In [22]:

    
neg_witness_vars = ['NEGATIVE.WITNESS.RUDENESS', 'NEGATIVE.WITNESS.NAME.CALLING',
       'NEGATIVE.WITNESS.THREATS', 'NEGATIVE.WITNESS.IMPERSONATION',
       'NEGATIVE.WITNESS.SUSTAINED.HARASSMENT',
       'NEGATIVE.WITNESS.CROSS.PLATFORM.HARASSMENT',
       'NEGATIVE.WITNESS.STALKING', 'NEGATIVE.WITNESS.SEXUAL.ADVANCES',
       'NEGATIVE.WITNESS.STEREOTYPING', 'NEGATIVE.WITNESS.DOXXING',
       'NEGATIVE.WITNESS.OTHER', 'NEGATIVE.WITNESS.NONE.OF.THE.ABOVE', 'NEGATIVE.WITNESS.ANY.RESPONSE']



In [23]:

    
neg_exp_vars = ['NEGATIVE.EXPERIENCE.RUDENESS', 'NEGATIVE.EXPERIENCE.NAME.CALLING',
       'NEGATIVE.EXPERIENCE.THREATS', 'NEGATIVE.EXPERIENCE.IMPERSONATION',
       'NEGATIVE.EXPERIENCE.SUSTAINED.HARASSMENT',
       'NEGATIVE.EXPERIENCE.CROSS.PLATFORM.HARASSMENT',
       'NEGATIVE.EXPERIENCE.STALKING', 'NEGATIVE.EXPERIENCE.SEXUAL.ADVANCES',
       'NEGATIVE.EXPERIENCE.STEREOTYPING', 'NEGATIVE.EXPERIENCE.DOXXING',
       'NEGATIVE.EXPERIENCE.OTHER', 'NEGATIVE.EXPERIENCE.NONE.OF.THE.ABOVE', 'NEGATIVE.EXPERIENCE.ANY.RESPONSE']



In [24]:

    
neg_resp_vars = ['NEGATIVE.RESPONSE.ASKED.USER.TO.STOP',
       'NEGATIVE.RESPONSE.SOLICITED.COMMUNITY.SUPPORT',
       'NEGATIVE.RESPONSE.BLOCKED.USER',
       'NEGATIVE.RESPONSE.REPORTED.TO.MAINTAINERS',
       'NEGATIVE.RESPONSE.REPORTED.TO.HOST.OR.ISP',
       'NEGATIVE.RESPONSE.CONSULTED.LEGAL.COUNSEL',
       'NEGATIVE.RESPONSE.CONTACTED.LAW.ENFORCEMENT',
       'NEGATIVE.RESPONSE.OTHER', 'NEGATIVE.RESPONSE.IGNORED', 'NEGATIVE.RESPONSE.ANY.RESPONSE']



In [25]:

    
neg_effect_vars = ['RESPONSE.EFFECTIVENESS.ASKED.USER.TO.STOP',
       'RESPONSE.EFFECTIVENESS.SOLICITED.COMMUNITY.SUPPORT',
       'RESPONSE.EFFECTIVENESS.BLOCKED.USER',
       'RESPONSE.EFFECTIVENESS.REPORTED.TO.MAINTAINERS',
       'RESPONSE.EFFECTIVENESS.REPORTED.TO.HOST.OR.ISP',
       'RESPONSE.EFFECTIVENESS.CONSULTED.LEGAL.COUNSEL',
       'RESPONSE.EFFECTIVENESS.CONTACTED.LAW.ENFORCEMENT',
       'RESPONSE.EFFECTIVENESS.OTHER']



In [26]:

    
neg_conseq_vars = ['NEGATIVE.CONSEQUENCES.STOPPED.CONTRIBUTING',
       'NEGATIVE.CONSEQUENCES.PSEUDONYM',
       'NEGATIVE.CONSEQUENCES.WORK.IN.PRIVATE',
       'NEGATIVE.CONSEQUENCES.CHANGE.USERNAME',
       'NEGATIVE.CONSEQUENCES.CHANGE.ONLINE.PRESENCE',
       'NEGATIVE.CONSEQUENCES.SUGGEST.COC',
       'NEGATIVE.CONSEQUENCES.PRIVATE.COMMUNITY.DISCUSSION',
       'NEGATIVE.CONSEQUENCES.PUBLIC.COMMUNITY.DISCUSSION',
       'NEGATIVE.CONSEQUENCES.OFFLINE.CHANGES', 'NEGATIVE.CONSEQUENCES.OTHER',
       'NEGATIVE.CONSEQUENCES.NONE.OF.THE.ABOVE', 'NEGATIVE.CONSEQUENCES.ANY.RESPONSE']



In [27]:

    
neg_anyresp_vars = ['NEGATIVE.WITNESS.ANY.RESPONSE', 'NEGATIVE.EXPERIENCE.ANY.RESPONSE',
       'NEGATIVE.RESPONSE.ANY.RESPONSE', 'NEGATIVE.CONSEQUENCES.ANY.RESPONSE']

Analysis



In [28]:

    
sns.set(font_scale=1.5)

Contributor identity

People participate in open source in different ways. Which of the following activities do you engage in?

Choose all that apply.



In [29]:

    
participation_type_resp= survey_df[participation_type_vars].apply(pd.Series.value_counts).transpose()
participation_type_resp.columns = ["No", "Yes"]
participation_type_resp









    Out[29]:







  
    
      
      No
      Yes
    
  
  
    
      PARTICIPATION.TYPE.FOLLOW
      1287
      4742
    
    
      PARTICIPATION.TYPE.USE.APPLICATIONS
      454
      5575
    
    
      PARTICIPATION.TYPE.USE.DEPENDENCIES
      946
      5083
    
    
      PARTICIPATION.TYPE.CONTRIBUTE
      1722
      4307
    
    
      PARTICIPATION.TYPE.OTHER
      5742
      287



In [ ]:



In [30]:

    
participation_type_prop = survey_df[participation_type_vars].mean() * 100
participation_type_prop = participation_type_prop.sort_values()
pd.DataFrame(participation_type_prop, columns=["percent"])









    Out[30]:







  
    
      
      percent
    
  
  
    
      PARTICIPATION.TYPE.OTHER
      4.76%
    
    
      PARTICIPATION.TYPE.CONTRIBUTE
      71.44%
    
    
      PARTICIPATION.TYPE.FOLLOW
      78.65%
    
    
      PARTICIPATION.TYPE.USE.DEPENDENCIES
      84.31%
    
    
      PARTICIPATION.TYPE.USE.APPLICATIONS
      92.47%



In [31]:

    
ax = participation_type_prop.plot(kind='barh')

labels = []
for l in ax.get_yticklabels():
    title_text = l.get_text()[19:].replace(".", " ") # cut off "CONTRIBUTOR.TYPE"
        
    labels.append(title_text)
    
plt.xlim(0,100)
ax.set_yticklabels(labels)

ax.set_xlabel("Percent of respondents")
t = plt.title("% of people who participate in the following activities:")

Contributon type: How often do you engage in each of the following activities?



In [32]:

    
contrib_type_responses = survey_df[contrib_type_vars].apply(pd.Series.value_counts).transpose()

#contrib_type_responses.columns = ["Not at all important", "Slightly important","Don't know", "Somewhat important", "Very important"]
contrib_type_responses = contrib_type_responses[["Never", "Rarely", "Occasionally", "Frequently"]]
contrib_type_responses = contrib_type_responses[["Frequently", "Occasionally", "Rarely", "Never"]]
contrib_type_responses = contrib_type_responses.sort_values(by='Frequently')
contrib_type_responses









    Out[32]:







  
    
      
      Frequently
      Occasionally
      Rarely
      Never
    
  
  
    
      CONTRIBUTOR.TYPE.COMMUNITY.ADMIN
      287
      417
      867
      2412
    
    
      CONTRIBUTOR.TYPE.CONTRIBUTE.DOCS
      460
      1214
      1665
      661
    
    
      CONTRIBUTOR.TYPE.FEATURE.REQUESTS
      573
      1625
      1346
      451
    
    
      CONTRIBUTOR.TYPE.PROJECT.MAINTENANCE
      996
      944
      974
      1090
    
    
      CONTRIBUTOR.TYPE.FILE.BUGS
      1067
      2073
      768
      106
    
    
      CONTRIBUTOR.TYPE.CONTRIBUTE.CODE
      1160
      1383
      1301
      189



In [33]:

    
sns.set(style="whitegrid", font_scale=1.75)
fig, ax = plt.subplots()
cmap=matplotlib.cm.Blues_r
contrib_type_responses.plot.barh(stacked=True, ax=ax, figsize=[12,6], cmap=cmap, edgecolor='black', linewidth=1)

labels = []
for l in ax.get_yticklabels():
    title_text = l.get_text()[17:].replace(".", " ") # cut off "CONTRIBUTOR.TYPE"
        
    labels.append(title_text)
    
ax.set_yticklabels(labels)


plt.title("How often do you engage in each of the following activities?")

plt.xlabel("Number of responses")



legend = plt.legend(fancybox=True, loc='upper center', bbox_to_anchor=(.5, -.13), ncol=4, shadow=True)
legend.get_frame().set_edgecolor('b')
legend.get_frame().set_facecolor('white')

Employment status

EMPLOYMENT.STATUS



In [34]:

    
prop_df = pd.DataFrame((survey_df['EMPLOYMENT.STATUS'].value_counts()))
prop_df.columns=["count"]
prop_df









    Out[34]:







  
    
      
      count
    
  
  
    
      Employed full time
      3615
    
    
      Full time student
      1048
    
    
      Employed part time
      349
    
    
      Temporarily not working
      314
    
    
      Other - please describe
      184
    
    
      Retired or permanently not working (e.g. due to disability)
      90



In [35]:

    
prop_df = pd.DataFrame((survey_df['EMPLOYMENT.STATUS'].value_counts(normalize=True).round(4)*100))
prop_df.columns=["percent"]
prop_df









    Out[35]:







  
    
      
      percent
    
  
  
    
      Employed full time
      64.55%
    
    
      Full time student
      18.71%
    
    
      Employed part time
      6.23%
    
    
      Temporarily not working
      5.61%
    
    
      Other - please describe
      3.29%
    
    
      Retired or permanently not working (e.g. due to disability)
      1.61%



In [36]:

    
ax = pd.DataFrame(survey_df['EMPLOYMENT.STATUS'].value_counts()).plot(kind='barh')
plt.suptitle("Employment status")
t = ax.set_xlabel("Count of responses")

In your main job, how often do you write or otherwise directly contribute to producing software?

PROFESSIONAL.SOFTWARE



In [37]:

    
prop_df = pd.DataFrame((survey_df['PROFESSIONAL.SOFTWARE'].value_counts()))
prop_df.columns=["count"]
prop_df









    Out[37]:







  
    
      
      count
    
  
  
    
      Frequently
      2747
    
    
      Occasionally
      542
    
    
      Rarely
      339
    
    
      Never
      279



In [38]:

    
prop_df = pd.DataFrame((survey_df['PROFESSIONAL.SOFTWARE'].value_counts(normalize=True).round(4)*100))
prop_df.columns=["percent"]
prop_df









    Out[38]:







  
    
      
      percent
    
  
  
    
      Frequently
      70.31%
    
    
      Occasionally
      13.87%
    
    
      Rarely
      8.68%
    
    
      Never
      7.14%



In [39]:

    
ax = pd.DataFrame(survey_df['PROFESSIONAL.SOFTWARE'].value_counts()).plot(kind='barh')
plt.title("In your main job, how often do you write or\notherwise directly contribute to producing software?")
t = ax.set_xlabel("Count of responses")

How interested are you in contributing to open source projects in the future?

FUTURE.CONTRIBUTION.INTEREST



In [40]:

    
prop_df = pd.DataFrame((survey_df['FUTURE.CONTRIBUTION.INTEREST'].value_counts()))
prop_df.columns=["count"]
prop_df









    Out[40]:







  
    
      
      count
    
  
  
    
      Very interested
      3929
    
    
      Somewhat interested
      1430
    
    
      Not too interested
      125
    
    
      Not at all interested
      24



In [41]:

    
prop_df = pd.DataFrame((survey_df['FUTURE.CONTRIBUTION.INTEREST'].value_counts(normalize=True).round(4).round(4)*100))
prop_df.columns=["percent"]
prop_df









    Out[41]:







  
    
      
      percent
    
  
  
    
      Very interested
      71.33%
    
    
      Somewhat interested
      25.96%
    
    
      Not too interested
      2.27%
    
    
      Not at all interested
      0.44%



In [42]:

    
ax = pd.DataFrame(survey_df['FUTURE.CONTRIBUTION.INTEREST'].value_counts()).plot(kind='barh')
plt.title("How interested are you in contributing\nto open source projects in the future?")
t = ax.set_xlabel("Count of responses")

How likely are you to contribute to open source projects in the future?



In [43]:

    
prop_df = pd.DataFrame((survey_df['FUTURE.CONTRIBUTION.LIKELIHOOD'].value_counts()))
prop_df.columns=["count"]
prop_df









    Out[43]:







  
    
      
      count
    
  
  
    
      Very likely
      3271
    
    
      Somewhat likely
      1719
    
    
      Somewhat unlikely
      440
    
    
      Very unlikely
      81



In [44]:

    
prop_df = pd.DataFrame((survey_df['FUTURE.CONTRIBUTION.LIKELIHOOD'].value_counts(normalize=True).round(4)*100))
prop_df.columns=["percent"]
prop_df









    Out[44]:







  
    
      
      percent
    
  
  
    
      Very likely
      59.35%
    
    
      Somewhat likely
      31.19%
    
    
      Somewhat unlikely
      7.98%
    
    
      Very unlikely
      1.47%



In [45]:

    
ax = pd.DataFrame(survey_df['FUTURE.CONTRIBUTION.LIKELIHOOD'].value_counts()).plot(kind='barh')
plt.title("How likely are you to contribute to\nopen source projects in the future?")
t = ax.set_xlabel("Count of responses")

Priorities and values

When thinking about whether to use open source software, how important are the following things?

OSS.USER.PRIORITIES.*



In [46]:

    
user_pri_responses = survey_df[user_pri_vars].apply(pd.Series.value_counts).transpose()

#contrib_type_responses.columns = ["Not at all important", "Slightly important","Don't know", "Somewhat important", "Very important"]
user_pri_responses = user_pri_responses[["Very important to have",
                                             "Somewhat important to have",
                                             "Not important either way",
                                             "Somewhat important not to have",
                                             "Very important not to have",
                                             "Don't know what this is"]]
user_pri_responses = user_pri_responses.sort_values(by="Very important to have")



In [47]:

    
idx = []
for i in user_pri_responses.index:
    idx.append(i[20:])
idx = pd.Series(idx)    
user_pri_responses.set_index(idx)









    Out[47]:







  
    
      
      Very important to have
      Somewhat important to have
      Not important either way
      Somewhat important not to have
      Very important not to have
      Don't know what this is
    
  
  
    
      CLA
      490
      1024
      2282
      336
      157
      488
    
    
      CODE.OF.CONDUCT
      848
      1461
      1993
      166
      120
      209
    
    
      WIDESPREAD.USE
      984
      2067
      1576
      114
      47
      28
    
    
      CONTRIBUTING.GUIDE
      1212
      1866
      1516
      95
      62
      62
    
    
      WELCOMING.COMMUNITY
      2062
      1822
      812
      67
      33
      18
    
    
      RESPONSIVE.MAINTAINERS
      2575
      1850
      302
      31
      35
      20
    
    
      ACTIVE.DEVELOPMENT
      2768
      1722
      267
      30
      31
      16
    
    
      LICENSE
      3125
      1160
      435
      31
      33
      47



In [48]:

    
user_pri_responses_prop = survey_df[user_pri_vars].apply(pd.Series.value_counts, normalize=True).round(4).transpose()

#contrib_type_responses.columns = ["Not at all important", "Slightly important","Don't know", "Somewhat important", "Very important"]
user_pri_responses_prop = user_pri_responses_prop[["Very important to have",
                                             "Somewhat important to have",
                                             "Not important either way",
                                             "Somewhat important not to have",
                                             "Very important not to have",
                                             "Don't know what this is"]]
user_pri_responses_prop = user_pri_responses_prop.sort_values(by="Very important to have")
user_pri_responses_prop = user_pri_responses_prop * 100



In [49]:

    
idx = []
for i in user_pri_responses_prop.index:
    idx.append(i[20:])
idx = pd.Series(idx)    
user_pri_responses_prop.set_index(idx)









    Out[49]:







  
    
      
      Very important to have
      Somewhat important to have
      Not important either way
      Somewhat important not to have
      Very important not to have
      Don't know what this is
    
  
  
    
      CLA
      10.26%
      21.44%
      47.77%
      7.03%
      3.29%
      10.22%
    
    
      CODE.OF.CONDUCT
      17.68%
      30.46%
      41.55%
      3.46%
      2.50%
      4.36%
    
    
      WIDESPREAD.USE
      20.43%
      42.92%
      32.72%
      2.37%
      0.98%
      0.58%
    
    
      CONTRIBUTING.GUIDE
      25.18%
      38.77%
      31.50%
      1.97%
      1.29%
      1.29%
    
    
      WELCOMING.COMMUNITY
      42.83%
      37.85%
      16.87%
      1.39%
      0.69%
      0.37%
    
    
      RESPONSIVE.MAINTAINERS
      53.50%
      38.44%
      6.27%
      0.64%
      0.73%
      0.42%
    
    
      ACTIVE.DEVELOPMENT
      57.26%
      35.62%
      5.52%
      0.62%
      0.64%
      0.33%
    
    
      LICENSE
      64.69%
      24.01%
      9.00%
      0.64%
      0.68%
      0.97%



In [50]:

    
sns.set(style="whitegrid", font_scale=1.75)
fig, ax = plt.subplots()
cmap=matplotlib.cm.coolwarm
colors = ["xkcd:darkblue", "xkcd:lightblue", "xkcd:beige", "xkcd:salmon", "xkcd:crimson", "xkcd:green"]
user_pri_responses.plot.barh(stacked=True, ax=ax, figsize=[12,8], color=colors)

labels = []
for l in ax.get_yticklabels():
    title_text = l.get_text()[20:].replace(".", " ") # cut off "OSS.USER.PRIORITIES."
        
    labels.append(title_text)
    
ax.set_yticklabels(labels)

plt.title("When thinking about whether to *use* open source software,\n how important are the following things?")

plt.xlabel("Number of responses")



legend = plt.legend(fancybox=True, loc='upper center', bbox_to_anchor=(.5, -.1), ncol=2, shadow=True)
legend.get_frame().set_edgecolor('b')
legend.get_frame().set_facecolor('white')

When thinking about whether to contribute to an open source project, how important are the following things?

OSS.CONTRIBUTOR.PRIORITIES.*



In [51]:

    
contrib_pri_responses = survey_df[contrib_pri_vars].apply(pd.Series.value_counts).transpose()

#contrib_type_responses.columns = ["Not at all important", "Slightly important","Don't know", "Somewhat important", "Very important"]
contrib_pri_responses = contrib_pri_responses[["Very important to have",
                                             "Somewhat important to have",
                                             "Not important either way",
                                             "Somewhat important not to have",
                                             "Very important not to have",
                                             "Don't know what this is"]]

contrib_pri_responses = contrib_pri_responses.sort_values(by="Very important to have")



In [52]:

    
idx = []
for i in contrib_pri_responses.index:
    idx.append(i[27:])
idx = pd.Series(idx)    
contrib_pri_responses.set_index(idx)









    Out[52]:







  
    
      
      Very important to have
      Somewhat important to have
      Not important either way
      Somewhat important not to have
      Very important not to have
      Don't know what this is
    
  
  
    
      WIDESPREAD.USE
      387
      1016
      1666
      70
      30
      12
    
    
      CLA
      419
      712
      1266
      327
      166
      280
    
    
      CODE.OF.CONDUCT
      655
      1145
      1085
      119
      84
      96
    
    
      CONTRIBUTING.GUIDE
      1198
      1396
      500
      41
      18
      24
    
    
      ACTIVE.DEVELOPMENT
      1368
      1333
      448
      21
      18
      5
    
    
      WELCOMING.COMMUNITY
      1533
      1199
      411
      21
      15
      7
    
    
      RESPONSIVE.MAINTAINERS
      1994
      1022
      138
      7
      16
      7
    
    
      LICENSE
      2199
      610
      337
      16
      15
      18



In [ ]:



In [53]:

    
contrib_pri_responses_prop = survey_df[contrib_pri_vars].apply(pd.Series.value_counts, normalize=True).round(4).transpose()

#contrib_type_responses.columns = ["Not at all important", "Slightly important","Don't know", "Somewhat important", "Very important"]
contrib_pri_responses_prop = contrib_pri_responses_prop[["Very important to have",
                                             "Somewhat important to have",
                                             "Not important either way",
                                             "Somewhat important not to have",
                                             "Very important not to have",
                                             "Don't know what this is"]]
contrib_pri_responses_prop = contrib_pri_responses_prop.sort_values(by="Very important to have")
contrib_pri_responses_prop = contrib_pri_responses_prop * 100



In [54]:

    
idx = []
for i in contrib_pri_responses_prop.index:
    idx.append(i[27:])
idx = pd.Series(idx)    
contrib_pri_responses_prop.set_index(idx)









    Out[54]:







  
    
      
      Very important to have
      Somewhat important to have
      Not important either way
      Somewhat important not to have
      Very important not to have
      Don't know what this is
    
  
  
    
      WIDESPREAD.USE
      12.17%
      31.94%
      52.37%
      2.20%
      0.94%
      0.38%
    
    
      CLA
      13.22%
      22.46%
      39.94%
      10.32%
      5.24%
      8.83%
    
    
      CODE.OF.CONDUCT
      20.57%
      35.96%
      34.08%
      3.74%
      2.64%
      3.02%
    
    
      CONTRIBUTING.GUIDE
      37.71%
      43.94%
      15.74%
      1.29%
      0.57%
      0.76%
    
    
      ACTIVE.DEVELOPMENT
      42.84%
      41.75%
      14.03%
      0.66%
      0.56%
      0.16%
    
    
      WELCOMING.COMMUNITY
      48.12%
      37.63%
      12.90%
      0.66%
      0.47%
      0.22%
    
    
      RESPONSIVE.MAINTAINERS
      62.63%
      32.10%
      4.33%
      0.22%
      0.50%
      0.22%
    
    
      LICENSE
      68.83%
      19.09%
      10.55%
      0.50%
      0.47%
      0.56%



In [55]:

    
sns.set(style="whitegrid", font_scale=1.75)
fig, ax = plt.subplots()
cmap=matplotlib.cm.coolwarm
colors = ["xkcd:darkblue", "xkcd:lightblue", "xkcd:beige", "xkcd:salmon", "xkcd:crimson", "xkcd:green"]
contrib_pri_responses.plot.barh(stacked=True, ax=ax, figsize=[12,8], color=colors)

labels = []
for l in ax.get_yticklabels():
    title_text = l.get_text()[27:].replace(".", " ") # cut off "OSS.USER.PRIORITIES."
        
    labels.append(title_text)
    
ax.set_yticklabels(labels)

plt.title("When thinking about whether to *contribute* to an open source project,\nhow important are the following things?")

plt.xlabel("Number of responses")



legend = plt.legend(fancybox=True, loc='upper center', bbox_to_anchor=(.5, -.1), ncol=2, shadow=True)
legend.get_frame().set_edgecolor('b')
legend.get_frame().set_facecolor('white')

How often do you try to find open source options over other kinds of software?

SEEK.OPEN.SOURCE



In [56]:

    
count_df = pd.DataFrame(data=survey_df['SEEK.OPEN.SOURCE'].value_counts())
count_df.columns = ["count"]
count_df



In [57]:

    
prop_df = pd.DataFrame((survey_df['SEEK.OPEN.SOURCE'].value_counts(normalize=True).round(4)*100))
prop_df.columns=["percent"]
prop_df









    Out[57]:







  
    
      
      percent
    
  
  
    
      Always
      73.38%
    
    
      Sometimes
      23.93%
    
    
      Rarely
      2.15%
    
    
      Never
      0.54%



In [58]:

    
ax = pd.DataFrame(survey_df['SEEK.OPEN.SOURCE'].value_counts()).plot(kind='barh')
plt.title("How often do you try to find open\nsource options over other kinds of software?")
t = ax.set_xlabel("Count of responses")

Open source software usability

OSS.UX: Do you believe that open source software is generally easier to use than closed source (proprietary) software, harder to use, or about the same?



In [59]:

    
count_df = pd.DataFrame(data=survey_df['OSS.UX'].value_counts())
count_df.columns = ["count"]
count_df









    Out[59]:







  
    
      
      count
    
  
  
    
      About the same
      2027
    
    
      Generally easier to use
      1597
    
    
      Generally harder to use
      897



In [60]:

    
prop_df = pd.DataFrame((survey_df['OSS.UX'].value_counts(normalize=True).round(4)*100))
prop_df.columns=["percent"]
prop_df









    Out[60]:







  
    
      
      percent
    
  
  
    
      About the same
      44.84%
    
    
      Generally easier to use
      35.32%
    
    
      Generally harder to use
      19.84%



In [61]:

    
ax = pd.DataFrame(survey_df['OSS.UX'].value_counts()).plot(kind='barh')
plt.title("Do you believe that open source software is generally\neasier to use than closed source (proprietary)\nsoftware, harder to use, or about the same?")
t = ax.set_xlabel("Count of responses")

Open source software security

OSS.SECURITY: Do you believe that open source software is generally more secure than closed source (proprietary) software, less secure, or about the same?



In [62]:

    
count_df = pd.DataFrame(data=survey_df['OSS.SECURITY'].value_counts())
count_df.columns = ["count"]
count_df









    Out[62]:







  
    
      
      count
    
  
  
    
      Generally more secure
      2688
    
    
      About the same
      1537
    
    
      Generally less secure
      295



In [63]:

    
prop_df = pd.DataFrame((survey_df['OSS.SECURITY'].value_counts(normalize=True).round(4)*100))
prop_df.columns=["percent"]
prop_df









    Out[63]:







  
    
      
      percent
    
  
  
    
      Generally more secure
      59.47%
    
    
      About the same
      34.00%
    
    
      Generally less secure
      6.53%



In [64]:

    
ax = pd.DataFrame(survey_df['OSS.SECURITY'].value_counts()).plot(kind='barh')
plt.title("Do you believe that open source software is\ngenerally more secure than closed source (proprietary)\nsoftware, less secure, or about the same?")
t = ax.set_xlabel("Count of responses")

Open source software stability

OSS.STABILITY: Do you believe that open source software is generally more stable than closed source (proprietary) software, less stable, or about the same?



In [65]:

    
count_df = pd.DataFrame(data=survey_df['OSS.STABILITY'].value_counts())
count_df.columns = ["count"]
count_df









    Out[65]:







  
    
      
      count
    
  
  
    
      About the same
      2240
    
    
      Generally more stable
      1399
    
    
      Generally less stable
      877



In [66]:

    
prop_df = pd.DataFrame((survey_df['OSS.STABILITY'].value_counts(normalize=True).round(4)*100))
prop_df.columns=["percent"]
prop_df









    Out[66]:







  
    
      
      percent
    
  
  
    
      About the same
      49.60%
    
    
      Generally more stable
      30.98%
    
    
      Generally less stable
      19.42%



In [67]:

    
pd.DataFrame(survey_df['OSS.STABILITY'].value_counts()).plot(kind='barh')
plt.title("Do you believe that open source software is\ngenerally more stable than closed source\n(proprietary), less stable, or about the same?")
t = ax.set_xlabel("Count of responses")



In [ ]:

Identification with open source

How much do you agree or disagree with the following statements:

EXTERNAL.EFFICACY: The open source community values contributions from people like me.
INTERNAL.EFFICACY: I have the skills and understanding necessary to make meaningful contributions to open source projects.
OSS.IDENTIFICATION: I consider myself to be a member of the open source (and/or the Free/Libre software) community.



In [68]:

    
oss_id_vars = ["INTERNAL.EFFICACY", "EXTERNAL.EFFICACY", "OSS.IDENTIFICATION"]



In [69]:

    
oss_id_responses = survey_df[oss_id_vars].apply(pd.Series.value_counts).transpose()

#contrib_type_responses.columns = ["Not at all important", "Slightly important","Don't know", "Somewhat important", "Very important"]
oss_id_responses = oss_id_responses[["Strongly agree",
                                     "Somewhat agree",
                                     "Neither agree nor disagree",
                                     "Somewhat disagree",
                                     "Strongly disagree"]]
oss_id_responses = oss_id_responses.sort_values(by="Strongly agree")
oss_id_responses









    Out[69]:







  
    
      
      Strongly agree
      Somewhat agree
      Neither agree nor disagree
      Somewhat disagree
      Strongly disagree
    
  
  
    
      EXTERNAL.EFFICACY
      1518
      1610
      1116
      150
      58
    
    
      OSS.IDENTIFICATION
      1579
      1513
      863
      351
      150
    
    
      INTERNAL.EFFICACY
      2052
      1685
      418
      240
      62



In [70]:

    
oss_id_responses_prop = survey_df[oss_id_vars].apply(pd.Series.value_counts, normalize=True).round(4) * 100 
oss_id_responses_prop.transpose()









    Out[70]:







  
    
      
      Neither agree nor disagree
      Somewhat agree
      Somewhat disagree
      Strongly agree
      Strongly disagree
    
  
  
    
      INTERNAL.EFFICACY
      9.38%
      37.81%
      5.38%
      46.04%
      1.39%
    
    
      EXTERNAL.EFFICACY
      25.07%
      36.16%
      3.37%
      34.10%
      1.30%
    
    
      OSS.IDENTIFICATION
      19.37%
      33.95%
      7.88%
      35.44%
      3.37%



In [71]:

    
sns.set(style="whitegrid", font_scale=1.75)
fig, ax = plt.subplots()
cmap=matplotlib.cm.coolwarm
colors = ["xkcd:darkblue", "xkcd:lightblue", "xkcd:beige", "xkcd:salmon", "xkcd:crimson"]
oss_id_responses.plot.barh(stacked=True, ax=ax, figsize=[12,5], cmap=matplotlib.cm.coolwarm, edgecolor='black', linewidth=1)

#print(str(ax.get_yticklabels()))

ax.set_yticklabels(["The open source community values\ncontributions from people like me.",
                    "I consider myself to be a member\nof the open source (and/or the\nFree/Libre software) community.",
                    "I have the skills and understanding\nnecessary to make meaningful\ncontributions to open source projects."])


plt.title("How much do you agree or disagree with the following statements:")

plt.xlabel("Number of responses")


legend = plt.legend(fancybox=True, loc='upper center', bbox_to_anchor=(.5, -.25), ncol=2, shadow=True)
legend.get_frame().set_edgecolor('b')
legend.get_frame().set_facecolor('white')

Transparency vs privacy

Attribution

TRANSPARENCY.PRIVACY.BELIEFS: Which of the following statements is closest to your beliefs about attribution in software development?

Records of authorship should be required so that end users know who created the source code they are working with.
People should be able to contribute code without attribution, if they wish to remain anonymous.



In [72]:

    
counts_df = pd.DataFrame(survey_df['TRANSPARENCY.PRIVACY.BELIEFS'].value_counts())
counts_df.columns=["count"]
counts_df









    Out[72]:







  
    
      
      count
    
  
  
    
      People should be able to contribute code without attribution, if they wish to remain anonymous.
      2454
    
    
      Records of authorship should be required so that end users know who created the source code they are working with.
      1594



In [73]:

    
prop_df = pd.DataFrame((survey_df['TRANSPARENCY.PRIVACY.BELIEFS'].value_counts(normalize=True).round(4)*100))
prop_df.columns=["percent"]
prop_df









    Out[73]:







  
    
      
      percent
    
  
  
    
      People should be able to contribute code without attribution, if they wish to remain anonymous.
      60.62%
    
    
      Records of authorship should be required so that end users know who created the source code they are working with.
      39.38%



In [74]:

    
ax = pd.DataFrame(survey_df['TRANSPARENCY.PRIVACY.BELIEFS'].value_counts()).plot(kind='barh', figsize=[10,6])
plt.title("Which of the following statements is closest to your\nbeliefs about attribution in software development?")
ax.set_yticklabels(["People should be able to contribute\ncode without attribution, if\nthey wish to remain anonymous.",
                    "Records of authorship should be\nrequired so that end users know\nwho created the source code they are working with."])
t = ax.set_xlabel("Count of responses")

In general, how much information about you is publicly available online?

INFO.AVAILABILITY



In [75]:

    
count_df = pd.DataFrame(survey_df['INFO.AVAILABILITY'].value_counts())
count_df.columns=["count"]
count_df









    Out[75]:







  
    
      
      count
    
  
  
    
      Some information about me
      1776
    
    
      A little information about me
      1133
    
    
      A lot of information about me
      1011
    
    
      No information at all about me
      140



In [76]:

    
prop_df = pd.DataFrame((survey_df['INFO.AVAILABILITY'].value_counts(normalize=True).round(4)*100))
prop_df.columns=["percent"]
prop_df









    Out[76]:







  
    
      
      percent
    
  
  
    
      Some information about me
      43.74%
    
    
      A little information about me
      27.91%
    
    
      A lot of information about me
      24.90%
    
    
      No information at all about me
      3.45%



In [77]:

    
ax = pd.DataFrame(survey_df['INFO.AVAILABILITY'].value_counts()).plot(kind='barh')
plt.title("In general, how much information about\nyou is publicly available online?")
t = ax.set_xlabel("Count of responses")

Do you feel that you need to make information available about yourself online for professional reasons?

INFO.JOB



In [78]:

    
count_df = pd.DataFrame(survey_df['INFO.JOB'].value_counts())
count_df.columns = ["count"]
count_df



In [79]:

    
prop_df = pd.DataFrame((survey_df['INFO.JOB'].value_counts(normalize=True).round(4)*100))
prop_df.columns=["percent"]
prop_df



In [80]:

    
ax = pd.DataFrame(survey_df['INFO.JOB'].value_counts()).plot(kind='barh', figsize=[10,6])
plt.title("Do you feel that you need to make information available\nabout yourself online for professional reasons?")
t = ax.set_xlabel("Count of responses")

General privacy practices

TRANSPARENCY.PRIVACY.PRACTICES.GENERAL

"Which of the following best describes your practices around publishing content online, such as posts on social media (e.g. Facebook, Instagram, Twitter, etc.), blogs, and other platforms (not including contributions to open source projects)?" (single choice)



In [81]:

    
counts_df = pd.DataFrame(survey_df['TRANSPARENCY.PRIVACY.PRACTICES.GENERAL'].value_counts())
counts_df.columns = ["count"]
counts_df









    Out[81]:







  
    
      
      count
    
  
  
    
      I include my real name.
      1718
    
    
      I usually use a consistent pseudonym that is easily linked to my real name online.
      1141
    
    
      I don't publish this kind of content online.
      517
    
    
      I usually use a consistent pseudonym that is not linked anywhere with my real name online.
      363
    
    
      I take precautions to use different pseudonymns on different platforms.
      270



In [82]:

    
prop_df = pd.DataFrame((survey_df['TRANSPARENCY.PRIVACY.PRACTICES.GENERAL'].value_counts(normalize=True).round(4)*100))
prop_df.columns=["percent"]
prop_df









    Out[82]:







  
    
      
      percent
    
  
  
    
      I include my real name.
      42.85%
    
    
      I usually use a consistent pseudonym that is easily linked to my real name online.
      28.46%
    
    
      I don't publish this kind of content online.
      12.90%
    
    
      I usually use a consistent pseudonym that is not linked anywhere with my real name online.
      9.05%
    
    
      I take precautions to use different pseudonymns on different platforms.
      6.73%



In [83]:

    
plot_counts_df = pd.DataFrame(survey_df['TRANSPARENCY.PRIVACY.PRACTICES.GENERAL'].value_counts())
idx = ['I include my real name.',
       'I usually use a consistent pseudonym that\nis easily linked to my real name online.',
       'I don\'t publish this kind of content online.',
       'I usually use a consistent pseudonym that\nis not linked anywhere with my real name online.',
       'I take precautions to use different\npseudonymns on different platforms.']
plot_counts_df.index = idx



In [84]:

    
ax = plot_counts_df.plot(kind='barh', figsize=[12,6])
plt.title("Which of the following best describes your\npractices around publishing content online [...] \nnot including contributions to open source projects?")
t = ax.set_xlabel("Count of responses")

OSS privacy practices

"Which of the following best describes your practices when making open source contributions?"



In [85]:

    
counts_df = pd.DataFrame(survey_df['TRANSPARENCY.PRIVACY.PRACTICES.OSS'].value_counts())
counts_df.columns = ["count"]
counts_df









    Out[85]:







  
    
      
      count
    
  
  
    
      I include my real name.
      1845
    
    
      I usually contribute using a consistent pseudonym that is easily linked to my real name online.
      766
    
    
      I usually contribute using a consistent pseudonym that is not linked anywhere with my real name online.
      273
    
    
      I take precautions to use different usernames  in different projects.
      42



In [86]:

    
prop_df = pd.DataFrame((survey_df['TRANSPARENCY.PRIVACY.PRACTICES.OSS'].value_counts(normalize=True).round(4)*100))
prop_df.columns=["percent"]
prop_df









    Out[86]:







  
    
      
      percent
    
  
  
    
      I include my real name.
      63.06%
    
    
      I usually contribute using a consistent pseudonym that is easily linked to my real name online.
      26.18%
    
    
      I usually contribute using a consistent pseudonym that is not linked anywhere with my real name online.
      9.33%
    
    
      I take precautions to use different usernames  in different projects.
      1.44%



In [87]:

    
plot_counts_df = pd.DataFrame(survey_df['TRANSPARENCY.PRIVACY.PRACTICES.OSS'].value_counts())
idx = ['I include my real name.',
       'I usually use a consistent pseudonym that\nis easily linked to my real name online.',
       'I usually use a consistent pseudonym that\nis not linked anywhere with my real name online.',
       'I take precautions to use different\npseudonymns on different platforms.']
plot_counts_df.index = idx



In [88]:

    
ax = plot_counts_df.plot(kind='barh', figsize=[12,6])
plt.title("Which of the following best describes your\npractices when making open source contributions?")
t = ax.set_xlabel("Count of responses")

Mentorship / Help

RECEIVED.HELP



In [89]:

    
counts_df = pd.DataFrame(survey_df['RECEIVED.HELP'].value_counts())
counts_df.columns = ["count"]
counts_df



In [90]:

    
prop_df = pd.DataFrame((survey_df['RECEIVED.HELP'].value_counts(normalize=True).round(4)*100))
prop_df.columns=["percent"]
prop_df



In [91]:

    
ax = pd.DataFrame(survey_df['RECEIVED.HELP'].value_counts()).plot(kind='barh')
plt.title("Have you ever received any kind of help from other people\nrelated to using or contributing to an open source project?")
t = ax.set_xlabel("Count of responses")

Thinking of the most recent case where someone helped you, how did you find someone to help you?



In [92]:

    
counts_df = pd.DataFrame(survey_df['FIND.HELPER'].value_counts())
counts_df.columns = ["count"]
counts_df









    Out[92]:







  
    
      
      count
    
  
  
    
      I asked for help in a public forum (e.g. in a GitHub Issue, project mailing list, etc.) and someone responded.
      2057
    
    
      I asked a specific person for help.
      403
    
    
      Someone offered me unsolicited help.
      272
    
    
      Other - Please describe
      64



In [93]:

    
prop_df = pd.DataFrame((survey_df['FIND.HELPER'].value_counts(normalize=True).round(4)*100))
prop_df.columns=["percent"]
prop_df









    Out[93]:







  
    
      
      percent
    
  
  
    
      I asked for help in a public forum (e.g. in a GitHub Issue, project mailing list, etc.) and someone responded.
      73.57%
    
    
      I asked a specific person for help.
      14.41%
    
    
      Someone offered me unsolicited help.
      9.73%
    
    
      Other - Please describe
      2.29%



In [94]:

    
ax = pd.DataFrame(survey_df['FIND.HELPER'].value_counts()).plot(kind='barh')
ax.set_xlabel("Count of responses")
plt.title("How did you find someone to help you?")
t = ax.set_yticklabels(['I asked for help in a public forum\n(e.g. in a GitHub Issue, project mailing list, etc.)\nand someone responded.',
 'I asked a specific person for help.',
 'Someone offered me unsolicited help.',
 'Other - Please describe'])

Which best describes your prior relationship with the person who helped you?

HELPER.PRIOR.RELATIONSHIP



In [95]:

    
counts_df = pd.DataFrame(survey_df['HELPER.PRIOR.RELATIONSHIP'].value_counts())
counts_df.columns = ["count"]
counts_df









    Out[95]:







  
    
      
      count
    
  
  
    
      Total strangers, I didn't know of them previously.
      1565
    
    
      I knew of them through their contributions to projects, but didn't know them personally.
      809
    
    
      We knew each other a little.
      211
    
    
      We knew each other well.
      208



In [96]:

    
prop_df = pd.DataFrame((survey_df['HELPER.PRIOR.RELATIONSHIP'].value_counts(normalize=True).round(4)*100))
prop_df.columns=["percent"]
prop_df









    Out[96]:







  
    
      
      percent
    
  
  
    
      Total strangers, I didn't know of them previously.
      56.03%
    
    
      I knew of them through their contributions to projects, but didn't know them personally.
      28.97%
    
    
      We knew each other a little.
      7.55%
    
    
      We knew each other well.
      7.45%



In [97]:

    
ax = pd.DataFrame(survey_df['HELPER.PRIOR.RELATIONSHIP'].value_counts()).plot(kind='barh')
plt.title("Which best describes your prior\nrelationship with the person who helped you?")
t = ax.set_xlabel("Count of responses")

What kind of problem did they help you with?

RECEIVED.HELP.TYPE



In [98]:

    
counts_df = pd.DataFrame(survey_df['RECEIVED.HELP.TYPE'].value_counts())
counts_df.columns = ["count"]
counts_df









    Out[98]:







  
    
      
      count
    
  
  
    
      Writing code or otherwise implementing ideas.
      1633
    
    
      Installing or using an application.
      820
    
    
      Understanding community norms (e.g. how to submit a contribution, how to communicate effectively).
      181
    
    
      Other (please describe)
      142
    
    
      Introductions to other people
      13



In [99]:

    
prop_df = pd.DataFrame((survey_df['RECEIVED.HELP.TYPE'].value_counts(normalize=True).round(4)*100))
prop_df.columns=["percent"]
prop_df









    Out[99]:







  
    
      
      percent
    
  
  
    
      Writing code or otherwise implementing ideas.
      58.55%
    
    
      Installing or using an application.
      29.40%
    
    
      Understanding community norms (e.g. how to submit a contribution, how to communicate effectively).
      6.49%
    
    
      Other (please describe)
      5.09%
    
    
      Introductions to other people
      0.47%



In [100]:

    
ax = pd.DataFrame(survey_df['RECEIVED.HELP.TYPE'].value_counts()).plot(kind='barh')
plt.title("What kind of problem did they help you with?")
t = ax.set_xlabel("Count of responses")

t = ax.set_yticklabels(['Writing code or otherwise implementing ideas.',
       'Installing or using an application.',
       'Understanding community norms (e.g. how to submit\na contribution, how to communicate effectively).',
       'Other (please describe)', 'Introductions to other people'])

Have you ever provided help for another person on an open source project?

PROVIDED.HELP



In [101]:

    
counts_df = pd.DataFrame(survey_df['PROVIDED.HELP'].value_counts())
counts_df.columns = ["count"]
counts_df



In [102]:

    
prop_df = pd.DataFrame((survey_df['PROVIDED.HELP'].value_counts(normalize=True).round(4)*100))
prop_df.columns=["percent"]
prop_df



In [103]:

    
ax = pd.DataFrame(survey_df['PROVIDED.HELP'].value_counts()).plot(kind='barh')
plt.title("Have you ever provided help for another\nperson on an open source project?")
t = ax.set_xlabel("Count of responses")



In [ ]:

Thinking of the most recent case where you helped someone, how did you come to help this person?

FIND.HELPEE



In [104]:

    
counts_df = pd.DataFrame(survey_df['FIND.HELPEE'].value_counts())
counts_df.columns = ["count"]
counts_df









    Out[104]:







  
    
      
      count
    
  
  
    
      They asked for help in a public forum (e.g. in a GitHub Issue, project mailing list, etc.) and I responded.
      1839
    
    
      They asked me directly for help.
      566
    
    
      I reached out to them to offer unsolicited help.
      405
    
    
      Other (please describe)
      28



In [105]:

    
prop_df = pd.DataFrame((survey_df['FIND.HELPEE'].value_counts(normalize=True).round(4)*100))
prop_df.columns=["percent"]
prop_df









    Out[105]:







  
    
      
      percent
    
  
  
    
      They asked for help in a public forum (e.g. in a GitHub Issue, project mailing list, etc.) and I responded.
      64.80%
    
    
      They asked me directly for help.
      19.94%
    
    
      I reached out to them to offer unsolicited help.
      14.27%
    
    
      Other (please describe)
      0.99%



In [106]:

    
ax = pd.DataFrame(survey_df['FIND.HELPEE'].value_counts()).plot(kind='barh')
ax.set_xlabel("Count of responses")
t = plt.title("How did you come to help this person?")
t = ax.set_yticklabels(['They asked for help in a public forum\n(e.g. in a GitHub Issue, project mailing list, etc.)\n and I responded.',
       'They asked me directly for help.',
       'I reached out to them to offer unsolicited help.',
       'Other (please describe)'])

Which best describes your prior relationship with the person you helped?

HELPEE.PRIOR.RELATIONSHIP



In [107]:

    
counts_df = pd.DataFrame(survey_df['HELPEE.PRIOR.RELATIONSHIP'].value_counts())
counts_df.columns = ["count"]
counts_df









    Out[107]:







  
    
      
      count
    
  
  
    
      Total strangers, I didn't know of them previously.
      1984
    
    
      We knew each other well.
      292
    
    
      I knew of them through their contributions to projects, but didn't know them personally.
      288
    
    
      We knew each other  a little.
      275



In [108]:

    
prop_df = pd.DataFrame((survey_df['HELPEE.PRIOR.RELATIONSHIP'].value_counts(normalize=True).round(4)*100))
prop_df.columns=["percent"]
prop_df









    Out[108]:







  
    
      
      percent
    
  
  
    
      Total strangers, I didn't know of them previously.
      69.88%
    
    
      We knew each other well.
      10.29%
    
    
      I knew of them through their contributions to projects, but didn't know them personally.
      10.14%
    
    
      We knew each other  a little.
      9.69%



In [109]:

    
ax = pd.DataFrame(survey_df['HELPEE.PRIOR.RELATIONSHIP'].value_counts()).plot(kind='barh')
plt.title("Which best describes your prior\nrelationship with the person you helped?")
t = ax.set_xlabel("Count of responses")

What kind of problem did you help them with?

PROVIDED.HELP.TYPE



In [110]:

    
counts_df = pd.DataFrame(survey_df['PROVIDED.HELP.TYPE'].value_counts())
counts_df.columns = ["count"]
counts_df









    Out[110]:







  
    
      
      count
    
  
  
    
      Writing code or otherwise implementing ideas.
      1602
    
    
      Installing or using an application.
      1028
    
    
      Other (please describe)
      101
    
    
      Understanding community norms (e.g. how to submit a contribution, how to communicate effectively).
      99
    
    
      Introductions to other people.
      8



In [111]:

    
prop_df = pd.DataFrame((survey_df['PROVIDED.HELP.TYPE'].value_counts(normalize=True).round(4)*100))
prop_df.columns=["percent"]
prop_df









    Out[111]:







  
    
      
      percent
    
  
  
    
      Writing code or otherwise implementing ideas.
      56.45%
    
    
      Installing or using an application.
      36.22%
    
    
      Other (please describe)
      3.56%
    
    
      Understanding community norms (e.g. how to submit a contribution, how to communicate effectively).
      3.49%
    
    
      Introductions to other people.
      0.28%



In [112]:

    
ax = pd.DataFrame(survey_df['PROVIDED.HELP.TYPE'].value_counts()).plot(kind='barh')
plt.title("What kind of problem did you help them with?")
t = ax.set_xlabel("Count of responses")
t = ax.set_yticklabels(['Writing code or otherwise implementing ideas.',
       'Installing or using an application.', 'Other (please describe)',
       'Understanding community norms (e.g. how to\nsubmit a contribution, how to communicate effectively).',
       'Introductions to other people.'])

Open Source Software in Paid Work

Do you contribute to open source as part of your professional work?

OSS.AS.JOB: Do you contribute to open source as part of your professional work? In other words, are you paid for any of your time spent on open source contributions?

Yes, indirectly- I contribute to open source in carrying out my work duties, but I am not required or expected to do so.
No.
Yes, directly- some or all of my work duties include contributing to open source projects.



In [113]:

    
counts_df = pd.DataFrame(survey_df['OSS.AS.JOB'].value_counts())
counts_df.columns = ["count"]
counts_df









    Out[113]:







  
    
      
      count
    
  
  
    
      Yes, indirectly- I contribute to open source in carrying out my work duties, but I am not required or expected to do so.
      896
    
    
      No.
      687
    
    
      Yes, directly-  some or all of my work duties include contributing to open source projects.
      464



In [114]:

    
prop_df = pd.DataFrame((survey_df['OSS.AS.JOB'].value_counts(normalize=True).round(4)*100))
prop_df.columns=["percent"]
prop_df









    Out[114]:







  
    
      
      percent
    
  
  
    
      Yes, indirectly- I contribute to open source in carrying out my work duties, but I am not required or expected to do so.
      43.77%
    
    
      No.
      33.56%
    
    
      Yes, directly-  some or all of my work duties include contributing to open source projects.
      22.67%



In [115]:

    
oss_as_job_df = pd.DataFrame(survey_df['OSS.AS.JOB'].value_counts())
oss_as_job_df.index = ["Yes, indirectly", "No", "Yes, directly"]
ax = oss_as_job_df.plot(kind='barh')
plt.title("Do you contribute to open source\nas part of your professional work?")
t = ax.set_xlabel("Count of responses")

How often do you use open source software in your professional work?

OSS.AT.WORK



In [116]:

    
counts_df = pd.DataFrame(survey_df['OSS.AT.WORK'].value_counts())
counts_df.columns = ["count"]
counts_df









    Out[116]:







  
    
      
      count
    
  
  
    
      Frequently
      2191
    
    
      Sometimes
      300
    
    
      Rarely
      110
    
    
      Never
      65



In [117]:

    
prop_df = pd.DataFrame((survey_df['OSS.AT.WORK'].value_counts(normalize=True).round(4)*100))
prop_df.columns=["percent"]
prop_df









    Out[117]:







  
    
      
      percent
    
  
  
    
      Frequently
      82.18%
    
    
      Sometimes
      11.25%
    
    
      Rarely
      4.13%
    
    
      Never
      2.44%



In [118]:

    
ax = pd.DataFrame(survey_df['OSS.AT.WORK'].value_counts()).plot(kind='barh')
plt.title("How often do you use open source\nsoftware in your professional work?")
t = ax.set_xlabel("Count of responses")

How does your employer's intellectual property agreement/policy affect your free-time contributions to open source unrelated to your work?

OSS.IP.POLICY



In [119]:

    
counts_df = pd.DataFrame(survey_df['OSS.IP.POLICY'].value_counts())
counts_df.columns = ["count"]
counts_df









    Out[119]:







  
    
      
      count
    
  
  
    
      I am free to contribute without asking for permission.
      1178
    
    
      My employer doesn't have a clear policy on this.
      695
    
    
      I am permitted to contribute to open source, but need to ask for permission.
      287
    
    
      I'm not sure.
      238
    
    
      Not applicable
      180
    
    
      I am not permitted to contribute to open source at all.
      63



In [120]:

    
prop_df = pd.DataFrame((survey_df['OSS.IP.POLICY'].value_counts(normalize=True).round(4)*100))
prop_df.columns=["percent"]
prop_df









    Out[120]:







  
    
      
      percent
    
  
  
    
      I am free to contribute without asking for permission.
      44.60%
    
    
      My employer doesn't have a clear policy on this.
      26.32%
    
    
      I am permitted to contribute to open source, but need to ask for permission.
      10.87%
    
    
      I'm not sure.
      9.01%
    
    
      Not applicable
      6.82%
    
    
      I am not permitted to contribute to open source at all.
      2.39%



In [121]:

    
ax = pd.DataFrame(survey_df['OSS.IP.POLICY'].value_counts()).plot(kind='barh')
plt.title("How does your employer's intellectual property\nagreement/policy affect your free-time contributions\nto open source unrelated to your work?")
t = ax.set_xlabel("Count of responses")

Which is closest to your employer’s policy on using open source software applications?



In [122]:

    
counts_df = pd.DataFrame(survey_df['EMPLOYER.POLICY.APPLICATIONS'].value_counts())
counts_df.columns = ["count"]
counts_df









    Out[122]:







  
    
      
      count
    
  
  
    
      Use of open source applications is encouraged.
      1174
    
    
      Use of open source applications is acceptable if it is the most appropriate tool.
      916
    
    
      My employer doesn't have a clear policy on this.
      338
    
    
      Not applicable
      88
    
    
      I'm not sure.
      83
    
    
      Use of open source applications is rarely, if ever, permitted.
      42



In [123]:

    
prop_df = pd.DataFrame((survey_df['EMPLOYER.POLICY.APPLICATIONS'].value_counts(normalize=True).round(4)*100))
prop_df.columns=["percent"]
prop_df









    Out[123]:







  
    
      
      percent
    
  
  
    
      Use of open source applications is encouraged.
      44.45%
    
    
      Use of open source applications is acceptable if it is the most appropriate tool.
      34.68%
    
    
      My employer doesn't have a clear policy on this.
      12.80%
    
    
      Not applicable
      3.33%
    
    
      I'm not sure.
      3.14%
    
    
      Use of open source applications is rarely, if ever, permitted.
      1.59%



In [124]:

    
ax = pd.DataFrame(survey_df['EMPLOYER.POLICY.APPLICATIONS'].value_counts()).plot(kind='barh')
ax.set_xlabel("Count of responses")
t = plt.title("Which is closest to your employer’s policy\non using open source software applications?")

How important do you think your involvement in open source was to getting your current job?

OSS.HIRING



In [125]:

    
counts_df = pd.DataFrame(survey_df['OSS.HIRING'].value_counts())
counts_df.columns = ["count"]
counts_df









    Out[125]:







  
    
      
      count
    
  
  
    
      Very important
      618
    
    
      Somewhat important
      448
    
    
      Not at all important
      361
    
    
      Not too important
      352
    
    
      Not applicable-I hadn't made any contributions when I got this job.
      254



In [126]:

    
prop_df = pd.DataFrame((survey_df['OSS.HIRING'].value_counts(normalize=True).round(4)*100))
prop_df.columns=["percent"]
prop_df









    Out[126]:







  
    
      
      percent
    
  
  
    
      Very important
      30.40%
    
    
      Somewhat important
      22.04%
    
    
      Not at all important
      17.76%
    
    
      Not too important
      17.31%
    
    
      Not applicable-I hadn't made any contributions when I got this job.
      12.49%



In [127]:

    
ax = pd.DataFrame(survey_df['OSS.HIRING'].value_counts()).plot(kind='barh')
ax.set_xlabel("Count of responses")
t = plt.title("How important do you think your involvement\nin open source was to getting your current job?")

Demographics

Do you currently live in a country other than the one in which you were born?

IMMIGRATION



In [128]:

    
counts_df = pd.DataFrame(survey_df['IMMIGRATION'].value_counts())
counts_df.columns = ["count"]
counts_df









    Out[128]:







  
    
      
      count
    
  
  
    
      No, I live in the country where I was born.
      2764
    
    
      Yes, and I intend to stay permanently.
      513
    
    
      Yes, and I am not sure about my future plans.
      292
    
    
      Yes, and I intend to stay temporarily.
      165



In [129]:

    
prop_df = pd.DataFrame((survey_df['IMMIGRATION'].value_counts(normalize=True).round(4)*100))
prop_df.columns=["percent"]
prop_df









    Out[129]:







  
    
      
      percent
    
  
  
    
      No, I live in the country where I was born.
      74.02%
    
    
      Yes, and I intend to stay permanently.
      13.74%
    
    
      Yes, and I am not sure about my future plans.
      7.82%
    
    
      Yes, and I intend to stay temporarily.
      4.42%



In [130]:

    
ax = pd.DataFrame(survey_df['IMMIGRATION'].value_counts()).plot(kind='barh')
ax.set_xlabel("Count of responses")
t = plt.title("Do you currently live in a country other\nthan the one in which you were born?")

Thinking of where you were born, are you a member of an ethnicity or nationality that is a considered a minority in that country?

MINORITY.HOMECOUNTRY



In [131]:

    
counts_df = pd.DataFrame(survey_df['MINORITY.HOMECOUNTRY'].value_counts())
counts_df.columns = ["count"]
counts_df









    Out[131]:







  
    
      
      count
    
  
  
    
      No
      754
    
    
      Yes
      124
    
    
      Not sure
      45
    
    
      Prefer not to say
      34



In [132]:

    
prop_df = pd.DataFrame((survey_df['MINORITY.HOMECOUNTRY'].value_counts(normalize=True).round(4)*100))
prop_df.columns=["percent"]
prop_df









    Out[132]:







  
    
      
      percent
    
  
  
    
      No
      78.79%
    
    
      Yes
      12.96%
    
    
      Not sure
      4.70%
    
    
      Prefer not to say
      3.55%



In [133]:

    
ax = pd.DataFrame(survey_df['MINORITY.HOMECOUNTRY'].value_counts()).plot(kind='barh')
ax.set_xlabel("Count of responses")
t = plt.title("Thinking of where you were born, are\nyou a member of an ethnicity or nationality that\nis a considered a minority in that country?")



In [ ]:

Thinking of where you currently live, are you a member of an ethnicity or nationality that is a considered a minority in that country?

MINORITY.CURRENT.COUNTRY



In [134]:

    
counts_df = pd.DataFrame(survey_df['MINORITY.CURRENT.COUNTRY'].value_counts())
counts_df.columns = ["count"]
counts_df









    Out[134]:







  
    
      
      count
    
  
  
    
      No
      2837
    
    
      Yes
      546
    
    
      Not sure
      193
    
    
      Prefer not to say
      156



In [135]:

    
prop_df = pd.DataFrame((survey_df['MINORITY.CURRENT.COUNTRY'].value_counts(normalize=True).round(4)*100))
prop_df.columns=["percent"]
prop_df









    Out[135]:







  
    
      
      percent
    
  
  
    
      No
      76.02%
    
    
      Yes
      14.63%
    
    
      Not sure
      5.17%
    
    
      Prefer not to say
      4.18%



In [136]:

    
ax = pd.DataFrame(survey_df['MINORITY.CURRENT.COUNTRY'].value_counts()).plot(kind='barh')
ax.set_xlabel("Count of responses")
t = plt.title("Thinking of where you currently live, are you\na member of an ethnicity or nationality that is a\nconsidered a minority in that country?")

What is your gender?

GENDER



In [137]:

    
counts_df = pd.DataFrame(survey_df['GENDER'].value_counts())
counts_df.columns = ["count"]
counts_df









    Out[137]:







  
    
      
      count
    
  
  
    
      Man
      3387
    
    
      Prefer not to say
      173
    
    
      Woman
      125
    
    
      Non-binary  or Other
      39



In [138]:

    
prop_df = pd.DataFrame((survey_df['GENDER'].value_counts(normalize=True).round(4)*100))
prop_df.columns=["percent"]
prop_df









    Out[138]:







  
    
      
      percent
    
  
  
    
      Man
      90.95%
    
    
      Prefer not to say
      4.65%
    
    
      Woman
      3.36%
    
    
      Non-binary  or Other
      1.05%



In [139]:

    
ax = pd.DataFrame(survey_df['GENDER'].value_counts()).plot(kind='barh')
ax.set_xlabel("Count of responses")
t = plt.title("What is your gender?")

Do you identify as transgender?

TRANSGENDER.IDENTITY



In [140]:

    
counts_df = pd.DataFrame(survey_df['TRANSGENDER.IDENTITY'].value_counts())
counts_df.columns = ["count"]
counts_df









    Out[140]:







  
    
      
      count
    
  
  
    
      No
      3494
    
    
      Prefer not to say
      158
    
    
      Yes
      33
    
    
      Not sure
      30



In [141]:

    
prop_df = pd.DataFrame((survey_df['TRANSGENDER.IDENTITY'].value_counts(normalize=True).round(4)*100))
prop_df.columns=["percent"]
prop_df









    Out[141]:







  
    
      
      percent
    
  
  
    
      No
      94.05%
    
    
      Prefer not to say
      4.25%
    
    
      Yes
      0.89%
    
    
      Not sure
      0.81%



In [142]:

    
ax = pd.DataFrame(survey_df['TRANSGENDER.IDENTITY'].value_counts()).plot(kind='barh')
ax.set_xlabel("Count of responses")
t = plt.title("Do you identify as transgender?")



In [ ]:

Do you identify as gay, lesbian, or bisexual, asexual, or any other minority sexual orientation?

SEXUAL.ORIENTATION



In [143]:

    
counts_df = pd.DataFrame(survey_df['SEXUAL.ORIENTATION'].value_counts())
counts_df.columns = ["count"]
counts_df









    Out[143]:







  
    
      
      count
    
  
  
    
      No
      3187
    
    
      Yes
      246
    
    
      Prefer not to say
      201
    
    
      Not sure
      85



In [144]:

    
prop_df = pd.DataFrame((survey_df['SEXUAL.ORIENTATION'].value_counts(normalize=True).round(4)*100))
prop_df.columns=["percent"]
prop_df









    Out[144]:







  
    
      
      percent
    
  
  
    
      No
      85.70%
    
    
      Yes
      6.61%
    
    
      Prefer not to say
      5.40%
    
    
      Not sure
      2.29%



In [145]:

    
ax = pd.DataFrame(survey_df['SEXUAL.ORIENTATION'].value_counts()).plot(kind='barh')
ax.set_xlabel("Count of responses")
t = plt.title("Do you identify as gay, lesbian, or bisexual,\nasexual, or any other minority sexual orientation?")

How well can you read and write in English?

WRITTEN.ENGLISH



In [146]:

    
counts_df = pd.DataFrame(survey_df['WRITTEN.ENGLISH'].value_counts())
counts_df.columns = ["count"]
counts_df









    Out[146]:







  
    
      
      count
    
  
  
    
      Very well
      2865
    
    
      Moderately well
      742
    
    
      Not very well
      108
    
    
      Not at all
      6



In [147]:

    
prop_df = pd.DataFrame((survey_df['WRITTEN.ENGLISH'].value_counts(normalize=True).round(4)*100))
prop_df.columns=["percent"]
prop_df









    Out[147]:







  
    
      
      percent
    
  
  
    
      Very well
      77.00%
    
    
      Moderately well
      19.94%
    
    
      Not very well
      2.90%
    
    
      Not at all
      0.16%



In [148]:

    
ax = pd.DataFrame(survey_df['WRITTEN.ENGLISH'].value_counts()).plot(kind='barh')
ax.set_xlabel("Count of responses")
t = plt.title("How well can you read and write in English?")



In [ ]:

What is your age?

AGE



In [149]:

    
counts_df = pd.DataFrame(survey_df['AGE'].value_counts().sort_index())
counts_df.columns = ["count"]
counts_df









    Out[149]:







  
    
      
      count
    
  
  
    
      17 or younger
      139
    
    
      18 to 24 years
      871
    
    
      25 to 34 years
      1400
    
    
      35 to 44 years
      772
    
    
      45 to 54 years
      267
    
    
      55 to 64 years
      93
    
    
      65 years or older
      36



In [150]:

    
prop_df = pd.DataFrame((survey_df['AGE'].value_counts(normalize=True).round(4)*100).sort_index())
prop_df.columns=["percent"]
prop_df









    Out[150]:







  
    
      
      percent
    
  
  
    
      17 or younger
      3.88%
    
    
      18 to 24 years
      24.34%
    
    
      25 to 34 years
      39.13%
    
    
      35 to 44 years
      21.58%
    
    
      45 to 54 years
      7.46%
    
    
      55 to 64 years
      2.60%
    
    
      65 years or older
      1.01%



In [151]:

    
ax = pd.DataFrame(survey_df['AGE'].value_counts().sort_index()).plot(kind='barh')
ax.set_xlabel("Count of responses")
t = plt.title("What is your age?")



In [ ]:

What is highest level of formal education that you have completed?

FORMAL.EDUCATION



In [152]:

    
counts_df = pd.DataFrame(survey_df['FORMAL.EDUCATION'].value_counts())
counts_df.columns = ["count"]
counts_df









    Out[152]:







  
    
      
      count
    
  
  
    
      Bachelor's degree
      1321
    
    
      Master's degree
      852
    
    
      Some college, no degree
      640
    
    
      Secondary (high) school graduate or equivalent
      375
    
    
      Doctorate (Ph.D.) or other advanced degree (e.g. M.D., J.D.)
      256
    
    
      Vocational/trade program or apprenticeship
      127
    
    
      Less than secondary (high) school
      126



In [153]:

    
prop_df = pd.DataFrame((survey_df['FORMAL.EDUCATION'].value_counts(normalize=True).round(4)*100))
prop_df.columns=["percent"]
prop_df









    Out[153]:







  
    
      
      percent
    
  
  
    
      Bachelor's degree
      35.73%
    
    
      Master's degree
      23.05%
    
    
      Some college, no degree
      17.31%
    
    
      Secondary (high) school graduate or equivalent
      10.14%
    
    
      Doctorate (Ph.D.) or other advanced degree (e.g. M.D., J.D.)
      6.92%
    
    
      Vocational/trade program or apprenticeship
      3.44%
    
    
      Less than secondary (high) school
      3.41%



In [154]:

    
order = ["Less than secondary (high) school",
         "Secondary (high) school graduate or equivalent",
         "Vocational/trade program or apprenticeship",
         "Some college, no degree",
         "Bachelor's degree",
         "Master's degree",
         "Doctorate (Ph.D.) or other advanced degree (e.g. M.D., J.D.)"]

edu_counts_df = survey_df['FORMAL.EDUCATION'].value_counts()[order]

ax = edu_counts_df.plot(kind='barh')
ax.set_xlabel("Count of responses")
t = plt.title("What is highest level of formal education that you have completed?")



In [ ]:

What is the highest level of formal education that either of your parents completed?

PARENTS.FORMAL.EDUCATION



In [155]:

    
counts_df = pd.DataFrame(survey_df['PARENTS.FORMAL.EDUCATION'].value_counts())
counts_df.columns = ["count"]
counts_df









    Out[155]:







  
    
      
      count
    
  
  
    
      Bachelor's degree
      961
    
    
      Master's degree
      871
    
    
      Secondary (high) school graduate or equivalent
      566
    
    
      Some college, no degree
      388
    
    
      Doctorate (Ph.D.) or other advanced degree (e.g. M.D., J.D.)
      387
    
    
      Vocational/trade program or apprenticeship
      257
    
    
      Less than secondary (high) school
      243



In [156]:

    
prop_df = pd.DataFrame((survey_df['PARENTS.FORMAL.EDUCATION'].value_counts(normalize=True).round(4)*100))
prop_df.columns=["percent"]
prop_df









    Out[156]:







  
    
      
      percent
    
  
  
    
      Bachelor's degree
      26.16%
    
    
      Master's degree
      23.71%
    
    
      Secondary (high) school graduate or equivalent
      15.41%
    
    
      Some college, no degree
      10.56%
    
    
      Doctorate (Ph.D.) or other advanced degree (e.g. M.D., J.D.)
      10.54%
    
    
      Vocational/trade program or apprenticeship
      7.00%
    
    
      Less than secondary (high) school
      6.62%



In [157]:

    
order = ["Less than secondary (high) school",
         "Secondary (high) school graduate or equivalent",
         "Vocational/trade program or apprenticeship",
         "Some college, no degree",
         "Bachelor's degree",
         "Master's degree",
         "Doctorate (Ph.D.) or other advanced degree (e.g. M.D., J.D.)"]

edu_counts_df = survey_df['PARENTS.FORMAL.EDUCATION'].value_counts()[order]

ax = edu_counts_df.plot(kind='barh')
ax.set_xlabel("Count of responses")
t = plt.title("What is highest level of formal education that either of your parents completed?")



In [ ]:



In [ ]:

How old were you when you first had regular access to a computer with an internet connection?

AGE.AT.FIRST.COMPUTER.INTERNET



In [158]:

    
counts_df = pd.DataFrame(survey_df['AGE.AT.FIRST.COMPUTER.INTERNET'].value_counts())
counts_df.columns = ["count"]
counts_df









    Out[158]:







  
    
      
      count
    
  
  
    
      Younger than 13 years old
      1478
    
    
      13 - 17 years old
      1313
    
    
      18 - 24 years old
      695
    
    
      25 - 45 years old
      202
    
    
      Older than 45 years old
      23



In [159]:

    
prop_df = pd.DataFrame((survey_df['AGE.AT.FIRST.COMPUTER.INTERNET'].value_counts(normalize=True).round(4)*100))
prop_df.columns=["percent"]
prop_df









    Out[159]:







  
    
      
      percent
    
  
  
    
      Younger than 13 years old
      39.83%
    
    
      13 - 17 years old
      35.38%
    
    
      18 - 24 years old
      18.73%
    
    
      25 - 45 years old
      5.44%
    
    
      Older than 45 years old
      0.62%



In [160]:

    
ax = pd.DataFrame(survey_df['AGE.AT.FIRST.COMPUTER.INTERNET'].value_counts()).plot(kind='barh')
ax.set_xlabel("Count of responses")
t = plt.title("How old were you when you first had regular\naccess to a computer with an internet connection?")



In [ ]:

Where did you first have regular access to a computer with internet connection?

LOCATION.OF.FIRST.COMPUTER.INTERNET



In [161]:

    
counts_df = pd.DataFrame(survey_df['LOCATION.OF.FIRST.COMPUTER.INTERNET'].value_counts())
counts_df.columns = ["count"]
counts_df









    Out[161]:







  
    
      
      count
    
  
  
    
      At home (belonging to me or a family member)
      2520
    
    
      In a classroom, computer lab, or library at school
      746
    
    
      At an internet cafe or similar space
      182
    
    
      Other (please describe)
      106
    
    
      At a public library or community center
      87
    
    
      At work (recoded from open ended)
      70



In [162]:

    
prop_df = pd.DataFrame((survey_df['LOCATION.OF.FIRST.COMPUTER.INTERNET'].value_counts(normalize=True).round(4)*100))
prop_df.columns=["percent"]
prop_df









    Out[162]:







  
    
      
      percent
    
  
  
    
      At home (belonging to me or a family member)
      67.91%
    
    
      In a classroom, computer lab, or library at school
      20.10%
    
    
      At an internet cafe or similar space
      4.90%
    
    
      Other (please describe)
      2.86%
    
    
      At a public library or community center
      2.34%
    
    
      At work (recoded from open ended)
      1.89%



In [163]:

    
ax = pd.DataFrame(survey_df['LOCATION.OF.FIRST.COMPUTER.INTERNET'].value_counts()).plot(kind='barh', figsize=[8.5,6])
ax.set_xlabel("Count of responses")
t = plt.title("Where did you first have regular access to a\ncomputer with internet connection?")

Where was the respondent surveyed from?

POPLATION



In [164]:

    
counts_df = pd.DataFrame(survey_df['POPULATION'].value_counts())
counts_df.columns = ["count"]
counts_df









    Out[164]:







  
    
      
      count
    
  
  
    
      github
      5495
    
    
      off site community
      534



In [165]:

    
prop_df = pd.DataFrame((survey_df['POPULATION'].value_counts(normalize=True).round(4)*100))
prop_df.columns=["percent"]
prop_df









    Out[165]:







  
    
      
      percent
    
  
  
    
      github
      91.14%
    
    
      off site community
      8.86%



In [166]:

    
ax = pd.DataFrame(survey_df['POPULATION'].value_counts()).plot(kind='barh', figsize=[8.5,6])
ax.set_xlabel("Count of responses")
t = plt.title("Where was the respondent surveyed from?")

Harassment / Inclusiveness of OSS

Have you ever observed any of the following in the context of an open source project?

DISCOURAGING.BEHAVIOR.*



In [167]:

    
discouraging_responses = survey_df[discouraging_vars].apply(pd.Series.value_counts).transpose()[["Yes", "No"]]
discouraging_responses









    Out[167]:







  
    
      
      Yes
      No
    
  
  
    
      DISCOURAGING.BEHAVIOR.LACK.OF.RESPONSE
      3017
      792
    
    
      DISCOURAGING.BEHAVIOR.REJECTION.WOUT.EXPLANATION
      1210
      2580
    
    
      DISCOURAGING.BEHAVIOR.DISMISSIVE.RESPONSE
      2195
      1598
    
    
      DISCOURAGING.BEHAVIOR.BAD.DOCS
      3559
      263
    
    
      DISCOURAGING.BEHAVIOR.CONFLICT
      1830
      1966
    
    
      DISCOURAGING.BEHAVIOR.UNWELCOMING.LANGUAGE
      649
      3158



In [168]:

    
discouraging_percent = pd.DataFrame(discouraging_responses["Yes"] / discouraging_responses.sum(axis=1) * 100, columns=["percent_yes"]).sort_values(by="percent_yes")
discouraging_percent.round(2)









    Out[168]:







  
    
      
      percent_yes
    
  
  
    
      DISCOURAGING.BEHAVIOR.UNWELCOMING.LANGUAGE
      17.05%
    
    
      DISCOURAGING.BEHAVIOR.REJECTION.WOUT.EXPLANATION
      31.93%
    
    
      DISCOURAGING.BEHAVIOR.CONFLICT
      48.21%
    
    
      DISCOURAGING.BEHAVIOR.DISMISSIVE.RESPONSE
      57.87%
    
    
      DISCOURAGING.BEHAVIOR.LACK.OF.RESPONSE
      79.21%
    
    
      DISCOURAGING.BEHAVIOR.BAD.DOCS
      93.12%



In [169]:

    
sns.set(style="whitegrid", font_scale=1.75)
fig, ax = plt.subplots()
cmap=matplotlib.cm.Blues_r
discouraging_responses.sort_values(by="No").plot.barh(stacked=True, ax=ax, figsize=[12,6], cmap=cmap, edgecolor='black', linewidth=1)

labels = []
for l in ax.get_yticklabels():
    title_text = l.get_text()[22:].replace(".", " ") # cut off "CONTRIBUTOR.TYPE"
        
    labels.append(title_text)
    
ax.set_yticklabels(labels)


plt.title("Have you ever observed any of the following in the context of an open source project?")

plt.xlabel("Number of responses")



legend = plt.legend(fancybox=True, loc='upper center', bbox_to_anchor=(.5, -.13), ncol=4, shadow=True)
legend.get_frame().set_edgecolor('b')
legend.get_frame().set_facecolor('white')



In [ ]:

Have you ever witnessed any of the following behaviors directed at another person in the context of an open source project? (not including something directed at you)

NEGATIVE.WITNESS.*



In [170]:

    
neg_witness_responses = neg_df[neg_witness_vars].apply(pd.Series.value_counts).transpose()[[1,0]]
neg_witness_responses.columns = ["Yes", "Blank"]
neg_witness_responses









    Out[170]:







  
    
      
      Yes
      Blank
    
  
  
    
      NEGATIVE.WITNESS.RUDENESS
      1753
      4276
    
    
      NEGATIVE.WITNESS.NAME.CALLING
      789
      5240
    
    
      NEGATIVE.WITNESS.THREATS
      162
      5867
    
    
      NEGATIVE.WITNESS.IMPERSONATION
      177
      5852
    
    
      NEGATIVE.WITNESS.SUSTAINED.HARASSMENT
      237
      5792
    
    
      NEGATIVE.WITNESS.CROSS.PLATFORM.HARASSMENT
      175
      5854
    
    
      NEGATIVE.WITNESS.STALKING
      108
      5921
    
    
      NEGATIVE.WITNESS.SEXUAL.ADVANCES
      136
      5893
    
    
      NEGATIVE.WITNESS.STEREOTYPING
      423
      5606
    
    
      NEGATIVE.WITNESS.DOXXING
      151
      5878
    
    
      NEGATIVE.WITNESS.OTHER
      78
      5951
    
    
      NEGATIVE.WITNESS.NONE.OF.THE.ABOVE
      1721
      4308
    
    
      NEGATIVE.WITNESS.ANY.RESPONSE
      3664
      2365

Only 3,664 respondents clicked any boxes in this question, meaning 2,365 did not click the "none of the above" or an option (or even get to this question). We have to adjust the no responses accordingly.



In [171]:

    
neg_witness_responses_adj = neg_witness_responses
neg_witness_responses_adj["Blank"] = neg_witness_responses_adj["Blank"] - 2365
neg_witness_responses_adj_df = pd.DataFrame(neg_witness_responses_adj["Yes"] / (neg_witness_responses_adj["Yes"] + neg_witness_responses_adj["Blank"]) * 100, columns=["percent_yes"])



In [172]:

    
neg_witness_responses_adj.columns = ["Yes", "No"]
neg_witness_responses_adj[:-1]









    Out[172]:







  
    
      
      Yes
      No
    
  
  
    
      NEGATIVE.WITNESS.RUDENESS
      1753
      1911
    
    
      NEGATIVE.WITNESS.NAME.CALLING
      789
      2875
    
    
      NEGATIVE.WITNESS.THREATS
      162
      3502
    
    
      NEGATIVE.WITNESS.IMPERSONATION
      177
      3487
    
    
      NEGATIVE.WITNESS.SUSTAINED.HARASSMENT
      237
      3427
    
    
      NEGATIVE.WITNESS.CROSS.PLATFORM.HARASSMENT
      175
      3489
    
    
      NEGATIVE.WITNESS.STALKING
      108
      3556
    
    
      NEGATIVE.WITNESS.SEXUAL.ADVANCES
      136
      3528
    
    
      NEGATIVE.WITNESS.STEREOTYPING
      423
      3241
    
    
      NEGATIVE.WITNESS.DOXXING
      151
      3513
    
    
      NEGATIVE.WITNESS.OTHER
      78
      3586
    
    
      NEGATIVE.WITNESS.NONE.OF.THE.ABOVE
      1721
      1943



In [173]:

    
neg_witness_responses_adj_df.sort_values(by="percent_yes").round(2)









    Out[173]:







  
    
      
      percent_yes
    
  
  
    
      NEGATIVE.WITNESS.OTHER
      2.13%
    
    
      NEGATIVE.WITNESS.STALKING
      2.95%
    
    
      NEGATIVE.WITNESS.SEXUAL.ADVANCES
      3.71%
    
    
      NEGATIVE.WITNESS.DOXXING
      4.12%
    
    
      NEGATIVE.WITNESS.THREATS
      4.42%
    
    
      NEGATIVE.WITNESS.CROSS.PLATFORM.HARASSMENT
      4.78%
    
    
      NEGATIVE.WITNESS.IMPERSONATION
      4.83%
    
    
      NEGATIVE.WITNESS.SUSTAINED.HARASSMENT
      6.47%
    
    
      NEGATIVE.WITNESS.STEREOTYPING
      11.54%
    
    
      NEGATIVE.WITNESS.NAME.CALLING
      21.53%
    
    
      NEGATIVE.WITNESS.NONE.OF.THE.ABOVE
      46.97%
    
    
      NEGATIVE.WITNESS.RUDENESS
      47.84%
    
    
      NEGATIVE.WITNESS.ANY.RESPONSE
      100.00%



In [174]:

    
sns.set(style="whitegrid", font_scale=1.75)
fig, ax = plt.subplots()
cmap=matplotlib.cm.Blues_r
neg_witness_responses_adj[:-1].sort_values(by='No').plot.barh(stacked=True, ax=ax, figsize=[12,6], cmap=cmap, edgecolor='black', linewidth=1)

labels = []
for l in ax.get_yticklabels():
    title_text = l.get_text()[17:].replace(".", " ") # cut off "CONTRIBUTOR.TYPE"
        
    labels.append(title_text)
    
ax.set_yticklabels(labels)


plt.title("Have you ever witnessed any of the following behaviors\ndirected at another person in the context of an open source\nproject? (not including something directed at you)")

plt.xlabel("Number of responses")



legend = plt.legend(fancybox=True, loc='upper center', bbox_to_anchor=(.5, -.13), ncol=4, shadow=True)
legend.get_frame().set_edgecolor('b')
legend.get_frame().set_facecolor('white')

Have you ever experienced any of the following behaviors directed at you in the context of an open source project?



In [175]:

    
neg_exp_responses = neg_df[neg_exp_vars].apply(pd.Series.value_counts).transpose()[[1,0]]
neg_exp_responses.columns = ["Yes", "Blank"]
neg_exp_responses









    Out[175]:







  
    
      
      Yes
      Blank
    
  
  
    
      NEGATIVE.EXPERIENCE.RUDENESS
      646
      5383
    
    
      NEGATIVE.EXPERIENCE.NAME.CALLING
      192
      5837
    
    
      NEGATIVE.EXPERIENCE.THREATS
      43
      5986
    
    
      NEGATIVE.EXPERIENCE.IMPERSONATION
      45
      5984
    
    
      NEGATIVE.EXPERIENCE.SUSTAINED.HARASSMENT
      55
      5974
    
    
      NEGATIVE.EXPERIENCE.CROSS.PLATFORM.HARASSMENT
      42
      5987
    
    
      NEGATIVE.EXPERIENCE.STALKING
      35
      5994
    
    
      NEGATIVE.EXPERIENCE.SEXUAL.ADVANCES
      25
      6004
    
    
      NEGATIVE.EXPERIENCE.STEREOTYPING
      114
      5915
    
    
      NEGATIVE.EXPERIENCE.DOXXING
      23
      6006
    
    
      NEGATIVE.EXPERIENCE.OTHER
      39
      5990
    
    
      NEGATIVE.EXPERIENCE.NONE.OF.THE.ABOVE
      2900
      3129
    
    
      NEGATIVE.EXPERIENCE.ANY.RESPONSE
      3638
      2391

Only 3,638 respondents clicked any boxes in this question, meaning 2,391 did not click the "none of the above" (or even get to this question). We have to adjust the no responses accordingly.



In [176]:

    
neg_exp_responses_adj = neg_exp_responses
neg_exp_responses_adj["Blank"] = neg_exp_responses_adj["Blank"] - 2391
neg_exp_responses_adj_df = pd.DataFrame(neg_exp_responses["Yes"] / (neg_exp_responses_adj["Yes"] + neg_exp_responses_adj["Blank"]) * 100, columns=["percent_yes"])



In [177]:

    
neg_exp_responses_adj.columns = ["Yes", "No"]
neg_exp_responses_adj[:-1]









    Out[177]:







  
    
      
      Yes
      No
    
  
  
    
      NEGATIVE.EXPERIENCE.RUDENESS
      646
      2992
    
    
      NEGATIVE.EXPERIENCE.NAME.CALLING
      192
      3446
    
    
      NEGATIVE.EXPERIENCE.THREATS
      43
      3595
    
    
      NEGATIVE.EXPERIENCE.IMPERSONATION
      45
      3593
    
    
      NEGATIVE.EXPERIENCE.SUSTAINED.HARASSMENT
      55
      3583
    
    
      NEGATIVE.EXPERIENCE.CROSS.PLATFORM.HARASSMENT
      42
      3596
    
    
      NEGATIVE.EXPERIENCE.STALKING
      35
      3603
    
    
      NEGATIVE.EXPERIENCE.SEXUAL.ADVANCES
      25
      3613
    
    
      NEGATIVE.EXPERIENCE.STEREOTYPING
      114
      3524
    
    
      NEGATIVE.EXPERIENCE.DOXXING
      23
      3615
    
    
      NEGATIVE.EXPERIENCE.OTHER
      39
      3599
    
    
      NEGATIVE.EXPERIENCE.NONE.OF.THE.ABOVE
      2900
      738



In [178]:

    
neg_exp_responses_adj_df.sort_values(by="percent_yes").round(2)









    Out[178]:







  
    
      
      percent_yes
    
  
  
    
      NEGATIVE.EXPERIENCE.DOXXING
      0.63%
    
    
      NEGATIVE.EXPERIENCE.SEXUAL.ADVANCES
      0.69%
    
    
      NEGATIVE.EXPERIENCE.STALKING
      0.96%
    
    
      NEGATIVE.EXPERIENCE.OTHER
      1.07%
    
    
      NEGATIVE.EXPERIENCE.CROSS.PLATFORM.HARASSMENT
      1.15%
    
    
      NEGATIVE.EXPERIENCE.THREATS
      1.18%
    
    
      NEGATIVE.EXPERIENCE.IMPERSONATION
      1.24%
    
    
      NEGATIVE.EXPERIENCE.SUSTAINED.HARASSMENT
      1.51%
    
    
      NEGATIVE.EXPERIENCE.STEREOTYPING
      3.13%
    
    
      NEGATIVE.EXPERIENCE.NAME.CALLING
      5.28%
    
    
      NEGATIVE.EXPERIENCE.RUDENESS
      17.76%
    
    
      NEGATIVE.EXPERIENCE.NONE.OF.THE.ABOVE
      79.71%
    
    
      NEGATIVE.EXPERIENCE.ANY.RESPONSE
      100.00%



In [179]:

    
sns.set(style="whitegrid", font_scale=1.75)
fig, ax = plt.subplots()
cmap=matplotlib.cm.Blues_r
neg_exp_responses_adj[:-1].sort_values(by='No').plot.barh(stacked=True, ax=ax, figsize=[12,6], cmap=cmap, edgecolor='black', linewidth=1)

labels = []
for l in ax.get_yticklabels():
    title_text = l.get_text()[20:].replace(".", " ") # cut off "CONTRIBUTOR.TYPE"
        
    labels.append(title_text)
    
ax.set_yticklabels(labels)


plt.title("Have you ever experienced any of the following behaviors\ndirected at you in the context of an open source project?")

plt.xlabel("Number of responses")



legend = plt.legend(fancybox=True, loc='upper center', bbox_to_anchor=(.5, -.13), ncol=4, shadow=True)
legend.get_frame().set_edgecolor('b')
legend.get_frame().set_facecolor('white')

Thinking of the last time you experienced harassment, how did you respond?

NEGATIVE.RESPONSE.*



In [180]:

    
neg_resp_responses = neg_df[neg_resp_vars].apply(pd.Series.value_counts).transpose()[[1,0]]
neg_resp_responses.columns = ["Yes", "Blank"]
neg_resp_responses









    Out[180]:







  
    
      
      Yes
      Blank
    
  
  
    
      NEGATIVE.RESPONSE.ASKED.USER.TO.STOP
      194
      5835
    
    
      NEGATIVE.RESPONSE.SOLICITED.COMMUNITY.SUPPORT
      112
      5917
    
    
      NEGATIVE.RESPONSE.BLOCKED.USER
      170
      5859
    
    
      NEGATIVE.RESPONSE.REPORTED.TO.MAINTAINERS
      95
      5934
    
    
      NEGATIVE.RESPONSE.REPORTED.TO.HOST.OR.ISP
      20
      6009
    
    
      NEGATIVE.RESPONSE.CONSULTED.LEGAL.COUNSEL
      8
      6021
    
    
      NEGATIVE.RESPONSE.CONTACTED.LAW.ENFORCEMENT
      9
      6020
    
    
      NEGATIVE.RESPONSE.OTHER
      71
      5958
    
    
      NEGATIVE.RESPONSE.IGNORED
      350
      5679
    
    
      NEGATIVE.RESPONSE.ANY.RESPONSE
      719
      5310

Only 719 respondents clicked any boxes in this question, meaning 5,310 did not click on "I did not react / ignored the incident" or any response (or even get to this question). We have to adjust the no responses accordingly.



In [181]:

    
neg_resp_responses_adj = neg_resp_responses
neg_resp_responses_adj["Blank"] = neg_resp_responses_adj["Blank"] - 5310
neg_resp_responses_adj_df = pd.DataFrame(neg_resp_responses_adj["Yes"] / (neg_resp_responses_adj["Yes"] + neg_resp_responses_adj["Blank"]) * 100, columns=["percent_yes"])



In [182]:

    
neg_resp_responses_adj.columns = ["Yes", "No"]
neg_resp_responses_adj[:-1]









    Out[182]:







  
    
      
      Yes
      No
    
  
  
    
      NEGATIVE.RESPONSE.ASKED.USER.TO.STOP
      194
      525
    
    
      NEGATIVE.RESPONSE.SOLICITED.COMMUNITY.SUPPORT
      112
      607
    
    
      NEGATIVE.RESPONSE.BLOCKED.USER
      170
      549
    
    
      NEGATIVE.RESPONSE.REPORTED.TO.MAINTAINERS
      95
      624
    
    
      NEGATIVE.RESPONSE.REPORTED.TO.HOST.OR.ISP
      20
      699
    
    
      NEGATIVE.RESPONSE.CONSULTED.LEGAL.COUNSEL
      8
      711
    
    
      NEGATIVE.RESPONSE.CONTACTED.LAW.ENFORCEMENT
      9
      710
    
    
      NEGATIVE.RESPONSE.OTHER
      71
      648
    
    
      NEGATIVE.RESPONSE.IGNORED
      350
      369



In [183]:

    
neg_resp_responses_adj_df.sort_values(by="percent_yes").round(2)









    Out[183]:







  
    
      
      percent_yes
    
  
  
    
      NEGATIVE.RESPONSE.CONSULTED.LEGAL.COUNSEL
      1.11%
    
    
      NEGATIVE.RESPONSE.CONTACTED.LAW.ENFORCEMENT
      1.25%
    
    
      NEGATIVE.RESPONSE.REPORTED.TO.HOST.OR.ISP
      2.78%
    
    
      NEGATIVE.RESPONSE.OTHER
      9.87%
    
    
      NEGATIVE.RESPONSE.REPORTED.TO.MAINTAINERS
      13.21%
    
    
      NEGATIVE.RESPONSE.SOLICITED.COMMUNITY.SUPPORT
      15.58%
    
    
      NEGATIVE.RESPONSE.BLOCKED.USER
      23.64%
    
    
      NEGATIVE.RESPONSE.ASKED.USER.TO.STOP
      26.98%
    
    
      NEGATIVE.RESPONSE.IGNORED
      48.68%
    
    
      NEGATIVE.RESPONSE.ANY.RESPONSE
      100.00%



In [184]:

    
sns.set(style="whitegrid", font_scale=1.75)
fig, ax = plt.subplots()
cmap=matplotlib.cm.Blues_r
neg_resp_responses_adj[:-1].sort_values(by='No').plot.barh(stacked=True, ax=ax, figsize=[12,6], cmap=cmap, edgecolor='black', linewidth=1)

labels = []
for l in ax.get_yticklabels():
    title_text = l.get_text()[18:].replace(".", " ") # cut off "CONTRIBUTOR.TYPE"
        
    labels.append(title_text)
    
ax.set_yticklabels(labels)


plt.title("Thinking of the last time you experienced\nharassment, how did you respond?")

plt.xlabel("Number of responses")



legend = plt.legend(fancybox=True, loc='upper center', bbox_to_anchor=(.5, -.13), ncol=4, shadow=True)
legend.get_frame().set_edgecolor('b')
legend.get_frame().set_facecolor('white')



In [ ]:

How effective were the following responses?

RESPONSE.EFFECTIVENESS.*



In [185]:

    
neg_effect_responses = neg_df[neg_effect_vars].apply(pd.Series.value_counts).transpose()
neg_effect_responses = neg_effect_responses.replace(np.nan, 0).sort_values(by="Mostly effective")
neg_effect_responses = neg_effect_responses[["Not at all effective", "A little effective", "Somewhat effective", "Mostly effective", "Completely effective"]]



In [186]:

    
idx = []
for i in neg_effect_responses.index:
    idx.append(i[23:].replace(".", " "))
neg_effect_responses.index = idx
neg_effect_responses.astype(int)









    Out[186]:







  
    
      
      Not at all effective
      A little effective
      Somewhat effective
      Mostly effective
      Completely effective
    
  
  
    
      CONTACTED LAW ENFORCEMENT
      4
      0
      2
      0
      3
    
    
      CONSULTED LEGAL COUNSEL
      1
      1
      3
      2
      1
    
    
      REPORTED TO HOST OR ISP
      6
      4
      6
      3
      1
    
    
      OTHER
      4
      0
      4
      10
      11
    
    
      REPORTED TO MAINTAINERS
      10
      11
      31
      30
      13
    
    
      SOLICITED COMMUNITY SUPPORT
      6
      22
      38
      32
      14
    
    
      ASKED USER TO STOP
      48
      51
      50
      33
      11
    
    
      BLOCKED USER
      6
      20
      28
      56
      58



In [187]:

    
sns.set(style="whitegrid", font_scale=1.75)
fig, ax = plt.subplots()
cmap=matplotlib.cm.Blues
neg_effect_responses.plot.barh(stacked=True, ax=ax, figsize=[12,6], cmap=cmap, edgecolor='black', linewidth=1)

plt.title("How effective were the following responses?\n(counting number of responses)")

plt.xlabel("Number of responses")



legend = plt.legend(fancybox=True, loc='upper center', bbox_to_anchor=(.5, -.13), ncol=4, shadow=True)
legend.get_frame().set_edgecolor('b')
legend.get_frame().set_facecolor('white')



In [188]:

    
neg_effect_responses_prop = neg_df[neg_effect_vars].apply(pd.Series.value_counts, normalize=True).round(4).transpose()
neg_effect_responses_prop = neg_effect_responses_prop.replace(np.nan, 0).sort_values(by="Completely effective")
neg_effect_responses_prop = neg_effect_responses_prop[["Not at all effective", "A little effective", "Somewhat effective", "Mostly effective", "Completely effective"]]
neg_effect_responses_prop = neg_effect_responses_prop * 100



In [189]:

    
idx = []
for i in neg_effect_responses_prop.index:
    idx.append(i[23:].replace(".", " "))
neg_effect_responses_prop.index = idx
neg_effect_responses_prop









    Out[189]:







  
    
      
      Not at all effective
      A little effective
      Somewhat effective
      Mostly effective
      Completely effective
    
  
  
    
      REPORTED TO HOST OR ISP
      30.00%
      20.00%
      30.00%
      15.00%
      5.00%
    
    
      ASKED USER TO STOP
      24.87%
      26.42%
      25.91%
      17.10%
      5.70%
    
    
      SOLICITED COMMUNITY SUPPORT
      5.36%
      19.64%
      33.93%
      28.57%
      12.50%
    
    
      CONSULTED LEGAL COUNSEL
      12.50%
      12.50%
      37.50%
      25.00%
      12.50%
    
    
      REPORTED TO MAINTAINERS
      10.53%
      11.58%
      32.63%
      31.58%
      13.68%
    
    
      CONTACTED LAW ENFORCEMENT
      44.44%
      0.00%
      22.22%
      0.00%
      33.33%
    
    
      BLOCKED USER
      3.57%
      11.90%
      16.67%
      33.33%
      34.52%
    
    
      OTHER
      13.79%
      0.00%
      13.79%
      34.48%
      37.93%



In [190]:

    
sns.set(style="whitegrid", font_scale=1.75)
fig, ax = plt.subplots()
cmap=matplotlib.cm.Blues
neg_effect_responses_prop.plot.barh(stacked=True, ax=ax, figsize=[12,6], cmap=cmap, edgecolor='black', linewidth=1)

plt.title("How effective were the following responses?\n(proportion of responses)")

plt.xlabel("Proportion of responses")



legend = plt.legend(fancybox=True, loc='upper center', bbox_to_anchor=(.5, -.13), ncol=4, shadow=True)
legend.get_frame().set_edgecolor('b')
legend.get_frame().set_facecolor('white')

As a result of experiencing or witnessing harassment, which, if any, of the following have you done?

NEGATIVE.CONSEQUENCES.*



In [ ]:



In [191]:

    
neg_conseq_responses = neg_df[neg_conseq_vars].apply(pd.Series.value_counts).transpose()[[1,0]]
neg_conseq_responses.columns = ["Yes", "Blank"]
neg_conseq_responses









    Out[191]:







  
    
      
      Yes
      Blank
    
  
  
    
      NEGATIVE.CONSEQUENCES.STOPPED.CONTRIBUTING
      390
      5639
    
    
      NEGATIVE.CONSEQUENCES.PSEUDONYM
      50
      5979
    
    
      NEGATIVE.CONSEQUENCES.WORK.IN.PRIVATE
      166
      5863
    
    
      NEGATIVE.CONSEQUENCES.CHANGE.USERNAME
      48
      5981
    
    
      NEGATIVE.CONSEQUENCES.CHANGE.ONLINE.PRESENCE
      79
      5950
    
    
      NEGATIVE.CONSEQUENCES.SUGGEST.COC
      116
      5913
    
    
      NEGATIVE.CONSEQUENCES.PRIVATE.COMMUNITY.DISCUSSION
      301
      5728
    
    
      NEGATIVE.CONSEQUENCES.PUBLIC.COMMUNITY.DISCUSSION
      248
      5781
    
    
      NEGATIVE.CONSEQUENCES.OFFLINE.CHANGES
      85
      5944
    
    
      NEGATIVE.CONSEQUENCES.OTHER
      90
      5939
    
    
      NEGATIVE.CONSEQUENCES.NONE.OF.THE.ABOVE
      1094
      4935
    
    
      NEGATIVE.CONSEQUENCES.ANY.RESPONSE
      1953
      4076

Only 1,953 respondents clicked any boxes in this question, meaning 4,076 did not click a response or the "none of the above" option (or even get to the question). We have to adjust the no responses accordingly.



In [192]:

    
neg_conseq_responses_adj = neg_conseq_responses
neg_conseq_responses_adj["Blank"] = neg_conseq_responses_adj["Blank"] - 4076
neg_conseq_responses_adj_df = pd.DataFrame(neg_conseq_responses_adj["Yes"] / (neg_conseq_responses_adj["Yes"] + neg_conseq_responses_adj["Blank"]) * 100, columns=["percent_yes"])



In [193]:

    
neg_conseq_responses_adj.columns = ['Yes', 'No']
neg_conseq_responses_adj[:-1]









    Out[193]:







  
    
      
      Yes
      No
    
  
  
    
      NEGATIVE.CONSEQUENCES.STOPPED.CONTRIBUTING
      390
      1563
    
    
      NEGATIVE.CONSEQUENCES.PSEUDONYM
      50
      1903
    
    
      NEGATIVE.CONSEQUENCES.WORK.IN.PRIVATE
      166
      1787
    
    
      NEGATIVE.CONSEQUENCES.CHANGE.USERNAME
      48
      1905
    
    
      NEGATIVE.CONSEQUENCES.CHANGE.ONLINE.PRESENCE
      79
      1874
    
    
      NEGATIVE.CONSEQUENCES.SUGGEST.COC
      116
      1837
    
    
      NEGATIVE.CONSEQUENCES.PRIVATE.COMMUNITY.DISCUSSION
      301
      1652
    
    
      NEGATIVE.CONSEQUENCES.PUBLIC.COMMUNITY.DISCUSSION
      248
      1705
    
    
      NEGATIVE.CONSEQUENCES.OFFLINE.CHANGES
      85
      1868
    
    
      NEGATIVE.CONSEQUENCES.OTHER
      90
      1863
    
    
      NEGATIVE.CONSEQUENCES.NONE.OF.THE.ABOVE
      1094
      859



In [194]:

    
neg_conseq_responses_adj_df.sort_values(by="percent_yes").round(2)









    Out[194]:







  
    
      
      percent_yes
    
  
  
    
      NEGATIVE.CONSEQUENCES.CHANGE.USERNAME
      2.46%
    
    
      NEGATIVE.CONSEQUENCES.PSEUDONYM
      2.56%
    
    
      NEGATIVE.CONSEQUENCES.CHANGE.ONLINE.PRESENCE
      4.05%
    
    
      NEGATIVE.CONSEQUENCES.OFFLINE.CHANGES
      4.35%
    
    
      NEGATIVE.CONSEQUENCES.OTHER
      4.61%
    
    
      NEGATIVE.CONSEQUENCES.SUGGEST.COC
      5.94%
    
    
      NEGATIVE.CONSEQUENCES.WORK.IN.PRIVATE
      8.50%
    
    
      NEGATIVE.CONSEQUENCES.PUBLIC.COMMUNITY.DISCUSSION
      12.70%
    
    
      NEGATIVE.CONSEQUENCES.PRIVATE.COMMUNITY.DISCUSSION
      15.41%
    
    
      NEGATIVE.CONSEQUENCES.STOPPED.CONTRIBUTING
      19.97%
    
    
      NEGATIVE.CONSEQUENCES.NONE.OF.THE.ABOVE
      56.02%
    
    
      NEGATIVE.CONSEQUENCES.ANY.RESPONSE
      100.00%



In [195]:

    
sns.set(style="whitegrid", font_scale=1.75)
fig, ax = plt.subplots()
cmap=matplotlib.cm.Blues_r
neg_conseq_responses_adj[:-1].sort_values(by='No').plot.barh(stacked=True, ax=ax, figsize=[12,6], cmap=cmap, edgecolor='black', linewidth=1)

labels = []
for l in ax.get_yticklabels():
    title_text = l.get_text()[22:].replace(".", " ") # cut off "CONTRIBUTOR.TYPE"
        
    labels.append(title_text)
    
ax.set_yticklabels(labels)


plt.title("As a result of experiencing or witnessing\nharassment, which, if any, of the following have you done?")

plt.xlabel("Number of responses")



legend = plt.legend(fancybox=True, loc='upper center', bbox_to_anchor=(.5, -.13), ncol=4, shadow=True)
legend.get_frame().set_edgecolor('b')
legend.get_frame().set_facecolor('white')



In [ ]:

	3	4	6
RESPONSE.ID	48	49	51
DATE.SUBMITTED	3/21/17 15:42	3/21/17 15:38	3/21/17 15:41
STATUS	Complete	Complete	Complete
PARTICIPATION.TYPE.FOLLOW	1	1	1
PARTICIPATION.TYPE.USE.APPLICATIONS	1	1	1
PARTICIPATION.TYPE.USE.DEPENDENCIES	1	1	1
PARTICIPATION.TYPE.CONTRIBUTE	1	1	0
PARTICIPATION.TYPE.OTHER	0	0	0
CONTRIBUTOR.TYPE.CONTRIBUTE.CODE	Frequently	Occasionally	NaN
CONTRIBUTOR.TYPE.CONTRIBUTE.DOCS	Rarely	Rarely	NaN
CONTRIBUTOR.TYPE.PROJECT.MAINTENANCE	Frequently	Rarely	NaN
CONTRIBUTOR.TYPE.FILE.BUGS	Frequently	Frequently	NaN
CONTRIBUTOR.TYPE.FEATURE.REQUESTS	Frequently	Frequently	NaN
CONTRIBUTOR.TYPE.COMMUNITY.ADMIN	Never	Occasionally	NaN
EMPLOYMENT.STATUS	Employed full time	Full time student	Employed full time
PROFESSIONAL.SOFTWARE	Frequently	NaN	Frequently
FUTURE.CONTRIBUTION.INTEREST	Very interested	Very interested	Very interested
FUTURE.CONTRIBUTION.LIKELIHOOD	Very likely	Very likely	Somewhat unlikely
OSS.USER.PRIORITIES.LICENSE	Very important to have	Very important to have	Very important to have
OSS.USER.PRIORITIES.CODE.OF.CONDUCT	Somewhat important not to have	Somewhat important to have	Not important either way
OSS.USER.PRIORITIES.CONTRIBUTING.GUIDE	Somewhat important to have	Very important to have	Somewhat important to have
OSS.USER.PRIORITIES.CLA	Not important either way	Very important to have	Don't know what this is
OSS.USER.PRIORITIES.ACTIVE.DEVELOPMENT	Somewhat important to have	Very important to have	Very important to have
OSS.USER.PRIORITIES.RESPONSIVE.MAINTAINERS	Somewhat important to have	Very important to have	Very important to have
OSS.USER.PRIORITIES.WELCOMING.COMMUNITY	Very important to have	Very important to have	Somewhat important to have
OSS.USER.PRIORITIES.WIDESPREAD.USE	Somewhat important to have	Not important either way	Somewhat important to have
OSS.CONTRIBUTOR.PRIORITIES.LICENSE	Not important either way	NaN	NaN
OSS.CONTRIBUTOR.PRIORITIES.CODE.OF.CONDUCT	Somewhat important not to have	NaN	NaN
OSS.CONTRIBUTOR.PRIORITIES.CONTRIBUTING.GUIDE	Not important either way	NaN	NaN
OSS.CONTRIBUTOR.PRIORITIES.CLA	Not important either way	NaN	NaN
OSS.CONTRIBUTOR.PRIORITIES.ACTIVE.DEVELOPMENT	Somewhat important to have	NaN	NaN
OSS.CONTRIBUTOR.PRIORITIES.RESPONSIVE.MAINTAINERS	Somewhat important to have	NaN	NaN
OSS.CONTRIBUTOR.PRIORITIES.WELCOMING.COMMUNITY	Somewhat important to have	NaN	NaN
OSS.CONTRIBUTOR.PRIORITIES.WIDESPREAD.USE	Somewhat important to have	NaN	NaN
SEEK.OPEN.SOURCE	Sometimes	Always	Always
OSS.UX	Generally easier to use	About the same	Generally easier to use
OSS.SECURITY	Generally more secure	Generally more secure	About the same
OSS.STABILITY	About the same	Generally less stable	About the same
INTERNAL.EFFICACY	Strongly agree	Strongly agree	Strongly agree
EXTERNAL.EFFICACY	Strongly agree	Strongly agree	Neither agree nor disagree
OSS.IDENTIFICATION	Neither agree nor disagree	Strongly agree	Neither agree nor disagree
USER.VALUES.STABILITY	Moderately important	Extremely important	Extremely important
USER.VALUES.INNOVATION	Not at all important	Very important	Moderately important
USER.VALUES.REPLICABILITY	Very important	Very important	Moderately important
USER.VALUES.COMPATIBILITY	Very important	Very important	Extremely important
USER.VALUES.SECURITY	Very important	Very important	Extremely important
USER.VALUES.COST	Very important	Not at all important	Very important
USER.VALUES.TRANSPARENCY	Very important	Extremely important	Extremely important
USER.VALUES.USER.EXPERIENCE	Extremely important	Moderately important	Very important
USER.VALUES.CUSTOMIZABILITY	Extremely important	Very important	Extremely important
USER.VALUES.SUPPORT	Slightly important	Moderately important	Not at all important
USER.VALUES.TRUSTED.PRODUCER	Very important	Slightly important	Moderately important
TRANSPARENCY.PRIVACY.BELIEFS	People should be able to contribute code witho...	People should be able to contribute code witho...	People should be able to contribute code witho...
INFO.AVAILABILITY	A lot of information about me	A lot of information about me	A little information about me
INFO.JOB	Yes	No	No
TRANSPARENCY.PRIVACY.PRACTICES.GENERAL	I include my real name.	I include my real name.	I don't publish this kind of content online.
TRANSPARENCY.PRIVACY.PRACTICES.OSS	I include my real name.	I include my real name.	NaN
RECEIVED.HELP	Yes	Yes	Yes
FIND.HELPER	Other - Please describe	I asked for help in a public forum (e.g. in a ...	I asked a specific person for help.
HELPER.PRIOR.RELATIONSHIP	We knew each other well.	Total strangers, I didn't know of them previou...	We knew each other well.
RECEIVED.HELP.TYPE	Writing code or otherwise implementing ideas.	Installing or using an application.	Installing or using an application.
PROVIDED.HELP	Yes	Yes	Yes
FIND.HELPEE	I reached out to them to offer unsolicited help.	They asked for help in a public forum (e.g. in...	They asked me directly for help.
HELPEE.PRIOR.RELATIONSHIP	Total strangers, I didn't know of them previou...	Total strangers, I didn't know of them previou...	We knew each other well.
PROVIDED.HELP.TYPE	Writing code or otherwise implementing ideas.	Installing or using an application.	Installing or using an application.
DISCOURAGING.BEHAVIOR.LACK.OF.RESPONSE	Yes	Yes	Yes
DISCOURAGING.BEHAVIOR.REJECTION.WOUT.EXPLANATION	Yes	No	No
DISCOURAGING.BEHAVIOR.DISMISSIVE.RESPONSE	Yes	Yes	Yes
DISCOURAGING.BEHAVIOR.BAD.DOCS	Yes	Yes	Yes
DISCOURAGING.BEHAVIOR.CONFLICT	Yes	Yes	No
DISCOURAGING.BEHAVIOR.UNWELCOMING.LANGUAGE	No	No	No
OSS.AS.JOB	Yes, directly- some or all of my work duties ...	NaN	NaN
OSS.AT.WORK	Frequently	NaN	Frequently
OSS.IP.POLICY	I am free to contribute without asking for per...	NaN	I'm not sure.
EMPLOYER.POLICY.APPLICATIONS	Use of open source applications is acceptable ...	NaN	Use of open source applications is encouraged.
EMPLOYER.POLICY.DEPENDENCIES	Use of open source dependencies is acceptable ...	NaN	Use of open source dependencies is encouraged.
OSS.HIRING	Very important	NaN	NaN
IMMIGRATION	No, I live in the country where I was born.	No, I live in the country where I was born.	Yes, and I intend to stay permanently.
MINORITY.HOMECOUNTRY	NaN	NaN	No
MINORITY.CURRENT.COUNTRY	No	No	No
GENDER	Man	Man	Man
TRANSGENDER.IDENTITY	No	No	No
SEXUAL.ORIENTATION	No	Yes	No
WRITTEN.ENGLISH	Very well	Very well	Very well
AGE	35 to 44 years	17 or younger	35 to 44 years
FORMAL.EDUCATION	Bachelor's degree	Secondary (high) school graduate or equivalent	Vocational/trade program or apprenticeship
PARENTS.FORMAL.EDUCATION	Bachelor's degree	Master's degree	Bachelor's degree
AGE.AT.FIRST.COMPUTER.INTERNET	13 - 17 years old	Younger than 13 years old	13 - 17 years old
LOCATION.OF.FIRST.COMPUTER.INTERNET	At home (belonging to me or a family member)	At home (belonging to me or a family member)	At home (belonging to me or a family member)
PARTICIPATION.TYPE.ANY.REPONSE	1	1	1
POPULATION	github	github	github
OFF.SITE.ID	NaN	NaN	NaN
TRANSLATED	0	0	0

	0	1	2
NEGATIVE.WITNESS.RUDENESS	1	1	0
NEGATIVE.WITNESS.NAME.CALLING	1	0	0
NEGATIVE.WITNESS.THREATS	0	0	0
NEGATIVE.WITNESS.IMPERSONATION	0	0	1
NEGATIVE.WITNESS.SUSTAINED.HARASSMENT	0	0	0
NEGATIVE.WITNESS.CROSS.PLATFORM.HARASSMENT	0	0	0
NEGATIVE.WITNESS.STALKING	0	0	0
NEGATIVE.WITNESS.SEXUAL.ADVANCES	0	0	0
NEGATIVE.WITNESS.STEREOTYPING	0	0	0
NEGATIVE.WITNESS.DOXXING	0	0	1
NEGATIVE.WITNESS.OTHER	0	0	0
NEGATIVE.WITNESS.NONE.OF.THE.ABOVE	0	0	0
NEGATIVE.EXPERIENCE.RUDENESS	0	1	0
NEGATIVE.EXPERIENCE.NAME.CALLING	0	0	0
NEGATIVE.EXPERIENCE.THREATS	0	0	0
NEGATIVE.EXPERIENCE.IMPERSONATION	0	0	0
NEGATIVE.EXPERIENCE.SUSTAINED.HARASSMENT	0	0	0
NEGATIVE.EXPERIENCE.CROSS.PLATFORM.HARASSMENT	0	0	0
NEGATIVE.EXPERIENCE.STALKING	0	0	0
NEGATIVE.EXPERIENCE.SEXUAL.ADVANCES	0	0	0
NEGATIVE.EXPERIENCE.STEREOTYPING	0	0	0
NEGATIVE.EXPERIENCE.DOXXING	0	0	0
NEGATIVE.EXPERIENCE.OTHER	0	0	0
NEGATIVE.EXPERIENCE.NONE.OF.THE.ABOVE	1	0	1
NEGATIVE.RESPONSE.ASKED.USER.TO.STOP	0	0	0
NEGATIVE.RESPONSE.SOLICITED.COMMUNITY.SUPPORT	0	0	0
NEGATIVE.RESPONSE.BLOCKED.USER	0	0	0
NEGATIVE.RESPONSE.REPORTED.TO.MAINTAINERS	0	0	0
NEGATIVE.RESPONSE.REPORTED.TO.HOST.OR.ISP	0	0	0
NEGATIVE.RESPONSE.CONSULTED.LEGAL.COUNSEL	0	0	0
NEGATIVE.RESPONSE.CONTACTED.LAW.ENFORCEMENT	0	0	0
NEGATIVE.RESPONSE.OTHER	0	0	0
NEGATIVE.RESPONSE.IGNORED	0	1	0
RESPONSE.EFFECTIVENESS.ASKED.USER.TO.STOP	NaN	NaN	NaN
RESPONSE.EFFECTIVENESS.SOLICITED.COMMUNITY.SUPPORT	NaN	NaN	NaN
RESPONSE.EFFECTIVENESS.BLOCKED.USER	NaN	NaN	NaN
RESPONSE.EFFECTIVENESS.REPORTED.TO.MAINTAINERS	NaN	NaN	NaN
RESPONSE.EFFECTIVENESS.REPORTED.TO.HOST.OR.ISP	NaN	NaN	NaN
RESPONSE.EFFECTIVENESS.CONSULTED.LEGAL.COUNSEL	NaN	NaN	NaN
RESPONSE.EFFECTIVENESS.CONTACTED.LAW.ENFORCEMENT	NaN	NaN	NaN
RESPONSE.EFFECTIVENESS.OTHER	NaN	NaN	NaN
NEGATIVE.CONSEQUENCES.STOPPED.CONTRIBUTING	0	0	1
NEGATIVE.CONSEQUENCES.PSEUDONYM	0	0	0
NEGATIVE.CONSEQUENCES.WORK.IN.PRIVATE	0	0	0
NEGATIVE.CONSEQUENCES.CHANGE.USERNAME	0	0	0
NEGATIVE.CONSEQUENCES.CHANGE.ONLINE.PRESENCE	0	0	0
NEGATIVE.CONSEQUENCES.SUGGEST.COC	0	0	0
NEGATIVE.CONSEQUENCES.PRIVATE.COMMUNITY.DISCUSSION	0	0	0
NEGATIVE.CONSEQUENCES.PUBLIC.COMMUNITY.DISCUSSION	0	1	0
NEGATIVE.CONSEQUENCES.OFFLINE.CHANGES	0	0	0
NEGATIVE.CONSEQUENCES.OTHER	0	0	0
NEGATIVE.CONSEQUENCES.NONE.OF.THE.ABOVE	1	0	0
NEGATIVE.WITNESS.ANY.RESPONSE	1	1	1
NEGATIVE.EXPERIENCE.ANY.RESPONSE	1	1	1
NEGATIVE.RESPONSE.ANY.RESPONSE	0	1	0
NEGATIVE.CONSEQUENCES.ANY.RESPONSE	1	1	1
POPULATION	github	github	github

	No	Yes
PARTICIPATION.TYPE.FOLLOW	1287	4742
PARTICIPATION.TYPE.USE.APPLICATIONS	454	5575
PARTICIPATION.TYPE.USE.DEPENDENCIES	946	5083
PARTICIPATION.TYPE.CONTRIBUTE	1722	4307
PARTICIPATION.TYPE.OTHER	5742	287

	percent
PARTICIPATION.TYPE.OTHER	4.76%
PARTICIPATION.TYPE.CONTRIBUTE	71.44%
PARTICIPATION.TYPE.FOLLOW	78.65%
PARTICIPATION.TYPE.USE.DEPENDENCIES	84.31%
PARTICIPATION.TYPE.USE.APPLICATIONS	92.47%

	Frequently	Occasionally	Rarely	Never
CONTRIBUTOR.TYPE.COMMUNITY.ADMIN	287	417	867	2412
CONTRIBUTOR.TYPE.CONTRIBUTE.DOCS	460	1214	1665	661
CONTRIBUTOR.TYPE.FEATURE.REQUESTS	573	1625	1346	451
CONTRIBUTOR.TYPE.PROJECT.MAINTENANCE	996	944	974	1090
CONTRIBUTOR.TYPE.FILE.BUGS	1067	2073	768	106
CONTRIBUTOR.TYPE.CONTRIBUTE.CODE	1160	1383	1301	189

	count
Employed full time	3615
Full time student	1048
Employed part time	349
Temporarily not working	314
Other - please describe	184
Retired or permanently not working (e.g. due to disability)	90

	percent
Employed full time	64.55%
Full time student	18.71%
Employed part time	6.23%
Temporarily not working	5.61%
Other - please describe	3.29%
Retired or permanently not working (e.g. due to disability)	1.61%

	count
Very interested	3929
Somewhat interested	1430
Not too interested	125
Not at all interested	24

	percent
Very interested	71.33%
Somewhat interested	25.96%
Not too interested	2.27%
Not at all interested	0.44%

	count
Very likely	3271
Somewhat likely	1719
Somewhat unlikely	440
Very unlikely	81

	percent
Very likely	59.35%
Somewhat likely	31.19%
Somewhat unlikely	7.98%
Very unlikely	1.47%

	Very important to have	Somewhat important to have	Not important either way	Somewhat important not to have	Very important not to have	Don't know what this is
CLA	490	1024	2282	336	157	488
CODE.OF.CONDUCT	848	1461	1993	166	120	209
WIDESPREAD.USE	984	2067	1576	114	47	28
CONTRIBUTING.GUIDE	1212	1866	1516	95	62	62
WELCOMING.COMMUNITY	2062	1822	812	67	33	18
RESPONSIVE.MAINTAINERS	2575	1850	302	31	35	20
ACTIVE.DEVELOPMENT	2768	1722	267	30	31	16
LICENSE	3125	1160	435	31	33	47

	Very important to have	Somewhat important to have	Not important either way	Somewhat important not to have	Very important not to have	Don't know what this is
CLA	10.26%	21.44%	47.77%	7.03%	3.29%	10.22%
CODE.OF.CONDUCT	17.68%	30.46%	41.55%	3.46%	2.50%	4.36%
WIDESPREAD.USE	20.43%	42.92%	32.72%	2.37%	0.98%	0.58%
CONTRIBUTING.GUIDE	25.18%	38.77%	31.50%	1.97%	1.29%	1.29%
WELCOMING.COMMUNITY	42.83%	37.85%	16.87%	1.39%	0.69%	0.37%
RESPONSIVE.MAINTAINERS	53.50%	38.44%	6.27%	0.64%	0.73%	0.42%
ACTIVE.DEVELOPMENT	57.26%	35.62%	5.52%	0.62%	0.64%	0.33%
LICENSE	64.69%	24.01%	9.00%	0.64%	0.68%	0.97%

	Very important to have	Somewhat important to have	Not important either way	Somewhat important not to have	Very important not to have	Don't know what this is
WIDESPREAD.USE	387	1016	1666	70	30	12
CLA	419	712	1266	327	166	280
CODE.OF.CONDUCT	655	1145	1085	119	84	96
CONTRIBUTING.GUIDE	1198	1396	500	41	18	24
ACTIVE.DEVELOPMENT	1368	1333	448	21	18	5
WELCOMING.COMMUNITY	1533	1199	411	21	15	7
RESPONSIVE.MAINTAINERS	1994	1022	138	7	16	7
LICENSE	2199	610	337	16	15	18

	Very important to have	Somewhat important to have	Not important either way	Somewhat important not to have	Very important not to have	Don't know what this is
WIDESPREAD.USE	12.17%	31.94%	52.37%	2.20%	0.94%	0.38%
CLA	13.22%	22.46%	39.94%	10.32%	5.24%	8.83%
CODE.OF.CONDUCT	20.57%	35.96%	34.08%	3.74%	2.64%	3.02%
CONTRIBUTING.GUIDE	37.71%	43.94%	15.74%	1.29%	0.57%	0.76%
ACTIVE.DEVELOPMENT	42.84%	41.75%	14.03%	0.66%	0.56%	0.16%
WELCOMING.COMMUNITY	48.12%	37.63%	12.90%	0.66%	0.47%	0.22%
RESPONSIVE.MAINTAINERS	62.63%	32.10%	4.33%	0.22%	0.50%	0.22%
LICENSE	68.83%	19.09%	10.55%	0.50%	0.47%	0.56%

	count
About the same	2027
Generally easier to use	1597
Generally harder to use	897

	percent
About the same	44.84%
Generally easier to use	35.32%
Generally harder to use	19.84%

	count
Generally more secure	2688
About the same	1537
Generally less secure	295

	percent
Generally more secure	59.47%
About the same	34.00%
Generally less secure	6.53%

	count
About the same	2240
Generally more stable	1399
Generally less stable	877

Summary Analysis of the 2017 GitHub Open Source Survey

Overview

Purpose and goal

Download and unzip data

Data processing

Main dataset

Load main dataset into pandas

Explore the main dataset with some sample responses

Create lists of variables for bulk analysis

Negative incidents

Load into pandas

Explore the negative dataset with some sample responses

Create lists of variables for bulk analysis

Analysis

Contributor identity

People participate in open source in different ways. Which of the following activities do you engage in?

Contributon type: How often do you engage in each of the following activities?

Employment status

In your main job, how often do you write or otherwise directly contribute to producing software?

How interested are you in contributing to open source projects in the future?

How likely are you to contribute to open source projects in the future?

Priorities and values

When thinking about whether to use open source software, how important are the following things?

When thinking about whether to contribute to an open source project, how important are the following things?

How often do you try to find open source options over other kinds of software?

Open source software usability

Open source software security

Open source software stability

Identification with open source

Transparency vs privacy

Attribution

In general, how much information about you is publicly available online?

Do you feel that you need to make information available about yourself online for professional reasons?

General privacy practices

OSS privacy practices

Mentorship / Help

Have you ever received any kind of help from other people related to using or contributing to an open source project?

Thinking of the most recent case where someone helped you, how did you find someone to help you?

Which best describes your prior relationship with the person who helped you?

What kind of problem did they help you with?

Have you ever provided help for another person on an open source project?

Thinking of the most recent case where you helped someone, how did you come to help this person?

Which best describes your prior relationship with the person you helped?

What kind of problem did you help them with?

Open Source Software in Paid Work

Do you contribute to open source as part of your professional work?

How often do you use open source software in your professional work?

How does your employer's intellectual property agreement/policy affect your free-time contributions to open source unrelated to your work?

Which is closest to your employer’s policy on using open source software applications?

How important do you think your involvement in open source was to getting your current job?

Demographics

Do you currently live in a country other than the one in which you were born?

Thinking of where you were born, are you a member of an ethnicity or nationality that is a considered a minority in that country?

Thinking of where you currently live, are you a member of an ethnicity or nationality that is a considered a minority in that country?

What is your gender?

Do you identify as transgender?

Do you identify as gay, lesbian, or bisexual, asexual, or any other minority sexual orientation?

How well can you read and write in English?

What is your age?

What is highest level of formal education that you have completed?

What is the highest level of formal education that either of your parents completed?

How old were you when you first had regular access to a computer with an internet connection?

Where did you first have regular access to a computer with internet connection?

Where was the respondent surveyed from?

Harassment / Inclusiveness of OSS

Have you ever observed any of the following in the context of an open source project?

Have you ever witnessed any of the following behaviors directed at another person in the context of an open source project? (not including something directed at you)

Have you ever experienced any of the following behaviors directed at you in the context of an open source project?

Thinking of the last time you experienced harassment, how did you respond?

How effective were the following responses?

As a result of experiencing or witnessing harassment, which, if any, of the following have you done?

	percent
About the same	49.60%
Generally more stable	30.98%
Generally less stable	19.42%

	Strongly agree	Somewhat agree	Neither agree nor disagree	Somewhat disagree	Strongly disagree
EXTERNAL.EFFICACY	1518	1610	1116	150	58
OSS.IDENTIFICATION	1579	1513	863	351	150
INTERNAL.EFFICACY	2052	1685	418	240	62

	Neither agree nor disagree	Somewhat agree	Somewhat disagree	Strongly agree	Strongly disagree
INTERNAL.EFFICACY	9.38%	37.81%	5.38%	46.04%	1.39%
EXTERNAL.EFFICACY	25.07%	36.16%	3.37%	34.10%	1.30%
OSS.IDENTIFICATION	19.37%	33.95%	7.88%	35.44%	3.37%

	count
People should be able to contribute code without attribution, if they wish to remain anonymous.	2454
Records of authorship should be required so that end users know who created the source code they are working with.	1594

	percent
People should be able to contribute code without attribution, if they wish to remain anonymous.	60.62%
Records of authorship should be required so that end users know who created the source code they are working with.	39.38%

	count
Some information about me	1776
A little information about me	1133
A lot of information about me	1011
No information at all about me	140

	percent
Some information about me	43.74%
A little information about me	27.91%
A lot of information about me	24.90%
No information at all about me	3.45%

	count
I include my real name.	1718
I usually use a consistent pseudonym that is easily linked to my real name online.	1141
I don't publish this kind of content online.	517
I usually use a consistent pseudonym that is not linked anywhere with my real name online.	363
I take precautions to use different pseudonymns on different platforms.	270

	percent
I include my real name.	42.85%
I usually use a consistent pseudonym that is easily linked to my real name online.	28.46%
I don't publish this kind of content online.	12.90%
I usually use a consistent pseudonym that is not linked anywhere with my real name online.	9.05%
I take precautions to use different pseudonymns on different platforms.	6.73%

	count
I include my real name.	1845
I usually contribute using a consistent pseudonym that is easily linked to my real name online.	766
I usually contribute using a consistent pseudonym that is not linked anywhere with my real name online.	273
I take precautions to use different usernames in different projects.	42