Introduction

These example notebooks introduce probabilistic programming. We first look at applications, how generative models can be applied to populations in BayesDB. We then progress towards creating new generative models that better fit our intuitions about those populations in Venture. BayesDB and Venture are developed by the MIT Probabilistic Computing Group.

We invite you to watch a presentation (slides) on the subject of this tutorial by Vikash Mansingka.

Before we get started...

About you...

Signing up with your name and email helps build a community of support and helps improve your user experience. When you sign up, we collect information including the commands you tried, how long they took, what errors they resulted in, any additional data that you import, etc. If you provide your email, we will invite you to a low-traffic announcements list. Please include the name and email you use below in any reports of bugs or surprises. Send those reports to bayesdb@mit.edu or via GitHub.

If security is a primary concern, then you should do a security audit (and share the results with us) before using the software. As this is alpha software, results may not be reliable. DO NOT USE THIS SOFTWARE FOR HIPAA-COVERED, PERSONALLY IDENTIFIABLE, OR SIMILARLY SENSITIVE DATA!

Please fill in your name and email, then use shift-return (or the play button above) to run the cell.


In [4]:
name = ""
email = ""


with open('bayesdb-session-capture-opt.txt', 'w') as optfile:
    optfile.write('%s <%s>\n' % (name, email))

# To opt out, use optfile.write('False') instead.
# Even opting out of sending details, you still allow us to count how often users opt out.
# You can opt-in or opt-out on a per-population basis using the session_capture_name option to Population.
# You must choose to either opt-in or opt-out.

Background

For those unfamiliar with the software, languages, or concepts we will use in this tutorial, we recommend:

You do not need extensive knowledge of any of these to read our examples, so feel free to skip ahead. But if you are not very familiar with one of the technologies, then doing initial learning will be very helpful to you in playing around confidently and doing the suggested exercises.

BayesDB

BayesDB allows you to query your data as other SQL database systems do. It also allows you to query the implications of your data. We explore these capabilities using information about satellites orbiting our planet.

  • Querying and Plotting the Satellites Data without doing any probabilistic analysis. This is a good place to start to get used to the language, before learning to explore the implications of the data.
  • Satellites Exploration — a bit of the above, plus a short exploration of the results of probabilistic analysis.

TODO: The same in smaller chunks, with those chunks expanded, promised here.

Working with your own data

Because a default BayesDB model is unlikely to model your data plausibly, and because we do not yet have the tools to be confident that any model has captured the relationships in a population well, BayesDB is not ready for use for higher levels of analysis.

As you work with your data, do not attempt to use BayesDB for:

  • inferential analysis: drawing conclusions about a larger population from which the data you analyze are a sample,
  • predictive analysis: using the population you have to make predictions outside of that population,
  • causal analysis: understanding how interventions in one variable will affect other variables, or
  • mechanistic analysis: understanding causal and structural relationships between variables.

For somewhat temporary technical reasons, BayesDB is not ready to handle very large populations, except by sub-sampling them (violating the caveat against inferential analysis!).

While the focus of the group is towards better model types and inference strategies, some of these limitations are still in view to grow past. If these interest you, please work with us towards those goals.

With those caveats, we explore a "new" dataset using BayesDB:

TODO: the same in smaller chunks, with those chunks expanded, is promised here.

To work with your own data, please contact the group to have a conversation about the population you want to explore, about appropriate types of analysis, and to learn how to unlock analysis. We lock this feature because users have frequently misunderstood the limitations of our software, drawing unwarranted inferences. The concepts are easy to misuse, the software is in an early alpha version, and working with our team will help keep egg off your face, or worse.

Venture

Venture is a prototype general-purpose probabilistic computing platform. In Venture, one can create novel probabilistic models, and inference strategies that allow efficient learning for those models. Venture is programmed primarily in VentureScript, but also supports applications written in other probabilistic or traditional programming languages. In this tutorial we will explore a mix of the VentureScript language and the Python API to Venture.

TODO: Tutorial examples promised here.

Notes

As I work in these notebooks, where is my work saved? Execute the following cell to find out:


In [5]:
import os
os.getcwd()


Out[5]:
'/Users/probcomp/GoogleDrive/ProbComp/bdbcontrib/examples'

In [ ]:


Copyright (c) 2010-2016, MIT Probabilistic Computing Project

Licensed under Apache 2.0 (edit cell for details).