In [62]:
import html2text
import urllib2
from bs4 import BeautifulSoup

def getPage(url):
    res = urllib2.urlopen(url)
    html = res.read()
    soup = BeautifulSoup(html)
    tt = soup.find('table')
    content = tt.find('td', {"valign" : "top", "width" : "600"})
    return html2text.html2text(content.decode_contents())

In [63]:
index = getPage('http://www.greenteapress.com/thinkstats/html/index.html')

In [64]:
pages = 'http://www.greenteapress.com/thinkstats/html/thinkstats%s.html'

In [65]:
chpts = ['%03d' % (x+1) for x in range(11)]
#allchpts = [getPage(pages % x) for x in chpts]

In [66]:
allchpts = []
for x in chpts:
    print pages % x
    allchpts.append(getPage(pages % x))


http://www.greenteapress.com/thinkstats/html/thinkstats001.html
http://www.greenteapress.com/thinkstats/html/thinkstats002.html
http://www.greenteapress.com/thinkstats/html/thinkstats003.html
http://www.greenteapress.com/thinkstats/html/thinkstats004.html
http://www.greenteapress.com/thinkstats/html/thinkstats005.html
http://www.greenteapress.com/thinkstats/html/thinkstats006.html
http://www.greenteapress.com/thinkstats/html/thinkstats007.html
http://www.greenteapress.com/thinkstats/html/thinkstats008.html
http://www.greenteapress.com/thinkstats/html/thinkstats009.html
http://www.greenteapress.com/thinkstats/html/thinkstats010.html
http://www.greenteapress.com/thinkstats/html/thinkstats011.html

In [67]:
#write files...
def writeFile(item, name):
    fwrite = open("scratch/"+name, 'w')
    fwrite.write(item.encode('utf-8'))
    fwrite.close()
    return None

writeFile(index, 'index.md')
for i,x in enumerate(allchpts):
    writeFile(x, "%03d.md" % (i+1))


g that is about half the size of a Hyracotherium (see `http://wikipedia.org/wiki/Hyracotherium`). 


ta. 
3

    If you don’t recognize this phrase, see `http://wikipedia.org/wiki/Twenty_Questions`. 


lly significant:**
     A result, like a difference between groups, that is relevant in practice. 


ue that is not in the sample. As far as I’m concerned, the median is the 50th percentile. Period. 


: U.S. Department of Health and Human Services, Centers for Disease Control and Prevention, 2008. 


 Probabilities In Employee Drug-Testing,” at `http://piercelaw.edu/risk/vol2/winter/gleason.htm`. 


her, the mean of a distribution is its center of mass, and the variance is its moment of inertia. 


ect the null hypothesis if it is false. 

* * *

1

    Also known as a “Significance criterion.” 


ata. 

* * *

1

    See `http://wikipedia.org/wiki/Exponential_distribution#Maximum_likelihood`. 


ess et al., _Numerical Recipes in C_, Chapter 15 at `http://www.nrbook.com/a/bookcpdf/c15-1.pdf`. 


9.8](thinkstats010.html#@default1076)   
  

  * Zipf’s law, [4.2](thinkstats005.html#@default295)