In [4]:
import os, sys
Get the current directory
In [10]:
currentdir = os.getcwd(); os.getcwd();
Then append the directory containing the Physique package/library (it's just a folder) with sys.path.append; the absolute path for where I placed it just happened to be "/Users/ernestyeung/Documents/TeslaModelSP85D": substitute that for the absolute path you find (look at your Finder or File organizing program)
In [11]:
currentdir # I'm on a different computer now
Out[11]:
In [5]:
sys.path.append('/home/topolo/PropD/Propulsion/')
In [6]:
import Physique
Programming note: __init__.py in the main directory uses os.path.dirname(__file__) with __file__ (literally that, it's not a placeholder name) being the string with the absolute pathname of the "file from which the module was loaded, if it was loaded from a file" (cf. stackoverflow Python file attribute absolute or relative?), i.e. "When a module is loaded in Python, __file__ is set to its name. You can then use that with other functions to find the directory that the file is located in." (cf. stackoverflow what does the file wildcard mean/do?)
In [21]:
from Physique import FundConst
print Physique.FundConst.columns
Physique.FundConst
Out[21]:
Find a Fundamental Constant you are interested in using the usual panda modules
In [41]:
g_0pd = FundConst[ FundConst["Quantity"].str.contains("gravity") ]
# standard acceleration of gravity as a panda DataFrame
g_0pd
Out[41]:
In [32]:
# access the values you're interested in
print g_0pd.Quantity
print g_0pd.Value.get_values()[0]
print g_0pd.Unit.get_values()[0]
In [43]:
# you can also grab just the 1 entry from this DataFrame using the .loc module
FundConst[FundConst["Quantity"].str.contains("Boltzmann")].loc[49,:]
Out[43]:
In [44]:
g_0pd.loc[303,:]
Out[44]:
This is the pandas DataFrame containing all the NIST Official Conversions to SI.
In [11]:
convDF = Physique.conv
In [12]:
convDF.columns
Out[12]:
From the list of columns, search for the quantity you desired by trying out different search terms: e.g. I'm reading Huzel and Huang's Modern Engineering for Design of Liquid-Propellant Rocket Engines and I want to know how to convert from
We can try to look up the U.S. or Imperial units from the Toconvertfrom column.
In [13]:
convDF[convDF['Toconvertfrom'].str.contains("pound-force ")]
Out[13]:
Or we can look up the SI unit we want to convert to.
In [14]:
convDF[convDF['to'].str.contains("newton ")]
Out[14]:
Look at what you want and see the index; it happens to be 340 in this example.
In [16]:
lbf2N = convDF.loc[340,:]; lbf2N
Out[16]:
Then the attributes can accessed by the column names.
In [19]:
print lbf2N.Toconvertfrom, lbf2N.to, lbf2N.Multiplyby
So for example, the reusable SSME delivers a vacuum thrust of 470000 lb or
In [21]:
print 470000*lbf2N.Multiplyby, lbf2N.to
To obtain the conversion for pressure in psia, which we search for with "psi"
In [22]:
convDF[convDF['Toconvertfrom'].str.match("psi")]
Out[22]:
So for a chamber pressure of 3028 psia for the SSME,
In [23]:
psi2Pa = convDF.loc[372,:]
In [24]:
print 3028*psi2Pa.Multiplyby, psi2Pa.to
Also, get the conversion for atmospheres (atm):
In [26]:
convDF[convDF['Toconvertfrom'].str.match("atm")]
Out[26]:
In [27]:
atm2Pa = convDF.loc[15,:]
In [29]:
print 3028*psi2Pa.Multiplyby/atm2Pa.Multiplyby, atm2Pa.Toconvertfrom
Take a look at the file scrape_BS.py in this Physique folder. BS stands for the BeautifulSoup python module that's extensively used here. Start at the class called scraped_BS which will use the python module requests to put the html out from a url address into a BeautifulSoup object.
In [11]:
JPL_SSD_URL = "http://ssd.jpl.nasa.gov/" # JPL NASA Solar System Dynamics webpage
In [12]:
jpl_ssd_BS = Physique.scrape_BS.scraped_BS(JPL_SSD_URL)
Take a look at it with the usual BeautifulSoup modules (i.e. functions).
Now, as the Udacity Data Wrangling instructor said, Shannon Bradshaw, taught, we're going to need to use the Inspect Element (Firefox), or Develop -> Web Inspector (Mac OS X Safari) functions on your web browswer to see what the relevant html codes are.
Now in this particular case (webpage formats are all different; assume the worst), there are no distinguishing classes for the tables (they're just nested tables on tables). cf. stackoverflow.com BeautifulSoup scraping nested tables I'm using the solution from this stackoverflow answer.
In [35]:
# for table in jpl_ssd_BS.soup.find_all("table"):
# for subtable in table.find_all("table"):
# print subtable.find("table") # uncomment this out and run it to see the whole darn thing
Let's just focus on the Physical Data subpage for today. This is the way to find a specific tag (in this case img) with a specific attribute (in this case alt="PHYSICAL DATA"), and then the parent module gets its parent. Then the href index in the square brackets [] gets the web address we desire.
In [46]:
jpl_ssd_BS.soup.find('img',{"alt":"PHYSICAL DATA"}).parent['href']
Out[46]:
In [48]:
JPL_SSD_PHYS_DATA_URL = JPL_SSD_URL + jpl_ssd_BS.soup.find('img',{"alt":"PHYSICAL DATA"}).parent['href'][1:]
JPL_SSD_PHYS_DATA_URL
Out[48]:
In [49]:
jpl_ssd_phys_data_BS = Physique.scrape_BS.scraped_BS(JPL_SSD_PHYS_DATA_URL)
At this point, I wish there was a rational and civilized manner to scrape all the relevant quantitative data from here for all the links (using Scrapy?) but I need help at this point for that endeavor. Otherwise, I manually look at the webpage itself and manually use Inspect Element to find what I want and then use BeautifulSoup accordingly.
In [81]:
jpl_ssd_phys_data_BS.soup.find('h2',text="Planets").find_next('a')
Out[81]:
In [82]:
JPL_SSD_PLANET_PHYS_PAR_URL = JPL_SSD_URL + jpl_ssd_phys_data_BS.soup.find('h2',text="Planets").find_next('a')['href']
jpl_ssd_planet_phys_par_BS = Physique.scrape_BS.scraped_BS(JPL_SSD_PLANET_PHYS_PAR_URL)
In [104]:
jpl_ssd_planet_phys_parTBL = jpl_ssd_planet_phys_par_BS.soup.find("div", {"class":"page_title"}).find_next("table")
Time to scrape the actual html code for the table we desire: jpl_ssd_planet_phys_parTBL. Take a look at the function make_conv_lst in scrape_BS.py and take a look at that first for loop. That's the procedure we'll take (and I confirmed that this is in practice what's done on stackoverflow). But wait: the data values are themselves tables. So again, there is no rhyme or reason for the logic or rationale for the html tables for data, in general, for any websites. So I'll get the headers first (which makes sense) and do an ugly hack for the data values (also notice the recursive=False option).
In [197]:
data = []
for row in jpl_ssd_planet_phys_parTBL.find_all('tr', recursive=False):
cols = row.find_all('td', recursive=False)
cols = [ele.text if ele.text != u'\xa0' else u'' for ele in cols]
data.append(cols)
hdrs = data[:2] # get the headers first
In [198]:
jpl_ssd_planet_phys_parTBL.find_all('tr')[2].find_all('td')[18].text
data = [[row[0].replace(u'\xa0',''),]+row[1:] for row in data[2:]] # remove the space, \xa0 from each of the planet's names
data = [[row[0],]+[ele.replace('\n','') for ele in row[1:]] for row in data] # remove the '\n' strings
data = [[row[0],]+[ele.split(u'\xb1')[0] for ele in row[1:]] for row in data] # let's just get the values
data = [[row[0],]+[ele.split(u'\xa0')[0] for ele in row[1:]] for row in data] # let's just get the values
I'll add back the units as part of the data (I don't know a sane and civilized way of attaching to each of the column names in pandas, a pandas DataFrame, the units, as extra information)
In [199]:
data = [hdrs[1],] + data
In [202]:
import pandas as pd
data = pd.DataFrame( data )
data.columns = hdrs[0]
In [203]:
data
Out[203]:
Time to save our work as a "pickle'd" pandas DataFrame.
In [204]:
data.to_pickle('./rawdata/JPL_NASA_SSD_Planet_Phys_Par_values.pkl') # values only
And so to access this, to use in Python, do the following, using .read_pickle of pandas:
In [207]:
PlanetParDF = pd.read_pickle('./rawdata/JPL_NASA_SSD_Planet_Phys_Par_values.pkl')
In [ ]: