For this lab we will walk you through scraping stock data from the NASDAQ website: http://www.nasdaq.com/
We will walk you through the process and when you're done you'll be tasked with creating a program which when given an NASDAQ symbol will retrieve the name of the stock, price, and percent change.
While we work through the example, we will use Amazon.com's stock symbol:amzn
For a list of NASDAQ stocks to try with the completed program, see here: http://www.cnbc.com/nasdaq-100/
Here's our plan for a given stock symbol, for example amzn
:
We will write each step as its own function below
In [ ]:
import requests
from bs4 import BeautifulSoup
import time
In [ ]:
symbol = "amzn"
url = 'http://www.nasdaq.com/symbol/' + symbol
response = requests.get(url)
if response.ok:
print (response.text)
else:
print ("Error retrieving " + url)
It looks like it works, so we should refactor it into a function and then test again. we pass in the symbol
as input and return what we would print as output.
In [ ]:
def get_html_from_nasdaq(symbol):
url = 'http://www.nasdaq.com/symbol/' + symbol
response = requests.get(url)
if response.ok:
return response.text
else:
return "Error retrieving " + url
Let's try it out... should work
In [ ]:
html = get_html_from_nasdaq('amzn')
html
Next we want to take html
and extract out the meaningful bits. This is the part that requires time and patience. You'll need to use a browser's developer tools to find the important CSS selectors so you can retrieve the data.
For simplicity's sake, we've done this for you. Feel free to open http://www.nasdaq.com/symbol/amzn in your browser's developer tools and locate each of these three selectors:
In [ ]:
soup = BeautifulSoup(html, "lxml")
name = soup.select("div#qwidget_pageheader h1")[0].text
price = soup.select("div#qwidget_lastsale")[0].text
change = soup.select("div#qwidget_percent")[0].text
print(name,price,change)
We can't easily return 3 values from a function (actually you can in python, but we don't like to teach you that) so instead we will create a dictionary of these values first:
In [ ]:
soup = BeautifulSoup(html, "lxml")
name = soup.select("div#qwidget_pageheader h1")[0].text
price = soup.select("div#qwidget_lastsale")[0].text
change = soup.select("div#qwidget_percent")[0].text
stock= { 'Name' : name,
"Price" : price,
"Change" : change
}
print(stock)
Once again, its time to refactor this into a function we take html
as input (its the one thing we require to make this code work) and stock
as output (since it is what we print.
In [ ]:
def extract_stock_data(html):
soup = BeautifulSoup(html, "lxml")
name = soup.select("div#qwidget_pageheader h1")[0].text
price = soup.select("div#qwidget_lastsale")[0].text
change = soup.select("div#qwidget_percent")[0].text
stock= { 'Name' : name,
"Price" : price,
"Change" : change
}
return stock
And we should test out our new function:
In [ ]:
stock = extract_stock_data(html)
print(stock)
Now you need to put it all together. Write a program to:
The program should work like this:
Enter a stock symbol on the NASDAQ Exchange: amzn
Name: Amazon.com, Inc. Common Stock Quote & Summary Data
Price: $852.53
Change: 0.24%
In [ ]:
# todo write code here:
Please answer the following questions. This should be a personal narrative, in your own voice. Answer the questions by double clicking on the question and placing your answer next to the Answer: prompt.
Answer:
Answer:
Answer:
1 ==> I can do this on my own and explain how to do it.
2 ==> I can do this on my own without any help.
3 ==> I can do this with help or guidance from others. If you choose this level please list those who helped you.
4 ==> I don't understand this at all yet and need extra help. If you choose this please try to articulate that which you do not understand.
Answer:
In [ ]: