This notebook provides code for scraping rates from rate.am. The rates are provided inside an HTML table, thus pandas.read_html() function is probably the most user friendly method of extrating infromation from rate.am. However, as one may be interested in extracting information from similar websites with interactive components driven by JavaScript, we use Selenium here first to make some actions and get page soruce and then only use pandas for scraping and manipulation.
Selenium functions and methods will be additionally posted in a separate document.
Key points:
In [1]:
import pandas as pd
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
In [2]:
browser = webdriver.Chrome()
In [3]:
url = "http://rate.am/en/armenian-dram-exchange-rates/banks/cash"
In [11]:
browser.get(url) #will wait until page is fully loaded
browser.find_element_by_xpath("//label[contains(text(),'Non-cash')]").click()
#browser.current_url
page = browser.page_source
browser.close()
In [15]:
all_tables = pd.read_html(page)
In [24]:
all_tables[2]
Out[24]:
In [52]:
cols = [i for i in range(5,13)]
cols.append(1)
all_tables[2].iloc[2:19,cols]
Out[52]:
Starting from here we introduce several Selenium tricks for manipulating the page (such as clicking the Page Down key on the keyboard).
In [56]:
browser = webdriver.Chrome()
browser.get(url)
In [62]:
button = browser.find_element_by_tag_name("html")
In [67]:
button.send_keys(Keys.PAGE_DOWN)
old=""
new=" "
while new>old:
old = browser.page_source
button.send_keys(Keys.END)
new = browser.page_source
In [68]:
browser.get("https://www.bloomberg.com/")
In [78]:
browser.implicitly_wait(30)
browser.find_element_by_partial_link_text("S&P")
Out[78]:
In [79]:
#EC(presense_of_element_located())
In [ ]: