Scraping Rate.am

This notebook provides code for scraping rates from rate.am. The rates are provided inside an HTML table, thus pandas.read_html() function is probably the most user friendly method of extrating infromation from rate.am. However, as one may be interested in extracting information from similar websites with interactive components driven by JavaScript, we use Selenium here first to make some actions and get page soruce and then only use pandas for scraping and manipulation.

Selenium functions and methods will be additionally posted in a separate document.

Key points:

  • browser.page_source - provides the HTML source of the page loaded by Selenium,
  • browser.current_url - provides the URL of the page where Selenium has navigated (maybe different from the base URL has the programmer may aks Selenium to click buttons or follow links),
  • find_element_by_xpath() - Selenium method for finding HTML elements using Xpath approach
  • send_keys(Keys.PAGE_DOWN) - tells Selenium to "press" Page Down key on keyboard
  • browser.implicitly_wait(30) - tells Selenium to wait 30 seconds for some action to be completed.

In [1]:
import pandas as pd
from selenium import webdriver
from selenium.webdriver.common.keys import Keys

In [2]:
browser = webdriver.Chrome()

In [3]:
url = "http://rate.am/en/armenian-dram-exchange-rates/banks/cash"

In [11]:
browser.get(url) #will wait until page is fully loaded
browser.find_element_by_xpath("//label[contains(text(),'Non-cash')]").click()
#browser.current_url
page = browser.page_source
browser.close()

In [15]:
all_tables = pd.read_html(page)

In [24]:
all_tables[2]


Out[24]:
0 1 2 3 4 5 6 7 8 9 10 11 12
0 NaN Bank Branches Date 1 USD \t1 EUR \t1 RUR \t1 GBP \t1 GEL \t1 CHF ... 1 USD \t1 EUR \t1 RUR \t1 GBP \t1 GEL \t1 CHF ... 1 USD \t1 EUR \t1 RUR \t1 GBP \t1 GEL \t1 CHF ... 1 USD \t1 EUR \t1 RUR \t1 GBP \t1 GEL \t1 CHF ... NaN NaN NaN NaN NaN
1 Buy Sell Buy Sell Buy Sell Buy Sell NaN NaN NaN NaN NaN
2 1. Unibank NaN 42 19 Jul, 19:11 478.50 482 554 564.00 7.50 7.80 620.0 640.0
3 2. VTB Bank (Armenia) NaN 69 19 Jul, 19:11 479.50 482 555 560.00 7.55 7.66 621.0 632.0
4 3. Evocabank NaN 12 19 Jul, 19:01 479.50 482 554 561.00 7.54 7.64 620.0 629.0
5 4. Inecobank NaN 23 19 Jul, 19:01 479.50 482 551.50 561.50 7.45 7.68 618.0 629.0
6 5. ID Bank NaN 14 19 Jul, 19:01 479 482 553 562.00 7.52 7.68 621.0 636.0
7 6. Byblos Bank Armenia NaN 5 19 Jul, 19:01 479 482.50 556 564.00 7.52 7.75 623.0 634.0
8 7. ArmSwissBank NaN 1 19 Jul, 19:01 479.50 481.50 555 560.00 7.53 7.73 622.0 628.0
9 8. Ardshinbank NaN 53 19 Jul, 19:01 479.50 482 554 562.00 7.47 7.72 622.0 637.0
10 9. ARARATBANK NaN 48 19 Jul, 19:01 479.50 482 549 567.00 7.49 7.70 611.0 640.0
11 10. ACBA-Credit Agricole.. NaN 59 19 Jul, 19:01 479 482 553 561.00 7.52 7.67 612.0 633.0
12 11. Mellat Bank NaN 1 19 Jul, 19:01 478 482 555 563.00 7.40 7.90 NaN NaN
13 12. ARMECONOMBANK NaN 51 19 Jul, 19:01 479 482 553 561.00 7.52 7.66 613.0 632.0
14 13. ArmBusinessBank NaN 55 19 Jul, 19:01 479.50 482 554 562.00 7.46 7.72 620.0 631.0
15 14. Converse Bank NaN 33 19 Jul, 19:01 478.50 482 552 560.00 7.49 7.69 619.0 628.0
16 15. Ameriabank NaN 11 19 Jul, 19:01 479 482 554 562.00 7.46 7.71 619.5 629.5
17 16. Artsakhbank NaN 25 19 Jul, 19:01 479.75 482 555 562.00 7.51 7.70 620.0 632.0
18 17. HSBC Bank Armenia NaN 7 19 Jul, 19:01 479 482 553.50 560.50 7.45 7.68 619.5 627.5
19 Choose the nearest branch for you and save you... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
20 Minimum 478 481.50 549 560 7.40 7.64 611 627.50 NaN NaN NaN NaN
21 Maximum 479.75 482.50 556 567 7.55 7.90 623 640.00 NaN NaN NaN NaN
22 Average 479.13 482 553.59 561.94 7.49 7.71 618.81 632.38 NaN NaN NaN NaN
23 Fluctuation (Jul 18) +0.22 +0.20 -0.75 -1.01 -0.04 -0.04 -2.25 -2.40 NaN NaN NaN NaN

In [52]:
cols = [i for i in range(5,13)]
cols.append(1)
all_tables[2].iloc[2:19,cols]


Out[52]:
5 6 7 8 9 10 11 12 1
2 478.50 482 554 564.0 7.50 7.80 620.0 640.0 Unibank
3 479.50 482 555 560.0 7.55 7.66 621.0 632.0 VTB Bank (Armenia)
4 479.50 482 554 561.0 7.54 7.64 620.0 629.0 Evocabank
5 479.50 482 551.50 561.5 7.45 7.68 618.0 629.0 Inecobank
6 479 482 553 562.0 7.52 7.68 621.0 636.0 ID Bank
7 479 482.50 556 564.0 7.52 7.75 623.0 634.0 Byblos Bank Armenia
8 479.50 481.50 555 560.0 7.53 7.73 622.0 628.0 ArmSwissBank
9 479.50 482 554 562.0 7.47 7.72 622.0 637.0 Ardshinbank
10 479.50 482 549 567.0 7.49 7.70 611.0 640.0 ARARATBANK
11 479 482 553 561.0 7.52 7.67 612.0 633.0 ACBA-Credit Agricole..
12 478 482 555 563.0 7.40 7.90 NaN NaN Mellat Bank
13 479 482 553 561.0 7.52 7.66 613.0 632.0 ARMECONOMBANK
14 479.50 482 554 562.0 7.46 7.72 620.0 631.0 ArmBusinessBank
15 478.50 482 552 560.0 7.49 7.69 619.0 628.0 Converse Bank
16 479 482 554 562.0 7.46 7.71 619.5 629.5 Ameriabank
17 479.75 482 555 562.0 7.51 7.70 620.0 632.0 Artsakhbank
18 479 482 553.50 560.5 7.45 7.68 619.5 627.5 HSBC Bank Armenia

Starting from here we introduce several Selenium tricks for manipulating the page (such as clicking the Page Down key on the keyboard).


In [56]:
browser = webdriver.Chrome()
browser.get(url)

In [62]:
button = browser.find_element_by_tag_name("html")

In [67]:
button.send_keys(Keys.PAGE_DOWN)
old=""
new=" "
while new>old:
    old = browser.page_source
    button.send_keys(Keys.END)
    new = browser.page_source

In [68]:
browser.get("https://www.bloomberg.com/")

In [78]:
browser.implicitly_wait(30)
browser.find_element_by_partial_link_text("S&P")


Out[78]:
<selenium.webdriver.remote.webelement.WebElement (session="619f97679d6f606a07c47f557cd5f89a", element="0.2922334204183461-1")>

In [79]:
#EC(presense_of_element_located())

In [ ]: