This spark notebook connects to BigInsights on Cloud using BigSQL.
This notebook runs succesfully on stand alone spark-1.6.1-bin-hadoop2.6
and will output a dataframe like this:
[Row(F1=77.0, F2=-16.200000762939453, F3=7.81678581237793), Row(F1=77.0, F2=-16.200000762939453, F3=7.528648376464844), Row(F1=77.0, F2=-16.200000762939453, F3=7.240304946899414), Row(F1=77.0, F2=-16.200000762939453, F3=6.9515509605407715), Row(F1=77.0, F2=-16.200000762939453, F3=6.6621809005737305), Row(F1=77.0, F2=-16.200000762939453, F3=8.371989250183105), Row(F1=77.0, F2=-16.200000762939453, F3=10.080772399902344), Row(F1=77.0, F2=-16.200000762939453, F3=11.788325309753418), Row(F1=77.0, F2=-16.200000762939453, F3=13.494444847106934), Row(F1=77.0, F2=-16.200000762939453, F3=15.198928833007812)]
The notebook environment is:
Notebook server: 3.2.0-8b0eef4 | Python 2.7.11 |Anaconda 2.3.0 (x86_64)| (default, Dec 6 2015, 18:57:58)
[GCC 4.2.1 (Apple Inc. build 5577)]
In [11]:
cluster = '10451' # E.g. 10000
username = 'biadmin' # E.g. biadmin
password = '' # Please request password from chris.snow@uk.ibm.com
table = 'biadmin.rowapplyout' # BigSQL table to query
In [12]:
import os
cwd = os.getcwd()
cls_host = 'ehaasp-{0}-mastermanager.bi.services.bluemix.net'.format(cluster)
sql_host = 'ehaasp-{0}-master-2.bi.services.bluemix.net'.format(cluster)
Get the cluster certificate
In [13]:
!openssl s_client -showcerts -connect {cls_host}:9443 < /dev/null | openssl x509 -outform PEM > certificate
# uncomment this for debugging
#!cat certificate
Add the cluster certificate to a truststore
In [14]:
!rm -f truststore.jks
!keytool -import -trustcacerts -alias biginsights -file certificate -keystore truststore.jks -storepass mypassword -noprompt
Now attempt to connect to BigInsights on Cloud
In [15]:
# test bigsql
url = 'jdbc:db2://{0}:51000/bigsql:user={1};password={2};sslConnection=true;sslTrustStoreLocation={3}/truststore.jks;Password=mypassword;'.format(sql_host, username, password, cwd)
df = sqlContext.read.format('jdbc').options(url=url, driver='com.ibm.db2.jcc.DB2Driver', dbtable=table).load()
print(df.take(10))
In [ ]: