This notebook is intended to show how to use pandas, and sql alchemy to upload data into DB2-switch.

Install using pip or any other package manager pandas, sqlalchemy and pg8000. The later one is the driver to connect to the db.



In [1]:

    
import pandas as pd
from sqlalchemy import create_engine

After importing the required packages, first create the engine to connect to the DB. The approach I generally use is to create a string based on the username and password. The code is a function, you just need to fill in with the username, password and the dbname.

It allows you to create different engines to connect to serveral dbs.



In [2]:

    
def connection(user,passwd,dbname, echo_i=False):
    str1 = ('postgresql+pg8000://' + user +':' + passw + '@switch-db2.erg.berkeley.edu:5432/' 
            + dbname + '?ssl=true&sslfactory=org.postgresql.ssl.NonValidatingFactory')
    engine = create_engine(str1,echo=echo_i)
    return engine



In [3]:

    
user = 'jdlara'
passw = 'Amadeus-2010'
dbname = 'apl_cec' 
engine= connection(user,passw,dbname)

Afterwards, use pandas to import the data from Excel files or any other text file format. Make sure that the data in good shape before trying to push it into the server. In this example I use previous knowledge of the structure of the tabs in the excel file to recursively upload each tab and match the name of the table with the tab name.

Before doing this I already checked that the data is properly organized.



In [4]:

    
excel_file = '~/Box Sync/EPIC-Biomass/PGEFeedersFinal.xlsx'



In [ ]:

    
tab_name = ['substation_banks','substations','feeders_limits_data','feeder_minimpacts']
schema_for_upload = 'PGE'
for name in tab_name:
    pd_data = pd.read_excel(excel_file, sheetname=name, encoding='UTF-8')
    pd_data.to_sql(name, engine, schema=schema_for_upload, if_exists='replace',chunksize=100)

Once the data is updated, it is possible to run the SQL commands to properly create geom columns in the tables, this can be done as follows. The ojective is to run an SQL querie like this:

set search_path = SCHEMA, public;
alter table TABLE drop column if exists geom;
SELECT AddGeometryColumn ('SCHEMA','TABLE','geom',4326,'POINT',2);
UPDATE TABLE set geom = ST_SetSRID(st_makepoint(TABLE.lon, TABLE.lat), 4326)::geometry;



In [9]:

    
def create_geom(table,schema,engine):
    k = engine.connect()
    query = ('set search_path = "'+ schema +'"'+ ', public;')
    print query
    k.execute(query)
    query = ('alter table ' + table + ' drop column if exists geom;')
    print query
    k.execute(query)
    query = 'SELECT AddGeometryColumn (\''+ schema + '\',\''+ table + '\',\'geom\''+',4326,\'POINT\',2);'
    print query
    k.execute(query)
    query = ('UPDATE ' + table + ' set geom = ST_SetSRID(st_makepoint(' + table + '.lon, ' + 
             table + '.lat), 4326)::geometry;')
    k.execute(query)
    print query
    return 'geom column added with SRID 4326'



In [10]:

    
table = 'feeders'
schema = 'PGE'
create_geom(table,schema,engine)









    



set search_path = "PGE", public;
alter table feeders drop column if exists geom;
SELECT AddGeometryColumn ('PGE','feeders','geom',4326,'POINT',2);
UPDATE feeders set geom = ST_SetSRID(st_makepoint(feeders.lon, feeders.lat), 4326)::geometry;






    Out[10]:





'geom column added with SRID 4326'

The function created the geom column, the next step is to define a function to create the Primary-Key in the db. Remember that the index from the data frame is included as an index in the db, sometimes an index is not really neded and might need to be dropped.



In [19]:

    
col = 'feeder_no'
k = engine.connect()
query = ('set search_path = "'+ schema +'"'+ ', public;')
k.execute(query)
query = ('alter table ' + table + ' ADD CONSTRAINT '+ table +'_pk PRIMARY KEY (' + col + ')')
print query 
k.execute(query)









    



alter table feeders ADD CONSTRAINT feeders_pk PRIMARY KEY (feeder_no)






    Out[19]:





<sqlalchemy.engine.result.ResultProxy at 0x10ef34050>



In [ ]:

    
ALTER TABLE table_name
  ADD CONSTRAINT [ constraint_name ]
    PRIMARY KEY (index_col1, index_col2, ... index_col_n)