In [1]:

    
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sn
import mpld3
import pandas as pd
import numpy as np
import imp
from sqlalchemy import create_engine
sn.set_context('notebook')

Sweden: missing Ca data

Heleen has discovered that Ca data are missing from RESA2 for 25 Swedish sites in the TOC trends analysis - see e-mail received 07/02/2017 at 16.12. The relevant background data is gathered together in this spreadsheet:

C:\Data\James_Work\Staff\Heleen_d_W\ICP_Waters\TOC_Trends_Analysis_2015\Swedish_Ca_Data\Missing_Data_25_Swedish_Sites.xlsx

The 25 sites (with their RESA2 site codes) are listed in the resa2_stns_missing_data sheet. Salar has extracted the data for these sites from the SMHI database and I have copied this to the salar_data sheet and simplified the column headings (see e-mail from Heleen received 09/02/2017 at 10.35 for the original). The sheet from_resa_10-02-2017 shows all the Mg and Ca data from the Swedish sites currently associated with the ICPW_TOCTRENDS_2015_SE project (as of 10/02/2017, before any errors were corrected). This should make it possible to identify which water samples are already in the database.

1. Read data



In [2]:

    
# Read data
in_xls = (r'C:\Data\James_Work\Staff\Heleen_d_W\ICP_Waters\TOC_Trends_Analysis_2015'
          r'\Swedish_Ca_Data\Missing_Data_25_Swedish_Sites.xlsx')

smhi_df = pd.read_excel(in_xls, sheetname='salar_data')
resa_df = pd.read_excel(in_xls, sheetname='from_resa_10-02-2017')

# Get just the cols of interest
smhi_df = smhi_df[['stn_code', 'stn_name', 'date', 'depth_m', 'ca_mekv/l', 'mg_mekv/l']]
resa_df = resa_df[['stn_code', 'stn_name', 'date', 'depth1', 'depth2', 'ca_uekv/l', 'mg_uekv/l']]



In [3]:

    
smhi_df.head(10)



In [4]:

    
resa_df.head(10)









    Out[4]:






  
    
      
      stn_code
      stn_name
      date
      depth1
      depth2
      ca_uekv/l
      mg_uekv/l
    
  
  
    
      0
      36561
      Abiskojaure
      1988-06-21
      0
      0
      182.991
      44.987
    
    
      1
      36561
      Abiskojaure
      1988-08-10
      0
      0
      158.992
      36.989
    
    
      2
      36561
      Abiskojaure
      1988-09-12
      0
      0
      184.991
      42.988
    
    
      3
      36561
      Abiskojaure
      1989-03-14
      0
      0
      342.983
      78.977
    
    
      4
      36561
      Abiskojaure
      1989-06-14
      0
      0
      248.988
      61.982
    
    
      5
      36561
      Abiskojaure
      1989-08-23
      0
      0
      164.992
      37.989
    
    
      6
      36561
      Abiskojaure
      1989-10-11
      0
      0
      202.990
      48.986
    
    
      7
      36561
      Abiskojaure
      1990-04-17
      0
      0
      323.984
      86.975
    
    
      8
      36561
      Abiskojaure
      1990-08-14
      0
      0
      138.993
      33.990
    
    
      9
      36561
      Abiskojaure
      1990-09-11
      0
      0
      153.992
      37.989

2. Join data to identify common water samples

The recent data from Salar includes more water samples than are currently in RESA2. In addition, it looks as though the sample depths in RESA2 might be incorrect. As a first step, let's see if we can match water samples based on sample date and station codes.



In [5]:

    
# Join datasets
df = pd.merge(resa_df, smhi_df, how='inner',
              left_on=['stn_code', 'date'],
              right_on=['stn_code', 'date'])
df.head()









    Out[5]:






  
    
      
      stn_code
      stn_name_x
      date
      depth1
      depth2
      ca_uekv/l
      mg_uekv/l
      stn_name_y
      depth_m
      ca_mekv/l
      mg_mekv/l
    
  
  
    
      0
      36575
      Örvattnet
      1988-03-18
      0
      0
      NaN
      32.990
      Örvattnet
      2.0
      0.059
      0.033
    
    
      1
      36575
      Örvattnet
      1988-08-24
      0
      0
      NaN
      30.991
      Örvattnet
      2.0
      0.052
      0.031
    
    
      2
      36575
      Örvattnet
      1988-11-02
      0
      0
      NaN
      30.991
      Örvattnet
      1.5
      0.054
      0.031
    
    
      3
      36575
      Örvattnet
      1989-02-22
      0
      0
      NaN
      30.991
      Örvattnet
      2.0
      0.067
      0.031
    
    
      4
      36575
      Örvattnet
      1989-08-21
      0
      0
      NaN
      29.991
      Örvattnet
      2.0
      0.055
      0.030

Next. check how much of the data downloaded by Salar is already in the database.



In [6]:

    
print len(smhi_df)
print len(df)

So, the recent data downloaded by Salar includes 2412 records for these 25 stations, of which 1910 are already in RESA2. Let's check if the values for Mg are comparable.



In [7]:

    
# Convert Mg to ueq/l and round to nearest whole number
df['mg_old_ueq/l'] = (df['mg_uekv/l'] + 0.5).astype(int)
df['mg_new_ueq/l'] = (df['mg_mekv/l']*1000 + 0.5).astype(int)

print 'All Mg values match:', (df['mg_old_ueq/l'] == df['mg_new_ueq/l']).all()

df.head()









    



All Mg values match: True






    Out[7]:






  
    
      
      stn_code
      stn_name_x
      date
      depth1
      depth2
      ca_uekv/l
      mg_uekv/l
      stn_name_y
      depth_m
      ca_mekv/l
      mg_mekv/l
      mg_old_ueq/l
      mg_new_ueq/l
    
  
  
    
      0
      36575
      Örvattnet
      1988-03-18
      0
      0
      NaN
      32.990
      Örvattnet
      2.0
      0.059
      0.033
      33
      33
    
    
      1
      36575
      Örvattnet
      1988-08-24
      0
      0
      NaN
      30.991
      Örvattnet
      2.0
      0.052
      0.031
      31
      31
    
    
      2
      36575
      Örvattnet
      1988-11-02
      0
      0
      NaN
      30.991
      Örvattnet
      1.5
      0.054
      0.031
      31
      31
    
    
      3
      36575
      Örvattnet
      1989-02-22
      0
      0
      NaN
      30.991
      Örvattnet
      2.0
      0.067
      0.031
      31
      31
    
    
      4
      36575
      Örvattnet
      1989-08-21
      0
      0
      NaN
      29.991
      Örvattnet
      2.0
      0.055
      0.030
      30
      30

Excellent - the values in the RESA2 database for Mg match the values in the data Salar has recently downloaded.

However, the sample depths do not match: all the samples in RESA2 are assumed to have come from the surface (depth1=0 and depth2=0), whereas Salar's data indicates they actually come from a variety of depths between 0.5 and 2 m. In the trend analysis so far, we've focused on samples taken within the upper 0.5 m of the water column. How much does this matter? If it is important, we'll need to correct the sample depths for all the Swedish sites, which will involve Salar downloading some more data followed by a bit of additonal work from me.

3. Add Ca data

Heleen has replied to my e-mail to say not to worry about the water depths - just add the new Ca data to the existing water sample IDs (see e-mail received 10/02/2017 at 16.19).

First, connect to the database.



In [8]:

    
# Connect to db
r2_func_path = r'C:\Data\James_Work\Staff\Heleen_d_W\ICP_Waters\Upload_Template\useful_resa2_code.py'
resa2 = imp.load_source('useful_resa2_code', r2_func_path)

engine, conn = resa2.connect_to_resa2()

Next, get the sample IDs associated with these 25 sites and join them to the data above.



In [9]:

    
# Get all water samples for these sites
stns = tuple(df['stn_code'].unique())

sql = ('SELECT station_id, sample_date, water_sample_id '
       'FROM resa2.water_samples '
       'WHERE station_id IN %s' % str(stns))
samp_df = pd.read_sql_query(sql, engine)

# Join
df = pd.merge(df, samp_df, how='left',
              left_on=['date', 'stn_code'],
              right_on=['sample_date', 'station_id'])

df.head()









    Out[9]:






  
    
      
      stn_code
      stn_name_x
      date
      depth1
      depth2
      ca_uekv/l
      mg_uekv/l
      stn_name_y
      depth_m
      ca_mekv/l
      mg_mekv/l
      mg_old_ueq/l
      mg_new_ueq/l
      station_id
      sample_date
      water_sample_id
    
  
  
    
      0
      36575
      Örvattnet
      1988-03-18
      0
      0
      NaN
      32.990
      Örvattnet
      2.0
      0.059
      0.033
      33
      33
      36575
      1988-03-18
      594956
    
    
      1
      36575
      Örvattnet
      1988-08-24
      0
      0
      NaN
      30.991
      Örvattnet
      2.0
      0.052
      0.031
      31
      31
      36575
      1988-08-24
      594957
    
    
      2
      36575
      Örvattnet
      1988-11-02
      0
      0
      NaN
      30.991
      Örvattnet
      1.5
      0.054
      0.031
      31
      31
      36575
      1988-11-02
      594958
    
    
      3
      36575
      Örvattnet
      1989-02-22
      0
      0
      NaN
      30.991
      Örvattnet
      2.0
      0.067
      0.031
      31
      31
      36575
      1989-02-22
      594959
    
    
      4
      36575
      Örvattnet
      1989-08-21
      0
      0
      NaN
      29.991
      Örvattnet
      2.0
      0.055
      0.030
      30
      30
      36575
      1989-08-21
      594960



In [10]:

    
# Extract columns of interest
df = df[['water_sample_id', 'ca_mekv/l']]

# Rename columns
df.columns = ['sample_id', 'value']

# Add new columns
df['method_id'] = 10551
df['approved'] = 'YES'
df['entered_by'] = 'JES'
df['entered_date'] = pd.datetime(2017, 2, 10)

# Reorder columns
df = df[['sample_id', 'method_id', 'value', 'approved', 'entered_by', 'entered_date']]

df.head(10)









    Out[10]:






  
    
      
      sample_id
      method_id
      value
      approved
      entered_by
      entered_date
    
  
  
    
      0
      594956
      10551
      0.059
      YES
      JES
      2017-02-10
    
    
      1
      594957
      10551
      0.052
      YES
      JES
      2017-02-10
    
    
      2
      594958
      10551
      0.054
      YES
      JES
      2017-02-10
    
    
      3
      594959
      10551
      0.067
      YES
      JES
      2017-02-10
    
    
      4
      594960
      10551
      0.055
      YES
      JES
      2017-02-10
    
    
      5
      594961
      10551
      0.058
      YES
      JES
      2017-02-10
    
    
      6
      594962
      10551
      0.047
      YES
      JES
      2017-02-10
    
    
      7
      594963
      10551
      0.060
      YES
      JES
      2017-02-10
    
    
      8
      594964
      10551
      0.059
      YES
      JES
      2017-02-10
    
    
      9
      594965
      10551
      0.060
      YES
      JES
      2017-02-10



In [13]:

    
# Write to database
df.to_sql(name='WATER_CHEMISTRY_VALUES2',
          schema='RESA2',
          con=engine,
          index=False,
          if_exists='append')









    



C:\Data\WinPython-64bit-2.7.10.3\python-2.7.10.amd64\lib\site-packages\pandas\io\sql.py:1165: UserWarning: The provided table name 'WATER_CHEMISTRY_VALUES2' is not found exactly as such in the database after writing the table, possibly due to case sensitivity issues. Consider using lower case table names.
  warnings.warn(msg, UserWarning)

	stn_code	stn_name	date	depth_m	ca_mekv/l	mg_mekv/l
0	36658	Mossjön	1987-02-25	2.0	0.082	0.074
1	36658	Mossjön	1987-08-11	2.0	0.066	0.050
2	36658	Mossjön	1987-10-20	2.0	0.061	0.055
3	36658	Mossjön	1988-05-03	2.0	0.055	0.043
4	36658	Mossjön	1988-08-09	2.0	0.055	0.048
5	36658	Mossjön	1988-10-18	2.0	0.061	0.051
6	36658	Mossjön	1989-08-09	2.0	0.072	0.058
7	36658	Mossjön	1989-10-30	0.5	0.073	0.058
8	36658	Mossjön	1990-02-20	2.0	0.059	0.063
9	36658	Mossjön	1990-08-09	2.0	0.079	0.066

	stn_code	stn_name	date	ca_uekv/l	mg_uekv/l
0	36561	Abiskojaure	1988-06-21	182.991	44.987
1	36561	Abiskojaure	1988-08-10	158.992	36.989
2	36561	Abiskojaure	1988-09-12	184.991	42.988
3	36561	Abiskojaure	1989-03-14	342.983	78.977
4	36561	Abiskojaure	1989-06-14	248.988	61.982
5	36561	Abiskojaure	1989-08-23	164.992	37.989
6	36561	Abiskojaure	1989-10-11	202.990	48.986
7	36561	Abiskojaure	1990-04-17	323.984	86.975
8	36561	Abiskojaure	1990-08-14	138.993	33.990
9	36561	Abiskojaure	1990-09-11	153.992	37.989

	stn_code	stn_name_x	date	ca_uekv/l	mg_uekv/l	stn_name_y	depth_m	ca_mekv/l	mg_mekv/l
0	36575	Örvattnet	1988-03-18	NaN	32.990	Örvattnet	2.0	0.059	0.033
1	36575	Örvattnet	1988-08-24	NaN	30.991	Örvattnet	2.0	0.052	0.031
2	36575	Örvattnet	1988-11-02	NaN	30.991	Örvattnet	1.5	0.054	0.031
3	36575	Örvattnet	1989-02-22	NaN	30.991	Örvattnet	2.0	0.067	0.031
4	36575	Örvattnet	1989-08-21	NaN	29.991	Örvattnet	2.0	0.055	0.030

	sample_id	method_id	value	approved	entered_by	entered_date
0	594956	10551	0.059	YES	JES	2017-02-10
1	594957	10551	0.052	YES	JES	2017-02-10
2	594958	10551	0.054	YES	JES	2017-02-10
3	594959	10551	0.067	YES	JES	2017-02-10
4	594960	10551	0.055	YES	JES	2017-02-10
5	594961	10551	0.058	YES	JES	2017-02-10
6	594962	10551	0.047	YES	JES	2017-02-10
7	594963	10551	0.060	YES	JES	2017-02-10
8	594964	10551	0.059	YES	JES	2017-02-10
9	594965	10551	0.060	YES	JES	2017-02-10