Cargue de datos s SciDB

1) Verificar Prerequisitos

Python

SciDB-Py requires Python 2.6-2.7 or 3.3



In [1]:

    
import sys
sys.version_info









    Out[1]:





sys.version_info(major=3, minor=4, micro=3, releaselevel='final', serial=0)

NumPy

tested with version 1.9 (1.13.1)



In [2]:

    
import numpy as np
np.__version__









    Out[2]:





'1.13.1'

Requests

tested with version 2.7 (2.18.1) Required for using the Shim interface to SciDB.



In [3]:

    
import requests
requests.__version__









    Out[3]:





'2.18.4'

Pandas (optional)

tested with version 0.15. (0.20.3) Required only for importing/exporting SciDB arrays as Pandas Dataframe objects.



In [4]:

    
import pandas as pd
pd.__version__









    Out[4]:





'0.20.3'

SciPy (optional)

tested with versions 0.10-0.12. (0.19.0) Required only for importing/exporting SciDB arrays as SciPy sparse matrices.



In [5]:

    
import scipy
scipy.__version__









    Out[5]:





'0.19.0'

2) Importar scidb-py

pip install git+http://github.com/paradigm4/scidb-py.git



In [6]:

    
import scidbpy
scidbpy.__version__









    Out[6]:





'16.9.1'



In [7]:

    
from scidbpy import connect

conectarse al servidor de Base de datos



In [8]:

    
sdb = connect('http://localhost:8080')

3) Leer archivo con cada una de las ondas



In [9]:

    
import urllib.request  # urllib2 in python2 the lib that handles the url stuff
target_url = "https://physionet.org/physiobank/database/mimic3wdb/matched/RECORDS-waveforms"
data = urllib.request.urlopen(target_url) # it's a file like object and works just like a file



In [10]:

    
lines = data.readlines();
line = str(lines[2])
line









    Out[10]:





"b'p00/p000033/p000033-2116-12-24-12-35\\n'"

Quitarle caracteres especiales



In [11]:

    
line = line.replace('b\'','').replace('\'','').replace('\\n','')
splited = line.split("/")
splited









    Out[11]:





['p00', 'p000033', 'p000033-2116-12-24-12-35']



In [12]:

    
carpeta,subCarpeta,onda = line.split("/")
carpeta = carpeta+"/"+subCarpeta
onda









    Out[12]:





'p000033-2116-12-24-12-35'

4) Importar WFDB para conectarse a physionet



In [13]:

    
import wfdb



In [14]:

    
carpeta = "p05/p050140"
onda = "p050140-2188-07-26-05-51"



In [15]:

    
sig, fields = wfdb.srdsamp(onda,pbdir='mimic3wdb/matched/'+carpeta, sampfrom=10000)



In [16]:

    
print(sig)
print("signame: " + str(fields['signame']))
print("units: " + str(fields['units']))
print("fs: " + str(fields['fs']))
print("comments: " + str(fields['comments']))
print("fields: " + str(fields))









    



[[ 0.4921875   0.33858268         nan ...,         nan         nan
          nan]
 [ 0.484375    0.33858268         nan ...,         nan         nan
          nan]
 [ 0.4765625   0.33070866         nan ...,         nan         nan
          nan]
 ..., 
 [        nan         nan         nan ...,         nan         nan
   0.36470588]
 [        nan         nan         nan ...,         nan         nan
   0.36862745]
 [        nan         nan         nan ...,         nan         nan
   0.38039216]]
signame: ['aVR', 'II', 'I', 'III', 'ABP', 'CVP', 'PLETH']
units: ['mV', 'mV', 'mV', 'mV', 'mmHg', 'mmHg', 'NU']
fs: 125
comments: ['Location: micu']
fields: {'signame': ['aVR', 'II', 'I', 'III', 'ABP', 'CVP', 'PLETH'], 'comments': ['Location: micu'], 'units': ['mV', 'mV', 'mV', 'mV', 'mmHg', 'mmHg', 'NU'], 'fs': 125}

Busca la ubicacion de la señal tipo II



In [17]:

    
signalII = None
try:
    signalII = fields['signame'].index("II")
except ValueError:
    print("List does not contain value")
if(signalII!=None):
    print("List contain value")









    



List contain value

Normaliza la señal y le quita los valores en null



In [18]:

    
#array = wfdb.processing.normalize(x=sig[:, signalII], lb=-2, ub=2)
array = sig[:, signalII]
array = array[~np.isnan(sig[:, signalII])]
arrayNun = np.trim_zeros(array)
array









    Out[18]:





array([ 0.33858268,  0.33858268,  0.33070866, ..., -0.38582677,
       -0.44094488, -0.48031496])

Cambiar los guiones "-" por raya al piso "_" porque por algun motivo SciDB tiene problemas con estos caracteres Si el arreglo sin valores nulos no queda vacio lo sube al SciDB



In [19]:

    
ondaName = onda.replace("-", "_")



In [20]:

    
if arrayNun.size>0 :
    sdb.input(upload_data=array).store(ondaName,gc=False)
#    sdb.iquery("store(input(<x:int64>[i], '{fn}', 0, '{fmt}'), "+ondaName+")", upload_data=array)

Check de list of arrays in SciDB



In [21]:

    
dir(sdb.arrays)









    Out[21]:





['p050140_2188_07_26_05_51']