Cargue de datos s SciDB

1) Verificar Prerequisitos

Python

SciDB-Py requires Python 2.6-2.7 or 3.3


In [1]:
import sys
sys.version_info


Out[1]:
sys.version_info(major=3, minor=4, micro=3, releaselevel='final', serial=0)

NumPy

tested with version 1.9 (1.13.1)


In [2]:
import numpy as np
np.__version__


Out[2]:
'1.13.1'

Requests

tested with version 2.7 (2.18.1) Required for using the Shim interface to SciDB.


In [3]:
import requests
requests.__version__


Out[3]:
'2.18.4'

Pandas (optional)

tested with version 0.15. (0.20.3) Required only for importing/exporting SciDB arrays as Pandas Dataframe objects.


In [4]:
import pandas as pd
pd.__version__


Out[4]:
'0.20.3'

SciPy (optional)

tested with versions 0.10-0.12. (0.19.0) Required only for importing/exporting SciDB arrays as SciPy sparse matrices.


In [5]:
import scipy
scipy.__version__


Out[5]:
'0.19.0'

2) Importar scidb-py

pip install git+http://github.com/paradigm4/scidb-py.git


In [6]:
import scidbpy
scidbpy.__version__


Out[6]:
'16.9.1'

In [7]:
from scidbpy import connect

conectarse al servidor de Base de datos


In [8]:
sdb = connect('http://localhost:8080')

3) Leer archivo con cada una de las ondas


In [9]:
import urllib.request  # urllib2 in python2 the lib that handles the url stuff
target_url = "https://physionet.org/physiobank/database/mimic3wdb/matched/RECORDS-waveforms"
data = urllib.request.urlopen(target_url) # it's a file like object and works just like a file

In [10]:
lines = data.readlines();
line = str(lines[2])
line


Out[10]:
"b'p00/p000033/p000033-2116-12-24-12-35\\n'"

Quitarle caracteres especiales


In [11]:
line = line.replace('b\'','').replace('\'','').replace('\\n','')
splited = line.split("/")
splited


Out[11]:
['p00', 'p000033', 'p000033-2116-12-24-12-35']

In [12]:
carpeta,subCarpeta,onda = line.split("/")
carpeta = carpeta+"/"+subCarpeta
onda


Out[12]:
'p000033-2116-12-24-12-35'

4) Importar WFDB para conectarse a physionet


In [13]:
import wfdb

In [14]:
carpeta = "p05/p050140"
onda = "p050140-2188-07-26-05-51"

In [15]:
sig, fields = wfdb.srdsamp(onda,pbdir='mimic3wdb/matched/'+carpeta, sampfrom=10000)

In [16]:
print(sig)
print("signame: " + str(fields['signame']))
print("units: " + str(fields['units']))
print("fs: " + str(fields['fs']))
print("comments: " + str(fields['comments']))
print("fields: " + str(fields))


[[ 0.4921875   0.33858268         nan ...,         nan         nan
          nan]
 [ 0.484375    0.33858268         nan ...,         nan         nan
          nan]
 [ 0.4765625   0.33070866         nan ...,         nan         nan
          nan]
 ..., 
 [        nan         nan         nan ...,         nan         nan
   0.36470588]
 [        nan         nan         nan ...,         nan         nan
   0.36862745]
 [        nan         nan         nan ...,         nan         nan
   0.38039216]]
signame: ['aVR', 'II', 'I', 'III', 'ABP', 'CVP', 'PLETH']
units: ['mV', 'mV', 'mV', 'mV', 'mmHg', 'mmHg', 'NU']
fs: 125
comments: ['Location: micu']
fields: {'signame': ['aVR', 'II', 'I', 'III', 'ABP', 'CVP', 'PLETH'], 'comments': ['Location: micu'], 'units': ['mV', 'mV', 'mV', 'mV', 'mmHg', 'mmHg', 'NU'], 'fs': 125}

Busca la ubicacion de la señal tipo II


In [17]:
signalII = None
try:
    signalII = fields['signame'].index("II")
except ValueError:
    print("List does not contain value")
if(signalII!=None):
    print("List contain value")


List contain value

Normaliza la señal y le quita los valores en null


In [18]:
#array = wfdb.processing.normalize(x=sig[:, signalII], lb=-2, ub=2)
array = sig[:, signalII]
array = array[~np.isnan(sig[:, signalII])]
arrayNun = np.trim_zeros(array)
array


Out[18]:
array([ 0.33858268,  0.33858268,  0.33070866, ..., -0.38582677,
       -0.44094488, -0.48031496])

Cambiar los guiones "-" por raya al piso "_" porque por algun motivo SciDB tiene problemas con estos caracteres Si el arreglo sin valores nulos no queda vacio lo sube al SciDB


In [19]:
ondaName = onda.replace("-", "_")

In [20]:
if arrayNun.size>0 :
    sdb.input(upload_data=array).store(ondaName,gc=False)
#    sdb.iquery("store(input(<x:int64>[i], '{fn}', 0, '{fmt}'), "+ondaName+")", upload_data=array)

Check de list of arrays in SciDB


In [21]:
dir(sdb.arrays)


Out[21]:
['p050140_2188_07_26_05_51']