Product SVD in Python

In this NoteBook, the reader will find code to load GeoTiff files, single- or multi-band, from HDFS. It reads the GeoTiffs as a ByteArrays and then stores the GeoTiffs in memory using MemFile from the RasterIO Python package. Subsequently, a statistical analysis is performed on each pair of datasets. In particular, the Python module productsvd is used to determine the SVD of the product of the two phenology datasets.

Initialization

This section initializes the notebook.

Dependencies

Here, all necessary libraries are imported.


In [75]:
#Add all dependencies to PYTHON_PATH
import sys
sys.path.append("/usr/lib/spark/python")
sys.path.append("/usr/lib/spark/python/lib/py4j-0.10.4-src.zip")
sys.path.append("/usr/lib/python3/dist-packages")
sys.path.append("/data/local/jupyterhub/modules/python")

#Define environment variables
import os
os.environ["HADOOP_CONF_DIR"] = "/etc/hadoop/conf"
os.environ["PYSPARK_PYTHON"] = "python3"
os.environ["PYSPARK_DRIVER_PYTHON"] = "ipython"

import subprocess

#Load PySpark to connect to a Spark cluster
from pyspark import SparkConf, SparkContext
from hdfs import InsecureClient
from tempfile import TemporaryFile

#from osgeo import gdal
#To read GeoTiffs as a ByteArray
from io import BytesIO
from rasterio.io import MemoryFile

import numpy as np
import pandas
import datetime
import matplotlib.pyplot as plt
import rasterio
from rasterio import plot
from os import listdir
from os.path import isfile, join
from numpy import exp, log
from numpy.random import standard_normal
import scipy.linalg
from productsvd import qrproductsvd
from sklearn.utils.extmath import randomized_svd

Configuration

This configuration determines whether functions print logs during the execution.


In [76]:
debugMode = True
maxModes = 26

Connect to Spark

Here, the Spark context is loaded, which allows for a connection to HDFS.


In [77]:
appName = "plot_GeoTiff"
masterURL = "spark://pheno0.phenovari-utwente.surf-hosted.nl:7077"

#A context needs to be created if it does not already exist
try:
    sc.stop()
except NameError:
    print("A new Spark Context will be created.")

sc = SparkContext(conf = SparkConf().setAppName(appName).setMaster(masterURL))
conf = sc.getConf()

Functions

This section defines various functions used in the analysis.

Support functions

These functions support other functions.


In [78]:
def dprint(msg):
    if (debugMode):
        print(str(datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")) + " | " + msg)

In [79]:
def progressBar(message, value, endvalue, bar_length = 20):
    if (debugMode):
        percent = float(value) / endvalue
        arrow = '-' * int(round(percent * bar_length)-1) + '>'
        spaces = ' ' * (bar_length - len(arrow))
        sys.stdout.write("\r" 
                         + str(datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")) 
                         + " | " 
                         + message 
                         + ": [{0}] {1}%".format(arrow + spaces, int(round(percent * 100)))
                        )
        if value == endvalue:
            sys.stdout.write("\n")
        sys.stdout.flush()

In [80]:
def get_hdfs_client():
    return InsecureClient("emma0.emma.nlesc.nl:50070", user="pheno",
         root="/")

Read functions

These functions allow for the reading of data.


In [81]:
def getDataSet(directoryPath, bandNum):
    dprint("Running getDataSet(directoryPath)")
    
    files = sc.binaryFiles(directoryPath + "/*.tif")
    fileList = files.keys().collect()
    dprint("Number of files: " + str(len(fileList)))
    dataSet = []
    plotShapes = []
    flattenedShapes = []
    for i, f in enumerate(sorted(fileList)):
        print(f)
        #progressBar("Reading files", i + 1, len(fileList))
        data = files.lookup(f)
        dataByteArray = bytearray(data[0])
        memfile = MemoryFile(dataByteArray)
        dataset = memfile.open()
        relevantBand = np.array(dataset.read()[bandNum])
        memfile.close()
        plotShapes.append(relevantBand.shape)
        flattenedDataSet = relevantBand.flatten()
        flattenedShapes.append(flattenedDataSet.shape)
        dataSet.append(flattenedDataSet)
    dataSet = np.array(dataSet).T
    dprint("dataSet.shape: " + str(dataSet.shape))
    
    dprint("Ending getDataSet(directoryPath)")
    return dataSet

In [82]:
def getMask(filePath):
    dprint("Running getMask(filePath)")
    
    mask_data = sc.binaryFiles(filePath).take(1)
    mask_byteArray = bytearray(mask_data[0][1])
    mask_memfile = MemoryFile(mask_byteArray)
    mask_dataset = mask_memfile.open()
    maskTransform = mask_dataset.transform
    mask_data = np.array(mask_dataset.read()[0])
    mask_memfile.close()
    dprint("mask_data.shape: " + str(mask_data.shape))
    
    dprint("Ending getMask(filePath)")
    return mask_data, maskTransform

Utility functions

These functions analyse and manipulate data.


In [83]:
def filterDataSet(dataSet, maskData):
    dprint("Running filterDataSet(dataSet, maskIndex)")
    
    maskIndex = np.nonzero(np.nan_to_num(maskData.flatten()))[0]
    dataSetFiltered = dataSet[maskIndex]
    dprint("dataSetFiltered.shape: " + str(dataSetFiltered.shape))
    
    dprint("Ending filterDataSet(dataSet, maskIndex)")
    return dataSetFiltered

In [84]:
def validateNorms(dataSet1, dataSet2, U, s, V):
    dprint("Running validateNorms(dataSet1, dataSet2, U, s, V)")
    
    length = len(s)
    norms = []
    for i in range(length):
        progressBar("Validating norms", i + 1, length)
        u = dataSet1 @ (dataSet2.T @ V.T[i]) / s[i]
        v = dataSet2 @ (dataSet1.T @ U.T[i]) / s[i]
        norms.append(scipy.linalg.norm(U.T[i] - u))
        norms.append(scipy.linalg.norm(V.T[i] - v))
    dprint("Largest norm difference: " + str(max(norms)))
    
    dprint("Ending validateNorms(dataSet1, dataSet2, U, s, V)")

Write functions

These functions write data and plots.


In [85]:
def writeCSVs(resultDirectory, U, s, V):
    dprint("Running writeCSV(resultDirectory, U, s, V)")
    
    for i, vectorData in enumerate([U, s, V]):
        progressBar("Writing CSV", i + 1, 3)
        fileName = ["U", "s", "V"][i] + ".csv"
        inFile = "/tmp/" + fileName
        outFile = resultDirectory + fileName
        #decompositionFile = open(inFile, "w")
        #vectorData.T.tofile(decompositionFile, sep = ",")
        #decompositionFile.close()
        #np.savetxt(inFile, vectorData.T, fmt='%.12f', delimiter=',')
        np.savetxt(inFile, vectorData.T, delimiter=',')
        #Upload to HDFS
        subprocess.run(['hadoop', 'dfs', '-copyFromLocal', '-f', inFile, outFile])
        #Remove from /tmp/
        subprocess.run(['rm', '-fr', inFile])
    
    dprint("Ending writeCSV(resultDirectory, U, s, V)")

In [86]:
def plotSingularValues(resultDirectory, s):
    dprint("Running plotSingularValues(resultDirectory, s)")
    
    fileName = "s.pdf"
    inFile = "/tmp/" + fileName
    outFile = resultDirectory + fileName
    x = range(len(s))
    total = s.T @ s
    cumulativeValue = 0
    valueList = []
    cumulativeList = []
    for i in x:
        value = np.square(s[i]) / total
        valueList.append(value)
        cumulativeValue = cumulativeValue + value
        cumulativeList.append(cumulativeValue)
    fig, ax1 = plt.subplots()
    ax2 = ax1.twinx()
    ax1.plot(x, valueList, "g^")
    ax2.plot(x, cumulativeList, "ro")
    ax1.set_xlabel("Singular values")
    ax1.set_ylabel("Variance explained", color = "g")
    ax2.set_ylabel("Cumulative variance explained", color = "r")
    plt.savefig(inFile)
    plt.clf()
    #Upload to HDFS
    subprocess.run(['hadoop', 'dfs', '-copyFromLocal', '-f', inFile, outFile])  
    #Remove from /tmp/
    subprocess.run(['rm', '-fr', inFile])
    
    dprint("Ending plotSingularValues(resultDirectory, s)")

In [87]:
def writeModes(resultDirectory, U, s, V):
    dprint("Running writeModes(resultDirectory, U, s, V)")
    
    for i in range(len(s)):
        progressBar("Writing modes", i + 1, len(s))
        fileName = "Mode" + str(i + 1).zfill(2) + ".txt"
        inFile = "/tmp/" + fileName
        outFile = resultDirectory + fileName
        decompositionFile = open(inFile, "w")
        U.T[i].tofile(decompositionFile, sep = ",")
        decompositionFile.close()
        decompositionFile = open(inFile, "a")
        decompositionFile.write("\n")
        s[i].tofile(decompositionFile, sep = ",")
        decompositionFile.write("\n")
        V.T[i].tofile(decompositionFile, sep = ",")
        decompositionFile.close()
        #Upload to HDFS
        subprocess.run(['hadoop', 'dfs', '-copyFromLocal', '-f', inFile, outFile])  
        #Remove from /tmp/
        subprocess.run(['rm', '-fr', inFile])
    
    dprint("Ending writeModes(resultDirectory, U, s, V)")

In [88]:
def plotModes(resultDirectory, U, s, V, maskData, maskTransform):
    dprint("Running plotModes(resultDirectory, U, s, V, maskData, maskTransform)")
    
    plotTemplate = np.full(maskData.shape[0] * maskData.shape[1], np.nan, dtype=np.float64)
    maskIndex = np.nonzero(np.nan_to_num(maskData.flatten()))[0]
        
    for i in range(min(maxModes, len(s))):
        progressBar("Plotting modes", i + 1, min(maxModes, len(s)))
        for vectorData, vectorName in zip([U, V], ["U", "V"]):
            data = np.copy(plotTemplate)
            np.put(data, maskIndex, vectorData.T[i])
            data = np.reshape(data, maskData.shape, )
                        
            fileName = "Mode" + vectorName + str(i + 1).zfill(2) + ".tif"
            inFile = "/tmp/" + fileName
            outFile = resultDirectory + fileName
            rasterioPlot = rasterio.open(inFile, "w", driver = "GTiff", width = data.shape[1], height = data.shape[0], count = 1, dtype = data.dtype, crs = "EPSG:4326", transform = maskTransform) #, compress="deflate")
            rasterioPlot.write(data, 1)
            rasterioPlot.close()
            #Upload to HDFS
            subprocess.run(['hadoop', 'dfs', '-copyFromLocal', '-f', inFile, outFile])  
            #Remove from /tmp/
            subprocess.run(['rm', '-fr', inFile])
    
    dprint("Ending plotModes(resultDirectory, U, s, V, maskData, maskTransform)")

Analysis function

This function combines all the necessary steps for the analysis.


In [89]:
import scipy.linalg
import numpy as np
from numpy import linalg as LA
from sklearn.decomposition import PCA
def qrproductsvdRG(A, B):
        QA, RA = scipy.linalg.qr(A, mode = "economic")
        dprint("QB.shape: " + str(QA.shape))
        dprint("RB.shape: " + str(RA.shape))
        
        QB, RB = scipy.linalg.qr(B, mode = "economic")
        dprint("QB.shape: " + str(QB.shape))
        dprint("RB.shape: " + str(RB.shape))
        
        #C = RA @ RB.T
        C = A @ B.T
        dprint("C.shape: " + str(C.shape))
        #UC, s, VCt = scipy.linalg.svd(C, full_matrices = False)
        U, s, Vt = scipy.linalg.svd(C, full_matrices = True)
        #U = QA @ UC
        #Vt = VCt @ QB.T
        return U, s, Vt

In [90]:
def runAnalysis(dataDirectory1, dataDirectory2, bandNum1, bandNum2, maskFile, resultDirectory):
    dprint("Running runAnalysis(dataDirectory1, dataDirectory2, maskFile, resultDirectory)")

    dataSet1 = getDataSet(dataDirectory1, bandNum1)
    dataSet2 = getDataSet(dataDirectory2, bandNum2)
    
    if (dataSet2.shape[1] == 26 and dataSet1.shape[1] != 26): # Hack to align time-dimension of SOS with Bloom and Leaf
        dataSet1 = dataSet1[:, 9:35]
    
    maskData, maskTransform = getMask(maskFile)
    
    dataSetFiltered1 = filterDataSet(dataSet1, maskData)
    dataSetFiltered2 = filterDataSet(dataSet2, maskData)
    
    U, s, Vt = qrproductsvd(dataSetFiltered1, dataSetFiltered2)
    V = Vt.T
    dprint("U.shape: " + str(U.shape))
    dprint("s.shape: " + str(s.shape))
    dprint("V.shape: " + str(V.shape))
    dprint("Singular values of product: ")
    dprint(str(s))
    
    validateNorms(dataSetFiltered1, dataSetFiltered2, U, s, V)
    
    plotSingularValues(resultDirectory, s)
    #writeModes(resultDirectory, U, s, V)
    plotModes(resultDirectory, U, s, V, maskData, maskTransform)
    writeCSVs(resultDirectory, U, s, V)
    
    dprint("Ending runAnalysis(dataDirectory1, dataDirectory2, maskFile, resultDirectory)")

Analyses

In this section, the various analyses are initiated. Each analysis uses a different pair of datasets.

Analysis 0


In [57]:
dprint("-------------------------------")
dprint("Running analysis 0")
dprint("-------------------------------")

dataDirectory1 = "hdfs:///user/hadoop/spring-index/BloomGridmet/"
bandNum1 = 3
dataDirectory2 = "hdfs:///user/hadoop/spring-index/LeafGridmet/"
bandNum2 = 3
maskFile = "hdfs:///user/hadoop/usa_state_masks/california_4km.tif"
resultDirectory = "hdfs:///user/emma/svd/BloomGridmetLeafGridmetCali/"

#Create Result dir
subprocess.run(['hadoop', 'dfs', '-mkdir', resultDirectory])

runAnalysis(dataDirectory1, dataDirectory2, bandNum1, bandNum2, maskFile, resultDirectory)

dprint("-------------------------------")
dprint("Ending analysis 0")
dprint("-------------------------------")


2017-12-21 11:36:45 | -------------------------------
2017-12-21 11:36:45 | Running analysis 0
2017-12-21 11:36:45 | -------------------------------
2017-12-21 11:36:47 | Running runAnalysis(dataDirectory1, dataDirectory2, maskFile, resultDirectory)
2017-12-21 11:36:47 | Running getDataSet(directoryPath)
2017-12-21 11:36:51 | Number of files: 37
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/BloomGridmet/1989.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/BloomGridmet/1990.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/BloomGridmet/1991.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/BloomGridmet/1992.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/BloomGridmet/1993.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/BloomGridmet/1994.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/BloomGridmet/1995.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/BloomGridmet/1996.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/BloomGridmet/1997.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/BloomGridmet/1998.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/BloomGridmet/1999.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/BloomGridmet/2000.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/BloomGridmet/2001.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/BloomGridmet/2002.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/BloomGridmet/2003.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/BloomGridmet/2004.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/BloomGridmet/2005.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/BloomGridmet/2006.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/BloomGridmet/2007.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/BloomGridmet/2008.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/BloomGridmet/2009.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/BloomGridmet/2010.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/BloomGridmet/2011.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/BloomGridmet/2012.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/BloomGridmet/2013.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/BloomGridmet/2014.tif
2017-12-21 11:38:35 | dataSet.shape: (1414560, 26)
2017-12-21 11:38:35 | Ending getDataSet(directoryPath)
2017-12-21 11:38:35 | Running getDataSet(directoryPath)
2017-12-21 11:38:39 | Number of files: 37
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/LeafGridmet/1989.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/LeafGridmet/1990.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/LeafGridmet/1991.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/LeafGridmet/1992.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/LeafGridmet/1993.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/LeafGridmet/1994.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/LeafGridmet/1995.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/LeafGridmet/1996.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/LeafGridmet/1997.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/LeafGridmet/1998.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/LeafGridmet/1999.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/LeafGridmet/2000.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/LeafGridmet/2001.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/LeafGridmet/2002.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/LeafGridmet/2003.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/LeafGridmet/2004.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/LeafGridmet/2005.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/LeafGridmet/2006.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/LeafGridmet/2007.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/LeafGridmet/2008.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/LeafGridmet/2009.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/LeafGridmet/2010.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/LeafGridmet/2011.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/LeafGridmet/2012.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/LeafGridmet/2013.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/LeafGridmet/2014.tif
2017-12-21 11:40:23 | dataSet.shape: (1414560, 26)
2017-12-21 11:40:23 | Ending getDataSet(directoryPath)
2017-12-21 11:40:23 | Running getMask(filePath)
2017-12-21 11:40:23 | mask_data.shape: (840, 1684)
2017-12-21 11:40:23 | Ending getMask(filePath)
2017-12-21 11:40:23 | Running filterDataSet(dataSet, maskIndex)
2017-12-21 11:40:23 | dataSetFiltered.shape: (23926, 26)
2017-12-21 11:40:23 | Ending filterDataSet(dataSet, maskIndex)
2017-12-21 11:40:23 | Running filterDataSet(dataSet, maskIndex)
2017-12-21 11:40:23 | dataSetFiltered.shape: (23926, 26)
2017-12-21 11:40:23 | Ending filterDataSet(dataSet, maskIndex)
2017-12-21 11:40:23 | U.shape: (23926, 26)
2017-12-21 11:40:23 | s.shape: (26,)
2017-12-21 11:40:23 | V.shape: (23926, 26)
2017-12-21 11:40:23 | Singular values of product: 
2017-12-21 11:40:23 | [  3.42122154e+09   1.18518663e+07   3.76387558e+06   2.18833940e+06
   1.84970873e+06   1.23104283e+06   9.87060894e+05   8.45422467e+05
   7.19598867e+05   6.36951611e+05   5.07628090e+05   4.53077579e+05
   4.06319809e+05   3.49696262e+05   3.36429837e+05   3.13949669e+05
   2.81085459e+05   2.44623309e+05   2.40696722e+05   2.29616231e+05
   2.14230881e+05   1.74395196e+05   1.60563107e+05   1.48403577e+05
   1.40796915e+05   1.25024597e+05]
2017-12-21 11:40:23 | Running validateNorms(dataSet1, dataSet2, U, s, V)
2017-12-21 11:40:23 | Validating norms: [------------------->] 100%
2017-12-21 11:40:23 | Largest norm difference: 2.5083465696902787e-12
2017-12-21 11:40:23 | Ending validateNorms(dataSet1, dataSet2, U, s, V)
2017-12-21 11:40:23 | Running plotSingularValues(resultDirectory, s)
2017-12-21 11:40:25 | Ending plotSingularValues(resultDirectory, s)
2017-12-21 11:40:25 | Running plotModes(resultDirectory, U, s, V, maskData, maskTransform)
2017-12-21 11:42:15 | Plotting modes: [------------------->] 100%
2017-12-21 11:42:20 | Ending plotModes(resultDirectory, U, s, V, maskData, maskTransform)
2017-12-21 11:42:20 | Running writeCSV(resultDirectory, U, s, V)
2017-12-21 11:42:25 | Writing CSV: [------------------->] 100%
2017-12-21 11:42:28 | Ending writeCSV(resultDirectory, U, s, V)
2017-12-21 11:42:28 | Ending runAnalysis(dataDirectory1, dataDirectory2, maskFile, resultDirectory)
2017-12-21 11:42:28 | -------------------------------
2017-12-21 11:42:28 | Ending analysis 0
2017-12-21 11:42:28 | -------------------------------

Analysis 1

This analysis focusses on Bloom and Leaf data from the USA from 1980 to 2016 at a 4K spatial resolution.


In [58]:
dprint("-------------------------------")
dprint("Running analysis 1")
dprint("-------------------------------")

dataDirectory1 = "hdfs:///user/hadoop/spring-index/BloomGridmet/"
bandNum1 = 3
dataDirectory2 = "hdfs:///user/hadoop/spring-index/LeafGridmet/"
bandNum2 = 3
maskFile = "hdfs:///user/hadoop/usa_mask_gridmet.tif"
resultDirectory = "hdfs:///user/emma/svd/BloomGridmetLeafGridmet/"

#Create Result dir
subprocess.run(['hadoop', 'dfs', '-mkdir', resultDirectory])

runAnalysis(dataDirectory1, dataDirectory2, bandNum1, bandNum2, maskFile, resultDirectory)

dprint("-------------------------------")
dprint("Ending analysis 1")
dprint("-------------------------------")


2017-12-21 12:03:54 | -------------------------------
2017-12-21 12:03:54 | Running analysis 1
2017-12-21 12:03:54 | -------------------------------
2017-12-21 12:03:56 | Running runAnalysis(dataDirectory1, dataDirectory2, maskFile, resultDirectory)
2017-12-21 12:03:56 | Running getDataSet(directoryPath)
2017-12-21 12:04:03 | Number of files: 37
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/BloomGridmet/1989.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/BloomGridmet/1990.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/BloomGridmet/1991.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/BloomGridmet/1992.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/BloomGridmet/1993.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/BloomGridmet/1994.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/BloomGridmet/1995.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/BloomGridmet/1996.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/BloomGridmet/1997.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/BloomGridmet/1998.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/BloomGridmet/1999.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/BloomGridmet/2000.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/BloomGridmet/2001.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/BloomGridmet/2002.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/BloomGridmet/2003.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/BloomGridmet/2004.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/BloomGridmet/2005.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/BloomGridmet/2006.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/BloomGridmet/2007.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/BloomGridmet/2008.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/BloomGridmet/2009.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/BloomGridmet/2010.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/BloomGridmet/2011.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/BloomGridmet/2012.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/BloomGridmet/2013.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/BloomGridmet/2014.tif
2017-12-21 12:05:47 | dataSet.shape: (1414560, 26)
2017-12-21 12:05:47 | Ending getDataSet(directoryPath)
2017-12-21 12:05:47 | Running getDataSet(directoryPath)
2017-12-21 12:05:50 | Number of files: 37
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/LeafGridmet/1989.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/LeafGridmet/1990.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/LeafGridmet/1991.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/LeafGridmet/1992.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/LeafGridmet/1993.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/LeafGridmet/1994.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/LeafGridmet/1995.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/LeafGridmet/1996.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/LeafGridmet/1997.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/LeafGridmet/1998.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/LeafGridmet/1999.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/LeafGridmet/2000.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/LeafGridmet/2001.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/LeafGridmet/2002.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/LeafGridmet/2003.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/LeafGridmet/2004.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/LeafGridmet/2005.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/LeafGridmet/2006.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/LeafGridmet/2007.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/LeafGridmet/2008.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/LeafGridmet/2009.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/LeafGridmet/2010.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/LeafGridmet/2011.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/LeafGridmet/2012.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/LeafGridmet/2013.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/LeafGridmet/2014.tif
2017-12-21 12:07:35 | dataSet.shape: (1414560, 26)
2017-12-21 12:07:35 | Ending getDataSet(directoryPath)
2017-12-21 12:07:35 | Running getMask(filePath)
2017-12-21 12:07:35 | mask_data.shape: (840, 1684)
2017-12-21 12:07:35 | Ending getMask(filePath)
2017-12-21 12:07:35 | Running filterDataSet(dataSet, maskIndex)
2017-12-21 12:07:35 | dataSetFiltered.shape: (483850, 26)
2017-12-21 12:07:35 | Ending filterDataSet(dataSet, maskIndex)
2017-12-21 12:07:35 | Running filterDataSet(dataSet, maskIndex)
2017-12-21 12:07:35 | dataSetFiltered.shape: (483850, 26)
2017-12-21 12:07:35 | Ending filterDataSet(dataSet, maskIndex)
2017-12-21 12:07:37 | U.shape: (483850, 26)
2017-12-21 12:07:37 | s.shape: (26,)
2017-12-21 12:07:37 | V.shape: (483850, 26)
2017-12-21 12:07:37 | Singular values of product: 
2017-12-21 12:07:37 | [  1.20393198e+11   1.65824193e+08   1.13588710e+08   8.93202326e+07
   6.23531013e+07   5.03768472e+07   3.68186474e+07   3.03093586e+07
   2.26468683e+07   1.97819321e+07   1.72451327e+07   1.39304915e+07
   1.37143603e+07   1.25302369e+07   1.15682303e+07   1.10030463e+07
   9.60962237e+06   8.89842021e+06   8.36738853e+06   7.77980353e+06
   7.05756099e+06   6.83398216e+06   5.92806850e+06   5.31220309e+06
   4.95018414e+06   4.42031973e+06]
2017-12-21 12:07:37 | Running validateNorms(dataSet1, dataSet2, U, s, V)
2017-12-21 12:07:39 | Validating norms: [------------------->] 100%
2017-12-21 12:07:39 | Largest norm difference: 3.8107363293205226e-11
2017-12-21 12:07:39 | Ending validateNorms(dataSet1, dataSet2, U, s, V)
2017-12-21 12:07:39 | Running plotSingularValues(resultDirectory, s)
2017-12-21 12:07:41 | Ending plotSingularValues(resultDirectory, s)
2017-12-21 12:07:41 | Running plotModes(resultDirectory, U, s, V, maskData, maskTransform)
2017-12-21 12:09:29 | Plotting modes: [------------------->] 100%
2017-12-21 12:09:33 | Ending plotModes(resultDirectory, U, s, V, maskData, maskTransform)
2017-12-21 12:09:33 | Running writeCSV(resultDirectory, U, s, V)
2017-12-21 12:09:53 | Writing CSV: [------------------->] 100%
2017-12-21 12:10:10 | Ending writeCSV(resultDirectory, U, s, V)
2017-12-21 12:10:10 | Ending runAnalysis(dataDirectory1, dataDirectory2, maskFile, resultDirectory)
2017-12-21 12:10:10 | -------------------------------
2017-12-21 12:10:10 | Ending analysis 1
2017-12-21 12:10:10 | -------------------------------

Analysis 2

This analysis focusses on Bloom and SOS data from the USA from 1980 to 2016 at a 4K spatial resolution.


In [91]:
dprint("-------------------------------")
dprint("Running analysis 2")
dprint("-------------------------------")

dataDirectory1 = "hdfs:///user/hadoop/spring-index/BloomGridmet/"
bandNum1 = 3
dataDirectory2 = "hdfs:///user/hadoop/avhrr/SOST4Km/"
bandNum2 = 0
maskFile = "hdfs:///user/hadoop/usa_mask_gridmet.tif"
resultDirectory = "hdfs:///user/emma/svd/BloomGridmetSOST4Km/"

#Create Result dir
subprocess.run(['hadoop', 'dfs', '-mkdir', resultDirectory])

runAnalysis(dataDirectory1, dataDirectory2, bandNum1, bandNum2, maskFile, resultDirectory)

dprint("-------------------------------")
dprint("Ending analysis 2")
dprint("-------------------------------")


2017-12-21 18:53:22 | -------------------------------
2017-12-21 18:53:22 | Running analysis 2
2017-12-21 18:53:22 | -------------------------------
2017-12-21 18:53:24 | Running runAnalysis(dataDirectory1, dataDirectory2, maskFile, resultDirectory)
2017-12-21 18:53:24 | Running getDataSet(directoryPath)
2017-12-21 18:53:28 | Number of files: 37
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/BloomGridmet/1980.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/BloomGridmet/1981.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/BloomGridmet/1982.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/BloomGridmet/1983.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/BloomGridmet/1984.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/BloomGridmet/1985.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/BloomGridmet/1986.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/BloomGridmet/1987.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/BloomGridmet/1988.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/BloomGridmet/1989.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/BloomGridmet/1990.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/BloomGridmet/1991.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/BloomGridmet/1992.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/BloomGridmet/1993.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/BloomGridmet/1994.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/BloomGridmet/1995.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/BloomGridmet/1996.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/BloomGridmet/1997.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/BloomGridmet/1998.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/BloomGridmet/1999.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/BloomGridmet/2000.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/BloomGridmet/2001.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/BloomGridmet/2002.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/BloomGridmet/2003.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/BloomGridmet/2004.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/BloomGridmet/2005.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/BloomGridmet/2006.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/BloomGridmet/2007.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/BloomGridmet/2008.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/BloomGridmet/2009.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/BloomGridmet/2010.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/BloomGridmet/2011.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/BloomGridmet/2012.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/BloomGridmet/2013.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/BloomGridmet/2014.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/BloomGridmet/2015.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/BloomGridmet/2016.tif
2017-12-21 18:55:57 | dataSet.shape: (1414560, 37)
2017-12-21 18:55:57 | Ending getDataSet(directoryPath)
2017-12-21 18:55:57 | Running getDataSet(directoryPath)
2017-12-21 18:55:58 | Number of files: 26
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/avhrr/SOST4Km/av_SOST1989v4_SIx.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/avhrr/SOST4Km/av_SOST1990v4_SIx.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/avhrr/SOST4Km/av_SOST1991v4_SIx.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/avhrr/SOST4Km/av_SOST1992v4_SIx.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/avhrr/SOST4Km/av_SOST1993v4_SIx.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/avhrr/SOST4Km/av_SOST1994v4_SIx.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/avhrr/SOST4Km/av_SOST1995v4_SIx.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/avhrr/SOST4Km/av_SOST1996v4_SIx.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/avhrr/SOST4Km/av_SOST1997v4_SIx.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/avhrr/SOST4Km/av_SOST1998v4_SIx.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/avhrr/SOST4Km/av_SOST1999v4_SIx.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/avhrr/SOST4Km/av_SOST2000v4_SIx.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/avhrr/SOST4Km/av_SOST2001v4_SIx.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/avhrr/SOST4Km/av_SOST2002v4_SIx.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/avhrr/SOST4Km/av_SOST2003v4_SIx.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/avhrr/SOST4Km/av_SOST2004v4_SIx.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/avhrr/SOST4Km/av_SOST2005v4_SIx.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/avhrr/SOST4Km/av_SOST2006v4_SIx.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/avhrr/SOST4Km/av_SOST2007v4_SIx.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/avhrr/SOST4Km/av_SOST2008v4_SIx.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/avhrr/SOST4Km/av_SOST2009v4_SIx.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/avhrr/SOST4Km/av_SOST2010v4_SIx.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/avhrr/SOST4Km/av_SOST2011v4_SIx.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/avhrr/SOST4Km/av_SOST2012v4_SIx.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/avhrr/SOST4Km/av_SOST2013v4_SIx.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/avhrr/SOST4Km/av_SOST2014v4_SIx.tif
2017-12-21 18:56:08 | dataSet.shape: (1414560, 26)
2017-12-21 18:56:08 | Ending getDataSet(directoryPath)
2017-12-21 18:56:08 | Running getMask(filePath)
2017-12-21 18:56:09 | mask_data.shape: (840, 1684)
2017-12-21 18:56:09 | Ending getMask(filePath)
2017-12-21 18:56:09 | Running filterDataSet(dataSet, maskIndex)
2017-12-21 18:56:09 | dataSetFiltered.shape: (483850, 26)
2017-12-21 18:56:09 | Ending filterDataSet(dataSet, maskIndex)
2017-12-21 18:56:09 | Running filterDataSet(dataSet, maskIndex)
2017-12-21 18:56:09 | dataSetFiltered.shape: (483850, 26)
2017-12-21 18:56:09 | Ending filterDataSet(dataSet, maskIndex)
2017-12-21 18:56:10 | U.shape: (483850, 26)
2017-12-21 18:56:10 | s.shape: (26,)
2017-12-21 18:56:10 | V.shape: (483850, 26)
2017-12-21 18:56:10 | Singular values of product: 
2017-12-21 18:56:10 | [  3.71029878e+11   2.55966641e+09   1.73851881e+09   1.43051532e+09
   1.19270739e+09   1.08619555e+09   9.63744419e+08   8.90094998e+08
   7.20049871e+08   5.62796152e+08   5.44605589e+08   5.27994494e+08
   4.97561741e+08   4.63842640e+08   4.07629855e+08   3.89378712e+08
   3.63498220e+08   3.49329156e+08   3.33183724e+08   3.13569361e+08
   3.06873159e+08   2.81878392e+08   2.77879538e+08   2.44100042e+08
   2.28775110e+08   1.42070662e+08]
2017-12-21 18:56:10 | Running validateNorms(dataSet1, dataSet2, U, s, V)
2017-12-21 18:56:13 | Validating norms: [------------------->] 100%
2017-12-21 18:56:14 | Largest norm difference: 5.6253930444886045e-12
2017-12-21 18:56:14 | Ending validateNorms(dataSet1, dataSet2, U, s, V)
2017-12-21 18:56:14 | Running plotSingularValues(resultDirectory, s)
2017-12-21 18:56:16 | Ending plotSingularValues(resultDirectory, s)
2017-12-21 18:56:16 | Running plotModes(resultDirectory, U, s, V, maskData, maskTransform)
2017-12-21 18:58:03 | Plotting modes: [------------------->] 100%
2017-12-21 18:58:07 | Ending plotModes(resultDirectory, U, s, V, maskData, maskTransform)
2017-12-21 18:58:07 | Running writeCSV(resultDirectory, U, s, V)
2017-12-21 18:58:27 | Writing CSV: [------------------->] 100%
2017-12-21 18:58:45 | Ending writeCSV(resultDirectory, U, s, V)
2017-12-21 18:58:45 | Ending runAnalysis(dataDirectory1, dataDirectory2, maskFile, resultDirectory)
2017-12-21 18:58:45 | -------------------------------
2017-12-21 18:58:45 | Ending analysis 2
2017-12-21 18:58:45 | -------------------------------

Analysis 3

This analysis focusses on Leaf and SOS data from the USA from 1980 to 2016 at a 4K spatial resolution.


In [ ]:
dprint("-------------------------------")
dprint("Running analysis 3")
dprint("-------------------------------")

dataDirectory1 = "hdfs:///user/hadoop/spring-index/LeafGridmet/"
bandNum1 = 3
dataDirectory2 = "hdfs:///user/hadoop/avhrr/SOST4Km/"
bandNum2 = 0
maskFile = "hdfs:///user/hadoop/usa_mask_gridmet.tif"
resultDirectory = "hdfs:///user/emma/svd/LeafGridmetSOST4Km/"

#Create Result dir
subprocess.run(['hadoop', 'dfs', '-mkdir', resultDirectory])

runAnalysis(dataDirectory1, dataDirectory2, bandNum1, bandNum2, maskFile, resultDirectory)

dprint("-------------------------------")
dprint("Ending analysis 3")
dprint("-------------------------------")


2017-12-21 19:05:18 | -------------------------------
2017-12-21 19:05:18 | Running analysis 3
2017-12-21 19:05:18 | -------------------------------
2017-12-21 19:05:20 | Running runAnalysis(dataDirectory1, dataDirectory2, maskFile, resultDirectory)
2017-12-21 19:05:20 | Running getDataSet(directoryPath)
2017-12-21 19:05:25 | Number of files: 37
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/LeafGridmet/1980.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/LeafGridmet/1981.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/LeafGridmet/1982.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/LeafGridmet/1983.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/LeafGridmet/1984.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/LeafGridmet/1985.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/LeafGridmet/1986.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/LeafGridmet/1987.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/LeafGridmet/1988.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/LeafGridmet/1989.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/LeafGridmet/1990.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/LeafGridmet/1991.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/LeafGridmet/1992.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/LeafGridmet/1993.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/LeafGridmet/1994.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/LeafGridmet/1995.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/LeafGridmet/1996.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/LeafGridmet/1997.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/LeafGridmet/1998.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/LeafGridmet/1999.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/LeafGridmet/2000.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/LeafGridmet/2001.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/LeafGridmet/2002.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/LeafGridmet/2003.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/LeafGridmet/2004.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/LeafGridmet/2005.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/LeafGridmet/2006.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/LeafGridmet/2007.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/LeafGridmet/2008.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/LeafGridmet/2009.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/LeafGridmet/2010.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/LeafGridmet/2011.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/LeafGridmet/2012.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/LeafGridmet/2013.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/LeafGridmet/2014.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/LeafGridmet/2015.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/spring-index/LeafGridmet/2016.tif
2017-12-21 19:07:53 | dataSet.shape: (1414560, 37)
2017-12-21 19:07:53 | Ending getDataSet(directoryPath)
2017-12-21 19:07:53 | Running getDataSet(directoryPath)
2017-12-21 19:07:54 | Number of files: 26
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/avhrr/SOST4Km/av_SOST1989v4_SIx.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/avhrr/SOST4Km/av_SOST1990v4_SIx.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/avhrr/SOST4Km/av_SOST1991v4_SIx.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/avhrr/SOST4Km/av_SOST1992v4_SIx.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/avhrr/SOST4Km/av_SOST1993v4_SIx.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/avhrr/SOST4Km/av_SOST1994v4_SIx.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/avhrr/SOST4Km/av_SOST1995v4_SIx.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/avhrr/SOST4Km/av_SOST1996v4_SIx.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/avhrr/SOST4Km/av_SOST1997v4_SIx.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/avhrr/SOST4Km/av_SOST1998v4_SIx.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/avhrr/SOST4Km/av_SOST1999v4_SIx.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/avhrr/SOST4Km/av_SOST2000v4_SIx.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/avhrr/SOST4Km/av_SOST2001v4_SIx.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/avhrr/SOST4Km/av_SOST2002v4_SIx.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/avhrr/SOST4Km/av_SOST2003v4_SIx.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/avhrr/SOST4Km/av_SOST2004v4_SIx.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/avhrr/SOST4Km/av_SOST2005v4_SIx.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/avhrr/SOST4Km/av_SOST2006v4_SIx.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/avhrr/SOST4Km/av_SOST2007v4_SIx.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/avhrr/SOST4Km/av_SOST2008v4_SIx.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/avhrr/SOST4Km/av_SOST2009v4_SIx.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/avhrr/SOST4Km/av_SOST2010v4_SIx.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/avhrr/SOST4Km/av_SOST2011v4_SIx.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/avhrr/SOST4Km/av_SOST2012v4_SIx.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/avhrr/SOST4Km/av_SOST2013v4_SIx.tif
hdfs://pheno0.phenovari-utwente.surf-hosted.nl:9000/user/hadoop/avhrr/SOST4Km/av_SOST2014v4_SIx.tif
2017-12-21 19:08:05 | dataSet.shape: (1414560, 26)
2017-12-21 19:08:05 | Ending getDataSet(directoryPath)
2017-12-21 19:08:05 | Running getMask(filePath)
2017-12-21 19:08:05 | mask_data.shape: (840, 1684)
2017-12-21 19:08:05 | Ending getMask(filePath)
2017-12-21 19:08:05 | Running filterDataSet(dataSet, maskIndex)
2017-12-21 19:08:05 | dataSetFiltered.shape: (483850, 26)
2017-12-21 19:08:05 | Ending filterDataSet(dataSet, maskIndex)
2017-12-21 19:08:05 | Running filterDataSet(dataSet, maskIndex)
2017-12-21 19:08:05 | dataSetFiltered.shape: (483850, 26)
2017-12-21 19:08:05 | Ending filterDataSet(dataSet, maskIndex)
2017-12-21 19:08:07 | U.shape: (483850, 26)
2017-12-21 19:08:07 | s.shape: (26,)
2017-12-21 19:08:07 | V.shape: (483850, 26)
2017-12-21 19:08:07 | Singular values of product: 
2017-12-21 19:08:07 | [  2.92903087e+11   3.29125161e+09   2.20368493e+09   1.78851818e+09
   1.67358708e+09   1.31650849e+09   1.21909218e+09   1.12936563e+09
   9.46032899e+08   8.83305418e+08   8.32881069e+08   7.92554957e+08
   7.03687201e+08   6.60285889e+08   6.53820841e+08   5.95896251e+08
   5.89816185e+08   5.24593845e+08   4.86683317e+08   4.55295264e+08
   4.45259650e+08   4.24979321e+08   3.93028695e+08   3.81641254e+08
   3.59734370e+08   2.15160585e+08]
2017-12-21 19:08:07 | Running validateNorms(dataSet1, dataSet2, U, s, V)
2017-12-21 19:08:11 | Validating norms: [------------------->] 100%
2017-12-21 19:08:11 | Largest norm difference: 2.7432772389746554e-12
2017-12-21 19:08:11 | Ending validateNorms(dataSet1, dataSet2, U, s, V)
2017-12-21 19:08:11 | Running plotSingularValues(resultDirectory, s)
2017-12-21 19:08:13 | Ending plotSingularValues(resultDirectory, s)
2017-12-21 19:08:13 | Running plotModes(resultDirectory, U, s, V, maskData, maskTransform)
2017-12-21 19:10:11 | Plotting modes: [------------------->] 100%
2017-12-21 19:10:16 | Ending plotModes(resultDirectory, U, s, V, maskData, maskTransform)
2017-12-21 19:10:16 | Running writeCSV(resultDirectory, U, s, V)
2017-12-21 19:10:16 | Writing CSV: [------>             ] 33%

Analysis 4

This analysis focusses on BloomFinalLowPR and SOSTLowPR data from the USA from 1989 to 2014 1Km resolution.


In [35]:
dprint("-------------------------------")
dprint("Running analysis 4")
dprint("-------------------------------")

dataDirectory1 = "hdfs:///user/hadoop/spring-index/BloomFinalLowPR/"
bandNum1 = 0
dataDirectory2 = "hdfs:///user/hadoop/avhrr/SOSTLowPR/"
bandNum2 = 0
maskFile = "hdfs:///user/hadoop/spring-index/BloomFinalLowPR/1989.tif"
resultDirectory = "hdfs:///user/emma/svd/BloomFinalLowPRSOSTLowPR/"

#Create Result dir
subprocess.run(['hadoop', 'dfs', '-mkdir', resultDirectory])

runAnalysis(dataDirectory1, dataDirectory2, bandNum1, bandNum2, maskFile, resultDirectory)

dprint("-------------------------------")
dprint("Ending analysis 4")
dprint("-------------------------------")


2017-12-19 13:56:34 | -------------------------------
2017-12-19 13:56:34 | Running analysis 4
2017-12-19 13:56:34 | -------------------------------
2017-12-19 13:56:36 | Running runAnalysis(dataDirectory1, dataDirectory2, maskFile, resultDirectory)
2017-12-19 13:56:36 | Running getDataSet(directoryPath)
2017-12-19 13:56:37 | Number of files: 26
2017-12-19 13:57:03 | Reading files: [------------------->] 100%
2017-12-19 13:57:05 | dataSet.shape: (30870, 26)
2017-12-19 13:57:05 | Ending getDataSet(directoryPath)
2017-12-19 13:57:05 | Running getDataSet(directoryPath)
2017-12-19 13:57:06 | Number of files: 26
2017-12-19 13:57:34 | Reading files: [------------------->] 100%
2017-12-19 13:57:35 | dataSet.shape: (30870, 26)
2017-12-19 13:57:35 | Ending getDataSet(directoryPath)
2017-12-19 13:57:35 | Running getMask(filePath)
2017-12-19 13:57:35 | mask_data.shape: (210, 147)
2017-12-19 13:57:35 | Ending getMask(filePath)
2017-12-19 13:57:35 | Running filterDataSet(dataSet, maskIndex)
2017-12-19 13:57:35 | dataSetFiltered.shape: (30870, 26)
2017-12-19 13:57:35 | Ending filterDataSet(dataSet, maskIndex)
2017-12-19 13:57:35 | Running filterDataSet(dataSet, maskIndex)
2017-12-19 13:57:35 | dataSetFiltered.shape: (30870, 26)
2017-12-19 13:57:35 | Ending filterDataSet(dataSet, maskIndex)
2017-12-19 13:57:36 | U.shape: (30870, 26)
2017-12-19 13:57:36 | s.shape: (26,)
2017-12-19 13:57:36 | V.shape: (30870, 26)
2017-12-19 13:57:36 | Singular values of product: 
2017-12-19 13:57:36 | [  1.20587459e+10   1.61809893e+07   7.14959958e+06   5.43663867e+06
   3.36146096e+06   2.87319546e+06   2.60868335e+06   1.66709042e+06
   1.56316663e+06   1.50092982e+06   1.38065996e+06   1.09581415e+06
   9.92082603e+05   8.73109084e+05   8.03291265e+05   6.94849214e+05
   6.48117639e+05   5.47592331e+05   5.42842103e+05   4.75190088e+05
   4.69942723e+05   4.61514941e+05   3.77371034e+05   2.92247653e+05
   2.75634196e+05   1.84255706e+05]
2017-12-19 13:57:36 | Running validateNorms(dataSet1, dataSet2, U, s, V)
2017-12-19 13:57:36 | Validating norms: [------------------->] 100%
2017-12-19 13:57:36 | Largest norm difference: 7.402137083765577e-12
2017-12-19 13:57:36 | Ending validateNorms(dataSet1, dataSet2, U, s, V)
2017-12-19 13:57:36 | Running plotSingularValues(resultDirectory, s)
2017-12-19 13:57:38 | Ending plotSingularValues(resultDirectory, s)
2017-12-19 13:57:38 | Running plotModes(resultDirectory, U, s, V, maskData, maskTransform)
2017-12-19 13:59:32 | Plotting modes: [------------------->] 100%
2017-12-19 13:59:36 | Ending plotModes(resultDirectory, U, s, V, maskData, maskTransform)
2017-12-19 13:59:36 | Ending runAnalysis(dataDirectory1, dataDirectory2, maskFile, resultDirectory)
2017-12-19 13:59:36 | -------------------------------
2017-12-19 13:59:36 | Ending analysis 4
2017-12-19 13:59:36 | -------------------------------

Analysis 5

This analysis focusses on LeafFinalLowPR and SOSTLowPR data from the USA from 1989 to 2014 1Km resolution.


In [36]:
dprint("-------------------------------")
dprint("Running analysis 5")
dprint("-------------------------------")

dataDirectory1 = "hdfs:///user/hadoop/spring-index/LeafFinalLowPR/"
bandNum1 = 0
dataDirectory2 = "hdfs:///user/hadoop/avhrr/SOSTLowPR/"
bandNum2 = 0
maskFile = "hdfs:///user/hadoop/spring-index/LeafFinalLowPR/1989.tif"
resultDirectory = "hdfs:///user/emma/svd/LeafFinalLowPRSOSTLowPR/"

#Create Result dir
subprocess.run(['hadoop', 'dfs', '-mkdir', resultDirectory])

runAnalysis(dataDirectory1, dataDirectory2, bandNum1, bandNum2, maskFile, resultDirectory)

dprint("-------------------------------")
dprint("Ending analysis 5")
dprint("-------------------------------")


2017-12-19 13:59:36 | -------------------------------
2017-12-19 13:59:36 | Running analysis 5
2017-12-19 13:59:36 | -------------------------------
2017-12-19 13:59:38 | Running runAnalysis(dataDirectory1, dataDirectory2, maskFile, resultDirectory)
2017-12-19 13:59:38 | Running getDataSet(directoryPath)
2017-12-19 13:59:44 | Number of files: 26
2017-12-19 14:00:12 | Reading files: [------------------->] 100%
2017-12-19 14:00:13 | dataSet.shape: (30870, 26)
2017-12-19 14:00:13 | Ending getDataSet(directoryPath)
2017-12-19 14:00:13 | Running getDataSet(directoryPath)
2017-12-19 14:00:14 | Number of files: 26
2017-12-19 14:00:41 | Reading files: [------------------->] 100%
2017-12-19 14:00:42 | dataSet.shape: (30870, 26)
2017-12-19 14:00:42 | Ending getDataSet(directoryPath)
2017-12-19 14:00:42 | Running getMask(filePath)
2017-12-19 14:00:42 | mask_data.shape: (210, 147)
2017-12-19 14:00:42 | Ending getMask(filePath)
2017-12-19 14:00:42 | Running filterDataSet(dataSet, maskIndex)
2017-12-19 14:00:42 | dataSetFiltered.shape: (30870, 26)
2017-12-19 14:00:42 | Ending filterDataSet(dataSet, maskIndex)
2017-12-19 14:00:42 | Running filterDataSet(dataSet, maskIndex)
2017-12-19 14:00:42 | dataSetFiltered.shape: (30870, 26)
2017-12-19 14:00:42 | Ending filterDataSet(dataSet, maskIndex)
2017-12-19 14:00:42 | U.shape: (30870, 26)
2017-12-19 14:00:42 | s.shape: (26,)
2017-12-19 14:00:42 | V.shape: (30870, 26)
2017-12-19 14:00:42 | Singular values of product: 
2017-12-19 14:00:42 | [  7.41470894e+09   2.63031158e+07   9.02425892e+06   6.62606689e+06
   5.24978714e+06   3.97618140e+06   3.46456570e+06   2.96742328e+06
   2.63031152e+06   2.37920646e+06   1.87791908e+06   1.80924544e+06
   1.68349978e+06   1.51099198e+06   1.41970179e+06   1.38772949e+06
   1.26231120e+06   1.14775291e+06   9.76859389e+05   9.25336223e+05
   8.23103195e+05   7.01515433e+05   6.58872327e+05   4.73111161e+05
   3.32454318e+05   1.93274133e+05]
2017-12-19 14:00:42 | Running validateNorms(dataSet1, dataSet2, U, s, V)
2017-12-19 14:00:42 | Validating norms: [------------------->] 100%
2017-12-19 14:00:42 | Largest norm difference: 5.53283439094686e-12
2017-12-19 14:00:42 | Ending validateNorms(dataSet1, dataSet2, U, s, V)
2017-12-19 14:00:42 | Running plotSingularValues(resultDirectory, s)
2017-12-19 14:00:44 | Ending plotSingularValues(resultDirectory, s)
2017-12-19 14:00:44 | Running plotModes(resultDirectory, U, s, V, maskData, maskTransform)
2017-12-19 14:02:37 | Plotting modes: [------------------->] 100%
2017-12-19 14:02:41 | Ending plotModes(resultDirectory, U, s, V, maskData, maskTransform)
2017-12-19 14:02:41 | Ending runAnalysis(dataDirectory1, dataDirectory2, maskFile, resultDirectory)
2017-12-19 14:02:41 | -------------------------------
2017-12-19 14:02:41 | Ending analysis 5
2017-12-19 14:02:41 | -------------------------------

Analysis 6

This analysis focusses on BloomFinalLowPR and LeafFinalLowPR data from the USA from 1989 to 2014 1Km resolution.


In [39]:
dprint("-------------------------------")
dprint("Running analysis 6")
dprint("-------------------------------")

dataDirectory1 = "hdfs:///user/hadoop/spring-index/BloomFinalLowPR/"
bandNum1 = 0
dataDirectory2 = "hdfs:///user/hadoop/spring-index/LeafFinalLowPR/"
bandNum2 = 0
maskFile = "hdfs:///user/hadoop/spring-index/BloomFinalLowPR/1989.tif"
resultDirectory = "hdfs:///user/emma/svd/BloomFinalLowPRLeafFinalLowPR/"

#Create Result dir
subprocess.run(['hadoop', 'dfs', '-mkdir', resultDirectory])

runAnalysis(dataDirectory1, dataDirectory2, bandNum1, bandNum2, maskFile, resultDirectory)

dprint("-------------------------------")
dprint("Ending analysis 6")
dprint("-------------------------------")


2017-12-19 14:15:07 | -------------------------------
2017-12-19 14:15:07 | Running analysis 6
2017-12-19 14:15:07 | -------------------------------
2017-12-19 14:15:09 | Running runAnalysis(dataDirectory1, dataDirectory2, maskFile, resultDirectory)
2017-12-19 14:15:09 | Running getDataSet(directoryPath)
2017-12-19 14:15:10 | Number of files: 26
2017-12-19 14:15:34 | Reading files: [------------------->] 100%
2017-12-19 14:15:35 | dataSet.shape: (30870, 26)
2017-12-19 14:15:35 | Ending getDataSet(directoryPath)
2017-12-19 14:15:35 | Running getDataSet(directoryPath)
2017-12-19 14:15:36 | Number of files: 26
2017-12-19 14:16:00 | Reading files: [------------------->] 100%
2017-12-19 14:16:01 | dataSet.shape: (30870, 26)
2017-12-19 14:16:01 | Ending getDataSet(directoryPath)
2017-12-19 14:16:01 | Running getMask(filePath)
2017-12-19 14:16:01 | mask_data.shape: (210, 147)
2017-12-19 14:16:01 | Ending getMask(filePath)
2017-12-19 14:16:01 | Running filterDataSet(dataSet, maskIndex)
2017-12-19 14:16:01 | dataSetFiltered.shape: (30870, 26)
2017-12-19 14:16:01 | Ending filterDataSet(dataSet, maskIndex)
2017-12-19 14:16:01 | Running filterDataSet(dataSet, maskIndex)
2017-12-19 14:16:01 | dataSetFiltered.shape: (30870, 26)
2017-12-19 14:16:01 | Ending filterDataSet(dataSet, maskIndex)
2017-12-19 14:16:01 | U.shape: (30870, 26)
2017-12-19 14:16:01 | s.shape: (26,)
2017-12-19 14:16:01 | V.shape: (30870, 26)
2017-12-19 14:16:01 | Singular values of product: 
2017-12-19 14:16:01 | [  3.68226752e+09   8.06563531e+05   4.85824678e+05   3.10463362e+05
   2.30006476e+05   1.61068309e+05   1.12546660e+05   9.62783613e+04
   7.68308440e+04   6.31607624e+04   5.01396173e+04   4.52413785e+04
   4.27108688e+04   3.49840664e+04   3.29666237e+04   3.12362959e+04
   2.51067545e+04   2.44929799e+04   2.18680338e+04   2.03446059e+04
   1.83310559e+04   1.52361370e+04   1.48177991e+04   1.34717678e+04
   1.03196921e+04   8.50797897e+03]
2017-12-19 14:16:01 | Running validateNorms(dataSet1, dataSet2, U, s, V)
2017-12-19 14:16:01 | Validating norms: [------------------->] 100%
2017-12-19 14:16:01 | Largest norm difference: 3.212667582506447e-11
2017-12-19 14:16:01 | Ending validateNorms(dataSet1, dataSet2, U, s, V)
2017-12-19 14:16:01 | Running plotSingularValues(resultDirectory, s)
2017-12-19 14:16:04 | Ending plotSingularValues(resultDirectory, s)
2017-12-19 14:16:04 | Running plotModes(resultDirectory, U, s, V, maskData, maskTransform)
2017-12-19 14:17:52 | Plotting modes: [------------------->] 100%
2017-12-19 14:17:56 | Ending plotModes(resultDirectory, U, s, V, maskData, maskTransform)
2017-12-19 14:17:56 | Ending runAnalysis(dataDirectory1, dataDirectory2, maskFile, resultDirectory)
2017-12-19 14:17:56 | -------------------------------
2017-12-19 14:17:56 | Ending analysis 6
2017-12-19 14:17:56 | -------------------------------

End of Notebook