Welcome to R from a Jupyter notebook!

This is how a Jupyter Notebook looks like. Here I will create the dataset for our toy lake example and I will introduce some basic (and more advance as well) plotting functionality to you.

First a test that the R kernel works:



In [1]:

    
1+1

It works! By the way, the text you see is Markdown. So, a Jupyter Notebook is a good way to combine text (formatted using Markdown) and R or Python code to create reproducible analytical pipelines.

In this notebook, I will use bold face whenever I write R syntax we discussed during the lecture.

Generating data for Lake Poing

The first step after our design lecture is to generate the data we wanted to sample from our lake. We agreed on 50 stations and gained some knowledge about the geography of the lake (i.e. its surroundings, etc.). We are going to use this knowledge to conduct our initial sampling and later in the course to design our experiments in a meaningful way.

Next I will generate a dataframe that will contain the following variables. To aggregate the variable in our dataframe I will create 1D arrays that will represent a physical/chemical parameter of the lake (e.g. temperature). Once I have all the sampling points I want in the vectors I will aggregate them into a dataframe.

We agreed in that we wanted to measure the following parameters at each station:

Distance from coast line (meters)
Depth (meters)
Surface temperature (degrees Celsius)
Turbidity (Secchi disk meters)
Nitrate concentration (mg/ml)
Phosphate concentration (mg/ml)
pH
Dissolved Oxygen (mg/mL)
Chlorophyl concentration (mg/ml)

And we mentioned different areas in which the lake can be divided:

Natural reserve
Agricultural land use
Town area
Industrial area (the mining processing company that wants to expand)

I think we need to include a fifth area that will be something like open waters and will group stations that are too faraway from the land. The problem is, of course, that at the begining of the study we don't know what "far" means. For now we only document the distance to land of each station we sample. We can use these data later if an open water category is needed to groups some stations.

To generate the data, we need to learn a bit about probability distributions. I will introduce a few of them in the next blocks of R code and I will try to give examples of how these distributions can be used.

Lets start with the normal distribution. This one looks... normal.



In [2]:

    
#this is how you import libraries into your session

library(MASS)#needed to generate multivariate normal data
library(tmvtnorm)#needed to generate truncated multivariate normal data
library(lavaan)#needed to get the funcion cor2cov









    



Loading required package: mvtnorm
Loading required package: Matrix
Loading required package: stats4
Loading required package: gmm
Loading required package: sandwich
This is lavaan 0.5-23.1097
lavaan is BETA software! Please report any bugs.



In [3]:

    
plot(function(x) dnorm(x, 0, 1), -5, 5, ylab="Density", main="A normal distribution with mean=0 and stdev=1")

If you only want positive numbers (we will use this one later to generate our data for some variables), you can use a lognormal distribution. This one will be truncated at zero



In [4]:

    
plot(function(x) dlnorm(x, 1, 1), -1, 20, ylab="Density", main="A lognormal distribution with mean=1 and stdev=1")

These two are good for continuous variables. What if you have counts? Ussually counts are modelled using a Poisson distribution.



In [5]:

    
plot(dpois(x=1:6, lambda=1), ylab="Density", xlab="x", main="A poisson distribution with lambda=1")

We could keep adding distributions to the Notebook, but I am going to stop here. I just want to add that R implements many distributions and that we use those distributions to generate data points. For instance, suppose we want to simulate sampling 50 data points from a normal distribution with mean=0 and stdev=1 (a normal standard distribution). We do the following:



In [6]:

    
set.seed(12345)#this assures that we always get the same results when the next function is called
normal_data<-rnorm(50,0,1)
print(normal_data)









    



 [1]  0.58552882  0.70946602 -0.10930331 -0.45349717  0.60588746 -1.81795597
 [7]  0.63009855 -0.27618411 -0.28415974 -0.91932200 -0.11624781  1.81731204
[13]  0.37062786  0.52021646 -0.75053199  0.81689984 -0.88635752 -0.33157759
[19]  1.12071265  0.29872370  0.77962192  1.45578508 -0.64432843 -1.55313741
[25] -1.59770952  1.80509752 -0.48164736  0.62037980  0.61212349 -0.16231098
[31]  0.81187318  2.19683355  2.04919034  1.63244564  0.25427119  0.49118828
[37] -0.32408658 -1.66205024  1.76773385  0.02580105  1.12851083 -2.38035806
[43] -1.06026555  0.93714054  0.85445172  1.46072940 -1.41309878  0.56740325
[49]  0.58318765 -1.30679883

These data comes from a distribution that looks like the distribution plotted above. If you plot that data it doesn't look normal or similar to the curve we saw before.



In [7]:

    
hist(normal_data)

This is because we only took 50 points. If you change the number of sampled points to 5000 things change.



In [8]:

    
set.seed(12345)#this assures that we always get the same results when the next function is called
lots_of_normal_data<-rnorm(5000,0,1)
hist(lots_of_normal_data)

Lets now generate the data for Lake Poing.

We want:

Distance from coast line (meters)
Depth (meters)
Surface temperature (degrees Celsius)
Turbidity (Secchi disk meters)
Nitrate concentration (mg/ml)
Phosphate concentration (mg/ml)
pH
Dissolved Oxygen (mg/mL)
Chlorophyl concentration (mg/ml)

The function below generates data from the 50 stations for each of the variables above (assuming of course that they are normally distributed).

Tto keep it simple, I will generate 50 datapoints per variable first. Assuming we went there and sampled one time so far.

The function needs a vector of means, and because I want to generate data that makes some sense, I will generate the data from a multivariate normal to add some (correlation) structure to it. To do this I need information about the covariance matrix... However, I am not that good and cannot generate nice data on the fly from the covariance (because I don't know which covariance values to use...). The good news is, I can calculate the covariance matrix from the correlation matrix, which I can specify easily. The only information I need is the correlation matrix and a vector with the standard deviation of the variables and the function cor2cov from the package lavaan will do the magic.

As discussed in class, I don't want to have negative values for most variables. So I will use a truncate multivariate normal distribution to avoid values < 0.



In [9]:

    
#btw, this is a comment...


genMultiVariateNormalData <- function(n,mu=c(0),sigma=matrix(1), varnames, truncate=T, left_truncate=c(-Inf), right_truncate=c(Inf)) {
  
  #check that the mean vector and the stdev vector are of the same length.
  if ((r <- length(mu)) != NCOL(sigma)) {
    stop("mu and sigma must have the same number of columns", call.=FALSE)
  }

  stationID<-rep(1:n)

  if(truncate){

    x <- rtmvnorm(n,mean=mu,sigma=sigma, lower=left_truncate, upper=right_truncate)
      
  }else{
  
    x <- rmvnorm(n,mean=mu,Sigma=sigma)
      
  }
  
  dd<-data.frame(var1=stationID,var2=x)
  colnames(dd)<-varnames
  dd  
}

#the mean value for each variable
meanVector<-c(500,35,17.8,22.4,27.79,2.03,8.4,4.1,398.73)

#the value of the standard deviation of each variable
sdVector<-c(150,17,3.3,8,7.8,1.8,0.3,3.1,200)

#the correlation matrix specifying how different variables relate to each other.
corMatrix<-matrix(c(1,0.23,-0.36,-0.43,-0.62,-0.51,0.1,0.15,0.27,0.23,1,-0.34,-0.54,-0.57,-0.47,-0.12,0.17,0.32,-0.36,-0.34,1,0.07,0.45,0.39,0.09,-0.24,0.23,-0.43,-0.54,0.07,1,0.7,0.59,-0.03,-0.27,0.8,-0.62,-0.57,0.45,0.7,1,0.79,0.21,0.4,0.87,-0.51,-0.47,0.39,0.59,0.79,1,0.13,0.57,0.91,0.1,-0.12,0.09,-0.03,0.21,0.13,1,0.07,0.23,0.15,0.17,-0.24,-0.27,0.4,0.57,0.07,1,0.28,0.27,0.32,0.23,0.8,0.87,0.91,0.23,0.28,1),9,9)
#here we calculate the covariace matrix.
covMatrix<-cor2cov(corMatrix,sdVector)

#and here we generate the data. Note we get some warnings!
LakePoing_PhysicoChemData<-genMultiVariateNormalData(50, meanVector, covMatrix, c("StationID","DistanceFromCoastLine","Depth","Temperature","Turbidity","NitrateConcentration","PhosphateConcentration","pH","DissolvedOxygen","ChlorophylConcentration"), truncate=T, left_truncate=rep(0,9), right_truncate=rep(Inf,9))









    



Warning message:
In rtmvnorm.rejection(n, mean, sigma, lower, upper, ...): Acceptance rate is very low (0) and rejection sampling becomes inefficient. Consider using Gibbs sampling.Warning message:
In rmvnorm(nproposals, mean = mean, sigma = sigma): sigma is numerically not positive semidefiniteWarning message:
In rmvnorm(nproposals, mean = mean, sigma = sigma): sigma is numerically not positive semidefiniteWarning message:
In rmvnorm(nproposals, mean = mean, sigma = sigma): sigma is numerically not positive semidefiniteWarning message:
In rmvnorm(nproposals, mean = mean, sigma = sigma): sigma is numerically not positive semidefiniteWarning message:
In rmvnorm(nproposals, mean = mean, sigma = sigma): sigma is numerically not positive semidefiniteWarning message:
In rmvnorm(nproposals, mean = mean, sigma = sigma): sigma is numerically not positive semidefiniteWarning message:
In rmvnorm(nproposals, mean = mean, sigma = sigma): sigma is numerically not positive semidefinite

The data has been generated, but, is it how we wanted it to be?



In [10]:

    
LakePoing_PhysicoChemData[1,2]<-NA
#check the header of the data
head(LakePoing_PhysicoChemData)

#print the meanVector used to generate the data
meanVector
#apply the function mean to all the columns in the dataframe
sapply(LakePoing_PhysicoChemData, mean)
#print the sdVector used to generate the data
sdVector
#apply the function sd to all the columns in the dataframe
sapply(LakePoing_PhysicoChemData, sd)









    





StationID DistanceFromCoastLine Depth Temperature Turbidity NitrateConcentration PhosphateConcentration pH DissolvedOxygen ChlorophylConcentration

	1             NA 66.25608 19.58872 17.00429 24.99985 1.757394 8.155144 3.633303 433.2720
	2       406.1727 39.42994 15.99275 34.23102 39.45532 5.198505 8.163536 6.349586 587.0538
	3       749.9593 52.85495 18.52725 25.13875 28.12810 4.056897 8.378468 4.963911 732.9942
	4       511.9407 70.05928 17.24642 17.56738 29.62252 3.210157 8.379396 8.481022 578.2946
	5       490.1708 21.29867 19.73065 17.47653 26.19793 1.061041 8.588204 3.977796 255.7663
	6       406.1146 44.73939 16.85460 32.09275 36.91632 4.762509 8.770558 5.268300 569.1504









    





	500
	35
	17.8
	22.4
	27.79
	2.03
	8.4
	4.1
	398.73








    





	StationID
		25.5
	DistanceFromCoastLine
		<NA>
	Depth
		35.6221835580608
	Temperature
		18.7932010447359
	Turbidity
		24.7425744096173
	NitrateConcentration
		32.9751833129685
	PhosphateConcentration
		3.14902905966643
	pH
		8.40507489017078
	DissolvedOxygen
		5.30868257423964
	ChlorophylConcentration
		456.269838526354








    





	150
	17
	3.3
	8
	7.8
	1.8
	0.3
	3.1
	200








    





	StationID
		14.5773797371133
	DistanceFromCoastLine
		<NA>
	Depth
		16.6911668632812
	Temperature
		3.06820799305462
	Turbidity
		8.73061357623073
	NitrateConcentration
		7.97899025277311
	PhosphateConcentration
		1.90228191159872
	pH
		0.275940524092886
	DissolvedOxygen
		2.5117915641513
	ChlorophylConcentration
		168.476392031186

Looks decent. What about the correlation structure of the variables?



In [11]:

    
#print correlation matrix used
corMatrix
#we don't care about the first column = stationID; print actual correlation matrix
cor(LakePoing_PhysicoChemData[,2:10])









    






	 1.00  0.23 -0.36 -0.43 -0.62 -0.51  0.10  0.15 0.27 
	 0.23  1.00 -0.34 -0.54 -0.57 -0.47 -0.12  0.17 0.32 
	-0.36 -0.34  1.00  0.07  0.45  0.39  0.09 -0.24 0.23 
	-0.43 -0.54  0.07  1.00  0.70  0.59 -0.03 -0.27 0.80 
	-0.62 -0.57  0.45  0.70  1.00  0.79  0.21  0.40 0.87 
	-0.51 -0.47  0.39  0.59  0.79  1.00  0.13  0.57 0.91 
	 0.10 -0.12  0.09 -0.03  0.21  0.13  1.00  0.07 0.23 
	 0.15  0.17 -0.24 -0.27  0.40  0.57  0.07  1.00 0.28 
	 0.27  0.32  0.23  0.80  0.87  0.91  0.23  0.28 1.00 









    





DistanceFromCoastLine Depth Temperature Turbidity NitrateConcentration PhosphateConcentration pH DissolvedOxygen ChlorophylConcentration

	DistanceFromCoastLine  1                    NA           NA          NA         NA           NA          NA          NA         NA  
	Depth NA           1.000000000  0.006836875 -0.28713456 -0.2245421  -0.12865580 -0.10911953  0.07600874  0.2852102  
	Temperature NA           0.006836875  1.000000000 -0.05344176  0.1721365   0.17918399  0.15425890 -0.44786711 -0.0984531  
	Turbidity NA          -0.287134559 -0.053441763  1.00000000  0.8702103   0.62758425  0.13935539 -0.13901238  0.5863490  
	NitrateConcentration NA          -0.224542143  0.172136490  0.87021031  1.0000000   0.77123670  0.32658433  0.11916594  0.5272028  
	PhosphateConcentration NA          -0.128655801  0.179183987  0.62758425  0.7712367   1.00000000  0.06232714  0.28630361  0.4355788  
	pH NA          -0.109119528  0.154258901  0.13935539  0.3265843   0.06232714  1.00000000  0.15459914  0.1156120  
	DissolvedOxygen NA           0.076008739 -0.447867115 -0.13901238  0.1191659   0.28630361  0.15459914  1.00000000  0.2328112  
	ChlorophylConcentration NA           0.285210156 -0.098453102  0.58634903  0.5272028   0.43557884  0.11561198  0.23281121  1.0000000

Another way to do it is to use the function pairs().



In [12]:

    
pairs(LakePoing_PhysicoChemData)

Now the usual exploratory data analysis round. This time I'll do it the lazy way... I mean with a bit of programming.



In [13]:

    
for(column in colnames(LakePoing_PhysicoChemData[,2:10])){
    
    title<-paste(c("Histogram of", column), sep=" ")
    hist(LakePoing_PhysicoChemData[[column]], main=title, xlab=column)
    
}

This is the equivalent of:

>hist(LakePoing_PhysicoChemData$DistanceFromCoastLine, main="DistanceFromCoastLine")

>hist(LakePoing_PhysicoChemData$Temperature, main="Temperature")

...

>hist(LakePoing_PhysicoChemData$ChlorophylConcentration, main="ChlorophylConcentration")

but way faster... I mean I have to write less code (but think/google more).

If you are interested in doing something similar to what the pairs() function does you can do it for selected variables. For example:



In [14]:

    
plot(LakePoing_PhysicoChemData$ChlorophylConcentration~LakePoing_PhysicoChemData$PhosphateConcentration)

Or you can try to code something:



In [15]:

    
#remember we don't want column 1...

for(y in 2:ncol(LakePoing_PhysicoChemData)){#for all columns starting with column 2
    for(x in (y+1):ncol(LakePoing_PhysicoChemData)){#for all columns starting with the previous selected column
        title<-paste(c(as.character(colnames(LakePoing_PhysicoChemData[y])), "vs.", as.character(colnames(LakePoing_PhysicoChemData[x]))), sep=" ")
        plot(LakePoing_PhysicoChemData[[y]]~LakePoing_PhysicoChemData[[x]], main=title)
    } 
}

#how can you modify the x and y labels in the previous nested for loop to plot nicer graphs?









    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    



Error in `[.data.frame`(LakePoing_PhysicoChemData, x): undefined columns selected
Traceback:

1. paste(c(as.character(colnames(LakePoing_PhysicoChemData[y])), 
 .     "vs.", as.character(colnames(LakePoing_PhysicoChemData[x]))), 
 .     sep = " ")   # at line 5 of file <text>
2. colnames(LakePoing_PhysicoChemData[x])
3. is.data.frame(x)
4. LakePoing_PhysicoChemData[x]
5. `[.data.frame`(LakePoing_PhysicoChemData, x)
6. stop("undefined columns selected")

Now, remember we wanted to have a sampling that reflected the zonation of the lake margins. These were:

Natural reserve
Agricultural land use
Town area
Industrial area (the mining processing company that wants to expand)

using the function genMultiVariateNormalData we could generate n data points for each of these areas as long as they amount to 50 (our N), because we sampled only 50 stations.

Above, I mentioned I wanted a fifth class to group stations that cannot be assigned to any area. This will allow us to sample 10 datapoint per area.

Now, coupling a for cycle with a call to the function genMultiVariateNormalData I can generate 10 data points for all the variables (9) for all the areas (5).



In [ ]:

    
areaDF<-data.frame(StationID=as.numeric(),DistanceFromCoastLine=as.numeric(),Depth=as.numeric(),Temperature=as.numeric(),Turbidity=as.numeric(),NitrateConcentration=as.numeric(),PhosphateConcentration=as.numeric(),pH=as.numeric(),DissolvedOxygen=as.numeric(),ChlorophylConcentration=as.numeric())

for(area in rep(letters[1:5])) {
    
    #the mean value for each variable
    meanVector<-c(500,35,17.8,22.4,27.79,2.03,8.4,4.1,398.73)

    #the value of the standard deviation of each variable
    sdVector<-c(150,17,3.3,8,7.8,1.8,0.3,3.1,200)

    #the correlation matrix specifying how different variables relate to each other.
    corMatrix<-matrix(c(1,0.23,-0.36,-0.43,-0.62,-0.51,0.1,0.15,0.27,0.23,1,-0.34,-0.54,-0.57,-0.47,-0.12,0.17,0.32,-0.36,-0.34,1,0.07,0.45,0.39,0.09,-0.24,0.23,-0.43,-0.54,0.07,1,0.7,0.59,-0.03,-0.27,0.8,-0.62,-0.57,0.45,0.7,1,0.79,0.21,0.4,0.87,-0.51,-0.47,0.39,0.59,0.79,1,0.13,0.57,0.91,0.1,-0.12,0.09,-0.03,0.21,0.13,1,0.07,0.23,0.15,0.17,-0.24,-0.27,0.4,0.57,0.07,1,0.28,0.27,0.32,0.23,0.8,0.87,0.91,0.23,0.28,1),9,9)
    
    #here we calculate the covariace matrix.
    covMatrix<-cor2cov(corMatrix,sdVector)
    tmpDF<-data.frame(genMultiVariateNormalData(10, meanVector, covMatrix, c("StationID","DistanceFromCoastLine","Depth","Temperature","Turbidity","NitrateConcentration","PhosphateConcentration","pH","DissolvedOxygen","ChlorophylConcentration"), truncate=T, left_truncate=rep(0,9), right_truncate=rep(Inf,9)),Area=rep(area,10))
    areaDF<-merge(areaDF,tmpDF, all=T)
}

Lets check the data generated for any variable:



In [ ]:

    
tapply(areaDF$PhosphateConcentration,areaDF$Area, mean)

In order to make the different areas of the lake different in a meaningful way, one would have to pass a matrix specifying the mean and standard deviations of the different categories and would need to iterate over the rows (representing the mean or standard deviation vector for different areas) to sample from a multivariate normal the matches those particular values. For this example I will keep the covariance structure of the matrix constant (as far as possible), i.e. it will be the same for all localities and will use the same standard deviation for each variable, i.e. I will only change the mean values at each point.

One way to achieve this would be to call the funcion genMultiVariateNormalData inside a for loop that iterates over matrix rows.

For example:



In [ ]:

    
#the mean value for each variable will appear as a column in the matrix. Rows will represent localities.

#the order of the variables is:
#c("DistanceFromCoastLine","Depth","Temperature","Turbidity","NitrateConcentration","PhosphateConcentration","pH","DissolvedOxygen","ChlorophylConcentration")

#the order of the areas will be:
#1=lake, 2=city, 3=natural reserve, 4=agriculture, 5=mining

meanMatrix<-matrix(c(420,32,16.2,17,25.85,2.21,8.1,3.9,360.30,
                     510,31,17.1,16,26.90,2.45,8.2,4.1,380.65,
                     370,23,16.8,19,22.20,1.03,8.3,4.3,300.03,
                     368,28,17.5,15,29.90,3.50,8.0,3.0,450.79,
                     480,35,16.4,14,27.79,2.03,7.4,2.2,398.47), ncol=9, nrow=5, byrow=T)

#the value of the standard deviation of each variable
sdVector<-c(50,7,2.1,4,2.8,0.8,0.3,1.1,120)

#the correlation matrix specifying how different variables relate to each other.
corMatrix<-matrix(c(1,0.23,-0.36,-0.43,-0.62,-0.51,0.1,0.15,0.27,0.23,1,-0.34,-0.54,-0.57,-0.47,-0.12,0.17,0.32,-0.36,-0.34,1,0.07,0.45,0.39,0.09,-0.24,0.23,-0.43,-0.54,0.07,1,0.7,0.59,-0.03,-0.27,0.8,-0.62,-0.57,0.45,0.7,1,0.79,0.21,0.4,0.87,-0.51,-0.47,0.39,0.59,0.79,1,0.13,0.57,0.91,0.1,-0.12,0.09,-0.03,0.21,0.13,1,0.07,0.23,0.15,0.17,-0.24,-0.27,0.4,0.57,0.07,1,0.28,0.27,0.32,0.23,0.8,0.87,0.91,0.23,0.28,1),9,9)

#here we calculate the covariace matrix.
covMatrix<-cor2cov(corMatrix,sdVector)

#empty dataframe to store the data.
areaDF<-data.frame(StationID=as.numeric(),DistanceFromCoastLine=as.numeric(),Depth=as.numeric(),Temperature=as.numeric(),Turbidity=as.numeric(),NitrateConcentration=as.numeric(),PhosphateConcentration=as.numeric(),pH=as.numeric(),DissolvedOxygen=as.numeric(),ChlorophylConcentration=as.numeric())

for(row in 1:nrow(meanMatrix)){
    
    tmpDF<-data.frame(genMultiVariateNormalData(10, meanMatrix[row,], covMatrix, c("StationID","DistanceFromCoastLine","Depth","Temperature","Turbidity","NitrateConcentration","PhosphateConcentration","pH","DissolvedOxygen","ChlorophylConcentration"), truncate=T, left_truncate=rep(0,9), right_truncate=rep(Inf,9)),Area=rep(letters[row],10))
    areaDF<-merge(areaDF,tmpDF, all=T)
}

write.csv(LakePoing_PhysicoChemData, "./LakePoing_PhysicoChemData.csv")



In [ ]:

    
plot(areaDF$ChlorophylConcentration~areaDF$Area)

StationID	DistanceFromCoastLine	Depth	Temperature	Turbidity	NitrateConcentration	PhosphateConcentration	pH	DissolvedOxygen	ChlorophylConcentration
1	NA	66.25608	19.58872	17.00429	24.99985	1.757394	8.155144	3.633303	433.2720
2	406.1727	39.42994	15.99275	34.23102	39.45532	5.198505	8.163536	6.349586	587.0538
3	749.9593	52.85495	18.52725	25.13875	28.12810	4.056897	8.378468	4.963911	732.9942
4	511.9407	70.05928	17.24642	17.56738	29.62252	3.210157	8.379396	8.481022	578.2946
5	490.1708	21.29867	19.73065	17.47653	26.19793	1.061041	8.588204	3.977796	255.7663
6	406.1146	44.73939	16.85460	32.09275	36.91632	4.762509	8.770558	5.268300	569.1504

1.00	0.23	-0.36	-0.43	-0.62	-0.51	0.10	0.15	0.27
0.23	1.00	-0.34	-0.54	-0.57	-0.47	-0.12	0.17	0.32
-0.36	-0.34	1.00	0.07	0.45	0.39	0.09	-0.24	0.23
-0.43	-0.54	0.07	1.00	0.70	0.59	-0.03	-0.27	0.80
-0.62	-0.57	0.45	0.70	1.00	0.79	0.21	0.40	0.87
-0.51	-0.47	0.39	0.59	0.79	1.00	0.13	0.57	0.91
0.10	-0.12	0.09	-0.03	0.21	0.13	1.00	0.07	0.23
0.15	0.17	-0.24	-0.27	0.40	0.57	0.07	1.00	0.28
0.27	0.32	0.23	0.80	0.87	0.91	0.23	0.28	1.00

	DistanceFromCoastLine	Depth	Temperature	Turbidity	NitrateConcentration	PhosphateConcentration	pH	DissolvedOxygen	ChlorophylConcentration
DistanceFromCoastLine	1	NA	NA	NA	NA	NA	NA	NA	NA
Depth	NA	1.000000000	0.006836875	-0.28713456	-0.2245421	-0.12865580	-0.10911953	0.07600874	0.2852102
Temperature	NA	0.006836875	1.000000000	-0.05344176	0.1721365	0.17918399	0.15425890	-0.44786711	-0.0984531
Turbidity	NA	-0.287134559	-0.053441763	1.00000000	0.8702103	0.62758425	0.13935539	-0.13901238	0.5863490
NitrateConcentration	NA	-0.224542143	0.172136490	0.87021031	1.0000000	0.77123670	0.32658433	0.11916594	0.5272028
PhosphateConcentration	NA	-0.128655801	0.179183987	0.62758425	0.7712367	1.00000000	0.06232714	0.28630361	0.4355788
pH	NA	-0.109119528	0.154258901	0.13935539	0.3265843	0.06232714	1.00000000	0.15459914	0.1156120
DissolvedOxygen	NA	0.076008739	-0.447867115	-0.13901238	0.1191659	0.28630361	0.15459914	1.00000000	0.2328112
ChlorophylConcentration	NA	0.285210156	-0.098453102	0.58634903	0.5272028	0.43557884	0.11561198	0.23281121	1.0000000