First of all, we have a cloud of XYZ points and we want to predict a vertice of a 2D structured grid based on this cloud of points (I will abreviate that as CP and the structured grid as SG).
It's important to note that, the CP has 3 values which describes each point:
So, I'm going to use KNN (K-Nearest Neighbors) strategy to search for the nearest points of each vertice of the SG and I must predict the property value of that vertice.
However if the nearest point of the vertice is too far away I will assume that point has to be blank (or in this case I will just put a zero value). I'm going to explain in the next sections how to apply this interpolation.
Obs: For more information about KNN algorithm please visit:
In [1]:
# Imports
import math
import seaborn
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import cm
from sklearn.neighbors import NearestNeighbors
from sklearn.preprocessing import normalize
In [2]:
# Loading xyz map
correct_map = pd.read_csv('correct_map.xyz', sep=' ', dtype='d', header=None, names=['x', 'y', 'z'])
scattered_data_10000 = pd.read_csv('scattered_data_10000.xyz', sep=' ', dtype='d', header=None, names=['x', 'y', 'z'])
NI = 100
NJ = 100
number_neighbors = 5
# Creating grid points
x_grid = np.linspace(1, 10, NI)
y_grid = np.linspace(1, 10, NJ)
grid_points = pd.DataFrame()
grid_points['x'] = np.tile(x_grid, NJ)
grid_points['y'] = y_grid.repeat(NI)
grid_points['z'] = np.zeros(NI*NJ)
Structured Grid:
In [3]:
import matplotlib as mpl
%matplotlib inline
mpl.rcParams['savefig.dpi'] = 250
mpl.rcParams['figure.dpi'] = 250
grid_points.plot(kind='scatter', x='x', y='y', marker='.', s=5)
plt.show()
Cloud of Points:
In [4]:
scattered_data_10000.plot(kind='scatter', x='x', y='y', marker='.', s=5)
plt.show()
In [5]:
# Applying KNN
neighbors = NearestNeighbors(n_neighbors=number_neighbors, algorithm='ball_tree').fit(scattered_data_10000.loc[:, ['x', 'y']])
# Distance and index of each point from each vertice of the grid
distances, indexes = neighbors.kneighbors(grid_points.loc[:, ['x', 'y']])
Symbol | Meaning |
---|---|
$$ S_x $$ | Step in X axis |
$$ x_{max} $$ | Maximum value of X in grid |
$$ x_{max} $$ | Minimum value of X in grid |
$$ N_i $$ | Number of vertices in X axis |
$$ S_y $$ | Step in Y axis |
$$ y_{max} $$ | Maximum value of Y in grid |
$$ y_{max} $$ | Minimum value of Y in grid |
$$ N_j $$ | Number of vertices in Y axis |
$$ R $$ | Radius |
$$ d_{norm} $$ | Distance normalized |
$$ n $$ | Number of neighbors |
$$ w $$ | One minus the normalization, represent the weight of each distance |
$$ P $$ | Result of the scalar points |
$$ Z $$ | Represents the property value |
In Python:
In [6]:
# Maximum and minimum values in X axis
max_x = grid_points.loc[:, 'x'].max()
min_x = grid_points.loc[:, 'x'].min()
# Maximum and minimum values in Y axis
max_y = grid_points.loc[:, 'y'].max()
min_y = grid_points.loc[:, 'y'].min()
# Step X and Step Y
step_x = (max_x - min_x) / NI
step_y = (max_y - min_y) / NJ
# Radius
radius = 2 * math.sqrt((step_x ** 2) + (step_y ** 2))
Selecting the points which the distance are equal or less than the radius:
In [7]:
less_radius = distances[:, 0] <= radius
distances = distances[less_radius, :]
indexes = indexes[less_radius, :]
In [8]:
# Using the scikit-learn library
weight_norm = 1 - normalize(distances, axis=1)
In [9]:
prod = weight_norm * scattered_data_10000.values[indexes, 2]
scalars = np.full(NI * NJ, 0.0)
grid_points.loc[less_radius, 'z'] = prod.sum(axis=1) / (weight_norm.sum(axis=1))
In [10]:
plt.pcolor(correct_map.values[:, 0].reshape(NI, NJ), correct_map.values[:, 1].reshape(NI, NJ), correct_map.values[:, 2].reshape(NI, NJ), cmap=cm.jet)
Out[10]:
In [11]:
plt.pcolor(grid_points.values[:, 0].reshape(NI, NJ), grid_points.values[:, 1].reshape(NI, NJ), grid_points.values[:, 2].reshape(NI, NJ), cmap=cm.jet)
Out[11]:
In [12]:
dif_map = correct_map.z - grid_points.z
In [13]:
dif_map.describe()
Out[13]:
In [14]:
error = (grid_points.z / correct_map.z) - 1
In [15]:
plt.hist(error)
Out[15]:
In [16]:
error[error < 0] *= -1
In [17]:
error.describe()
Out[17]:
Analyzing the histogram, the mean and the standard deviation we can see the mean error is something around 0.33% and the most part of the values is near to 0.46%.