Generating Noisy Data

For testing learning algorithms we may need to generate our own dataset with random noise built in.

A noiseless sine-wave

We will start out with creating a dataset. It will be a sine wave-ish data represented by $y = \sin ({2\pi x})$.


In [86]:
import numpy as np
import warnings
warnings.filterwarnings('ignore')
import matplotlib.pyplot as plt
%matplotlib inline
num_points = 700
x = np.arange(0,1, 1/num_points);
y = np.sin(2 * np.pi * x);
fig = plt.figure(figsize=(17,9), dpi=120);
ax1 = fig.add_subplot(111);
ax1.plot(x, y);
ax1.plot(x, y, linewidth=10, color = 'lightblue', alpha=0.7);
ax1.set_xlim(0, 1);
ax1.set_ylim(np.min(y)-0.2, np.max(y)+0.2);
ax1.grid();
#ax1.set_axis_off();
ax1.set_title('A noiseless sine-wave', fontsize=20);
ax1.set_ylabel(r'$\mathrm{sin}(2\pi x)$');
ax1.set_xlabel(r'$x$');


Adding noise to data

To add noise to the data all we have to do is generate random variates and add it to the generated pristine signal. As an example I am adding noise generated from a Normal distribution with mean($\mu$) 0 and with a spread or standard deviation($\sigma$) of 0.2. Although this is not the place for it but for the sake of completeness the following is the definition for Normal Distribution : $$f(x) = \dfrac{1}{\sigma\sqrt{2\pi}} e^{-\dfrac{1}{2}\left(\dfrac{x-\mu}{\sigma}\right)^2}$$


In [85]:
fig = plt.figure(figsize=(17,9), dpi=120);
ax1 = fig.add_subplot(111);
ax1.set_axis_bgcolor('gray');
epsilon = np.random.normal(0, 0.2, num_points);
z = y + epsilon;
ax1.plot(x, y, color = 'red');
ax1.plot(x, y, linewidth=10, color = 'lightblue', alpha=0.7);
ax1.scatter(x, z, linewidth=5, color = 'lightgreen', alpha=0.7);
ax1.set_xlim(0, 1);
ax1.set_ylim(np.min(z)-0.2, np.max(z)+0.2);
ax1.grid();
ax1.set_title('A sine-wave with superimposed noise', fontsize=20);
ax1.set_axis_off();
ax1.set_ylabel(r'$\mathrm{sin}(2\pi x)$');
ax1.set_xlabel(r'$x$');