Finding Asteroids


In [1]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as pl

Figures

pl.subplots() returns the figure and an array of axes, in the shape you want to plot:


In [4]:
img1 = np.loadtxt('../Tutorials/starfield_exp1.txt')
img2 = np.loadtxt('../Tutorials/starfield_exp2.txt')

# Side-by-side plots!
# Subplot values: (number of rows, number of columns, plot number)
fig, (ax1, ax2) = pl.subplots(ncols=2, figsize=(12,6))

#  Show each image
ax1.imshow(img1, origin='lower')
ax2.imshow(img2, origin='lower')

#  Get the coordinates of the middle of the image
xMid = img1.shape[0] / 2
yMid = img1.shape[1] / 2

# Overplot circle around starting point.
ax1.scatter(xMid, yMid, color='k', marker='o', s=100, facecolors='none') # Hollow points of size = 100
ax2.scatter(xMid, yMid, color='k', marker='o', s=100, facecolors='none')

#  Diff the images
img3 = img2 - img1

#  Get 
endpt = np.argwhere(img3 == img3.min())

ax2.scatter(endpt[0][0], endpt[0][1], color='r', marker='o', s=100, facecolors='none')


Out[4]:
<matplotlib.collections.PathCollection at 0x116141400>

Q. Guesses for where this data came from?

Procedure for making the **fake asteroid data**: 1) Create two 2-D arrays of zeros (representing two different observations half an hour apart). 2) Create N stars, randomly positioned, and with random brightness. Same in each image. 3) Add an asteroid, with a different position in each image. 4) Add noise (Gaussian random numbers) to each image. Different for each image. 5) Convolve each image with telescope point-spread function. -Take ASTR 3510-3520 and maybe PHYS 4510 (Optics) to learn more!

Q. Why would this asteroid exercise be useful in practice?

Imagine you're the director of an observatory whose purpose is to identify near Earth asteroids and determine their orbital parameters. Time is of the essence, so you must automate the process. How could you be confident in your automated procedure for asteroid finding? How about: 1) Randomly generate images of the sky. Add asteroids with varying brightnesses, velocities, distances from Earth, etc. 2) Apply your automated procedure to these synthetic datasets. **The key: you know where the asteroid is in each image (you put it there!).** Assessments: 1) What fraction of the time does your automated procedure successfully find the asteroid(s)? 2) Does your procedure struggle in particular cases (e.g., faint asteroids, fast moving asteroids, etc.)? 3) How can your observing strategy be altered to ensure that your procedure can handle such problem cases?

Random Numbers (Text Ch. 8)

Used primarily to simulate physical processes (say when processes are stochastic, like emission and absorption of photons) and analysis of data (in the presence of noise). Because they are "cheap" many simulations are run to assess statistics.

"Drawing" random numbers

Python's module for this is called "random". It generates numbers distributed between 0 and 1 with uniform probability (these are not really perfectly random -- they are "pseudo-random").

In [6]:
import random
print(random.random())
print(random.random())
print(random.random())


0.15459930321802728
0.9529400964403417
0.11695419921492656
If the seed is specified, the same random number will be generated:

In [7]:
random.seed(2)
print(random.random())

random.seed(2)
print(random.random())

random.seed(2)
print(random.random())


0.9560342718892494
0.9560342718892494
0.9560342718892494

Q. So, if we didn't set the seed, how did we get different numbers each of the first 3 times I called random?

Q. What could the seed be based on that would appear random?

Q. If we need random numbers generated between 0 and 100 what could we do?

Could multiply by 100:

In [8]:
print(100 * random.random())
print(100 * random.random())
print(100 * random.random())


94.78274870593494
5.655136772680869
8.487199515892163
Or use random.uniform:

In [9]:
random.uniform(10, 20)


Out[9]:
18.354988781294495

In [12]:
import random

# Pick a seed:
random.seed(5)

# Let's have 1000 samples:
N = 1000
x = range(N)

# List comprehension to generate random numbers in y between 0 and 100:
y = [random.uniform(0, 100) for _ in x]
To visualize the random numbers:

In [13]:
pl.scatter(x, y)


Out[13]:
<matplotlib.collections.PathCollection at 0x115d7aba8>
This is not a very convenient way to view the distribution of random numbers to see if they are truly uniformly distributed.

Q. How could we visualize the distribution better?

Same as above to begin with:

In [14]:
random.seed(5)
N = 1000      # 1000 samples
x = range(N)
y = [random.uniform(0, 100) for _ in x]

# Now, plot a histogram of the data
#
# hist() returns:
#    the number of values in each bin <array>
#    the bin x-coord values <array>
#    the patches <array>
#
# A Patch describes a bin geometrically
nArray, binArray, patchArray = pl.hist(y, bins=50, facecolor='g')

# Label the plot and add a grid for visualization:
pl.xlabel('Random Number')
pl.ylabel('Number of Occurances')
pl.grid(True)

#  Print each returned value from hist()
print(nArray)
print()
print(binArray)
print()
print(patchArray)
print()

print(patchArray[0])


[ 21.  25.  26.  20.  23.  22.  19.  23.  19.  14.  17.  18.  17.  21.  27.
  24.  11.  18.  31.  23.  17.  21.  17.  23.  24.  14.  16.  18.  18.  22.
  21.  21.  19.  18.  21.  19.  22.  19.  22.  21.  15.  22.  13.  12.  18.
  20.  23.  23.  26.  16.]

[  4.39632140e-02   2.04168385e+00   4.03940448e+00   6.03712511e+00
   8.03484574e+00   1.00325664e+01   1.20302870e+01   1.40280076e+01
   1.60257283e+01   1.80234489e+01   2.00211695e+01   2.20188902e+01
   2.40166108e+01   2.60143314e+01   2.80120521e+01   3.00097727e+01
   3.20074933e+01   3.40052140e+01   3.60029346e+01   3.80006552e+01
   3.99983759e+01   4.19960965e+01   4.39938171e+01   4.59915377e+01
   4.79892584e+01   4.99869790e+01   5.19846996e+01   5.39824203e+01
   5.59801409e+01   5.79778615e+01   5.99755822e+01   6.19733028e+01
   6.39710234e+01   6.59687441e+01   6.79664647e+01   6.99641853e+01
   7.19619060e+01   7.39596266e+01   7.59573472e+01   7.79550679e+01
   7.99527885e+01   8.19505091e+01   8.39482298e+01   8.59459504e+01
   8.79436710e+01   8.99413916e+01   9.19391123e+01   9.39368329e+01
   9.59345535e+01   9.79322742e+01   9.99299948e+01]

<a list of 50 Patch objects>

Rectangle(0.0439632,0;1.99772x21)

Q. Is the normalization about right? That is, how many samples are there?


In [15]:
# Hint:  What is the average number of hits 
# in each bin and how many bins are there?

print(len(nArray))


50

Q. From these data, how would we compute the probability of a random number occuring in each bin?


In [16]:
# Same as above
random.seed(5)
N = 1000
x = range(N)
y = [random.uniform(0, 100) for _ in x]

# Now, note "normed = True" to normalize to 1:
nArray, binArray, patchArray = pl.hist(y, bins=50, normed=True, facecolor='g')

# Then plot
pl.xlabel('Random Number')
pl.ylabel('Probability of Occurance')
pl.grid(True)

nArray


Out[16]:
array([ 0.01051198,  0.01251426,  0.01301483,  0.01001141,  0.01151312,
        0.01101255,  0.00951084,  0.01151312,  0.00951084,  0.00700799,
        0.0085097 ,  0.00901027,  0.0085097 ,  0.01051198,  0.0135154 ,
        0.01201369,  0.00550628,  0.00901027,  0.01551769,  0.01151312,
        0.0085097 ,  0.01051198,  0.0085097 ,  0.01151312,  0.01201369,
        0.00700799,  0.00800913,  0.00901027,  0.00901027,  0.01101255,
        0.01051198,  0.01051198,  0.00951084,  0.00901027,  0.01051198,
        0.00951084,  0.01101255,  0.00951084,  0.01101255,  0.01051198,
        0.00750856,  0.01101255,  0.00650742,  0.00600685,  0.00901027,
        0.01001141,  0.01151312,  0.01151312,  0.01301483,  0.00800913])
numpy also has a random number generator to efficiently generate a large number of random numbers:

In [17]:
import numpy as np
randUniArray = np.random.uniform(0, 10, size=10000) # Range from 0 to 10

Q. What should the sum of all the numbers be (on average)?


In [18]:
sum(randUniArray)


Out[18]:
49967.920533300348
To make a normal or Gaussian distribution:

In [20]:
import numpy as np

mu    = 100  # Average of 100
sigma = 20   # Standard deviation of 20

# randn = normal distribution (unity normalized by default)
randNormArray = mu + sigma * np.random.randn(10000)

# Q. What data type should x be?

type(randNormArray)


Out[20]:
numpy.ndarray

In [21]:
# Histogram of the data:
#nArray, binArray, patchArray = pl.hist(randNormArray, 100, normed=True, facecolor='g')
pl.hist(randNormArray, 100, normed=True, facecolor='g')

pl.xlabel('x', size='large')
pl.ylabel('Probability of x', size='large')
pl.title('Normal Distribution')
pl.text(40, 0.015, r'$\mu=100,\ \sigma=20$', size='x-large')
pl.grid(True)


Again (100x more samples):

In [22]:
mu    = 0
sigma = 20 

randNormArray = mu + sigma * np.random.randn(1000000) 

nArray, binArray, patchArray = pl.hist(randNormArray, 100, normed=1, facecolor='r')

pl.xlabel('x', size='large')
pl.ylabel('Probability of x', size='large')
pl.title('Normal Distribution')
pl.text(40, .015, r'$\mu=0,\ \sigma=20$', size='x-large')
pl.grid(True)


Q. When might you be inclined to provide the random number generator with a seed?

Alternative approach (using numpy.random.normal):

In [24]:
mu    = 100
sigma = 20

# Method #1
randNormArray1 = mu + sigma * np.random.randn(10000) 

# Method #2
randNormArray2 = np.random.normal(size=10000, loc=mu, scale=sigma)

# Compare
_, _, _ = pl.hist(randNormArray1, bins=100, color='b')
_, _, _ = pl.hist(randNormArray2, bins=100, color='g')



In [ ]: