Goal: Use the trained quasar classifier to classify Quasar Candidates in the range of the Sloan Digital Sky Survey identified by the SIMBAD Astrological Database.
In the Data Collection notebook, we downloaded the Quasar Candidate images from the Sloan Digital Sky Survey. We will now use our trained model to classify these images.
In [ ]:
import os
import sys
from scipy import ndimage
import matplotlib.pyplot as plt
%matplotlib inline
from PIL import Image
import numpy as np
import pandas as pd
import tensorflow as tf
from QuasarClassifier import QuasarClassifier, ImportImages
We create a list of the Quasar Candidate filenames.
In [ ]:
ImageNames = list(
filter(lambda name: 'QuasarC' in name, os.listdir('./Images/')))
ImageNames = list(map(lambda name: './Images/' + name, ImageNames))
We now use the Quasar Classifier imported from QuasarClassifier.
In [ ]:
Classification = QuasarClassifier(ImageNames, 120, 120)
In [ ]:
Classification.head()
Out[ ]:
In [ ]:
Classification.tail()
Out[ ]:
We will now look at 20 random positive results and 20 random negative results. We will use the ImportImages function from QuasarClassifier to transform the images to numpy arrays.
In [ ]:
input_images = ImportImages(ImageNames, 120, 120)
In [ ]:
positives = Classification[Classification['QuasarProbability'] >= 0.5]
# Viewing the positives
pos_sample = positives.sample(20)
for ind in pos_sample.index:
imageplot = plt.figure(figsize=(10, 3))
axis1 = imageplot.add_axes([0, 0, .24, .9])
axis1.imshow(input_images[ind, :, :, 0], cmap='Reds')
axis2 = imageplot.add_axes([.25, 0, .49, .9])
axis2.imshow(input_images[ind, :, :, 1], cmap='Greens')
axis3 = imageplot.add_axes([.50, 0, .74, .9])
axis3.imshow(input_images[ind, :, :, 2], cmap='Blues')
axis4 = imageplot.add_axes([.75, 0, .9, .9])
current_img = positives.loc[ind]['Filename']
axis4.imshow(Image.open(current_img))
imageplot.suptitle('Model Determined Quasar Image Number: %s. Quasar Probability %.2f%%' % (
str(ind), (positives.loc[ind]['QuasarProbability'] * 100)))
In [ ]:
negatives = Classification[Classification['QuasarProbability'] < 0.5]
# Viewing the negatives
neg_sample = negatives.sample(20)
for ind in neg_sample.index:
imageplot = plt.figure(figsize=(10, 3))
axis1 = imageplot.add_axes([0, 0, .24, .9])
axis1.imshow(input_images[ind, :, :, 0], cmap='Reds')
axis2 = imageplot.add_axes([.25, 0, .49, .9])
axis2.imshow(input_images[ind, :, :, 1], cmap='Greens')
axis3 = imageplot.add_axes([.50, 0, .74, .9])
axis3.imshow(input_images[ind, :, :, 2], cmap='Blues')
axis4 = imageplot.add_axes([.75, 0, .9, .9])
current_img = negatives.loc[ind]['Filename']
axis4.imshow(Image.open(current_img))
imageplot.suptitle('Model Determined Non-Quasar Image Number: %s. Quasar Probability %.2f%%' %
(str(ind), (negatives.loc[ind]['QuasarProbability'] * 100)))
We will merge these probabilities into the QuasarCandidate.csv file.
In [ ]:
QuasarCandidates = pd.read_csv('QuasarCandidatesData.csv')
In [ ]:
QuasarCandidates.head()
Out[ ]:
In [ ]:
QuasarCandidates.tail()
Out[ ]:
The index of the QuasarCandidates is the number attached to the Quasar Candidate images. As is this not the index of the Classification dataframe we must join these two dataframes on the index of the QuasarCandidates dataframe and index in the Filename of the Classifcation dataframe.
In [ ]:
# Create an index column for the join in the QuasarCandidates dataframe
QuasarCandidates.reset_index(inplace=True)
In [ ]:
# Remove the index of the Filenames in the Classification dataframe
# We achieve this by splitting after the '_' and before the '.' in each filename.
# We also require the ImageNumber to be an integer to join with the index of QuasarCandidates.
Classification['ImageNumber'] = Classification['Filename'].apply(
lambda name: int(name.split(sep='_')[1].split(sep='.')[0]))
In [ ]:
Classification.head()
Out[ ]:
In [ ]:
OutputDF = pd.merge(QuasarCandidates, Classification, 'inner',
left_on='index', right_on='ImageNumber')
In [ ]:
OutputDF.head()
Out[ ]:
In [ ]:
OutputDF.tail()
Out[ ]:
In [ ]:
OutputDF.drop(['index', 'ImageNumber'], axis=1, inplace=True)
In [ ]:
OutputDF.head()
Out[ ]:
In [ ]:
OutputDF.to_csv('QuasarCandidateDataWithClassification.csv', index=False)