Sudoku recognizer

Alejandro Hernández Cordero (GitHub ID: ahcorde)

This notebook details how to recognize a sudoku puzzle from a picture. We'll make use of simple image processing algorithms (edge detection, thresholds... ) and character recognition using the K-Nearest Neighbors (KNN) algorithm. It is a very simple but effective algorithm for solving multi-class classification problems. This puzzle matrix is a 9x9 array of known numbers 1-9 or 0s where the number is unknown

This ipython notebook is divided into two parts. The first part is related to computer vision and the second part is related to machine learning. To complete this task it is necessary to use a computer vision library, in this case we are going to use OpenCV library.

Introduction

The first thing we need to do is to identify the puzzle. We have some challenges: The lines are not perfect, the black grid lines have similar color as a lot of elements in the image or the small squares are difficult to extract.

To simplify the problem we assumed: the puzzle is the biggest square in the image and the puzzle will be orientated reasonably correctly.

In the next picture you can see a sudoku puzzle in a newspaper. To load the image it's used imread.


In [ ]:
%matplotlib inline
import pylab as pl
#import Opencv library
try:
    import cv2
except ImportError:
    print "You must have OpenCV installed"
    
#load image
image_sudoku_original = cv2.imread('../../../data/ocr/sudoku.jpg')
#Show Images
_=pl.imshow(image_sudoku_original) 
_=pl.axis("off")

Step 1: Segmenting the Sudoku

Once we have read the image we have to detect the lines. For this task, It used an adaptativeThreshold to extract the edge of the image (for each pixel in the image take the average value of the surrounding area). This function accepts a gray scale image, with the function cvtColor we change the color space (from RGB to gray scale). In this picture it's possible to see letters, lines or numbers. Light pixels are the paper and dark pixels are the ink.

The adaptativeThreshold function takes as first argument the input image, the second argument returns the binary image, the third argument is the non-zero value assigned to the pixels for which the condition is satisfied. The fourth is the adaptive thresholding algorithm, the fifth argument is the threshold type, the next argument is size of the pixel neighborhood and finally the last argument is a constant that is subtracted from the mean or weight mean.


In [ ]:
#gray image
image_sudoku_gray = cv2.cvtColor(image_sudoku_original,cv2.COLOR_BGR2GRAY)
#adaptive threshold
thresh = cv2.adaptiveThreshold(image_sudoku_gray,255,1,1,11,15)

#show image
_=pl.imshow(thresh, cmap=pl.gray())
_=pl.axis("off")

We have to find the connected contours in the image (using the function findContours). These function returns a vector with the corners of the contours. We'll find the sukodu in this contours.

The assumption is that the sudoku puzzle has 4 sides and it's convex. Checking the number of the contour is equal to four and using the funcion of OpenCV isContourConvex to check if the square is convex. We obtain the possible candidates.

We need to filter the candidates using the assumption: the sudoku puzzle is the biggest square in the image. We calculate the area of the possible contours (contourArea). The biggest square is the sudoku puzzle.


In [ ]:
#find the countours 
contours0,hierarchy = cv2.findContours( thresh,
                                        cv2.RETR_LIST,
                                        cv2.CHAIN_APPROX_SIMPLE)
#size of the image (height, width)
h, w = image_sudoku_original.shape[:2]

#copy the original image to show the posible candidate
image_sudoku_candidates = image_sudoku_original.copy()

#biggest rectangle
size_rectangle_max = 0; 
for i in range(len(contours0)):
    #aproximate countours to polygons
    approximation = cv2.approxPolyDP(contours0[i], 4, True)
        
    #has the polygon 4 sides?
    if(not (len (approximation)==4)):
        continue;
    #is the polygon convex ?
    if(not cv2.isContourConvex(approximation) ):
        continue; 
    #area of the polygon
    size_rectangle = cv2.contourArea(approximation)
    #store the biggest
    if size_rectangle> size_rectangle_max:
        size_rectangle_max = size_rectangle 
        big_rectangle = approximation

In the image below it's possible to see the 4 sides of the sudoku puzzle in red


In [ ]:
#show the best candidate
approximation = big_rectangle
for i in range(len(approximation)):
    cv2.line(image_sudoku_candidates,
             (big_rectangle[(i%4)][0][0], big_rectangle[(i%4)][0][1]), 
             (big_rectangle[((i+1)%4)][0][0], big_rectangle[((i+1)%4)][0][1]),
             (255, 0, 0), 2)
#show image
_=pl.imshow(image_sudoku_candidates, cmap=pl.gray()) 
_=pl.axis("off")

Now we have the sudoku puzzle segmented. We have got the corner points of the puzzle. It's currently not really usable for much. The sudoku puzzle is a bit distorted. It's necessary to correct the skewed perspective of the image. We need a way to mapping from the puzzle in the original picture back into a square. Where each corner of the sudoku puzzle corresponds to a corner on the a new image.

We use a transformation that will map one arbitrary 2D quadrilateral into another. We can use a perspective transformation:

$$X = \frac{ax + by + c}{gx + hy +1}$$$$Y = \frac{dx + ey + f}{gx + hy +1}$$

This perspective transformation maps a point $ (x, y) $ in one quadrilateral into a new $ (X, Y) $ in another quadrilateral. These two equations contain 8 unknowns, but we have 8 values. (the corners $x$ and $y$ coordinates of the puzzle). Solving these equations gives us the $a,b,c,d,e,f,g,h$ which provide us with a mapping to get our puzzle out nice and straight.

The OpenCV function getperspectivetransform resolved the perspective transformation. Calculates a perspective transform from four pairs of the corresponding points, where the first parameter are the coordinates of quadrangle vertices in the source image and the second parameter are the coordinates of the corresponding quadrangle vertices in the destination image.

To applies a perspective transformation to an image we used the function warpPerspective. This function transforms the source image using the specified matrix. We obtains the image below.

We need to sort the corner of the sudoku puzzle and then associate each point with the new image dimension.


In [ ]:
import numpy as np
IMAGE_WIDHT = 16
IMAGE_HEIGHT = 16
SUDOKU_SIZE= 9
N_MIN_ACTVE_PIXELS = 10

#sort the corners to remap the image
def getOuterPoints(rcCorners):
    ar = [];
    ar.append(rcCorners[0,0,:]);
    ar.append(rcCorners[1,0,:]);
    ar.append(rcCorners[2,0,:]);
    ar.append(rcCorners[3,0,:]);
    
    x_sum = sum(rcCorners[x, 0, 0] for x in range(len(rcCorners)) ) / len(rcCorners)
    y_sum = sum(rcCorners[x, 0, 1] for x in range(len(rcCorners)) ) / len(rcCorners)
    
    def algo(v):
        return (math.atan2(v[0] - x_sum, v[1] - y_sum)
                + 2 * math.pi) % 2*math.pi
        ar.sort(key=algo)
    return (  ar[3], ar[0], ar[1], ar[2])

The dataset images have 16x16 pixels. The size of the new image will be 144x144, because we have to divide each row and col by 9 and we have to return the same size of the images.


In [ ]:
#point to remap
points1 = np.array([
                    np.array([0.0,0.0] ,np.float32) + np.array([144,0], np.float32),
                    np.array([0.0,0.0] ,np.float32),
                    np.array([0.0,0.0] ,np.float32) + np.array([0.0,144], np.float32),
                    np.array([0.0,0.0] ,np.float32) + np.array([144,144], np.float32),
                    ],np.float32)    
outerPoints = getOuterPoints(approximation)
points2 = np.array(outerPoints,np.float32)

#Transformation matrix
pers = cv2.getPerspectiveTransform(points2,  points1 );

#remap the image
warp = cv2.warpPerspective(image_sudoku_original, pers, (SUDOKU_SIZE*IMAGE_HEIGHT, SUDOKU_SIZE*IMAGE_WIDHT));
warp_gray = cv2.cvtColor(warp, cv2.COLOR_BGR2GRAY)

#show image
_=pl.imshow(warp_gray, cmap=pl.gray())
_=pl.axis("off")

we've now got an undistorted square Sudoku puzzle

Step 2: Segmenting the numbers

After remapping, we can divide the puzzle into a 9×9 grid. Each cell in the grid will correspond (approximately) to a cell on the puzzle. The correspondence might not be perfect, but it should be good enough.

Now, we know the size of the bounding box. We also know the size of the image. We can easily center the image. We create a new image that’s the same size as the original.

Sending images directly to k-Nearest Neighbors is not a good idea. A little bit of work on the image can increase accuracy. Here, we’ll center the actual digit in the image. The dataset has been made in such a way, all digits are centered by their bounding box.

To remove the noise around the number (lines in general) we delete all the pixels outside the center of the image from a radius.


In [ ]:
def extract_number(x, y):
    #square -> position x-y
    im_number = warp_gray[x*IMAGE_HEIGHT:(x+1)*IMAGE_HEIGHT][:, y*IMAGE_WIDHT:(y+1)*IMAGE_WIDHT]

    #threshold
    im_number_thresh = cv2.adaptiveThreshold(im_number,255,1,1,15,9)
    #delete active pixel in a radius (from center) 
    for i in range(im_number.shape[0]):
        for j in range(im_number.shape[1]):
            dist_center = math.sqrt( (IMAGE_WIDHT/2 - i)**2  + (IMAGE_HEIGHT/2 - j)**2);
            if dist_center > 6:
                im_number_thresh[i,j] = 0;

    n_active_pixels = cv2.countNonZero(im_number_thresh)
    return [im_number, im_number_thresh, n_active_pixels]

Sometimes after remove the noise of the images can appear other particles close to the number. Because of this, we find the biggest bounding box in the small squared. We make the bounding box a little more bigger to improve the matching with the dataset.


In [ ]:
def find_biggest_bounding_box(im_number_thresh):
    contour,hierarchy = cv2.findContours(im_number_thresh.copy(),
                                         cv2.RETR_CCOMP,
                                         cv2.CHAIN_APPROX_SIMPLE)

    biggest_bound_rect = [];
    bound_rect_max_size = 0;
    for i in range(len(contour)):
         bound_rect = cv2.boundingRect(contour[i])
         size_bound_rect = bound_rect[2]*bound_rect[3]
         if  size_bound_rect  > bound_rect_max_size:
             bound_rect_max_size = size_bound_rect
             biggest_bound_rect = bound_rect
    #bounding box a little more bigger
    x_b, y_b, w, h = biggest_bound_rect;
    x_b= x_b-1;
    y_b= y_b-1;
    w = w+2;
    h = h+2; 
                
    return [x_b, y_b, w, h]

Now we have to separate all the numbers from the grid. First we extract the numbers. If we have more active pixels than a threshold, we find the biggest bounding box in the image if the size of this bounding box is bigger than 0, we store this number in a matrix. In another case we store a matrix of zeros.


In [ ]:
import math
import numpy as np

#sudoku representation
sudoku = np.zeros(shape=(9*9,IMAGE_WIDHT*IMAGE_HEIGHT))

def Recognize_number( x, y):
    """
    Recognize the number in the rectangle
    """    
    #extract the number (small squares)
    [im_number, im_number_thresh, n_active_pixels] = extract_number(x, y)

    if n_active_pixels> N_MIN_ACTVE_PIXELS:
        [x_b, y_b, w, h] = find_biggest_bounding_box(im_number_thresh)

        im_t = cv2.adaptiveThreshold(im_number,255,1,1,15,9);
        number = im_t[y_b:y_b+h, x_b:x_b+w]

        if number.shape[0]*number.shape[1]>0:
            number = cv2.resize(number, (IMAGE_WIDHT, IMAGE_HEIGHT), interpolation=cv2.INTER_LINEAR)
            ret,number2 = cv2.threshold(number, 127, 255, 0)
            number = number2.reshape(1, IMAGE_WIDHT*IMAGE_HEIGHT)
            sudoku[x*9+y, :] = number;
            return 1

        else:
            sudoku[x*9+y, :] = np.zeros(shape=(1, IMAGE_WIDHT*IMAGE_HEIGHT));
            return 0

Here you can see the segmented numbers in a grid


In [ ]:
index_subplot=0
n_numbers=0
indexes_numbers = []
for i in range(SUDOKU_SIZE):
    for j in range(SUDOKU_SIZE):
        if Recognize_number(i, j)==1:
            if (n_numbers%5)==0:
                index_subplot=index_subplot+1
            indexes_numbers.insert(n_numbers, i*9+j)
            n_numbers=n_numbers+1

#create subfigures
f,axarr= pl.subplots(index_subplot,5)

width = 0;
for i in range(len(indexes_numbers)):
    ind = indexes_numbers[i]
    if (i%5)==0 and i!=0:
        width=width+1
    axarr[i%5, width].imshow(cv2.resize(sudoku[ind, :].reshape(IMAGE_WIDHT,IMAGE_HEIGHT), (IMAGE_WIDHT*5,IMAGE_HEIGHT*5)), cmap=pl.gray())
    axarr[i%5, width].axis("off")

Step 3: Recognizing digits (Machine learning)

In machine learning, classification is the problem of identifying to which of a set of categories a new observation belongs. Classification is an example of the more general problem of pattern recognition. Which is the assignment of some sort of output value to a given input value. The k-Nearest Neighbors algorithm (or KNN) is a non-parametric method used for classification and regression.

In SHOGUN, you can use CKNN to perform k-Nearest Neighbors algorithm, it's a non-parametric method used for classification and regression. To construct a KNN machine, you must choose the hyper-parameter K and a distance function. Usually, we simply use the standard CEuclideanDistance, but in general, any subclass of CDistance could be used.

We need to load the training data. but the dataset values are between -1 and 1. We need to normalize the data between 0-255.


In [ ]:
from modshogun import MulticlassLabels,RealFeatures
from modshogun import KNN, EuclideanDistance
from scipy.io import loadmat

from numpy    import random

#load data
mat  = loadmat('../../../data/multiclass/usps.mat')
Xall = mat['data']
#normalize between 0-255
normalize = np.zeros((Xall.shape[1], Xall.shape[0],1), np.uint8)
normalize= cv2.normalize(Xall, normalize, 0, 255, cv2.NORM_MINMAX, cv2.CV_8UC1);
Xall = np.asmatrix(normalize, np.float64)

#load labels
Yall = np.array(mat['label'].squeeze(), dtype=np.double)
Yall = Yall - 1

#random position to train knn
random.seed(0)

subset = random.permutation(len(Yall))
test_images = Xall[:, subset[:5000]]
test_labels = Yall[subset[:5000]]

We initialize the knn, set up the features and configure the kind of distance that we want to use (in this case Euclidean distance).


In [ ]:
k=10

labels_numbers = MulticlassLabels(test_labels)
feats  = RealFeatures(test_images)
dist = EuclideanDistance()
knn = KNN(k, dist, labels_numbers)
_=knn.train(feats)

Now we have to take all the segmented numbers and returns what digit it is.


In [ ]:
sudoku_puzzle = np.zeros(shape=(9,9))

#Predict the number
for i in range(SUDOKU_SIZE):
    for j in range(SUDOKU_SIZE):         
        #has the image some active pixel?    
        n_active_pixels = cv2.countNonZero(np.asmatrix(sudoku[i*9+j, :]))
        if n_active_pixels>20 :
            feats_test  = RealFeatures( np.asmatrix(sudoku[i*9+j, :], np.float64).T)
            sudoku_puzzle[i,j] = knn.apply_multiclass(feats_test)[0]
        else:
            sudoku_puzzle[i,j] = 0
            
print sudoku_puzzle
#show image
_=pl.imshow(image_sudoku_original)
_=pl.axis("off")

The recognition isn't very accurate because the training data is for handwritten text.

The accuracy of the process could be improved. In this example we have used a holistic method, all the image is introduced in the classifier. But adding new features to the classifier, for example the number of holes, number of straight lines or length of the lines, could improve the accuracy of the classifier.

References