In [ ]:

    
#%autosave 0
from IPython.core.display import HTML, display
display(HTML('<style>.container { width:100%; !important } </style>'))

Object Identification in Images

We use matplotlib to read and show images.



In [ ]:

    
import matplotlib.image  as img
import matplotlib.pyplot as plt

The image of the car is taken from here: https://commons.wikimedia.org/w/index.php?curid=32044247. For performance reasons I have rescaled the image.

If you want to try this on your own machine, you have to install the Python image procesing library which is known as pillow.



In [ ]:

    
image = img.imread('Resources/kaefer.jpg')

The image has a height of 959 pixels and a width of 1439 pixels:



In [ ]:

    
image.shape

We store width and height in the variables mand n.



In [ ]:

    
m, n, _ = image.shape



In [ ]:

    
plt.rcParams["figure.figsize"] = (15,22)



In [ ]:

    
plt.imshow(image)

We use the union-find data structure implemented earlier.



In [ ]:

    
import union_find_oo as uf

The list M contains all coordinates $(x, y)$ of our image. This list has over a million coordinates.



In [ ]:

    
M = [ (x, y) for x in range(m) for y in range(n) ]
len(M)



In [ ]:

    
UF = uf.UnionFind(M)

The color of a pixel is a vector with three components. These components specify the red, green, and blue value of the pixel. The function $\texttt{distance}(X, Y)$ takes two pixel-vectors $X$ and $Y$ and computes a value that measures how much these pixels differ.



In [ ]:

    
def distance(X, Y):
    return abs(int(X[0]) - int(Y[0])) + \
           abs(int(X[1]) - int(Y[1])) + \
           abs(int(X[2]) - int(Y[2]))

Neighbouring pixels that differ by less than threshold will be considered equivalent, i.e. we assume that they are part of the same component.



In [ ]:

    
threshold = 16

The function $\texttt{simlar}(p_1, p_2)$ takes two coordinate pairs $p_1 = (x_1,y_1)$ and $p_2 = (x_2,y_2)$ and checks, whether the pixels at these coordinates are equivalent.



In [ ]:

    
def similar(p1, p2):
    x1, y1 = p1
    x2, y2 = p2
    X = image[x1, y1]
    Y = image[x2, y2]
    return distance(X, Y) < threshold

We iterate over all pixels and for every pixel over all of its neighbouring pixels. If a pixel is similar to a neighbouring pixel, we declare these pixels as equivalent in our union-find data structure UF.



In [ ]:

    
%%time
for x in range(m-1):
    if x % 10 == 0:
        print('.', end='')
    for y in range(n-1):
        if y > 0:
            L = [(1,0), (0,1), (1,1), (1, -1)]
        else:
            L = [(1,0), (0,1), (1,1)]
        for (a,b) in L:
            p1 = (x    , y    )
            p2 = (x + a, y + b)
            if similar(p1, p2):
                UF.union(p1, p2)
print()

Given a coordinate pair $p = (x,y)$, the function $\texttt{extract_component}(p)$ extracts the set of all coordinates that point to equivalent pixels. It returns a pair $(c, i)$ where $c$ is the number of equivalent pixels and $i$ is an image that only contains the pixels equivalent to the pixel at $(x,y)$.



In [ ]:

    
def extract_component(p):
    x, y      = p
    new_image = image.copy()
    m, n, _   = image.shape
    root_xy   = UF.find((x, y))
    count     = 0
    mean      = [0, 0, 0]
    for a in range(0, m):
        for b in range(0, n):
            root_ab = UF.find((a, b))
            if root_xy != root_ab:
                for c in range(3):
                    new_image[a, b, c] = 255
            else:
                for c in range(3):
                    mean[c] += image[a, b, c]
                count += 1
    for c in range(3):
        mean[c] /= count
    print(f'component at {p} has {count} pixels')
    return (count, new_image, mean)

The function all_components computes all those components that have more than $2000$ pixels.



In [ ]:

    
def all_components():
    Roots = set()
    for x in range(0, m):
        for y in range(0, n):
            root = UF.find((x, y))
            if UF.mSize[root] > 2000 and not root in Roots:
                print(UF.mHeight[root])
                Roots.add(root)
    return Roots



In [ ]:

    
Roots = all_components()
len(Roots)

We have found $28$ different components that have at least 2000 pixels. Let us extract the corresponding images.



In [ ]:

    
Mean = {}
for r in Roots:
    size, component, mean = extract_component(r)
    Mean[r] = mean

Those images that have a similar mean will be unified.



In [ ]:

    
for r1 in Roots:
    for r2 in Roots:
        if distance(Mean[r1], Mean[r2]) < 30:
            UF.union(r1, r2)



In [ ]:

    
Roots = all_components()
len(Roots)

Now we are left with $9$ components. Lets view them.



In [ ]:

    
Mean = {}
for r in Roots:
    size, component, mean = extract_component(r)
    Mean[r] = mean
    plt.imshow(component)
    plt.show()

Since the picture has been taken in bright sunlight, the illumination is inhomogeneous: The front part of the car is much brighter than the sides. Hence we have to join these parts manually.



In [ ]:

    
Parts = [(465, 499), (656, 163), (691, 1132), (267, 434), (736, 247)]
for p in Parts[1:]:
    UF.union(p, Parts[0])
_, car, _ = extract_component(Parts[0])
plt.imshow(car)



In [ ]: