Approximating Pulses

In PVQ, codewords aren't taken from a codebook, they must follow a specific rule, their sum of absolute difference or L1-Norm must equal K

$$\mathbf{y} \in \mathbb{Z}^N : \sum_{i=0}^{N-1} \lvert \mathbf{y}_i \rvert = K\:.$$

As such, finding the codeword corresponding to a given vector is not trivial.

In this notebook, we will look into approximating pulses to see how accurately we can estimate how the energy is distributed from the left and right sides of the codeword. More precisely, we are interested in the absolute sum of the left half $K_{\text{Left}}$ and the absolute sum of the right half $K_{\text{Right}}$.

Basic Setup

First, we need to import a few things


In [1]:
%matplotlib inline
import csv
import numpy as np
import matplotlib.pyplot as plt

Loading data files

We extracted the vector coefficient right before PVQ and their corresponding Codewords in seqF10_coeffs4x4.csv and seqF10_pulses4x4.csv respectively. For now we only consider the first 15 AC coefficients.

We gathered data from the first 10 frames of the sequences of the video-1-short dataset

  • akiyo_cif
  • ducks_take_off
  • grandma
  • park
  • sign_irene
  • sintel_trailer_2k
  • soccer

In [2]:
def LoadData(coeffFile, pulseFile, kFile):
    _coeffs = np.genfromtxt(coeffFile, delimiter=',')
    _coeffs = _coeffs[:,0:15]
    _pulses = np.genfromtxt(pulseFile, delimiter=',')
    _pulses = _pulses[:,0:15]
    _ks = np.genfromtxt(kFile, delimiter=',')
    
    _numLines = len(_ks)
    # Sanity check that the pulses match with k
    for i in range(0, _numLines):
        assert(_ks[i] == sum(abs(_pulses[i,:])))
        return _coeffs, _pulses, _ks

[akiyoCoeffs, akiyoPulses, akiyoKs] = LoadData("data/akiyoF10_coeffs4x4.csv", "data/akiyoF10_pulses4x4.csv",
                                               "data/akiyoF10_k4x4.csv");

[ducksCoeffs, ducksPulses, ducksKs] = LoadData("data/ducksF10_coeffs4x4.csv", "data/ducksF10_pulses4x4.csv",
                                               "data/ducksF10_k4x4.csv");

[grandmaCoeffs, grandmaPulses, grandmaKs] = LoadData("data/grandmaF10_coeffs4x4.csv", "data/grandmaF10_pulses4x4.csv",
                                               "data/grandmaF10_k4x4.csv");

[parkCoeffs, parkPulses, parkKs] = LoadData("data/parkF10_coeffs4x4.csv", "data/parkF10_pulses4x4.csv",
                                            "data/parkF10_k4x4.csv");

[signCoeffs, signPulses, signKs] = LoadData("data/signF10_coeffs4x4.csv", "data/signF10_pulses4x4.csv",
                                            "data/signF10_k4x4.csv");

[sintelCoeffs, sintelPulses, sintelKs] = LoadData("data/sintelF10_coeffs4x4.csv", "data/sintelF10_pulses4x4.csv",
                                                  "data/sintelF10_k4x4.csv");

[soccerCoeffs, soccerPulses, soccerKs] = LoadData("data/soccerF10_coeffs4x4.csv", "data/soccerF10_pulses4x4.csv",
                                            "data/soccerF10_k4x4.csv");

Approximating Pulses

In PVQ, vectors are normalized, the resulting energy is referred to as the shape of the vector. By taking ratios of the energy, we can get a good approximation of where to spread the K values. We refer to these values as pulses.


In [4]:
def gain(v):
    return np.sqrt(np.dot(v,v))

def shape(v):
    return np.true_divide(v, gain(v))

def approximatePulses(_coeffs, _ks):
    _approxPulses = np.zeros(shape=_coeffs.shape)
    _roundedPulses = np.zeros(shape=_coeffs.shape)
    for i in range(0,len(_ks)):
        s = shape(_coeffs[i,:])
        _approxPulses[i] = np.true_divide(s,sum(np.abs(s)))*_ks[i]
    
    # Sanity check that the approximated pulses match with k     
    _numLines = len(_ks)
    for i in range(0, _numLines):
        assert(_ks[i] - sum(abs(_approxPulses[i,:])) < 0.0001)
        
    return _approxPulses

akiyoApproxPulses = np.round(approximatePulses(akiyoCoeffs, akiyoKs))
ducksApproxPulses = np.round(approximatePulses(ducksCoeffs, ducksKs))
grandmaApproxPulses = np.round(approximatePulses(grandmaCoeffs, grandmaKs))
parkApproxPulses = np.round(approximatePulses(parkCoeffs, parkKs))
signApproxPulses = np.round(approximatePulses(signCoeffs, signKs))
sintelApproxPulses = np.round(approximatePulses(sintelCoeffs, sintelKs))
soccerApproxPulses = np.round(approximatePulses(soccerCoeffs, soccerKs))

Note that with this approach we have fractional pulses. The absolute sum of these fractional pulses is K. However, if we round, floor or ceil the pulses, the absolute sum may no longer be K.

Based on this original estimate, the encoder will try to play with the pulses to obtain integer pulses and to improve the rate distortion ratio of the codeword.

Let's measure the mean absolute error between our approximated pulses and the pulses of the Daala encoder.


In [5]:
akiyoError = np.abs(akiyoPulses - akiyoApproxPulses).sum(axis=1)
ducksError = np.abs(ducksPulses - ducksApproxPulses).sum(axis=1)
grandmaError = np.abs(grandmaPulses - grandmaApproxPulses).sum(axis=1)
parkError = np.abs(parkPulses - parkApproxPulses).sum(axis=1)
signError = np.abs(signPulses - signApproxPulses).sum(axis=1)
sintelError = np.abs(sintelPulses - sintelApproxPulses).sum(axis=1)
soccerError = np.abs(soccerPulses - soccerApproxPulses).sum(axis=1)


print("Average Absolute Approximation Error (in pulses)")
print("Akiyo: %0.2f" % np.mean(akiyoError))
print("Ducks: %0.2f" % np.mean(ducksError))
print("Grandma: %0.2f" % np.mean(grandmaError))
print("Park: %0.2f" % np.mean(parkError))
print("Sign: %0.2f" % np.mean(signError))
print("Sintel: %0.2f" % np.mean(sintelError))
print("Soccer: %0.2f" % np.mean(soccerError))


Average Absolute Approximation Error (in pulses)
Akiyo: 1.27
Ducks: 1.64
Grandma: 1.31
Park: 1.52
Sign: 1.32
Sintel: 1.41
Soccer: 1.25

On average, our estimate is off by one or two pulses.

Error distribution

Now, let's look at how the error distributes over the codewords.


In [6]:
plt.figure(figsize=(16,20));
plt.subplot(4,2,1);
p = plt.stem(np.abs(akiyoPulses - akiyoApproxPulses).mean(axis=0));
plt.ylabel("Mean Absolute Error(in pulses)");
plt.xlabel("Codeword index");
plt.title("Akiyo");
plt.gca().set_xlim([-0.5, 14.5])

plt.subplot(4,2,2);
plt.stem(np.abs(ducksPulses - ducksApproxPulses).mean(axis=0));
plt.ylabel("Mean Absolute Error(in pulses)");
plt.xlabel("Codeword index");
plt.title("Ducks");
plt.gca().set_xlim([-0.5, 14.5]);

plt.subplot(4,2,3);
plt.stem(np.abs(grandmaPulses - grandmaApproxPulses).mean(axis=0));
plt.ylabel("Mean Absolute Error(in pulses)");
plt.xlabel("Codeword index");
plt.title("Grandma");
plt.gca().set_xlim([-0.5, 14.5]);

plt.subplot(4,2,4);
plt.stem(np.abs(parkPulses - parkApproxPulses).mean(axis=0));
plt.ylabel("Mean Absolute Error(in pulses)")
plt.xlabel("Codeword index");
plt.title("Park");
plt.gca().set_xlim([-0.5, 14.5]);

plt.subplot(4,2,5)
plt.stem(np.abs(signPulses - signApproxPulses).mean(axis=0));
plt.ylabel("Mean Absolute Error(in pulses)");
plt.xlabel("Codeword index");
plt.title("Sign");
plt.gca().set_xlim([-0.5, 14.5]);

plt.subplot(4,2,6);
plt.stem(np.abs(sintelPulses - sintelApproxPulses).mean(axis=0));
plt.ylabel("Mean Absolute Error(in pulses)");
plt.xlabel("Codeword index");
plt.title("Sintel");
plt.gca().set_xlim([-0.5, 14.5]);

plt.subplot(4,2,7);
plt.stem(np.abs(soccerPulses - soccerApproxPulses).mean(axis=0));
plt.ylabel("Mean Absolute Error(in pulses)");
plt.xlabel("Codeword index");
plt.title("Soccer");
plt.gca().set_xlim([-0.5, 14.5]);


It would appear the daala's PVQ encoder is left biased, as most of the error is in the left part.

Negative vs Positive bias

Next, let's do the same test but this time without the absolute operator. This way, we can see if there's a positive or negative bias for the codeword indices.


In [7]:
plt.figure(figsize=(16,20));
plt.subplot(4,2,1);
p = plt.stem((akiyoPulses - akiyoApproxPulses).mean(axis=0));
plt.ylabel("Mean Error(in pulses)");
plt.xlabel("Codeword index");
plt.title("Akiyo");
plt.gca().set_xlim([-0.5, 14.5])

plt.subplot(4,2,2);
plt.stem((ducksPulses - ducksApproxPulses).mean(axis=0));
plt.ylabel("Mean Error(in pulses)");
plt.xlabel("Codeword index");
plt.title("Ducks");
plt.gca().set_xlim([-0.5, 14.5]);

plt.subplot(4,2,3);
plt.stem((grandmaPulses - grandmaApproxPulses).mean(axis=0));
plt.ylabel("Mean Error(in pulses)");
plt.xlabel("Codeword index");
plt.title("Grandma");
plt.gca().set_xlim([-0.5, 14.5]);

plt.subplot(4,2,4);
plt.stem((parkPulses - parkApproxPulses).mean(axis=0));
plt.ylabel("Mean Error(in pulses)")
plt.xlabel("Codeword index");
plt.title("Park");
plt.gca().set_xlim([-0.5, 14.5]);

plt.subplot(4,2,5)
plt.stem((signPulses - signApproxPulses).mean(axis=0));
plt.ylabel("Mean Error(in pulses)");
plt.xlabel("Codeword index");
plt.title("Sign");
plt.gca().set_xlim([-0.5, 14.5]);

plt.subplot(4,2,6);
plt.stem((sintelPulses - sintelApproxPulses).mean(axis=0));
plt.ylabel("Mean Error(in pulses)");
plt.xlabel("Codeword index");
plt.title("Sintel");
plt.gca().set_xlim([-0.5, 14.5]);

plt.subplot(4,2,7);
plt.stem((soccerPulses - soccerApproxPulses).mean(axis=0));
plt.ylabel("Mean Error(in pulses)");
plt.xlabel("Codeword index");
plt.title("Soccer");
plt.gca().set_xlim([-0.5, 14.5]);


Although the left bias is present in every sequence of the dataset, the negative vs positive bias appears to be content specific.

Estimating $K_{\text{Left}}$

Now on to the main event, predicting the value of K for the left half.


In [8]:
akiyoLeftError = np.abs(akiyoPulses[:,0:7]).sum(axis=1) - np.abs(akiyoApproxPulses[:,0:7]).sum(axis=1)
ducksLeftError = np.abs(ducksPulses[:,0:7]).sum(axis=1) - np.abs(ducksApproxPulses[:,0:7]).sum(axis=1)
grandmaLeftError = np.abs(grandmaPulses[:,0:7]).sum(axis=1) - np.abs(grandmaApproxPulses[:,0:7]).sum(axis=1)
parkLeftError = np.abs(parkPulses[:,0:7]).sum(axis=1) - np.abs(parkApproxPulses[:,0:7]).sum(axis=1)
signLeftError = np.abs(signPulses[:,0:7]).sum(axis=1) - np.abs(signApproxPulses[:,0:7]).sum(axis=1)
sintelLeftError = np.abs(sintelPulses[:,0:7]).sum(axis=1) - np.abs(sintelApproxPulses[:,0:7]).sum(axis=1)
soccerLeftError = np.abs(soccerPulses[:,0:7]).sum(axis=1) - np.abs(soccerApproxPulses[:,0:7]).sum(axis=1)

print("Average Absolute Approximation Error Left (in pulses)")
print("Akiyo: %0.2f" % np.mean(np.abs(akiyoLeftError)))
print("Ducks: %0.2f" % np.mean(np.abs(ducksLeftError)))
print("Grandma: %0.2f" % np.mean(np.abs(grandmaLeftError)))
print("Park: %0.2f" % np.mean(np.abs(parkLeftError)))
print("Sign: %0.2f" % np.mean(np.abs(signLeftError)))
print("Sintel: %0.2f" % np.mean(np.abs(sintelLeftError)))
print("Soccer: %0.2f" % np.mean(np.abs(soccerLeftError)))


Average Absolute Approximation Error Left (in pulses)
Akiyo: 1.06
Ducks: 1.38
Grandma: 1.14
Park: 1.28
Sign: 1.16
Sintel: 1.23
Soccer: 1.04

Estimating $K_{\text{Right}}$

Let's see if we can do a better job with $K_{\text{Right}}$, remember that most of the error is located in the left part of the codeword.


In [9]:
akiyoRightError = np.abs(akiyoPulses[:,7:15]).sum(axis=1) - np.abs(akiyoApproxPulses[:,7:15]).sum(axis=1)
ducksRightError = np.abs(ducksPulses[:,7:15]).sum(axis=1) - np.abs(ducksApproxPulses[:,7:15]).sum(axis=1)
grandmaRightError = np.abs(grandmaPulses[:,7:15]).sum(axis=1) - np.abs(grandmaApproxPulses[:,7:15]).sum(axis=1)
parkRightError = np.abs(parkPulses[:,7:15]).sum(axis=1) - np.abs(parkApproxPulses[:,7:15]).sum(axis=1)
signRightError = np.abs(signPulses[:,7:15]).sum(axis=1) - np.abs(signApproxPulses[:,7:15]).sum(axis=1)
sintelRightError = np.abs(sintelPulses[:,7:15]).sum(axis=1) - np.abs(sintelApproxPulses[:,7:15]).sum(axis=1)
soccerRightError = np.abs(soccerPulses[:,7:15]).sum(axis=1) - np.abs(soccerApproxPulses[:,7:15]).sum(axis=1)

print("Average Absolute Approximation Error Right (in pulses)")
print("Akiyo: %0.2f" % np.mean(np.abs(akiyoRightError)))
print("Ducks: %0.2f" % np.mean(np.abs(ducksRightError)))
print("Grandma: %0.2f" % np.mean(np.abs(grandmaRightError)))
print("Park: %0.2f" % np.mean(np.abs(parkRightError)))
print("Sign: %0.2f" % np.mean(np.abs(signRightError)))
print("Sintel: %0.2f" % np.mean(np.abs(sintelRightError)))
print("Soccer: %0.2f" % np.mean(np.abs(soccerRightError)))


Average Absolute Approximation Error Right (in pulses)
Akiyo: 0.15
Ducks: 0.16
Grandma: 0.12
Park: 0.18
Sign: 0.11
Sintel: 0.15
Soccer: 0.11

It appears we are much better at predicting $K_{\text{Right}}$ than we are for $K_{\text{Left}}$.

The Glass is not half empty

Turns out that predicting the left half is tricky, because the codewords vary from our estimates based on elements specific to the content. However, the right half is much more stable, and since we know K, we can use the complement to estimate $K_{\text{left}}$


In [10]:
akiyoLeftErrorC = np.abs(akiyoPulses[:,0:7]).sum(axis=1) - (akiyoKs - np.abs(akiyoApproxPulses[:,7:15]).sum(axis=1))
ducksLeftErrorC = np.abs(ducksPulses[:,0:7]).sum(axis=1) - (ducksKs - np.abs(ducksApproxPulses[:,7:15]).sum(axis=1))
grandmaLeftErrorC = np.abs(grandmaPulses[:,0:7]).sum(axis=1) - (grandmaKs - np.abs(grandmaApproxPulses[:,7:15]).sum(axis=1))
parkLeftErrorC = np.abs(parkPulses[:,0:7]).sum(axis=1) - (parkKs - np.abs(parkApproxPulses[:,7:15]).sum(axis=1))
signLeftErrorC = np.abs(signPulses[:,0:7]).sum(axis=1) - (signKs - np.abs(signApproxPulses[:,7:15]).sum(axis=1))
sintelLeftErrorC = np.abs(sintelPulses[:,0:7]).sum(axis=1) - (sintelKs - np.abs(sintelApproxPulses[:,7:15]).sum(axis=1))
soccerLeftErrorC = np.abs(soccerPulses[:,0:7]).sum(axis=1) - (soccerKs - np.abs(soccerApproxPulses[:,7:15]).sum(axis=1))

print("Average Absolute Approximation Error Left (in pulses)")
print("Akiyo: %0.2f" % np.mean(np.abs(akiyoLeftErrorC)))
print("Ducks: %0.2f" % np.mean(np.abs(ducksLeftErrorC)))
print("Grandma: %0.2f" % np.mean(np.abs(grandmaLeftErrorC)))
print("Park: %0.2f" % np.mean(np.abs(parkLeftErrorC)))
print("Sign: %0.2f" % np.mean(np.abs(signLeftErrorC)))
print("Sintel: %0.2f" % np.mean(np.abs(sintelLeftErrorC)))
print("Soccer: %0.2f" % np.mean(np.abs(soccerLeftErrorC)))


Average Absolute Approximation Error Left (in pulses)
Akiyo: 0.15
Ducks: 0.16
Grandma: 0.12
Park: 0.18
Sign: 0.11
Sintel: 0.15
Soccer: 0.11

Not sure, why is KLeft Complement better than KLeft

Knowing that our predictions for the right half are better, its statistically correct to trust them more. As such, we discard our statistically imprecise estimate of KLeft and use the complement as shown in the example. It is not perfect but it is more precise. In this extreme example, we go from an error of -3 to +2. In most cases however, the error goes from 1 to 0.


In [13]:
idx = 5833;
print("Akiyo pulses: %s" % akiyoPulses[idx,:])
KLeft = np.abs(akiyoPulses[idx,0:7]).sum()
KRight = np.abs(akiyoPulses[idx,7:15]).sum()
print("KLeft = %d KRight = %d " % (KLeft, KRight))
print("Akiyo pulses approximation: %s" % akiyoApproxPulses[idx,:])
K = np.abs(akiyoPulses[idx,:]).sum()
KLeftApprox = np.abs(akiyoApproxPulses[idx, 0:7]).sum()
KRightApprox = np.abs(akiyoApproxPulses[idx, 7:15]).sum()
print("KLeftApprox = %d (%d) KRightApprox = %d " % (KLeftApprox, KLeftApprox - KLeft, KRightApprox))
print("KLeftComplement (%d - %d) = %d (%+d)" % (K, KRightApprox, K-KRightApprox, K-KRightApprox-KLeft))


Akiyo pulses: [-29.  -6.  -3.  17.   1.   4.  -1.  -2.   1.  -2.   0.   0.   1.   0.   1.]
KLeft = 61 KRight = 7 
Akiyo pulses approximation: [-29.  -5.  -3.  17.   0.   3.  -1.  -1.   0.  -2.   0.  -0.   1.  -0.   1.]
KLeftApprox = 58 (-3) KRightApprox = 5 
KLeftComplement (68 - 5) = 63 (+2)
KLeft estimate vs KLeft Complement estimate

Our complement based estimates are much more accurate, this is confirmed by looking at the histogram of the error:


In [14]:
plt.figure(figsize=(16,20));
plt.subplot(4,2,1);
plt.hist([akiyoLeftError, akiyoLeftErrorC]);
plt.xlabel("Approx Error (in pulses)");
plt.ylabel("Occurences");
plt.title("Akiyo")
plt.legend(["Left Error","Left Complement Error"])

plt.subplot(4,2,2);
plt.hist([ducksLeftError, ducksLeftErrorC]);
plt.xlabel("Approx Error (in pulses)");
plt.ylabel("Occurences");
plt.title("Ducks")
plt.legend(["Left Error","Left Complement Error"])

plt.subplot(4,2,3);
plt.hist([grandmaLeftError, grandmaLeftErrorC]);
plt.xlabel("Approx Error (in pulses)");
plt.ylabel("Occurences");
plt.title("Grandma")
plt.legend(["Left Error","Left Complement Error"])

plt.subplot(4,2,4);
plt.hist([parkLeftError, parkLeftErrorC]);
plt.xlabel("Approx Error (in pulses)");
plt.ylabel("Occurences");
plt.title("Park")
plt.legend(["Left Error","Left Complement Error"])

plt.subplot(4,2,5);
plt.hist([signLeftError, signLeftErrorC]);
plt.xlabel("Approx Error (in pulses)");
plt.ylabel("Occurences");
plt.title("Sign")
plt.legend(["Left Error","Left Complement Error"])

plt.subplot(4,2,6);
plt.hist([sintelLeftError, sintelLeftErrorC]);
plt.xlabel("Approx Error (in pulses)");
plt.ylabel("Occurences");
plt.title("Sintel")
plt.legend(["Left Error","Left Complement Error"])

plt.subplot(4,2,7);
plt.hist([soccerLeftError, soccerLeftErrorC]);
plt.xlabel("Approx Error (in pulses)");
plt.ylabel("Occurences");
plt.title("Soccer")
plt.legend(["Left Error","Left Complement Error"]);


Questions:

"Given a starting pulse allocation based on L1 energy, what's the range of K values I need to consider on each side to ensure I get the optimal some (large) percentage of the time?"


In [15]:
print("Percentage of time KLeft estimate +-x is not optimal for x=0, x=1, x=2")

def checkCorrect(absErr):
    numElements = absErr.shape
    return (np.array([(absErr != 0).sum(), (absErr > 1).sum(), (absErr > 2).sum()]) / numElements * 100)

print("AKiyo: %s" % (checkCorrect(np.abs(akiyoLeftErrorC))))
print("Ducks: %s" % (checkCorrect(np.abs(ducksLeftErrorC))))
print("Grandma: %s" % (checkCorrect(np.abs(grandmaLeftErrorC))))
print("Park: %s" % (checkCorrect(np.abs(parkLeftErrorC))))
print("Sign: %s" % (checkCorrect(np.abs(signLeftErrorC))))
print("Sintel: %s" % (checkCorrect(np.abs(sintelLeftErrorC))))
print("Soccer: %s" % (checkCorrect(np.abs(soccerLeftErrorC))))


Percentage of time KLeft estimate +-x is not optimal for x=0, x=1, x=2
AKiyo: [ 13.59014075   1.51525355   0.14815813]
Ducks: [ 12.51030144   2.48080243   0.53526036]
Grandma: [ 10.90367829   1.2361875    0.17659821]
Park: [ 15.67943539   2.04988801   0.28243072]
Sign: [ 10.05801749   1.04133953   0.10082811]
Sintel: [ 13.41829398   1.19625303   0.09560919]
Soccer: [ 9.97834183  1.0293624   0.07752186]

In [336]:
print("Percentage of time KRight estimate +-x is not optimal for x=0, x=1, x=2")

def checkCorrect(absErr):
    numElements = absErr.shape
    return (np.array([(absErr != 0).sum(), (absErr > 1).sum(), (absErr > 2).sum()]) / numElements * 100)

print("AKiyo: %s" % (checkCorrect(np.abs(akiyoRightError))))
print("Ducks: %s" % (checkCorrect(np.abs(ducksRightError))))
print("Grandma: %s" % (checkCorrect(np.abs(grandmaRightError))))
print("Park: %s" % (checkCorrect(np.abs(parkRightError))))
print("Sign: %s" % (checkCorrect(np.abs(signRightError))))
print("Sintel: %s" % (checkCorrect(np.abs(sintelRightError))))
print("Soccer: %s" % (checkCorrect(np.abs(soccerRightError))))


Percentage of time KRight estimate +-x is not optimal for x=0, x=1, x=2
AKiyo: [ 13.59014075   1.51525355   0.14815813]
Ducks: [ 12.51030144   2.48080243   0.53526036]
Grandma: [ 10.90367829   1.2361875    0.17659821]
Park: [ 15.67943539   2.04988801   0.28243072]
Sign: [ 10.05801749   1.04133953   0.10082811]
Sintel: [ 13.41829398   1.19625303   0.09560919]
Soccer: [ 9.97834183  1.0293624   0.07752186]

Conclusion

In this notebook, we looked at the accuracy of estimating the energy of the pulses based on the energy of the coefficients. From our dataset, we concluded, that the left half of the coefficient seem to be the hardest to estimate and that by estimating the right half and using the complement we obtained better results.


In [ ]: