Feature extraction

1. Prerequisites:

  • Download set of coverimages and put to the ad_cover directory which is in the same folder where current notebook is;
  • Create ad_stego directory in the same folder where current notebook is.

2. Prepare stegoimages using nsF5 steganographic system

Note: here matlab script code is used as it is wide-used tool in steganography and steganalysis for prototyping due to existence of convenient library for operating with JPG images and rich set of examples


In [ ]:
% ========================================================================
COVER_DIR = 'ad_cover/';
STEGO_DIR = 'ad_stego/';
% ========================================================================
files = dir(strcat(COVER_DIR, '*.jpg'));
% Take first 1000 pictures from the dataset
pic_num = 1000;
total_payload = 20000;
embed_rate = 0.05;
for k=1:pic_num
    filename = strcat(COVER_DIR, files(k).name);
    [filepath, name, ext] = fileparts(filename);
    dct_coeff = jpeg_read(filename);
    stego_coeff = nsf5_simulation_cste_payload_color(dct_coeff , total_payload, embed_rate, cputime);
    jpeg_write(stego_coeff, [STEGO_DIR name '.jpg']);
    disp(name);
end

3. Extract features using intrablock and interblock correlation

This method in details is well explained in original article. Here we have the main steps:

  1. Extract intrablock features (transitional probabilities for vertical, horizontal, diagonal and minor diagonal shifts in difference matrix):
\begin{equation*} N_{feat\_intra} = TPM(V) + TPM(H) + TPM(MD) + TPM(mD) = 334,\ T = 4 \end{equation*}
  1. Extract interblock features (transitional probabilities for vertical and horizontal shifts in difference mode averaged matrix):
\begin{equation*} N_{feat\_inter} = TPM(V) + TPM(H) = 162,\ T = 4 \\ N_{feat} = N_{feat\_intra} + N_{feat\_inter} = 486 \end{equation*}

In the code below we use function chen486 which takes jpg filename and returns set of features as one-dim vector


In [ ]:
STEGO_DIR = 'ad_stego/';
COVER_DIR = 'ad_cover/';
DATASET = 'images_db_chen.csv';
START_PIC = 1;
END_PIC = 1000;
% #############################
% ########### Prepare dataset #
% #############################
n = 486;
heads = strings([1, n+2]);
heads(1)  = "filename";
for k = 1:n
    heads(k+1) = strcat("feature_", int2str(k));
end
heads(n+2) = "embedded";
% ###################################
% ########### Write down the header #
% ###################################
fid = fopen(DATASET, 'wt');
for i=1:n+1
    fprintf(fid, '%s,', heads(i));
end
fprintf(fid, '%s\n', heads(n+2));
% ########################################
% ########### Scan directory, pick file, #
% ########### extract features           #
% ########### and put to the dataset     #
% ########################################
embedded = 0;
files = dir(strcat(COVER_DIR, '*.jpg'));
for k=START_PIC:END_PIC
    filename = strcat(COVER_DIR, files(k).name);
    [filepath, name, ext] = fileparts(filename);
    features = transpose(chen486(filename));
    fprintf(fid, '%s,', [name ext]);
    for k = 1:n
        fprintf(fid, '%.4f,', features(k));
    end
    fprintf(fid, '%d\n', embedded);
    disp(name);
end
embedded = 1;
files = dir(strcat(STEGO_DIR, '*.jpg'));
for k=START_PIC:END_PIC
    filename = strcat(STEGO_DIR, files(k).name);
    [filepath, name, ext] = fileparts(filename);
    features = transpose(chen486(filename));
    fprintf(fid, '%s,', [name ext]);
    for k = 1:n
        fprintf(fid, '%.4f,', features(k));
    end
    fprintf(fid, '%d\n', embedded);
    disp(name);
end

4. Extract features using intrablock and interblock correlation with Cartesian calibration

Another approach was proposed in the article to improve practical steganalysis. Basically, the idea of calibration is to extract cover image features proposed earlier out of stegoimage. Calibration starts with a JPEG image J1 under investigation, decompresses it into the spatial domain using inverse DCT, crops by four pixels in both directions, and recompresses the cropped image using the quantization matrix of J1. As a result, a different JPEG image, J2, is obtained (ref. image).
So, the code below calls ccchen972 for each cover image


In [ ]:
STEGO_DIR = 'ad_stego/';
COVER_DIR = 'ad_cover/';
DATASET = 'images_db_ccchen.csv';
START_PIC = 1;
END_PIC = 1000;
% #############################
% ########### Prepare dataset #
% #############################
n = 972;
heads = strings([1, n+2]);
heads(1)  = "filename";
for k = 1:n
    heads(k+1) = strcat("feature_", int2str(k));
end
heads(n+2) = "embedded";
% ###################################
% ########### Write down the header #
% ###################################
fid = fopen(DATASET, 'wt');
for i=1:n+1
    fprintf(fid, '%s,', heads(i));
end
fprintf(fid, '%s\n', heads(n+2));
% ########################################
% ########### Scan directory, pick file, #
% ########### extract features           #
% ########### and put to the dataset     #
% ########################################
embedded = 0;
files = dir(strcat(COVER_DIR, '*.jpg'));
for k=START_PIC:END_PIC
    filename = strcat(COVER_DIR, files(k).name);
    [filepath, name, ext] = fileparts(filename);
    features = transpose(ccchen972(filename, 75));
    fprintf(fid, '%s,', [name ext]);
    for k = 1:n
        fprintf(fid, '%.4f,', features(k));
    end
    fprintf(fid, '%d\n', embedded);
    disp(name);
end
embedded = 1;
files = dir(strcat(STEGO_DIR, '*.jpg'));
for k=START_PIC:END_PIC
    filename = strcat(STEGO_DIR, files(k).name);
    [filepath, name, ext] = fileparts(filename);
    features = transpose(ccchen972(filename, 75));
    fprintf(fid, '%s,', [name ext]);
    for k = 1:n
        fprintf(fid, '%.4f,', features(k));
    end
    fprintf(fid, '%d\n', embedded);
    disp(name);
end

5. Extract features using Cartesian Calibrated JPEG domain rich model

The state of the art approach for steganalysis was proposed in the article and is called rich model. The model consists of several qualitatively different parts. First, in the lines of our CF∗ features, individual DCT modes are modeled separately, so collect many of these submodels and put them together. They will be naturally diverse since they capture dependencies among different DCT coefficients. The second part of the proposed JRM is formed as integral statistics from the whole DCT plane. The increased statistical power enables one to extend the range of co-occurrence features and therefore cover a different spectrum of dependencies than the mode-specific features from the first part. The features of both parts are further diversified by modeling not only DCT coefficients themselves, but also their differences calculated in different directions.

Here we utilize ccJRM routine to calculate features proposed


In [ ]:
STEGO_DIR = 'ad_stego/';
COVER_DIR = 'ad_cover/';
DATASET = 'images_db_ccjrm.csv';
START_PIC = 1;
END_PIC = 1000;
% #############################
% ########### Prepare dataset #
% #############################
n = 22510;
heads = strings([1, n+2]);
heads(1)  = "filename";
for k = 1:n
    heads(k+1) = strcat("feature_", int2str(k));
end
heads(n+2) = "embedded";
% ###################################
% ########### Write down the header #
% ###################################
fid = fopen(DATASET, 'wt');
for i=1:n+1
    fprintf(fid, '%s,', heads(i));
end
fprintf(fid, '%s\n', heads(n+2));
% ########################################
% ########### Scan directory, pick file, #
% ########### extract features           #
% ########### and put to the dataset     #
% ########################################
embedded = 0;
files = dir(strcat(COVER_DIR, '*.jpg'));
for k=START_PIC:END_PIC
    filename = strcat(COVER_DIR, files(k).name);
    [filepath, name, ext] = fileparts(filename);
    C = struct2cell([ccJRM(filename, 75)]);
    features = transpose(cat(1, C{:}));
    fprintf(fid, '%s,', [name ext]);
    for k = 1:n
        fprintf(fid, '%.4f,', features(k));
    end
    fprintf(fid, '%d\n', embedded);
    disp(name);
end
embedded = 1;
files = dir(strcat(STEGO_DIR, '*.jpg'));
for k=START_PIC:END_PIC
    filename = strcat(STEGO_DIR, files(k).name);
    [filepath, name, ext] = fileparts(filename);
    C = struct2cell([ccJRM(filename, 75)]);
    features = transpose(cat(1, C{:}));
    fprintf(fid, '%s,', [name ext]);
    for k = 1:n
        fprintf(fid, '%.4f,', features(k));
    end
    fprintf(fid, '%d\n', embedded);
    disp(name);
end