2020-11-22

Digital Image Processing: Histograms and Frequency Domain Filtering

Starry, starry night
Paint your palette blue and grey
Look out on a summer’s day
With eyes that know the darkness in my soul
Shadows on the hills
Sketch the trees and the daffodils
Catch the breeze and the winter chills
In colors on the snowy linen land

Click here to download the code and sample images

1. Code Structure and Usage Instructions

1.1 Code Structure

The program written for this experiment is a command-line executable program in C++ for the Linux environment, rather than the long-outdated Windows MFC program mentioned in the PPT.

  • ps.cpp: Main program code (our own PhotoShop)
  • bmp.cpp and bmp.hpp: Encapsulate BMP image processing operations
  • color.cpp and color.hpp: Encapsulate color space operations
  • fft.cpp and fft.hpp: Fast Fourier Transform (FFT) and frequency domain filtering operations for images
  • utils.cpp and utils.hpp: General utility functions

1.2 Compilation

Compile using the pre-written Makefile, which generates the ps binary file after compilation. Input the paths of input and output files, as well as parameters for processing the file, via the command line.

Since this experiment uses the fftw3 Fast Fourier Transform library, ensure the library is installed before compilation:

sudo apt install libfftw3-dev

When compiling afterward, use the following parameters for linking:

-lfftw3 -lm

This has already been implemented in the Makefile.

1.3 Usage Instructions

This program reads BMP files by specifying paths via the command line, and can perform corresponding image processing with specified parameters. Parameters can be stacked, in which case the processing operations will be executed sequentially. The usage instructions are as follows:

Usage:          ./ps input_img [-d|-l|-h|-e|-n|-b|-o]
Parameters:
-d              Show DFT Transform spectrum
-l filter_size  Gaussian low pass filter
-h filter_size  Gaussian high pass filter
-e              Equalization histogram of input image
-n ref_img      Normalization histogram of input image with reference image
-b threshold    Binarize image into 2-color with given threshold
-o output_img   Output image

For example, to perform low-pass filtering on the image test.bmp, then equalize it, and output to test_out.bmp, use:

./ps test.bmp -l 20 -e -o test_out.bmp

2. BMP Image Format and Reading/Writing

The BMP class is defined in bmp.hpp:

// BMP Read & Write & Manipulate Class
class BMP
{
private:
    Header header;  // BMP Header
    Color *px;      // pixels
    Color *colors;  // color palette
    int colors_num; // size of color palette
    int padding;    // padding bytes of each line
    int *his;       // V channel histogram

    Color get_color(int index);
    int find_color(Color &px);
    void read1(FILE *fp);
    void read4(FILE *fp);
    void read8(FILE *fp);
    void read24(FILE *fp);
    void write1(FILE *fp);
    void write4(FILE *fp);
    void write8(FILE *fp);
    void write24(FILE *fp);
    void to1();
    void to8();
    void to24();
    void hsv();
    void rgb();

public:
    BMP(const char *filename);
    ~BMP();
    void info();
    void dft();
    void low_pass_filter(double filter_size);
    void high_pass_filter(double filter_size);
    void equalize();
    void normalize(int *his);
    void binarize(int threshold);
    int *histogram();
    void write(const char *filename);
};

Subsequent BMP file processing is carried out around this class, where:

  • header: Stores header information
  • px: Stores pixel data
  • colors: Stores color palette information
  • colors_num: Number of color palette entries
  • padding: Number of padding bytes for 4-byte alignment per line (mentioned later)
  • his: Stores histogram information

Subsequent functions and variables will be explained or reflected in comments, and will not be repeated here.

2.1 BMP File Header Format

The Header structure is defined in bmp.hpp, with 1-byte alignment. The cstdint library is used to ensure consistent integer bit widths:

#pragma once
#pragma pack(push, 1)
// BMP File Header
struct Header
{
    std::uint16_t bmp;              // Magic number for file
    std::uint32_t file_size;        // Size of file
    std::uint16_t reserved_1;       // Reserved
    std::uint16_t reserved_2;       // Reserved
    std::uint32_t offset;           // Offset to bitmap data
    std::uint32_t header_size;      // Size of info header
    std::uint32_t width;            // Width of image
    std::uint32_t height;           // Height of image
    std::uint16_t planes;           // Number of color planes
    std::uint16_t depth;            // Number of bits per pixel
    std::uint32_t compression;      // Type of compression to use
    std::uint32_t image_size;       // Size of image data
    std::uint32_t ppm_x;            // X pixels per meter
    std::uint32_t ppm_y;            // Y pixels per meter
    std::uint32_t colors;           // Number of colors used
    std::uint32_t colors_important; // Number of important colors
};
#pragma pack(pop)

Key points to note are:

  • offset: Offset from the start of the file to the bitmap data area
  • width: Image width
  • height: Image height
  • depth: Bit depth (number of bits per pixel), which determines the type of image: 1 for 2-color images, 4 for 16-color images, 8 for 256-color images, and 24 for true-color images
  • colors: Number of colors in the palette. In most cases, this value is set to 0, and the actual number of palette entries is determined by the bit depth:
    • A default of 2 palette entries for a bit depth of 1
    • A default of 16 palette entries for a bit depth of 4
    • A default of 256 palette entries for a bit depth of 8
    • No palette for a bit depth of 24

2.2 BMP Palette Area Format

Each palette entry consists of 4 bytes, in the order of R, G, B, and Reserved. Adjacent palette entries are contiguous. The Color structure can be defined in color.hpp:

// Color Palette
struct Color
{
    bool type = false; // false for HSV, true for RGB
    std::uint8_t r;    // Red
    std::uint8_t g;    // Green
    std::uint8_t b;    // Blue
    std::uint8_t a;    // Reserved
    void hsv();
    void rgb();
};

Considering subsequent color space conversion, two conversion functions and a type identifier are provided here. Therefore, creating a Color array can represent the color palette. Additionally, two functions (get_color and find_color) that map indices in the palette array to the corresponding Color structures are written for subsequent file reading and writing:

// Get a color from color table
// index: color index
Color BMP::get_color(int index)
{
    Color color;
    color.r = colors[index].r;
    color.g = colors[index].g;
    color.b = colors[index].b;
    color.a = colors[index].a;
    return color;
}

// Find a color index in color table
// color: color, which index is to be found
int BMP::find_color(Color &color)
{
    for (int i = 0; i < colors_num; i++)
    {
        if (color.r == colors[i].r && color.g == colors[i].g &&
            color.b == colors[i].b && color.a == colors[i].a)
            return i;
    }
    return -1;
}

2.3 BMP Data Area Format

The data area is stored from left to right and bottom to top. Each row is from left to right, with pixels in each row adjacent to one another, but a certain number of 0-byte padding is added at the end of each row to make the length of each row a multiple of 4 bytes (also known as 4-byte alignment). Therefore, direct reading/writing with nested loops is not feasible; the offset must be calculated first:

padding = (4 - (((header.width * header.depth + 7) / 8) % 4)) % 4;

Since the minimum unit for each read/write operation is 1 byte, corresponding calculations must be performed for bit depths less than 8:

  • For 2-color images (1-bit depth): 8 bits in 1 byte represent 8 pixels, with lower bits first and higher bits later. The value (0 or 1) of each bit represents its color index in the palette.
  • For 16-color images (4-bit depth): 8 bits in 1 byte are divided into two groups of 4 bits, representing two pixels. The lower 4 bits represent the first pixel, and the higher 4 bits represent the second pixel. The value corresponding to the 4 bits is also the color index in the palette.
  • For 256-color images (8-bit depth): 1 byte represents 1 pixel, and the value here also indicates the color index in the palette, not the grayscale value from 0 to 255.
  • For true-color images (24-bit depth): 3 bytes represent 1 pixel, arranged in B, G, R channel order. The content of 1 byte for each channel represents the pixel value (0 to 255) of that channel (true-color images have no palette).

2.4 Reading and Writing BMP Images

Writing functions are similar to reading functions; reading is used as an example here. In the BMP class, the constructor is defined as the read function, which takes a string parameter pointing to the filename:

// Constructor
// filename: BMP image file
BMP::BMP(const char *filename)
{
    uint8_t byte;
    FILE *fp;

    // File header
    if ((fp = fopen(filename, "rb")) == NULL)
        throw_ex("fopen");
    if (fread(&header, sizeof(Header), 1, fp) < 0)
        throw_ex("fread");

    // Color palette
    colors_num = (header.colors == 0 && header.depth != 24) ? (1 << header.depth) : header.colors;
    colors = new Color[colors_num];
    for (int i = 0; i < colors_num; i++)
    {
        fread(&byte, 1, 1, fp);
        colors[i].r = byte;
        fread(&byte, 1, 1, fp);
        colors[i].g = byte;
        fread(&byte, 1, 1, fp);
        colors[i].b = byte;
        fread(&byte, 1, 1, fp);
        colors[i].a = byte;
    }

    // Pixels
    padding = (4 - (((header.width * header.depth + 7) / 8) % 4)) % 4;
    px = new Color[header.width * header.height];
    switch (header.depth)
    {
    case 1:
        read1(fp);
        break;
    case 4:
        read4(fp);
        break;
    case 8:
        read8(fp);
        break;
    case 24:
        read24(fp);
        break;
    }
    fclose(fp);
    his = new int[256];
}

The file header, color palette, and pixels are read in sequence. After reading the file header, the number of palette entries (color_num) and the number of alignment padding bytes per row of pixels (padding) are calculated using information from the file header. Then, different reading functions are called based on different bit depth scenarios. read4 is used as an example:

// Read 4-bit BMP file
// fp: file pointer
void BMP::read4(FILE *fp)
{
    uint8_t byte;
    fseek(fp, header.offset, SEEK_SET);
    for (int i = header.height - 1; i >= 0; i--)
    {
        for (int j = 0; j < header.width; j += 2)
        {
            if (fread(&byte, 1, 1, fp) < 0)
                throw_ex("fread");
            px[header.width * i + j] = get_color(byte & 0x0F);
            px[header.width * i + j + 1] = get_color(byte >> 4);
        }
        fseek(fp, padding, SEEK_CUR);
    }
}

First, fseek is used to jump to the data area. Since the BMP format is stored from bottom to top, but in digital image processing, the x-axis is generally from top to bottom, rows are read in reverse order. When reading pixels, since it is a 4-bit image (16-color), one byte contains two pixels, so these two pixels are split using bitwise operations and written to the px array in sequence. Additionally, the padding area is skipped after reading each row.

Writing operations are similar. For 1-bit (2-color) BMP images, the write operation is as follows:

// Write 1-bit BMP file
// fp: file pointer
void BMP::write1(FILE *fp)
{
    uint8_t byte;
    fseek(fp, header.offset, SEEK_SET);
    for (int i = header.height - 1; i >= 0; i--)
    {
        for (int j = 0; j < header.width; j += 8)
        {
            byte = 0;
            for (int k = 0; k < 8; k++)
                byte += find_color(px[i * header.width + j + k]) << (7 - k);
            fwrite(&byte, 1, 1, fp);
        }
        fwrite("\0", 1, padding, fp);
    }
}

For the convenience of subsequent operations, images can be uniformly converted to 24-bit color storage, so the to24 function is written, which simply adjusts the image size, offset, number of palette entries, and bit depth information in the header structure:

// Convert current BMP image into 24-bit
void BMP::to24()
{
    header.offset = 54;
    header.depth = 24;
    padding = (4 - (((header.width * header.depth + 7) / 8) % 4)) % 4;
    header.image_size = (header.width * 3 + padding) * header.height;
    header.file_size = header.offset + header.image_size;
    colors_num = 0;
}

Meanwhile, for the convenience of binarization and grayscale processing, the to1 function (converts the image to 1-bit storage) and to8 function (converts to 256-color grayscale image storage) are also written:

// Convert current BMP image into 1-bit
void BMP::to1()
{
    header.offset = 62;
    header.depth = 1;
    header.colors = 0;
    padding = (4 - (((header.width * header.depth + 7) / 8) % 4)) % 4;
    header.image_size = ((header.width + 7) / 8 + padding) * header.height;
    header.file_size = header.offset + header.image_size;

    delete[] colors;
    colors_num = 2;
    colors = new Color[colors_num];
    colors[0].r = 0;
    colors[0].g = 0;
    colors[0].b = 0;
    colors[0].a = 0;
    colors[1].r = 255;
    colors[1].g = 255;
    colors[1].b = 255;
    colors[1].a = 0;
}

// Convert current BMP image into 8-bit
void BMP::to8()
{
    header.offset = 1078;
    header.depth = 8;
    header.colors = 0;
    padding = (4 - (((header.width * header.depth + 7) / 8) % 4)) % 4;
    header.image_size = (header.width + padding) * header.height;
    header.file_size = header.offset + header.image_size;

    delete[] colors;
    colors_num = 256;
    colors = new Color[colors_num];
    for (int i = 0; i < colors_num; i++)
    {
        colors[i].r = i;
        colors[i].g = i;
        colors[i].b = i;
        colors[i].a = 0;
    }
}

3. Histogram Equalization and Normalization

Since all pixel data is stored in the px array, subsequent image processing operations can be carried out around the px array, as well as the width (header.width) and height (header.height) in the header information.

Equalization and normalization here only consider cases where grayscale values range from 0 to 255.

3.1 Obtaining Histogram Array

A histogram is the statistical distribution of grayscale values, with the horizontal axis representing grayscale values and the vertical axis representing the number of pixels with that grayscale value. Therefore, the histogram function is written to obtain the histogram array:

// Calculate image histogram
int *BMP::histogram()
{
    hsv();
    for (int i = 0; i < 256; i++)
        his[i] = 0;
    for (int i = 0; i < header.width * header.height; i++)
        his[px[i].b]++;
    rgb();
    return his;
}

The result is stored in the histogram array.

3.2 Grayscale Image Equalization

According to the formula in the “Digital Image Processing” textbook, let map[i] be the grayscale value that a pixel with the original grayscale value i should be converted to after equalization. Then:

map[i]=\frac{255}{width\times height}\sum_{k=0}^ihistogram[k]

This smooths the cumulative distribution of the histogram up to each grayscale value. Then, iterate over each grayscale value i and set i\leftarrow map[i] to complete equalization. Partial code can be written as follows (taking the B channel as an example):

// Histogram equalization
void BMP::equalize()
{
    ...
    int *map = new int[256];
    int sum;
    double size = header.width * header.height;
    // Value Channel
    sum = 0;
    for (int i = 0; i < 256; i++)
    {
        sum += histogram[i];
        map[i] = (int)round(255 * sum / size);
    }
    for (int i = 0; i < size; i++)
        px[i].b = map[px[i].b];
    for (int i = 0; i < colors_num; i++)
        colors[i].b = map[colors[i].b];
    ...
    calc_histogram();
    delete[] map;
}

Note that the colors in the palette must also be equalized to ensure correct correspondence during saving. The effect is shown in the upper right corner of the figure below.

3.3 Color Image Equalization

However, the above only applies to grayscale images where the RGB channel values are the same. For color images, the first thought is to process the three channels separately and perform equalization on each. However, this leads to color inconsistency issues, as shown in the lower right corner:

Because the purpose of equalization is to balance brightness, and none of the RGB channels can independently reflect brightness, color space conversion is required first. Here, conversion from RGB to HSV is chosen, where the V channel represents brightness.

Two mutual conversion functions for these channels are written in color.cpp:

// RGB to HSV in-place
void Color::hsv()
{
    if (type)
        return;
    double R, G, B, H, S, V;
    R = r / 255.0;
    G = g / 255.0;
    B = b / 255.0;
    double X_max = max(R, max(G, B));
    double X_min = min(R, min(G, B));
    double C = X_max - X_min;
    V = X_max;
    if (C == 0)
        H = 0;
    else if (V == R)
        H = 60 * (0 + (G - B) / C);
    else if (V == G)
        H = 60 * (2 + (B - R) / C);
    else
        H = 60 * (4 + (R - G) / C);
    if (V == 0)
        S = 0;
    else
        S = C / V;
    r = (uint8_t)round(H / 2);
    g = (uint8_t)round(S * 255);
    b = (uint8_t)round(V * 255); // H/S/V all in 0~255
    type = true;
}

// HSV to RGB in-place
void Color::rgb()
{
    if (!type)
        return;
    double R, G, B, H, S, V;
    H = (r * 2) / 60.0;
    S = g / 255.0;
    V = b / 255.0;
    double C = V * S;
    double X = C * (1 - fabs(fmod(H, 2) - 1));
    double M = V - C;
    if (0 <= H && H <= 1)
        R = C, G = X, B = 0;
    else if (1 < H && H <= 2)
        R = X, G = C, B = 0;
    else if (2 < H && H <= 3)
        R = 0, G = C, B = X;
    else if (3 < H && H <= 4)
        R = 0, G = X, B = C;
    else if (4 < H && H <= 5)
        R = X, G = 0, B = C;
    else if (5 < H && H <= 6)
        R = C, G = 0, B = X;
    else
        R = 0, G = 0, B = 0;
    r = (uint8_t)round((R + M) * 255);
    g = (uint8_t)round((G + M) * 255);
    b = (uint8_t)round((B + M) * 255);
    type = false;
}

An in-place mode is adopted, where the original R, G, and B correspond to H, S, and V respectively. Since in the HSV color space, H typically ranges from 0 to 360, and S and V typically range from 0 to 100, scaling is performed here to uniformly set them in the range of 0 to 255.

With channel conversion, for color image equalization, it is only necessary to convert to the HSV space before calculation, perform equalization on the V channel, and then convert back to the RGB space after calculation. The effect is shown in the figure below:

3.4 Histogram Matching

Similarly, the V channel is used for explanation here (the V channel of a grayscale image corresponds to its grayscale value). According to the textbook formula, let map[i] be the grayscale value that a pixel with the original grayscale value i should be converted to after equalization; histogram_2 is the target histogram to be approximated. Then, after first equalizing the image:

g[i]=255\times\sum_{k=0}^ihistogram_2[k]

map[i]=g^{-1}[i]

Since g[i] is obviously monotonically increasing, the value of g^{-1}[i] can be obtained through simple binary search. The lower_bound function from the C++ algorithm library can be used to implement this. The program is written as follows:

// Histogram normalization
// his: new distribution of histogram
void BMP::normalize(int *his)
{
    equalize();
    hsv();
    int *g = new int[256];
    int *map = new int[256];
    int sum;
    double size = header.width * header.height;
    // We process only value channel
    sum = 0;
    for (int i = 0; i < 256; i++)
    {
        sum += his[i];
        g[i] = (int)round(255 * sum / size);
    }
    for (int i = 0; i < 256; i++)
        map[i] = lower_bound(g, g + 256, i) - g;
    for (int i = 0; i < size; i++)
        px[i].b = map[px[i].b];
    rgb();
    calc_histogram();
    delete[] g;
    delete[] map;
}

Using the histogram distribution of a 256-color grayscale image to normalize the histogram of a color image, the effect is shown in star24_out.bmp on the right:

3.5 Image Binarization

We can binarize the image based on the histogram distribution. In fact, it is only necessary to determine whether the pixel is 0 or 1 based on whether the value of the V channel exceeds a set threshold. At the same time, to save space, the binarized image can be converted to 2-color storage, so the previously written to1 function is used:

// Binarize an image into black and white
// threshold: Value channel threshold
void BMP::binarize(int threshold)
{
    hsv();
    for (int i = 0; i < header.height; i++)
        for (int j = 0; j < header.width; j++)
            if (px[i * header.width + j].b > threshold)
            {
                px[i * header.width + j].b = 255;
                px[i * header.width + j].g = 255;
                px[i * header.width + j].r = 255;
            }
            else
            {
                px[i * header.width + j].b = 0;
                px[i * header.width + j].g = 0;
                px[i * header.width + j].r = 0;
            }
    to1();
}

Binarization with a threshold of 100 is performed on both color and 16-color images, with the effect shown in the figure below:

4. FFT and Frequency Domain Filtering

The FFT algorithm is implemented here using the fftw3 library, which can be installed on Linux with the following command:

sudo apt install libfftw3-dev

4.1 Image Conventions

For image coordinates, the same convention as in the textbook is used here: the origin is at the upper left corner, the x-axis represents height (denoted by i in the program), and the y-axis represents width (denoted by j in the program).

For spectrum display: For grayscale images, DFT can be directly performed using grayscale values, but for color images, conversion to the HSV space is also performed here, and the V channel is used to calculate the DFT.

However, for frequency domain filtering: For color images, DFT, filtering, and IDFT calculations are performed separately for each of the three channels.

4.2 2D DFT Spectrum Display

4.2.1 Code Framework

According to the FFTW documentation, the code framework for the usage steps is as follows:

#include <fftw3.h>
...
{
    fftw_complex *in, *out;
    fftw_plan p;
    ...
    in = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * N);
    out = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * N);
    p = fftw_plan_dft_1d(N, in, out, FFTW_FORWARD, FFTW_ESTIMATE);
    ...
    fftw_execute(p); /* repeat as needed */
    ...
    fftw_destroy_plan(p);
    fftw_free(in); fftw_free(out);
}
  • First, create storage space for input and output complex arrays using fftw_malloc.
  • Then, create a computation plan using fftw_plan* series functions.
  • Next, execute the computation using fftw_execute.
  • Finally, release the plan and the space for the complex arrays using destroy_plan and fftw_free.

fftw_complex represents a complex number; in fact, each fftw_complex is a double [2], where index 0 represents the real part and index 1 represents the imaginary part.

4.2.2 Format Conversion

To implement spectrum display, the px array first needs to be converted to an fftw_complex array. The real part can be simply assigned here, but this results in the spectrum origin being at the upper left corner (same as the image). To center the spectrum origin in the image, the conclusion from the textbook can be used:

If the Fourier transform of an image f(x,\,y) is F(u,\,v), then:

f(x,\,y)(-1)^{x+y}\Leftrightarrow F\left(u-\frac{height}{2},v-\frac{width}{2}\right)

Therefore, (-1)^{i+j} can be multiplied when assigning the real part:

in = (fftw_complex *)fftw_malloc(sizeof(fftw_complex) * width * height);
out = (fftw_complex *)fftw_malloc(sizeof(fftw_complex) * width * height);
if (in == NULL || out == NULL)
    throw_ex("fftw_malloc");

for (int i = 0; i < height; i++)
    for (int j = 0; j < width; j++)
    {
        in[i * width + j][0] = px[i * width + j].b * ((i + j) % 2 ? -1 : 1); // Make sure (0, 0) is at the center of the spectrum
        in[i * width + j][1] = 0;
    }

4.2.3 Calculating DFT

According to the documentation, a fftw_plan_dft_2d plan needs to be created here:

p = fftw_plan_dft_2d(height, width, in, out, FFTW_FORWARD, FFTW_ESTIMATE);
fftw_execute(p);

After calculation, the results are stored in out.

4.2.4 Spectrum Visualization

Since the calculation results are complex numbers, the modulus of the spectrum is used to determine the grayscale value (consistent with the textbook). Additionally, as mentioned on page 268 (page 155 of the Chinese version) of the textbook, since the spectral intensity varies significantly across different frequencies, the dynamic range can be improved by taking the logarithm of the spectral intensity.

Finally, since the actual grayscale values range from 0 to 255, the grayscale values need to be uniformly scaled. Let the spectrum image be f'(x,\,y), and the previous Fourier transform result be F(u,\,v). The final formula is as follows:

f'(x,\,y) = \mathrm{round}\left(\frac{\log( |F(x,\,y)|+1)}{\displaystyle{\max_{0\leq u<height,\,0\leq v< width}\log(|F(u,\,v)|+1)}}\times255\right)

The code can be written as follows:

for (int i = 0; i < height; i++)
    for (int j = 0; j < width; j++)
    {
        double re = out[i * width + j][0];
        double im = out[i * width + j][1];
        f_max = max(f_max, log(sqrt(re * re + im * im) + 1)); // use log to enhance low-power frequencies
    }
for (int i = 0; i < height; i++)
    for (int j = 0; j < width; j++)
    {
        double re = out[i * width + j][0];
        double im = out[i * width + j][1];
        uint8_t value = (uint8_t)round(log(sqrt(re * re + im * im) + 1) / f_max * 255); // use log to enhance low-power frequencies
        px[i * width + j].b = value;
        px[i * width + j].g = value;
        px[i * width + j].r = value;
    }

4.2.5 Function Prototype and Testing

The above calculation steps are written as the fft_dft function with the following prototype:

// Calculate DFT of an image, and replace its px in-place
// width:  image width
// height: image height
// px:     image pixel data, and after this, its data will be replaced by its Fourier spectrum in grayscale
void fft_dft(int width, int height, Color *px)

This function is then called in bmp.cpp. Since the spectrum is a grayscale image, the image is converted to an 8-bit 256-color image to reduce storage space:

// Convert image to Fourier spectrum of its V channel
void BMP::dft()
{
    hsv();
    fft_dft(header.width, header.height, px);
    to8();
}

Testing on images similar to those in the textbook yields the following results, with the spectra of the two images shown on the right:

Testing on Lena’s grayscale and color images yields the following results:

It can be seen that there is no significant difference between the spectra of the grayscale and color images, because the spectrum of the color image is calculated using the V channel.

4.3 Frequency Domain Filtering

4.3.1 Filter Design

Gaussian low-pass and high-pass filters from the textbook are used here. The filter can be expressed as H(u,\,v), and multiplying H(u,\,v) by the frequency-domain F(u,\,v) yields the new frequency domain. Let size be the filter size, then:

The Gaussian low-pass filter is:

H(u,\,v)=\exp\left(-\frac{(u-u_0)^2+(v-v_0)^2}{2\times size^2}\right)

The Gaussian high-pass filter is:

H(u,\,v)=1-\exp\left(-\frac{(u-u_0)^2+(v-v_0)^2}{2\times size^2}\right)

where:

u_0=\frac{height}{2}\quad v_0=\frac{width}{2}

For a low-pass filter, a larger size means more frequencies are allowed to pass, resulting in less impact on the original image; for a high-pass filter, a larger size means more frequencies are removed, resulting in a greater impact on the original image. These two filters can be written as the following functions:

// Gaussian low-pass filter
// width:       image width
// height:      image height
// out:         frequency domain data
// filter_size: size of filter in frequency domain
void gauss_lpf(int width, int height, fftw_complex *out, double filter_size)
{
    double x_center = height / 2.0;
    double y_center = width / 2.0;

    for (int i = 0; i < height; i++)
        for (int j = 0; j < width; j++)
        {
            double d2 = (i - x_center) * (i - x_center) + (j - y_center) * (j - y_center);
            double h = exp(-d2 / (2 * filter_size * filter_size));
            out[i * width + j][0] *= h;
            out[i * width + j][1] *= h;
        }
}

// Gaussian high-pass filter
// width:       image width
// height:      image height
// out:         frequency domain data
// filter_size: size of filter in frequency domain
void gauss_hpf(int width, int height, fftw_complex *out, double filter_size)
{
    double x_center = height / 2.0;
    double y_center = width / 2.0;

    for (int i = 0; i < height; i++)
        for (int j = 0; j < width; j++)
        {
            double d2 = (i - x_center) * (i - x_center) + (j - y_center) * (j - y_center);
            double h = 1 - exp(-d2 / (2 * filter_size * filter_size));
            out[i * width + j][0] *= h;
            out[i * width + j][1] *= h;
        }
}

4.3.2 IDFT Inverse Transform

After obtaining the new out complex array through the filter, inverse transform needs to be performed on it. According to the textbook formula, IDFT can be calculated using FFT:

If:

F(u,\,v)=\mathrm{DFT}[f(x,\,y)]

then:

f(x,\,y)=\frac{1}{width\times height}\mathrm{DFT}[\overline{F(u,\,v)}]

Only the complex conjugate needs to be taken, and the result divided by the image size. Additionally, since the image was multiplied by (-1)^{x+y} before DFT, it needs to be multiplied by (-1)^{x+y} again here to restore the original image.

Furthermore, only the real part of the inverse transform result, i.e., \mathrm{Re}\,f(x,y), needs to be taken. The code can be written as follows:

for (int i = 0; i < height; i++)
    for (int j = 0; j < width; j++)
        out[i * width + j][1] *= -1;
p = fftw_plan_dft_2d(height, width, out, out, FFTW_FORWARD, FFTW_ESTIMATE);
fftw_execute(p);
for (int i = 0; i < height; i++)
    for (int j = 0; j < width; j++)
    {
        double re = out[i * width + j][0] / (width * height) * ((i + j) % 2 ? -1 : 1);
        if (re > 255)
            re = 255;
        if (re < 0)
            re = 0;
        uint8_t value = (uint8_t)round(re);
        if (c == 0)
            px[i * width + j].b = value;
        else if (c == 1)
            px[i * width + j].g = value;
        else
            px[i * width + j].r = value;
    }

4.3.3 Function Prototype and Testing

The above steps are written as the fft_dft_idft function with the following prototype:

// Calculate DFT of an image, perform some processing in the frequency domain, and then IDFT
// width:       image width
// height:      image height
// px:          image pixel data, and after this, its data will be replaced by the processed image
// proc:        processing function for the frequency domain
// filter_size: size of filter in frequency domain
void fft_dft_idft(int width, int height, Color *px, void (*proc)(int, int, fftw_complex *, double), double filter_size)

where proc is one of the two filter functions.

This function is called in bmp.cpp. Since filtering 2-color and 16-color images generates new colors, they need to be converted to 256-color; color images remain unchanged:

// Gaussian low-pass filter
// filter_size: image is blurrier if smaller
void BMP::low_pass_filter(double filter_size)
{
    if (header.depth <= 8)
        to8();
    fft_dft_idft(header.width, header.height, px, gauss_lpf, filter_size);
}

// Gaussian high-pass filter
// filter_size: image is sharper if larger
void BMP::high_pass_filter(double filter_size)
{
    if (header.depth <= 8)
        to8();
    fft_dft_idft(header.width, header.height, px, gauss_hpf, filter_size);
}

Low-pass filtering is performed on Lena’s color image with filter sizes of 2, 4, 8, 16, and 32:

It can be found that a smaller filter size means fewer high-frequency components are allowed to pass, resulting in a smoother image. High-pass filtering is then performed with the same sizes (2, 4, 8, 16, 32):

It can be seen that a larger filter size means more low-frequency components are removed, resulting in more prominent edges. Additionally, as low-frequency components close to DC are removed, the image tends to be black in smooth areas and has other colors only at edges. To increase contrast, histogram equalization is performed on these five images:

This further shows that a larger filter size results in more obvious extraction of edge features.

The following are tests on the test images: low-pass filtering is applied to the first row, and high-pass filtering to the second row, with a filter size of 10 for both:

Significant changes in the spectra can also be observed.

For the second row, since the range is scaled when displaying the spectrum, the color at the center (where frequency intensity is originally highest) remains the same, but it can be seen that the originally dark areas around become lighter.

The algorithm is also effective for non-square patterns. Below is the result of low-pass filtering with a size of 10:


Click here to download the code and sample images