设计一个轻量的图像数据结构

背景

在日常的图像处理相关的编码工作中，都会涉及到图像的数据结构。而众所周知，在cpp中没有提供图像的数据结构的标准库实现，又因为图像数据结构本质就是一个数组，所以开发者经常都自行实现一个图像的数据结构，往往都是简单的一个结构体封装

1
2
3
4
5
6


struct Image
{
    void* data;
    std::size_t with;
    std::size_t height;
};

这种实现和使用起来非常简单明了，但有这内存管理、无法设置roi等缺点。尤其是内存管理，使用结构体打包裸数据的实现，在传参和拷贝图像数据的时候需要时刻注意内存泄漏的问题。对于这个问题，一些比较成熟的第三方库比如opencv实现的比较完善，但这些库往往过于臃肿庞大。所以在自己的项目中，涉及图像处理时，有必要考虑设计一个轻量且好用的图像数据结构。

chatgpt实现

现在流行面向AI编程，首先先看一下chatgpt的实现情况：

下面是一个简单的 C++ 图像类示例，支持浅拷贝和深拷贝、多通道图像数据和设置 ROI（感兴趣区域）：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73


#include <iostream>
#include <cstring>

class Image {
private:
    int width;
    int height;
    int channels;
    unsigned char *data;

public:
    Image(int w, int h, int ch) : width(w), height(h), channels(ch) {
        data = new unsigned char[width * height * channels];
    }

    // Copy constructor for deep copy
    Image(const Image& other) : width(other.width), height(other.height), channels(other.channels) {
        data = new unsigned char[width * height * channels];
        std::memcpy(data, other.data, width * height * channels);
    }

    // Assignment operator for deep copy
    Image& operator=(const Image& other) {
        if (this != &other) {
            width = other.width;
            height = other.height;
            channels = other.channels;
            delete[] data;
            data = new unsigned char[width * height * channels];
            std::memcpy(data, other.data, width * height * channels);
        }
        return *this;
    }

    // Destructor
    ~Image() {
        delete[] data;
    }

    // Function to set ROI
    void setROI(int x, int y, int w, int h) {
        // Implement your logic to set ROI
        // For simplicity, let's assume setting ROI means updating width, height, and data pointers
        width = w;
        height = h;
        data += (y * width + x) * channels;
    }

    // Function to print image details
    void printDetails() const {
        std::cout << "Width: " << width << ", Height: " << height << ", Channels: " << channels << std::endl;
    }
};

int main() {
    // Create an image object
    Image img(800, 600, 3);
    img.printDetails();

    // Create a copy using copy constructor
    Image imgCopy(img);
    imgCopy.printDetails();

    // Create a copy using assignment operator
    Image imgAssign = img;
    imgAssign.printDetails();

    // Set ROI
    img.setROI(100, 100, 400, 300);
    img.printDetails(); // After setting ROI, width, height should be updated

    return 0;
}

在这个示例中，Image 类具有成员变量 width、height 和 channels 分别表示图像的宽度、高度和通道数，以及指向图像数据的指针 data。深拷贝通过复制图像数据实现，而不仅仅是指针的复制。浅拷贝（复制构造函数和赋值运算符）在复制数据指针时会导致多个对象共享同一块内存，可能会导致错误。 setROI 方法用于设置感兴趣区域（Region of Interest），在此示例中简单地更新了图像的宽度、高度和数据指针，实际应用中可能需要更复杂的逻辑。

可以看到，chatgpt的答案还是有模有样的。但是仔细一看就会发现，在深拷贝浅拷贝和roi设置上都存在问题，对于图像传参数的复用上面也没有合理的考虑，所以这个方案时不满足需求的。

具体设计

下来看具体设计

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145


//定义
#ifndef MANGO_CORE_IMAGE_H
#define MANGO_CORE_IMAGE_H

#include <vector>
#include <cassert>

using namespace std;

typedef unsigned short uattr;
typedef unsigned char uchar;
typedef uchar Vec3b[3];
typedef uchar Vec4b[4];
typedef float Vec3f[3];
typedef float Vec4f[4];

#define UNITATTR(id, size) ((uattr)((id << 8) | size))
#define UNITATTR_ID(v) ((uattr)(v >> 8))
#define UNITATTR_SIZE(v) ((uattr)(v & 0x00ff))  //calculate the size of one element

class Point
{
public:
    int x, y;

public:
    Point(int x = 0, int y = 0)
            : x(x), y(y) {}
};

class Size
{
public:
    int width, height;

public:
    Size(int width = 0, int height = 0)
            : width(width), height(height) {}

    bool operator==(const Size& other)
    {
        return (width == other.width) && (height == other.height);
    }

    bool operator!=(const Size& other)
    {
        return !(*this == other);
    }
};


class Rect
{
public:
    int x, y, width, height;

public:
    Rect(int x = 0, int y = 0, int width = 0, int height = 0)
            : x(x), y(y), width(width), height(height) {}

public:
    Size inline size() const
    {
        return Size(width, height);
    }
};


enum IMAGE_TYPE
{
    IMAGE_8UC1 = UNITATTR(0, 1),
    IMAGE_8UC2 = UNITATTR(1, 2),
    IMAGE_8UC3 = UNITATTR(2, 3),
    IMAGE_8UC4 = UNITATTR(3, 4),
    IMAGE_32FC1 = UNITATTR(4, 4),
    IMAGE_32FC2 = UNITATTR(5, 8),
    IMAGE_32FC3 = UNITATTR(6, 12),
    IMAGE_32FC4 = UNITATTR(7, 16)
};

enum BORDER_TYPE
{
    NONE = 0,               ///<不填充
    MIRROR = 1,             ///<默认 镜像， 123 | 321
    CONSTANT = 2,           ///<常数填充，  123 | ccc
    REPLICATE = 3,          ///<边界复制,   123 | 333
};

class Image
{
private:
    uchar *ptr_;
    Rect origin_rect_;
    Rect my_rect_;
    uattr unit_type_;
    bool owner_;

    int rows;
    int cols;

public:
    Image();

    Image(const Size &size, uattr type);

    Image(const Size &size, uattr type, uchar* data);

    Image(const Image &other);
    Image(const Image &&other);

    Image& operator=(const Image &other);
    Image operator()(const Rect &roi);

    virtual ~Image();

private:
    void Release();
public:
    void Create(const Size &size, uattr type);
    void CopyTo(Image &other) const;
public:
    template<typename E>
    E& At(int x, int y);

    template<typename E>
    E& At(const Point &point);

    template<typename E>
    E* Ptr(int x, int y);

    template<typename E>
    E* Ptr(const Point &point);

    std::uint8_t Channels() const;

    bool IsEmpty() const;

    Size GetSize() const;
};

Image LoadImage(const char* img_path);

int WriteImage(Image img, const char* img_path);

#endif //MANGO_CORE_IMAGE_H

再看一下具体实现

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257


#include "image.hpp"

#define STB_IMAGE_IMPLEMENTATION
#include "stb_image/stb_image.h"

#define STB_IMAGE_WRITE_IMPLEMENTATION
#include "stb_image/stb_image_write.h"

#include <cstring>
#include <iostream>

Image::Image()
{
    ptr_ = nullptr;
    owner_ = false;
    unit_type_ = 0;
    origin_rect_ = Rect();
    my_rect_ = Rect();
    rows = 0;
    cols = 0;
}

Image::Image(const Size &size, uattr type)
{
    ptr_ = (uchar*)malloc(size.width * size.height * UNITATTR_SIZE(type));

    assert(ptr_ != nullptr);

    my_rect_ = origin_rect_ = Rect(0, 0, size.width, size.height);
    unit_type_ = type;
    rows = size.height;
    cols = size.width;
}

Image::Image(const Size &size, uattr type, uchar* data)
{
    assert(data != nullptr);
    ptr_ = data;
    my_rect_ = origin_rect_ = Rect(0, 0, size.width, size.height);
    unit_type_ = type;
    rows = size.height;
    cols = size.width;
}

Image::Image(const Image &other)
{
    ptr_ = other.ptr_;
    owner_ = true;
    auto t = const_cast<Image&>(other);
    t.owner_ = false;
    unit_type_ = other.unit_type_;
    origin_rect_ = other.origin_rect_;
    my_rect_ = other.my_rect_;
    rows = other.rows;
    cols = other.cols;
}

Image::Image(const Image &&other)
{
    ptr_ = other.ptr_;
    owner_ = true;
    auto t = const_cast<Image&>(other);
    t.owner_ = false;
    unit_type_ = other.unit_type_;
    origin_rect_ = other.origin_rect_;
    my_rect_ = other.my_rect_;
    rows = other.rows;
    cols = other.cols;
}

Image &Image::operator=(const Image &other)
{
    if (this != &other) 
    {
        Release();

        ptr_ = other.ptr_;
        origin_rect_ = other.origin_rect_;
        my_rect_ = other.my_rect_;
        unit_type_ = other.unit_type_;
        owner_ = true;
        auto t = const_cast<Image&>(other);
        t.owner_ = false;
        rows = other.rows;
        cols = other.cols;
    }

    return *this;
}

Image Image::operator()(const Rect &roi)
{
    Image ret = *this;
    ret.owner_ = false;
    this->owner_ = true;
    ret.my_rect_ = roi;

    return ret;
}

void Image::Create(const Size &size, uattr type)
{
    Release();

    ptr_ = (uchar*)malloc(size.width * size.height * UNITATTR_SIZE(type));

    assert(ptr_ != NULL);

    my_rect_ = origin_rect_ = Rect(0, 0, size.width, size.height);
    unit_type_ = type;
    rows = size.height;
    cols = size.width;
}

void Image::Release()
{
    if (owner_ && ptr_)
        free(ptr_);

    ptr_ = nullptr;
    owner_ = true;
    unit_type_ = 0;
    origin_rect_ = Rect();
    my_rect_ = Rect();
    rows = 0;
    cols = 0;
}

std::uint8_t Image::Channels() const
{
    return UNITATTR_SIZE(unit_type_);
}

bool Image::IsEmpty() const
{
    return  ptr_ == nullptr || rows <= 0 || cols <= 0;
}

Size Image::GetSize() const
{
    return Size(my_rect_.width, my_rect_.height);
}


void Image::CopyTo(Image &other) const
{
    other.Create(my_rect_.size(), unit_type_);

    if (my_rect_.x == 0 && my_rect_.y == 0)
    {
        memcpy(other.ptr_, ptr_, my_rect_.height * my_rect_.width * UNITATTR_SIZE(unit_type_));
        return;
    }

    for (int row = 0; row < other.my_rect_.height; ++row)
        memcpy(other.ptr_ + (UNITATTR_SIZE(unit_type_) * other.my_rect_.width * row), ptr_ + (((origin_rect_.width * (row + my_rect_.y)) + my_rect_.x) * UNITATTR_SIZE(unit_type_)), (UNITATTR_SIZE(unit_type_) * other.my_rect_.width));
}

 Image::~Image()
{
    Release();
}

template<typename E>
E &Image::At(int x, int y)
{
    return *((E*)(&(ptr_[UNITATTR_SIZE(unit_type_)*((origin_rect_.width * (y + my_rect_.y)) + (x + my_rect_.x))])));
}

template<typename E>
E &Image::At(const Point &point)
{
    return At<E>(point.x, point.y);
}

template<typename E>
E *Image::Ptr(int x, int y)
{
    return &At<E>(x, y);
}

template<typename E>
E *Image::Ptr(const Point &point)
{
    return Ptr<E>(point.x, point.y);
}

static IMAGE_TYPE ChannelToType(const int channels)
{
    if(channels == 1)
    {
        return  IMAGE_TYPE::IMAGE_8UC1;
    }else if(channels == 2)
    {
        return IMAGE_TYPE::IMAGE_8UC2;
    }else if(channels == 3)
    {
        return IMAGE_TYPE::IMAGE_8UC3;
    }else if(channels == 4)
    {
        return IMAGE_TYPE::IMAGE_32FC4;
    }else{
        return IMAGE_TYPE::IMAGE_32FC4;
    }
}

Image LoadImage(const char* img_path)
{
    Image matrix;
    assert(img_path != nullptr);
    if(img_path != nullptr)
    {
        int w, h, n;
        w = h = n = 0;
        std::uint8_t *data = stbi_load(img_path, &w, &h, &n, 0);
        if(nullptr != data)
        {
            auto tmp = Image(Size(w, h), ChannelToType(n), data);
            matrix = tmp;
        }
    }
    return matrix;
}

int WriteImage(Image img, const char* img_path)
{
    if(img.IsEmpty() || img_path == nullptr)
    {
        return  -1;
    }
    std::uint8_t* data = img.Ptr<std::uint8_t>(0, 0);
    auto src_size = img.GetSize();
    int  h = src_size.height;
    int  w = src_size.width;
    auto n = img.Channels();

    std::string file_name = std::string(img_path);
    size_t dot_position = file_name.find_last_of(".");
    if (dot_position != std::string::npos)
    {
        std::string file_extension = file_name.substr(dot_position + 1);
        if(file_extension == "jeg" || file_extension == "jpeg" )
        {
            return stbi_write_jpg(img_path, w, h, n, data, 100);
        }else if(file_extension == "png")
        {
            return stbi_write_png(img_path,w, h, n, data, w * n );
        }else if(file_extension == "bmp")
        {
            return stbi_write_bmp(img_path, w, h, n, data);
        }else
        {
            std::cerr<< "The image format unsupported!!!" << std::endl;
        }
    }
    return -1;
}

简单说明一下以上方案的设计要点

基于宏替换和位移运算实现了图像数据类型的定义和计算，支持多种数据类型图像，且后续方便扩展

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17


#define UNITATTR(id, size) ((uattr)((id << 8) | size))
#define UNITATTR_ID(v) ((uattr)(v >> 8))
#define UNITATTR_SIZE(v) ((uattr)(v & 0x00ff))  //calculate the size of one element

//...

enum IMAGE_TYPE
{
    IMAGE_8UC1 = UNITATTR(0, 1),
    IMAGE_8UC2 = UNITATTR(1, 2),
    IMAGE_8UC3 = UNITATTR(2, 3),
    IMAGE_8UC4 = UNITATTR(3, 4),
    IMAGE_32FC1 = UNITATTR(4, 4),
    IMAGE_32FC2 = UNITATTR(5, 8),
    IMAGE_32FC3 = UNITATTR(6, 12),
    IMAGE_32FC4 = UNITATTR(7, 16)
};

基于一个数据所有权机制，加析构函数自动释放。解决了图像浅拷贝传参数的内存释放问题，每次浅拷贝发生时都会转移所有权，所以最后作用的图像对象如果发生析构函数，则会自动触发内存回收。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19


Image &Image::operator=(const Image &other)
{
    if (this != &other) 
    {
        Release();

        ptr_ = other.ptr_;
        origin_rect_ = other.origin_rect_;
        my_rect_ = other.my_rect_;
        unit_type_ = other.unit_type_;
        owner_ = true;
        auto t = const_cast<Image&>(other);
        t.owner_ = false;
        rows = other.rows;
        cols = other.cols;
    }

    return *this;
}

基于ROI形状记录实现了ROI操作，并且不丢弃原图像大小和区域。理论上沿着这个思路还可以扩充多边形roi、圆形roi，相应需要处理的只是在取像素点数据的逻辑。

1
2
3
4
5


template<typename E>
E &Image::At(int x, int y)
{
    return *((E*)(&(ptr_[UNITATTR_SIZE(unit_type_)*((origin_rect_.width * (y + my_rect_.y)) + (x + my_rect_.x))])));
}

基于stbimage实现了图片的编码和解码，stbimage大概是目前已知最轻量的图像编解码库了（单头文件），而且性能表现也不错。支持多种常见格式的图像编解码。

1
2
3


Image LoadImage(const char* img_path);

int WriteImage(Image img, const char* img_path);

以上，就是一个轻量化图像数据结构方案的实现。对于绝大多数项目的场景，应该是够用了。我个人的后续所有项目也会使用这个方案。

微信公众号