Image Processing is an integral part of Computer vision. We almost always want to resize images, do data augmentation, see images in a grid, etc. OpenCV (Open source computer vision), scikit-image, Pillow are some popular image processing libraries in Python. In this article, I’ve covered some of the most commonly used Image processing techniques.
Here’s the Jupyter notebook I’ve used for the article: https://jovian.ml/aakanksha-ns/image-processing
1) Reading an Image
Images are represented as arrays consisting of pixel values. 8-bit images have pixel values ranging from 0 (black) to 255 (white). Depending on the color scale there are various channels in an image, each channel representing the pixel values for one particular color. RGB (Red, green, blue) is the most commonly used color scale and all images I’ve used in my examples are RGB images.
We can easily read the image array using the
imread function from OpenCV. One thing to remember here is that OpenCV reads images in BGR order by default.
Cropping is a widely used augmentation technique. However, be careful as to not crop important parts of the image (pretty obvious, but easy to miss when you have too many images of various different sizes). Since images are represented using arrays, cropping is equivalent to taking out a slice from an array:
Most deep learning model architectures expect all input images to be of the same dimensions.
4) Flipping image
This is another very popular image augmentation technique. The only thing to remember here is that the flipping should make sense for your use case. For example, if you’re classifying building types, you wouldn’t encounter any inverted buildings in your test set so it doesn’t make sense to do a vertical flip in this case.
5) Rotate Image:
In most cases, it is okay to rotate the image by a small angle. The naive way of doing this might change the entire orientation of the image in some cases like in this case:
Hence, a better way of rotating is by doing an affine transform using OpenCV. An affine transformation preserves collinearity and ratios of distances (eg: the midpoint of a line segment continues to remain the midpoint even after transformation). You can also fill the borders by using the
6) Change Brightness and Contrast:
This involves applying the following function to each pixel:
Here alpha (>0) is called gain and beta is called bias, these parameters are said to control contrast and brightness respectively. Since we represent images using arrays, this function can be applied to each pixel by traversing through the array:
However, this could take a while on bigger images (like the one below), so you’d want to use the optimized library functions:
7) Displaying Bounding Box:
Object detection is a very popular computer vision problem that involves finding a bounding box enclosing the object of interest. Displaying the bounding box on the picture can help us visually inspect the problem and requirements. One thing to remember while dealing with these problems is that if you’re planning to flip the image, make sure you flip the box coordinates accordingly too. Here’s an easy way to display an image with its bounding box.
8) Showing multiple images in a grid:
Often we want to inspect multiple images in one go. It can be easily done using subplots in matplotlib.
9) Converting images to Black and white:
Although not widely used in computer vision, it’s nice to know how to convert color images to greyscale.
This technique can be useful in making your model more robust to image quality issues. If a model can perform well on blurred images, it may indicate the model is doing well in general.
Of course, there are a lot more problem-specific image processing techniques. For example, for auto-driving cars, you might wanna mark other cars on the road and view it from various angles. However, for most problems, the above-mentioned functions should be useful!