Data augmentation technique for degraded images without losing the classification ability of clean images

Kazuki Endo

doi:10.1117/1.JEI.33.2.023009

7 March 2024 Data augmentation technique for degraded images without losing the classification ability of clean images

Kazuki Endo

Author Affiliations +

Journal of Electronic Imaging, Vol. 33, Issue 2, 023009 (March 2024). https://doi.org/10.1117/1.JEI.33.2.023009

Abstract

Classification networks of degraded images need to deal with various strengths of degradation, referred to as degradation levels, in practical applications. However, there has been limited exploration of data augmentation techniques for degraded images with various degradation levels. We propose a data augmentation technique to apply distinct data augmentations to both clean and degraded image domains. Specifically, the proposed method uses random erasing and CutBlur data augmentations for a clean and degraded image, respectively. Experimental results show that the proposed method can effectively train a classification network of degraded images without losing the classification ability of clean images. Furthermore, the results also confirm the proposed method’s efficacy across various degradations, multiple network architectures, and several datasets.

1. Introduction

Image recognition has seen remarkable progress through the use of deep convolutional neural networks (CNNs).¹^–⁷ Typically, these CNNs are trained with only clean images and are designed to input clean images. However, in real-world applications, such as autonomous driving, the input images of the networks often contain various degradations, such as noise, blur, and compression. Prior studies⁸^,⁹ have pointed out that CNNs trained with only clean images cannot recognize degraded images well due to degradations. Therefore, recognizing degraded images becomes a more critical and realistic challenge compared with recognizing only clean images. This paper focuses on the CNN-based classification of degraded images because classification is the typical task of image recognition.

Recently, the classification of degraded images has become an increasingly researched topic.⁹^–¹⁹ A straightforward approach to address the classification of degraded images is to train a classification network using degraded images. Notably, even if these images include a single degradation, the degradation usually has various strengths of degradation. Therefore, the classification network should be trained over the various strengths of degradation. This paper uses a degradation level as a parameter representing the strength of degradation. For instance, the degradation levels are noise levels for additive white Gaussian noise (AWGN). This paper assumes that the original clean image of a degraded image can be acquired. Consequently, degraded images are assumed to be synthesized from clean images without any degradations using a degradation operator while changing degradation levels.

Note that synthesizing degraded images can be regarded as a kind of data augmentation. To the best of the author’s knowledge, there is limited literature on data augmentation methods of degraded images over various levels of degradation. One such method is mixed training,¹⁰ which is a straightforward approach for augmenting degraded images with various levels of degradation. Mixed training involves the following four steps. (1) A clean image is randomly sampled from a training dataset. (2) A degradation level, followed by a uniform distribution, is randomly sampled. (3) A degraded image is acquired from the clean image by applying a degradation operator with the sampled degradation level. (4) A classification network is trained using the degraded image. However, a classification network trained by mixed training loses the classification ability of clean images compared with a network trained with only clean images⁸ because mixed training trains a network averagely over various levels of degradation, as illustrated in Fig. 1. To overcome this drawback, Endo et al.¹⁸^,²⁰ introduced a network structure termed the feature adjustor. This paper proposes a data augmentation technique to overcome this drawback without relying on special network structures, in which degraded images are assumed to have a single known degradation with unknown levels of degradation. Therefore, our goal is to construct a data augmentation technique for degraded images that can train a classification network of degraded images without losing the classification ability of clean images, as depicted in Fig. 1.

Fig. 1

Classification accuracy over various degradation levels with different training methods.

Figure 2 illustrates several data augmentations of degraded images. Mixed training, as shown in Fig. 2(a), has already been mentioned. Figure 2(b) shows mixed training with random erasing,²¹ which first generates degraded images by mixed training and then applies random erasing to them. CutBlur²² is presented in Fig. 2(c). CutBlur generates a degraded image with a clean region or a clean image with a degraded region. These regions are highlighted as rectangles with white edges in Fig. 2(c). Generally, the same data augmentation methods are applied to both clean and degraded images during the training of classification networks. However, this might not be the best strategy because clean images belong to a distinct domain from degraded images. A more intuitive approach would be to apply appropriate augmentation methods to images based on their respective domains. Based on this idea, this paper proposes a data augmentation technique that applies different operations to clean and degraded images. Specifically, as illustrated in Fig. 2(d), the proposed method is a combination of random erasing for clean images and CutBlur for degraded images. Applying different data augmentations for each clean and degraded image enhances a classification network of degraded images without losing the classification ability of clean images.

Fig. 2

Data augmentations of degraded images: (a) mixed training, (b) mixed training with random erasing, (c) CutBlur, and (d) proposed method. Degradation is JPEG distortion, in which the degradation level denotes a 101 − JPEG quality factor. Black rectangles denote erased regions. Rectangles with white edges denote regions replaced by other image qualities. “D.L.” stands for degradation level.

This paper’s contributions are as follows.

(1) Recognizing that clean and degraded images belong to distinct domains, this paper proposes a data augmentation technique that applies separate data augmentations to each domain. Specifically, the proposed method utilizes random erasing for clean images and CutBlur for degraded images.
(2) Unlike mixed training, a classification network of degraded images trained using the proposed method does not lose the classification ability of clean images. Furthermore, the proposed method shows a more stable performance in classifying high-quality images than mixed training with random erasing and CutBlur.
(3) To validate the effectiveness of the proposed method against various degradations, the proposed method was tested on four types: JPEG distortion, Gaussian blur, AWGN, and salt-and-pepper noise.
(4) To evaluate the robustness of the proposed method against different network structures and datasets, the proposed method was confirmed using four classification CNNs, i.e., VGG16,¹ ResNet50,⁴^,⁵ ResNet56,⁴^,⁵ and PyramidNet110-270⁶ with ShakeDrop regularization,⁷ and on three datasets: CIFAR-10,²³ CIFAR-100,²³ and Tiny ImageNet.²⁴

The remainder of this paper is organized as follows. Section 2 describes the related works of this paper. Section 3 explains the proposed method. Then experiments are described in Sec. 4. Finally, conclusions are described in Sec. 5.

2. Related Works

2.1.

Data Augmentations of Degraded Images

There are few papers related to data augmentations of degraded images with various degradation levels. Peng et al.¹⁰ investigated the fine-grained classification of low-resolution images. They proposed staged training in which a classifier is trained with high-resolution images before training the classifier with low-resolution images. Their aim was to transfer the knowledge of high-resolution images to the classifier of low-resolution images rather than to provide data augmentation. In their experiments, they used mixed training, which involves randomly sampling both low- and high-resolution images to train a network. Their results showed that mixed training is superior to staged training for the classification of high-resolution images. In this paper, mixed training is used as the baseline of data augmentation of degraded images with various levels of degradation. Meanwhile, Yoo et al.²² introduced the data augmentation technique CutBlur for single-image super-resolution. CutBlur replaces a region of either a high-resolution or a low-resolution image with the corresponding region of its paired image, assuming a pair of high-resolution and low-resolution images exists. This paper applies CutBlur to the classification of degraded images with various levels of degradation. Specifically, in the proposed method, CutBlur is applied to only degraded images.

2.2.

Data Augmentations of Image Mixing and Deleting

There are many data augmentations of image mixing and deleting, as surveyed by Naveed et al.²⁵ DeVries et al.²⁶ introduced Cutout, which is a data augmentation technique that deletes a fixed-size square region from images. Cutout always deletes a square region from images but randomly selects the position of the region. Zhong et al.²¹ proposed a data augmentation technique called random erasing, which randomly deletes a rectangle region inside images. Notably, the size of the rectangle region is randomly determined. Moreover, random erasing allows for the replacement of the rectangle region with various colors or random noise. Yun et al.²⁷ proposed CutMix data augmentation, which replaces a rectangle region of an image with a region of another image. Cutout, random erasing, and CutMix do not consider the presence of image degradation. By contrast, the proposed method takes into account degradation and changes data augmentation methods based on the presence or absence of degradation in a training image. Specifically, the proposed method applies random erasing to only clean images without any degradation.

Hendrycks et al.²⁸ introduced AUGMIX to boost robustness against a domain gap between training and testing images. First, AUGMIX performs multiple operations on an image independently and acquires corresponding images. Then those images are merged into a single image. Notably, AUGMIX does not incorporate any degradations, which testing images include, in data augmentations. This paper uses the same degradation operators between the training and testing image domains. Thus the problem setting of this paper differs from that of AUGMIX.

2.3.

Classification of Degraded Images

Three primary approaches exist for the classification of degraded images: a straightforward approach, a restoration approach, and a knowledge distillation approach. The straightforward approach¹⁰^,¹¹ trains an image classification network with degraded images directly. By contrast, the restoration approach⁹^,¹²^–¹⁵ is a sequential network composed of a restoration network and a classification network, in which the classification network is trained with clean images without any degradation. First, degraded images are restored by the restoration network. Next, restored images are input into the classification network trained with clean images. The classification network may be fine-tuned with restored images. The knowledge distillation approach for degraded images¹⁶^–²⁰^,²⁹ transfers the knowledge of a teacher network into a student network. Typically, the teacher network is a classification network trained with only clean images. The student network is trained with degraded images to coincide with image features or the predicted distribution of the teacher network, which has clean images as input. This paper focuses on the straightforward approach, which offers a clearer impact of data augmentations than other approaches.

3. Proposed Method

Although clean and degraded images belong to different domains, the same data augmentation methods are usually performed on them. However, it would be more reasonable to apply distinct data augmentations to each domain. Based on this concept, this paper proposes a data augmentation technique that applies distinct transformations to clean and degraded images, as illustrated in Fig. 3. Initially, mixed training¹⁰ is employed to sample degradation levels using a discrete uniform distribution, in which a clean image is not sampled. With the determined degradation level, a degraded image is synthesized by applying a degradation operator to a clean image. Subsequently, either a clean or degraded image is randomly sampled, as demonstrated in Fig. 3. If a clean image is selected, random erasing²¹ is applied to the clean image with a decision probability, $p_{d}$ , in which random erasing is used to erase a region inside the clean image. Though original random erasing can replace the erased region with random noise or various colors, the proposed method always sets the region colored black. Conversely, when a degraded image is selected, CutBlur²² is applied to the degraded image with decision probability $p_{d}$ , in which CutBlur replaces a region inside the degraded image with its corresponding clean image. The detailed algorithm of the proposed method is shown in Algorithm 1.

Fig. 3

Proposed method. $p_{d}$ denotes a decision probability to apply either random erasing or CutBlur. $p$ is a random number that is randomly sampled from $[0,1]$ .

Algorithm 1

Detailed algorithm of the proposed method. Random [a,b] and Uniform [a,b] denote discrete and continuous uniform random number generators on [a,b], respectively. ⌊z⌋ returns a maximum integer not to exceed z.

Input: Clean image $I_{c}$

Decision probability

p_{d}

The range of ratio

[R_{\min}, R_{\max}]

, where

0 < R_{\min}, R_{\max} < 1

The range of degradation level

[L_{\min}, L_{\max}]

Output: Output image $I$

1:

l

← Random

[L_{\min}, L_{\max}]

2:

p

← Uniform

[0,1]

3:

I_{d} \leftarrow D (I_{c}, l)

, where

D

denotes a degradation operator.

4:

I

← Randomly sample from

{I_{c}, I_{d}}

5: if $p \leq p_{d}$ then

6:

(W, H)

← Width and height of

I

7:

(r_{w}, r_{h})

← Uniform

[R_{\min}, R_{\max}] \times

Uniform

[R_{\min}, R_{\max}]

8:

(x, y)

← Random

[0, W - ⌊ r_{w} W ⌋] \times

Random

[0, H - ⌊ r_{h} H ⌋]

9: for $0 \leq i < ⌊ r_{w} W ⌋, 0 \leq j < ⌊ r_{h} H ⌋$ do

10: if $I = I_{c}$ then

11:

I (x + i, y + j) \leftarrow 0

(Black)

12: else

13:

I (x + i, y + j) \leftarrow I_{c} (x + i, y + j)

14: end if

15: end for

16: end if

17: return $I$

4. Experiments

This section validates the efficacy of the proposed method. Table 1 shows four existing methods compared with the proposed method in terms of domain sampling probabilities and data augmentations, in which the domain sampling probability means a probability of sampling each domain for clean and degraded images. “Clean” signifies a naive training method using only clean images without any degradations. On the other hand, “Mixed” denotes mixed training, which randomly samples degradation levels, including a clean image and applies degradation. For “Mixed” the sampling probability of each degradation level is $\frac{1}{N + 1}$ , where $N$ denotes the number of degradation levels, excluding a clean image. “Clean” and “Mixed” are baselines for clean images and degraded images, respectively. Then “Mixed R.E.” denotes mixed training combined with random erasing data augmentation. After generating degraded images using mixed training, “Mixed R.E.” applies random erasing. As based on mixed training, the domain sampling probability of “Mixed R.E.” is the same as that of “Mixed.” For a fair comparison with the proposed method, the region of an image was always erased and filled with black. Although the standard random erasing selects a height and an aspect ratio randomly, “Mixed R.E.” randomly samples a height and a width in the same way as Algorithm 1. Furthermore, “CutBlur” denotes the data augmentation method by the same name. “CutBlur” has a domain sampling probability of $\frac{1}{2}$ for each clean and degraded image domain. For a fair comparison with the proposed method, the decision probability of “CutBlur” was always set to 1.0. This implies that clean and degraded images were always mixed. The implementation of “CutBlur” mirrored the CutBlur part of Algorithm 1. Finally, “Proposed” stands for the proposed method. The decision probability $p_{d}$ of “Proposed” was set to 1.0, as discussed in Appendix 6.2. The parameters $R_{\min}$ and $R_{\max}$ for “Mixed R.E.,” “CutBlur,” and “Proposed” were fixed at 0.125 and 0.75, respectively. All experiments were executed using Python 3.7.7, Pytorch 1.9.0, CUDA 11.7, and PIL 9.3 on an NVIDIA RTX-A6000 GPU and an Intel Core i9-10940X clocked at CPU 3.3 GHz. Other detailed experiments are described in Appendix A.

Table 1

Comparison of existing and proposed methods in terms of domain sampling probabilities and data augmentations. “Mixed” and “Mixed R.E.” denote mixed training10 and mixed training with random erasing (R.E.),21 respectively. N represents the number of degradation levels used in the experiments, excluding a clean image.

Conditions	Image domain	Existing methods				Proposed
Conditions	Image domain	Clean	Mixed10	Mixed R.E.	CutBlur22	Proposed
Domain sampling probability	Clean image	1	$\frac{1}{N + 1}$	$\frac{1}{N + 1}$	$\frac{1}{2}$	$\frac{1}{2}$
Domain sampling probability	Degraded image	0	$\frac{N}{N + 1}$	$\frac{N}{N + 1}$	$\frac{1}{2}$	$\frac{1}{2}$
Data augmentation	Clean image	None	None	R.E.	CutBlur	R.E.
Data augmentation	Degraded image	None	None	R.E.	CutBlur	CutBlur

4.1.

Training and Evaluation Procedures

The training procedure for experiments is as follows. First, the horizontal flip is randomly applied to a clean image sampled from a dataset. Then the clean image is transformed by each training condition, as seen in Table 1. Subsequently, random cropping yields an augmented image. Finally, a classification network is trained with the augmented image while minimizing the expectation of cross-entropy loss between estimated and true labels. The expectation is replaced by the sample mean on a minibatch in training. The details of the optimizer settings are explained in subsequent analyses because they depend on the structure of classification networks.

The evaluation procedure is as follows. First, degraded testing images are synthesized from clean testing images by applying a degradation operator across every degradation level. Then a trained classification network infers class labels for both clean and degraded images. Finally, two evaluation metrics are calculated: accuracy and interval mean accuracy.⁸ The accuracy, denoted by Acc, gives the number of correct predictions divided by the total number of testing images. The interval mean accuracy is defined as

Eq. (1)

\bar{Acc} (f_{θ} | q_{α}, q_{β}) ≔ \frac{1}{β - α + 1} \sum_{i = α}^{β} Acc (f_{θ} (D (I_{c}, q_{i})), Y),

where

f_{θ}

denotes a classification network with parameter

θ

.

D

denotes a degradation operator such as JPEG distortion, Gaussian blur, etc.

I_{c}

,

Y

, and

q_{i}

represent a clean image, the associated true label of the clean image, and the

i

’th degradation level, respectively.

α

and

β

are some integers satisfying

0 \leq α \leq β

. The rationale behind using the interval mean accuracy is as follows. A calculated Acc for each degradation level may exhibit fluctuations due to the variance in a classification network’s predictions. This implies that the Acc does not necessarily show a smooth change for degradation levels, even if degradation levels are consecutive. Therefore, the interval mean accuracy, which averages accuracies over a range of degradation levels, provides a more coherent and effective metric than analyzing individual Acc. The interval mean accuracy simplifies the process of understanding the classification performance over various levels of degradation.

4.2.

Analysis for JPEG CIFAR-10

Now, experimental comparisons are performed to confirm the proposed method’s effectiveness. Furthermore, we also demonstrate that the proposed method does not depend on a specific network architecture by evaluating three classification CNNs: VGG16,¹ ResNet50,⁴^,⁵ and PyramidNet110-270⁶ with ShakeDrop regularization.⁷ These CNNs differ in architectures and numbers of parameters. In these experiments, the following dataset, degradation, and optimizers were used.

Dataset. The CIFAR-10 dataset,²³ comprising 60,000 RGB images with a resolution of $32 \times 32 pixels$ , was used. This dataset is split into 50,000 training images and 10,000 testing images across 10 classes.

Degradation. JPEG distortion was focused on because JPEG compression is the de facto standard of image compression. In all experiments for JPEG distortion, JPEG quality factors, ranging from 1 to 100, were used instead of degradation levels. For clarity, we call the CIFAR-10 dataset that applies JPEG distortion as “JPEG CIFAR-10.”

Optimizers. For VGG16 and ResNet50, RAdam³⁰ optimizer was used with an initial learning rate of 0.001 and a weight decay of 0.0001. On the other hand, PyramidNet110-270 with ShakeDrop regularization was trained using stochastic gradient decent (SGD), consistent with the approach reported by Yamada et al.⁷ The learning rate was set to 0.1 initially and scaled down by multiplying 0.1 at the 75th and 150th epochs. Additionally, a momentum of 0.9 and a weight decay of 0.0001 were applied.

Table 2 shows the interval mean accuracy of JPEG CIFAR-10 for VGG16, ResNet50, and PyramidNet110-270 with ShakeDrop regularization. Regarding the accuracy of JPEG CIFAR-10, the results are shown in Appendix B. First, focusing on the results of VGG16, the proposed method demonstrates classifying degraded images without losing the classification ability of clean images. Comparing “Clean” and “Mixed,” “Clean” shows a higher accuracy in $\bar{Acc} (81,100)$ and clean images than “Mixed,” but it is worse for other interval mean accuracies. In particular, the performance of “Mixed” drops significantly by 0.052 in classifying clean images. In the case of “Mixed R.E.,” “Mixed R.E.” significantly underperforms “Clean” by 0.040 in the classification of clean images, and “Mixed R.E.” outperforms “Mixed.” Regarding “CutBlur,” “CutBlur” almost outperforms “Mixed” but still underperforms “Clean” by 0.026 in the classification of clean images. When comparing “Proposed” with “Clean,” “Proposed” exhibits only a difference of 0.006, indicating good classification performance of clean images. Moreover, “Proposed” almost outperforms three existing methods, i.e., “Mixed,” “Mixed R.E.,” and “CutBlur,” except for $\bar{Acc} (1,20)$ . The results show that a classification CNN trained by the proposed method can classify degraded images without losing the classification ability of clean images.

Table 2

Interval mean accuracy of JPEG CIFAR-10 with three networks: VGG16, ResNet50, and PyramidNet110-270 with ShakeDrop regularization. The JPEG quality factor is used instead of a degradation level. “All” means the interval mean accuracy for all quality factors, including a clean image. All results are averaged over three runs. Italic numbers represent standard deviations. In each interval, the bold value indicates the highest interval mean accuracy among all five methods.

Network architecture	Interval mean accuracy	Existing methods				Proposed
Network architecture	Interval mean accuracy	Clean	Mixed	Mixed R.E.	CutBlur	Proposed
VGG16	$\bar{Acc} (1,20)$	0.469 ± 0.016	0.752 ± 0.008	0.757 ± 0.003	0.755 ± 0.003	0.752 ± 0.000
	$\overline{Acc} (21,40)$	0.730 ± 0.010	0.850 ± 0.003	0.858 ± 0.004	0.862 ± 0.002	0.865 ± 0.001
	$\overline{Acc} (41,60)$	0.800 ± 0.005	0.862 ± 0.004	0.871 ± 0.005	0.876 ± 0.004	0.884 ± 0.002
	$\overline{Acc} (61,80)$	0.843 ± 0.002	0.870 ± 0.004	0.880 ± 0.005	0.885 ± 0.003	0.896 ± 0.001
	$\overline{Acc} (81,100)$	0.897 ± 0.001	0.875 ± 0.003	0.887 ± 0.006	0.895 ± 0.003	0.911 ± 0.001
	Clean images	0.928 ± 0.001	0.876 ± 0.004	0.888 ± 0.005	0.902 ± 0.003	0.922 ± 0.001
	All	0.750 ± 0.006	0.842 ± 0.003	0.851 ± 0.004	0.855 ± 0.002	0.862 ± 0.001
ResNet50	$\overline{Acc} (1,20)$	0.429 ± 0.011	0.767 ± 0.005	0.771 ± 0.003	0.759 ± 0.001	0.760 ± 0.003
	$\overline{Acc} (21,40)$	0.687 ± 0.016	0.861 ± 0.001	0.874 ± 0.000	0.875 ± 0.002	0.877 ± 0.001
	$\overline{Acc} (41,60)$	0.771 ± 0.012	0.873 ± 0.001	0.888 ± 0.001	0.893 ± 0.002	0.897 ± 0.001
	$\overline{Acc} (61,80)$	0.832 ± 0.007	0.881 ± 0.003	0.897 ± 0.001	0.904 ± 0.003	0.911 ± 0.002
	$\overline{Acc} (81,100)$	0.904 ± 0.002	0.885 ± 0.003	0.904 ± 0.001	0.915 ± 0.002	0.928 ± 0.001
	Clean images	0.944 ± 0.001	0.887 ± 0.005	0.906 ± 0.001	0.923 ± 0.001	0.943 ± 0.001
	All	0.727 ± 0.009	0.854 ± 0.001	0.867 ± 0.001	0.870 ± 0.001	0.875 ± 0.001
PyramidNet 110-270 with ShakeDrop regularization	$\overline{Acc} (1,20)$	0.410 ± 0.010	0.806 ± 0.001	0.816 ± 0.002	0.790 ± 0.001	0.791 ± 0.001
	$\overline{Acc} (21,40)$	0.698 ± 0.001	0.907 ± 0.002	0.919 ± 0.001	0.909 ± 0.001	0.912 ± 0.001
	$\overline{Acc} (41,60)$	0.794 ± 0.001	0.920 ± 0.002	0.932 ± 0.000	0.926 ± 0.001	0.931 ± 0.001
	$\overline{Acc} (61,80)$	0.857 ± 0.003	0.928 ± 0.002	0.942 ± 0.001	0.939 ± 0.001	0.945 ± 0.001
	$\overline{Acc} (81,100)$	0.932 ± 0.001	0.934 ± 0.002	0.948 ± 0.001	0.952 ± 0.001	0.960 ± 0.001
	Clean images	0.966 ± 0.001	0.936 ± 0.001	0.951 ± 0.002	0.960 ± 0.001	0.970 ± 0.001
	All	0.740 ± 0.001	0.899 ± 0.002	0.912 ± 0.000	0.904 ± 0.001	0.908 ± 0.001

Subsequently, we confirm that the proposed method is effective not only for VGG16 but also for other CNNs. Regarding ResNet50, the interval mean accuracy of JPEG CIFAR-10 is shown in the middle row of Table 2. Only “Proposed” achieves an almost equivalent accuracy to “Clean” in classifying clean images. However, “Proposed” underperforms “Mixed” in $\bar{Acc} (1,20)$ . Furthermore, the interval mean accuracy for PyramidNet110-270 with ShakeDrop regularization is shown in the bottom of Table 2. Comparing “Proposed” and “Mixed,” “Proposed” outperforms “Mixed” except for $\overline{Acc} (1,20)$ . This tendency is similar to the tendency of ResNet50. However, the relationship between “Proposed” and “Mixed R.E.” is slightly different from VGG16 and ResNet50. Specifically, “Proposed” performs better than “Mixed R.E.” for $\bar{Acc} (61,80)$ , $\overline{Acc} (81,100)$ , and clean images but worse for other interval mean accuracies, as seen in Table 2. In other words, “Proposed” shows a good performance for high-quality images, and “Mixed R.E.” shows a good performance for low-quality images. Notably, only “Proposed” outperforms “Clean” in classifying clean images. As a result, the proposed method is also effective for ResNet50 and PyramidNet110-270 with ShakeDrop regularization. This indicates that the proposed method does not depend on a specific network architecture.

4.3.

Application to Other Degradations

The proposed method is evaluated for other degradations with the CIFAR-10 dataset: Gaussian blur, AWGN, and salt-and-pepper noise. In these evaluations, PyramidNet110-270 with ShakeDrop regularization was used because it showed the best performance in classifying JPEG CIFAR-10.

The second row of Table 3 shows the interval mean accuracy of Gaussian blurring CIFAR-10. The degradation level denotes the standard deviation of a Gaussian blur kernel, varying from 0 to 5 in increments of 0.1. In classifying clean images, “Proposed” outperforms “Clean” and “CutBlur.” Moreover, “Proposed” is superior for high-quality images, whereas “Mixed R.E.” is superior for low-quality images. This tendency is almost similar to one of JPEG CIFAR-10.

Table 3

Interval mean accuracy of CIFAR-10 under several degradations using PyramidNet110-270 with Shakedrop regularization. The degradation level denotes a standard deviation of a kernel for Gaussian blur, a standard deviation for AWGN, and a density for salt-and-pepper noise. “All” means the interval mean accuracy for all degradation levels, including a clean image. All results are averaged over three runs. Italic numbers represent standard deviations. In each interval, the bold value indicates the highest interval mean accuracy among all five methods.

Degradation type	Interval mean accuracy	Existing methods				Proposed
Degradation type	Interval mean accuracy	Clean	Mixed	Mixed R.E.	CutBlur	Proposed
Gaussian blur	Clean images	0.966 ± 0.001	0.930 ± 0.002	0.940 ± 0.002	0.968 ± 0.001	0.974 ± 0.002
	$\overline{Acc} (0.1, 1)$	0.864 ± 0.004	0.929 ± 0.002	0.938 ± 0.002	0.963 ± 0.001	0.967 ± 0.001
	$\overline{Acc} (1.1, 2)$	0.237 ± 0.008	0.913 ± 0.002	0.921 ± 0.002	0.932 ± 0.001	0.935 ± 0.001
	$\overline{Acc} (2.1, 3)$	0.161 ± 0.009	0.885 ± 0.001	0.892 ± 0.002	0.885 ± 0.002	0.885 ± 0.002
	$\overline{Acc} (3.1, 4)$	0.167 ± 0.005	0.848 ± 0.001	0.853 ± 0.002	0.829 ± 0.001	0.828 ± 0.003
	$\overline{Acc} (4.1, 5)$	0.170 ± 0.004	0.807 ± 0.001	0.813 ± 0.001	0.774 ± 0.001	0.772 ± 0.000
	All	0.332 ± 0.002	0.878 ± 0.001	0.885 ± 0.002	0.879 ± 0.001	0.879 ± 0.001
AWGN	Clean images	0.966 ± 0.001	0.948 ± 0.001	0.960 ± 0.001	0.964 ± 0.000	0.972 ± 0.001
	$\overline{Acc} (1,10)$	0.919 ± 0.002	0.946 ± 0.002	0.957 ± 0.002	0.960 ± 0.000	0.966 ± 0.001
	$\overline{Acc} (11,20)$	0.629 ± 0.030	0.937 ± 0.001	0.947 ± 0.001	0.945 ± 0.000	0.951 ± 0.001
	$\overline{Acc} (21,30)$	0.316 ± 0.038	0.923 ± 0.001	0.933 ± 0.001	0.926 ± 0.000	0.931 ± 0.002
	$\overline{Acc} (31,40)$	0.185 ± 0.019	0.905 ± 0.001	0.916 ± 0.001	0.903 ± 0.001	0.908 ± 0.002
	$\overline{Acc} (41,50)$	0.140 ± 0.010	0.883 ± 0.000	0.895 ± 0.002	0.876 ± 0.001	0.880 ± 0.002
	All	0.448 ± 0.017	0.919 ± 0.001	0.930 ± 0.001	0.923 ± 0.001	0.928 ± 0.001
Salt-and-pepper noise	Clean images	0.966 ± 0.001	0.962 ± 0.002	0.970 ± 0.002	0.964 ± 0.001	0.973 ± 0.001
	$\overline{Acc} (0, 0.05)$	0.592 ± 0.004	0.962 ± 0.002	0.969 ± 0.003	0.962 ± 0.002	0.972 ± 0.001
	$\overline{Acc} (0.06, 0.1)$	0.229 ± 0.012	0.960 ± 0.002	0.968 ± 0.003	0.959 ± 0.003	0.970 ± 0.002
	$\overline{Acc} (0.11, 0.15)$	0.144 ± 0.004	0.958 ± 0.002	0.966 ± 0.003	0.956 ± 0.004	0.968 ± 0.002
	$\overline{Acc} (0.16, 0.2)$	0.124 ± 0.008	0.956 ± 0.003	0.965 ± 0.004	0.952 ± 0.004	0.966 ± 0.001
	$\overline{Acc} (0.21, 0.25)$	0.118 ± 0.010	0.953 ± 0.003	0.962 ± 0.005	0.947 ± 0.005	0.963 ± 0.002
	All	0.269 ± 0.005	0.958 ± 0.003	0.966 ± 0.003	0.955 ± 0.004	0.968 ± 0.001

Subsequently, the evaluation shifts to CIFAR-10 with added white Gaussian noise, termed AWGN CIFAR-10. The interval mean accuracy of AWGN CIFAR-10 is presented in the third row of Table 3. The degradation level denotes the standard deviation of Gaussian noise, varying from 0 to 50 in increments of 1.0, in which the intensity is the 8-bit base. “Proposed” outperforms “Clean” in classifying clean images. In addition, “Proposed” is superior for high-quality images, whereas “Mixed R.E.” is superior for low-quality images. This tendency is similar to one of Gaussian blurring CIFAR-10.

Finally, the last row of Table 3 shows the interval mean accuracy of CIFAR-10 with added salt-and-pepper noise. The degradation level signifies the density of salt-and-pepper noise, varying from 0 to 0.25 in increments of 0.01. “Proposed” outperforms other existing methods for all interval mean accuracies.

These results show that the proposed method can train a classification network without losing the classification ability of high-quality images, including clean images, for the above three degradations, that is, the proposed method is effective for not only JPEG distortion but also other degradations.

4.4.

Analysis for CIFAR-100

To evaluate the efficacy of the proposed method on other datasets, the proposed method is applied to CIFAR-100.²³ CIFAR-100 has 100 classes and contains 50,000 training images and 10,000 testing images. Regarding a classification network, PyramidNet110-270 with ShakeDrop regularization was used. The training strategy was almost the same as that of CIFAR-10 except for the detailed settings of SGD. The learning rate was set to 0.5 initially and scaled down by multiplying 0.1 at the 150th and 225th epochs. Additionally, a momentum of 0.9 and a weight decay of 0.0001 were applied. Regarding degradations, four types were analyzed: JPEG distortion, Gaussian blur, AWGN, and salt-and-pepper noise. The range of JPEG quality factors and degradation levels was consistent with that of CIFAR-10 for each type.

Table 4 shows the interval mean accuracy of CIFAR-100 degraded by four types of degradation. Only “Proposed” attains the same level of performance as “Clean” in classifying clean images for all degradations. However, “Proposed” underperforms “Mixed” or “Mixed R.E.” for the classification of low-quality images except for salt-and-pepper noise. Regarding salt-and-pepper noise, “Mixed R.E.” and “Proposed” show almost similar performance for every interval mean accuracy.

Table 4

Interval mean accuracy of CIFAR-100 under several degradations using PyramidNet110-270 with Shakedrop regularization. For JPEG distortion, the JPEG quality factor is used instead of a degradation level. Regarding the other degradations, the degradation level denotes a standard deviation of a kernel for Gaussian blur, a standard deviation for AWGN, and a density for salt-and-pepper noise. “All” means the interval mean accuracy for all quality factors or degradation levels, including a clean image. All results are based on a single run. In each interval, the bold value indicates the highest interval mean accuracy among all five methods.

Degradation type	Interval mean accuracy	Existing methods				Proposed
Degradation type	Interval mean accuracy	Clean	Mixed	Mixed R.E.	CutBlur	Proposed
JPEG distortion	$\overline{Acc} (1,20)$	0.143	0.568	0.573	0.537	0.536
	$\overline{Acc} (21,40)$	0.391	0.712	0.718	0.706	0.716
	$\bar{Acc} (41,60)$	0.517	0.733	0.739	0.738	0.752
	$\overline{Acc} (61,80)$	0.610	0.747	0.754	0.762	0.776
	$\overline{Acc} (81,100)$	0.739	0.758	0.769	0.790	0.804
	Clean images	0.838	0.762	0.772	0.814	0.835
	All	0.484	0.704	0.711	0.708	0.718
Gaussian blur	Clean images	0.838	0.721	0.730	0.829	0.838
	$\bar{Acc} (0.1, 1)$	0.677	0.718	0.727	0.813	0.821
	$\bar{Acc} (1.1, 2)$	0.120	0.701	0.704	0.749	0.749
	$\bar{Acc} (2.1, 3)$	0.032	0.671	0.670	0.662	0.661
	$\overline{Acc} (3.1, 4)$	0.019	0.624	0.623	0.581	0.575
	$\overline{Acc} (4.1, 5)$	0.015	0.567	0.573	0.502	0.505
	All	0.186	0.657	0.661	0.665	0.666
AWGN	Clean images	0.838	0.784	0.794	0.824	0.838
	$\overline{Acc} (1,10)$	0.628	0.778	0.790	0.813	0.822
	$\overline{Acc} (11,20)$	0.190	0.760	0.771	0.779	0.788
	$\overline{Acc} (21,30)$	0.057	0.736	0.744	0.741	0.748
	$\overline{Acc} (31,40)$	0.028	0.707	0.716	0.699	0.709
	$\overline{Acc} (41,50)$	0.018	0.672	0.685	0.659	0.663
	All	0.197	0.732	0.742	0.740	0.748
Salt-and-pepper noise	Clean images	0.838	0.830	0.844	0.837	0.847
	$\overline{Acc} (0, 0.05)$	0.230	0.830	0.844	0.836	0.845
	$\overline{Acc} (0.06, 0.1)$	0.039	0.830	0.842	0.835	0.844
	$\overline{Acc} (0.11, 0.15)$	0.021	0.829	0.842	0.834	0.842
	$\overline{Acc} (0.16, 0.2)$	0.015	0.827	0.840	0.832	0.839
	$\overline{Acc} (0.21, 0.25)$	0.012	0.826	0.837	0.828	0.836
	All	0.093	0.828	0.841	0.833	0.841

The results show that the proposed method can classify high-quality images well, including clean images, whereas “Mixed R.E.” is superior for the classification of low-quality images. This tendency is the same as one observed in CIFAR-10 evaluations, that is, the proposed method is also effective for the CIFAR-100 dataset.

4.5.

Analysis for TINY ImageNet

In this section, the efficacy of the proposed method is evaluated using TINY ImageNet.²⁴ This dataset was chosen due to its higher resolution compared with the CIFAR datasets. TINY ImageNet contains 100,000 training images and 10,000 testing images for 200 classes. Each class has 500 images for training and 50 images for testing. TINY ImageNet are RGB images with a resolution of $64 \times 64 pixels$ . For the network architecture, ResNet56⁴^,⁵ was utilized instead of PyramidNet110-270 with ShakeDrop regularization to reduce the training time. The optimization method was SGD. The learning rate was set to 0.1 initially and scaled down by multiplying 0.1 at epochs 60, 120, 160, 200, 240, and 280. Additionally, a momentum of 0.9 and a weight decay of 0.0001 were applied.

Table 5 shows the interval mean accuracy of TINY ImageNet degraded by four types of degradation: JPEG distortion, Gaussian blur, AWGN, and salt-and-pepper noise. Regarding AWGN and salt-and-pepper noise, “Mixed” outperforms “Clean” in the classification of clean images and already attains this paper’s goal. Consequently, “Mixed R.E.” shows almost the best performance for AWGN and salt-and-pepper noise. This superior performance appears to result from the higher resolution of TINY ImageNet in addition to naive pixel-wise degradations. However, “Proposed” outperforms “Mixed R.E.” in classifying clean images and almost outperforms “Mixed.”

Table 5

Interval mean accuracy of TINY ImageNet under several degradations with ResNet56. For JPEG distortion, the JPEG quality factor is used instead of a degradation level. Regarding the other degradations, the degradation level denotes a standard deviation of a kernel for Gaussian blur, a standard deviation for AWGN, and a density for salt-and-pepper noise. “All” means the interval mean accuracy for all quality factors or degradation levels, including a clean image. All results are based on a single run. In each interval, the bold value indicates the highest interval mean accuracy among all five methods.

Degradation type	Interval mean accuracy	Existing methods				Proposed
Degradation type	Interval mean accuracy	Clean	Mixed	Mixed R.E.	CutBlur	Proposed
JPEG distortion	$\overline{Acc} (1,20)$	0.126	0.428	0.444	0.396	0.402
	$\overline{Acc} (21,40)$	0.398	0.541	0.557	0.533	0.541
	$\overline{Acc} (41,60)$	0.464	0.553	0.568	0.547	0.557
	$\overline{Acc} (61,80)$	0.558	0.570	0.587	0.578	0.588
	$\overline{Acc} (81,100)$	0.574	0.571	0.590	0.581	0.595
	Clean images	0.578	0.571	0.589	0.585	0.596
	All	0.426	0.533	0.550	0.528	0.537
Gaussian blur	Clean images	0.578	0.494	0.490	0.572	0.594
	$\overline{Acc} (0.1, 1)$	0.415	0.496	0.492	0.553	0.574
	$\overline{Acc} (1.1, 2)$	0.077	0.493	0.489	0.514	0.526
	$\overline{Acc} (2.1, 3)$	0.034	0.475	0.474	0.466	0.474
	$\overline{Acc} (3.1, 4)$	0.025	0.446	0.449	0.415	0.416
	$\overline{Acc} (4.1, 5)$	0.020	0.407	0.419	0.361	0.365
	All	0.123	0.464	0.465	0.464	0.473
AWGN	Clean images	0.578	0.579	0.594	0.597	0.601
	$\overline{Acc} (1,10)$	0.562	0.578	0.592	0.594	0.597
	$\overline{Acc} (11,20)$	0.446	0.568	0.585	0.574	0.585
	$\overline{Acc} (21,30)$	0.284	0.554	0.569	0.552	0.563
	$\overline{Acc} (31,40)$	0.161	0.531	0.550	0.524	0.535
	$\overline{Acc} (41,50)$	0.085	0.502	0.528	0.490	0.501
	All	0.313	0.547	0.565	0.548	0.558
Salt-and-pepper noise	Clean images	0.578	0.581	0.596	0.583	0.597
	$\overline{Acc} (0, 0.05)$	0.362	0.582	0.597	0.583	0.595
	$\overline{Acc} (0.06, 0.1)$	0.131	0.582	0.596	0.580	0.595
	$\overline{Acc} (0.11, 0.15)$	0.055	0.582	0.596	0.580	0.591
	$\overline{Acc} (0.16, 0.2)$	0.030	0.581	0.595	0.576	0.588
	$\overline{Acc} (0.21, 0.25)$	0.021	0.578	0.593	0.574	0.584
	All	0.137	0.581	0.595	0.579	0.591

Next, regarding JPEG distortion, the accuracy difference between “Clean” and “Mixed” is relatively minor by 0.007 in classifying clean images. Thus “Mixed R.E.” can outperform “Clean” in classifying clean images and is the best in the interval “All.” This results from the image quality being not so degraded as JPEG CIFAR-10 due to TINY ImageNet being higher resolution even if JPEG distortion is applied. However, “Proposed” outperforms “Mixed R.E.” for classifying high-quality images including clean images and almost outperforms “Mixed.”

Finally, focusing on Gaussian blur, only “Proposed” outperforms “Clean” in classifying clean images as well as high-quality images. In addition, “Proposed” is the best in the interval “All.”

As a result, only “Proposed” stably outperforms “Clean” in classifying clean images for all types of degradation. The proposed method is effective for not only CIFAR-100 but also for TINY ImageNet.

5. Conclusions

This paper proposed a data augmentation technique for degraded images with various levels of degradation by applying distinct data augmentations for each clean and degraded image. This paper also showed that the proposed method can effectively train a classification network of degraded images without losing the classification ability of clean images. In addition, experimental results showed that the proposed method creates a more stable performance in classifying high-quality images than mixed training with random erasing and CutBlur. Furthermore, the proposed method’s effectiveness was confirmed for four types of degradation: JPEG distortion, Gaussian blur, AWGN, and salt-and-pepper noise. Finally, the robustness of the proposed method was demonstrated for three datasets, i.e., CIFAR-10, CIFAR-100, and TINY ImageNet, and for four classification CNNs, i.e., VGG16, ResNet50, ResNet56, and PyramidNet110-270 with ShakeDrop regularization.

Although this paper proved the effectiveness of the proposed method, we found an improvement opportunity. The proposed method enhances the classification of high-quality images; however, it reduces the classification ability of low-quality images. A straightforward extension could consider the domain sampling probability as an adjustable parameter. Optimizing the domain sampling probability might reduce the observed trade-off. This trade-off should be further investigated, especially by analyzing the feature discrepancy between clean and degraded images, in the near future.

6. Appendix A: Detailed Analyses for the Proposed Method

In this section, the proposed method is deeply analyzed in terms of the combination of data augmentation, decision probability, and domain sampling probability. All analyses are performed using VGG16 trained with JPEG CIFAR-10.

6.1.

Combination of Data Augmentations

This section provides a numerical validation to ensure the appropriateness of the combined data augmentations implemented in the proposed method. The key point of the proposed method is to apply distinct data augmentations for each clean and degraded image domain. Here three distinct data augmentations are considered for each domain: random erasing, CutBlur, and the absence of any data augmentation. In total, nine possible combinations are examined, as shown in Table 6. The notation “X–Y” in Table 6 denotes that X and Y are applied for each clean and degraded image, where “N,” “R,” and “C” represent the absence of any data augmentation (none), random erasing (R.E.), and CutBlur, respectively.

Table 6

Interval mean accuracy of JPEG CIFAR-10 based on VGG16 with respect to nine possible combinations of random erasing (R.E.), CutBlur, and the absence of data augmentation (none). Erasing and decision probabilities were set to 1.0. Moreover, domain sampling probabilities were 12 for both clean and degraded image domains. R–C is averaged over three runs, and the others are based on a single run. Italic numbers represent standard deviations. In each interval, the bold value indicates the highest interval mean accuracy among all combinations.

Notation	N–N	R–N	C–N	N–R	N–C	R–R	C–C	C–R	R–C
Clean images	None	R.E.	CutBlur	None	None	R.E.	CutBlur	CutBlur	R.E.
Degraded images	None	None	None	R.E.	CutBlur	R.E.	CutBlur	R.E.	CutBlur
$\overline{Acc} (1,20)$	0.761	0.756	0.757	0.759	0.754	0.746	0.754	0.758	0.752 ± 0.000
$\overline{Acc} (21,40)$	0.856	0.860	0.855	0.864	0.866	0.861	0.864	0.864	0.865 ± 0.001
$\overline{Acc} (41,60)$	0.867	0.875	0.868	0.879	0.880	0.878	0.880	0.878	0.884 ± 0.002
$\overline{Acc} (61,80)$	0.875	0.888	0.878	0.889	0.889	0.889	0.889	0.885	0.896 ± 0.001
$\overline{Acc} (81,100)$	0.881	0.901	0.885	0.899	0.899	0.902	0.899	0.894	0.911 ± 0.001
Clean images	0.886	0.909	0.890	0.905	0.905	0.910	0.905	0.898	0.922 ± 0.001
All	0.848	0.857	0.849	0.859	0.858	0.856	0.858	0.856	0.862 ± 0.001

Regarding a clean image domain, R–N almost outperforms C–N and shows better performance for high-quality images. By contrast, for a degraded image domain, the differences between N–R and N–C are relatively minor. Consequently, two combinations of data augmentations emerge as candidates: R–R and R–C. Comparing R–R and R–C, R–C outperforms R–R except for $\overline{Acc} (1,20)$ , as seen in Table 6. In addition, R–C shows the best performance for the classification of clean images and almost outperforms other combinations except for $\overline{Acc} (1,20)$ . These results indicate that the proposed method R–C is a rational combination of data augmentations.

Furthermore, the following analysis discusses the impact of random erasing and CutBlur on the proposed method R–C. First, random erasing is discussed. Focusing on $\overline{Acc} (1,20)$ , R–R notably underperforms both R–N and N–R by a margin of over 0.1, as shown in Table 6. Comparing R–N and N–R, R–N is slightly more accurate than N–R in classifying clean images. As random erasing reinforces extracting image features for a classification network, these results suggest that random erasing may amplify the differences between features of clean and degraded images. Therefore, random erasing should be selectively applied to one image domain. In the proposed method, random erasing enhances the classification ability of high-quality images, including clean images.

Next, CutBlur is discussed. Comparing N–N with C–N, their performance is nearly identical. This indicates that applying CutBlur to clean images does not significantly improve the results. However, N–C outperforms N–N by 0.1 in the interval “All” and is almost the same performance as C–C for all interval mean accuracies. This implies that CutBlur is not effective for a clean image domain and should be applied to a degraded image domain. Within the context of the proposed method, CutBlur boosts classification across all degradation levels, excluding extremely low-quality images. Moreover, CutBlur might enhance the classification of clean images because degraded images processed with CutBlur retain a region of clean images.

6.2.

Decision Probability

In Sec. 4, a decision probability of 1.0 was used. This section confirms how the proposed method performs by changing a decision probability. Table 7 shows the interval mean accuracy when the decision probability is varied from 0 to 1.0 in increments of 0.2. The classification performance for high-quality images improves as the decision probability increases. Regarding the classification of clean images, a decision probability of 1.0 outperforms all other probabilities. Therefore, a decision probability of 1.0 seems to be the most reasonable choice.

Table 7

Interval mean accuracy for VGG16 trained on JPEG CIFAR-10, with the decision probability varying from 0 to 1.0 in increments of 0.2. The case of 1.0 is averaged over three runs, and the others are based on a single run. Italic numbers represent standard deviations. In each interval, the bold value indicates the highest interval mean accuracy among all six decision probabilities.

Decision probability	0.0	0.2	0.4	0.6	0.8	1.0
$\overline{Acc} (1,20)$	0.761	0.758	0.761	0.757	0.753	0.752 ± 0.000
$\overline{Acc} (21,40)$	0.856	0.861	0.865	0.868	0.863	0.865 ± 0.001
$\overline{Acc} (41,60)$	0.867	0.878	0.882	0.886	0.881	0.884 ± 0.002
$\overline{Acc} (61,80)$	0.875	0.888	0.891	0.896	0.892	0.896 ± 0.001
$\overline{Acc} (81,100)$	0.881	0.897	0.904	0.908	0.908	0.911 ± 0.001
Clean images	0.886	0.902	0.914	0.914	0.919	0.922 ± 0.001
All	0.848	0.857	0.861	0.863	0.860	0.862 ± 0.001

6.3.

Domain Sampling Probability

The proposed method always uses a domain sampling probability of $\frac{1}{2} (= 0.5)$ . Here we numerically validate the impact of a domain sampling probability. Table 8 shows the interval mean accuracy when the domain sampling probability of degraded images is varied from 0.1 to 0.9 in increments of 0.1. From the point of this paper’s goal, the proposed method needs to be as close to both “Clean” in the classification of clean images and “Mixed” in the classification of low-quality images as possible. As seen in Table 2, “Clean” and “Mixed” show an accuracy of 0.928 for the classification of a clean image and $\overline{Acc} (1,20)$ of 0.752, respectively. Compared with these two values, probabilities around 0.5 and 0.6 are good choices, as seen in Table 8, that is, using a domain sampling probability of 0.5 seems plausible.

Table 8

Interval mean accuracy for VGG16 trained on JPEG CIFAR-10, with the domain sampling probability of degraded images varying from 0.1 to 0.9 in increments of 0.1. The case of 0.5 is averaged over three runs, and the others are based on a single run. Italic numbers represent standard deviations. In each interval, the bold value indicates the highest interval mean accuracy among all nine domain sampling probabilities.

Domain sampling probability	0.1	0.2	0.3	0.4	0.5	0.6	0.7	0.8	0.9
$\overline{Acc} (1,20)$	0.700	0.724	0.736	0.743	0.752 ± 0.000	0.750	0.756	0.758	0.763
$\overline{Acc} (21,40)$	0.841	0.853	0.856	0.860	0.865 ± 0.001	0.864	0.867	0.868	0.861
$\overline{Acc} (41,60)$	0.866	0.875	0.876	0.880	0.884 ± 0.002	0.881	0.882	0.885	0.877
$\overline{Acc} (61,80)$	0.886	0.890	0.891	0.892	0.896 ± 0.001	0.895	0.893	0.894	0.887
$\overline{Acc} (81,100)$	0.910	0.911	0.913	0.910	0.911 ± 0.001	0.909	0.906	0.904	0.898
Clean images	0.930	0.926	0.928	0.922	0.922 ± 0.001	0.921	0.914	0.913	0.902
All	0.842	0.851	0.855	0.858	0.862 ± 0.001	0.860	0.861	0.862	0.858

7. Appendix B: Accuracy of JPEG CIFAR-10

The accuracy of JPEG CIFAR-10 is presented for each JPEG quality factor using three networks: VGG16, ResNet50, and PyramidNet with ShakeDrop regularization. Figure 4 shows the classification accuracy of JPEG CIFAR-10 for each degradation level, where the accuracy of clean images is plotted next to the JPEG quality factor of 100. For all networks, “Proposed” outperforms “Mixed” over the JPEG quality factor of 20. Moreover, “Proposed” approaches to “Clean” as the JPEG quality factors increase. These observations are consistent with the analysis using the interval mean accuracy.

Fig. 4

Accuracy of JPEG CIFAR-10 for (a) VGG16, (b) ResNet50, and (c) PyramidNet110-270 with ShakeDrop regularization. The accuracy of clean images is plotted next to the JPEG quality factor of 100. All values are averaged over three runs. The accuracy for a JPEG quality factor of over 90 is zoomed in the right graphs.

Code and Data Availability

The code used to generate the results is available in a GitHub repository ( https://github.com/kendo-al/da-degradedimg_pytorch).

Acknowledgments

This work was supported by JSPS KAKENHI (Grant No. JP22K21311).

References

1.

K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in Int. Conf. Learn. Represent., (2015). Google Scholar

2.

J. Springenberg et al., “Striving simplicity: the all convolutional net,” in Int. Conf. Learn. Represent. Workshop Track, (2015). Google Scholar

3.

J. Tompson et al., “Efficient object localization using convolutional networks,” in IEEE Conference on Computer Vision and Pattern Recognition, 648 –656 (2015). https://doi.org/10.1109/CVPR.2015.7298664 Google Scholar

4.

K. He et al., “Deep residual learning for image recognition,” in IEEE Conf. Comput. Vision and Pattern Recognit., 770 –778 (2016). https://doi.org/10.1109/CVPR.2016.90 Google Scholar

5.

K. He et al., “Identity mappings in deep residual networks,” in Eur. Conf. Comput. Vision, 630 –645 (2016). https://doi.org/10.1007/978-3-319-46493-0_38 Google Scholar

6.

D. Han, J. Kim, J. Kim, “Deep pyramidal residual networks,” in IEEE Conf. Comput. Vision and Pattern Recognit., 5927 –5935 (2017). https://doi.org/10.1109/CVPR.2017.668 Google Scholar

7.

Y. Yamada et al., “ShakeDrop regularization for deep residual learning,” IEEE Access, 7 186126 –186136 https://doi.org/10.1109/ACCESS.2019.2960566 (2019). Google Scholar

8.

K. Endo, M. Tanaka and M. Okutomi, “CNN-based classification of degraded images,” in Proc. IS&T Int. Symp. Electron. Imaging, (2020). https://doi.org/10.2352/ISSN.2470-1173.2020.10.IPAS-028 Google Scholar

9.

Y. Pei et al., “Effects of image degradation and degradation removal to CNN-based image classification,” IEEE Trans. Pattern Anal. Mach. Intell., 43 (4), 1239 –1253 https://doi.org/10.1109/TPAMI.2019.2950923 ITPIDJ 0162-8828 (2021). Google Scholar

10.

X. Peng et al., “Fine-to-coarse knowledge transfer for low-res image classification,” in IEEE Int. Conf. Image Process., (2016). https://doi.org/10.1109/ICIP.2016.7533047 Google Scholar

11.

N. Das et al., “Keeping the bad guys out: protecting and vaccinating deep learning with JPEG compression,” (2017). https://arxiv.org/abs/1705.02900 Google Scholar

12.

D. Cai et al., “Convolutional low-resolution fine-grained classification,” Parttern Recognit. Lett., 119 166 –171 https://doi.org/10.1016/j.patrec.2017.10.020 (2019). Google Scholar

13.

Y. Pei et al., “Effects of image degradations to CNN-based image classification,” (2018). https://arxiv.org/abs/1810.05552 Google Scholar

14.

K. Endo, M. Tanaka and M. Okutomi, “Classifying degraded images over various levels of degradation,” in IEEE Int. Conf. Image Process., (2020). https://doi.org/10.1109/ICIP40778.2020.9191087 Google Scholar

15.

K. Endo, M. Tanaka and M. Okutomi, “CNN-based classification of degraded images with awareness of degradation levels,” IEEE Trans. Circuits Syst. Video Technol., 31 4046 –4057 https://doi.org/10.1109/TCSVT.2020.3045659 ITCTEM 1051-8215 (2021). Google Scholar

16.

Y. Pei, Y. Huang and X. Zhang, “Consistency guided network for degraded image classification,” IEEE Trans. Circuits Syst. Video Technol., 31 2231 –2246 https://doi.org/10.1109/TCSVT.2020.3016863 ITCTEM 1051-8215 (2021). Google Scholar

17.

S. Wan et al., “Feature consistency training with JPEG compressed images,” IEEE Trans. Circuits Syst. Video Technol., 30 4769 –4780 https://doi.org/10.1109/TCSVT.2019.2959815 ITCTEM 1051-8215 (2020). Google Scholar

18.

K. Endo, M. Tanaka and M. Okutomi, “CNN-based classification of degraded images without sacrificing clean images,” IEEE Access, 9 116094 –116104 https://doi.org/10.1109/ACCESS.2021.3105957 (2021). Google Scholar

19.

D. Daultani et al., “ILIAC: efficient classification of degraded images using knowledge distillation with cutout data augmentation,” Electron. Imaging, 35 (9), 296-1 –296-6 https://doi.org/10.2352/EI.2023.35.9.IPAS-296 ELIMEX (2023). Google Scholar

20.

K. Endo, M. Tanaka and M. Okutomi, “Semantic segmentation of degraded images using layer-wise feature adjustor,” in IEEE/CVF Winter Conf. Appl. Comput. Vision (WACV), 3204 –3212 (2023). https://doi.org/10.1109/WACV56688.2023.00322 Google Scholar

21.

Z. Zhong et al., “Random erasing data augmentation,” in Proc. AAAI Conf. Artif. Intell. (AAAI), (2020). https://doi.org/10.1609/aaai.v34i07.7000 Google Scholar

22.

J. Yoo, N. Ahn and K. Sohn, “Rethinking data augmentation for image super-resolution: a comprehensive analysis and a new strategy,” in IEEE/CVF Conf. Comput. Vision and Pattern Recognit., 8372 –8381 (2020). https://doi.org/10.1109/CVPR42600.2020.00840 Google Scholar

23.

A. Krizhevsky, “Learning multiple layers of features from tiny images,” (2009). Google Scholar

24.

Y. Le and X. Yang, “Tiny imagenet visual recognition challenge,” (2015). http://vision.stanford.edu/teaching/cs231n/reports/2015/pdfs/yle_project.pdf Google Scholar

25.

H. Naveed et al., “Survey: image mixing and deleting for data augmentation,” (2021). https://arxiv.org/abs/2106.07085 Google Scholar

26.

T. DeVries and G. W. Taylor, “Improved regularization of convolutional neural networks with cutout,” (2017). https://arxiv.org/abs/1708.04552 Google Scholar

27.

S. Yun et al., “CutMix: regularization strategy to train strong classifiers with localizable features,” in IEEE/CVF Int. Conf. Comput. Vision, 6022 –6031 (2019). https://doi.org/10.1109/ICCV.2019.00612 Google Scholar

28.

D. Hendrycks et al., “AugMix: a simple data processing method to improve robustness and uncertainty,” in Proc. Int. Conf. Learn. Represent. (ICLR), (2020). Google Scholar

29.

D. Guo et al., “Degraded image semantic segmentation with dense-Gram networks,” IEEE Trans. Image Process., 29 782 –795 https://doi.org/10.1109/TIP.2019.2936111 IIPRE4 1057-7149 (2020). Google Scholar

30.

L. Liu et al., “On the variance of the adaptive learning rate and beyond,” in Int. Conf. Learn. Represent., (2020). Google Scholar

Biography

Kazuki Endo is an associate professor at Teikyo Heisei University. He received his bachelor’s degree in mathematics, his master’s degree in industrial engineering and management, and his DEng degree in systems and control engineering from Tokyo Institute of Technology in 1997, 1999, and 2022, respectively. He joined Industrial Bank of Japan, Ltd. (current Mizuho bank) in 1999. Since 2022, he has been an associate professor in the Department of Business at Teikyo Heisei University.

CC BY: © The Authors. Published by SPIE under a Creative Commons Attribution 4.0 International License. Distribution or reproduction of this work in whole or in part requires full attribution of the original publication, including its DOI.

Citation Download Citation

Kazuki Endo "Data augmentation technique for degraded images without losing the classification ability of clean images," Journal of Electronic Imaging 33(2), 023009 (7 March 2024). https://doi.org/10.1117/1.JEI.33.2.023009

Received: 29 June 2023; Accepted: 5 February 2024; Published: 7 March 2024

Access the abstract

JOURNAL ARTICLE
17 PAGES

DOWNLOAD PAPER SAVE TO MY LIBRARY

GET CITATION

KEYWORDS

Image classification

Education and training

Distortion

Image quality

Network architectures

Image resolution

Image compression

1.

Introduction