Classification networks of degraded images need to deal with various strengths of degradation, referred to as degradation levels, in practical applications. However, there has been limited exploration of data augmentation techniques for degraded images with various degradation levels. We propose a data augmentation technique to apply distinct data augmentations to both clean and degraded image domains. Specifically, the proposed method uses random erasing and CutBlur data augmentations for a clean and degraded image, respectively. Experimental results show that the proposed method can effectively train a classification network of degraded images without losing the classification ability of clean images. Furthermore, the results also confirm the proposed method’s efficacy across various degradations, multiple network architectures, and several datasets. |
1.IntroductionImage recognition has seen remarkable progress through the use of deep convolutional neural networks (CNNs).1–7 Typically, these CNNs are trained with only clean images and are designed to input clean images. However, in real-world applications, such as autonomous driving, the input images of the networks often contain various degradations, such as noise, blur, and compression. Prior studies8,9 have pointed out that CNNs trained with only clean images cannot recognize degraded images well due to degradations. Therefore, recognizing degraded images becomes a more critical and realistic challenge compared with recognizing only clean images. This paper focuses on the CNN-based classification of degraded images because classification is the typical task of image recognition. Recently, the classification of degraded images has become an increasingly researched topic.9–19 A straightforward approach to address the classification of degraded images is to train a classification network using degraded images. Notably, even if these images include a single degradation, the degradation usually has various strengths of degradation. Therefore, the classification network should be trained over the various strengths of degradation. This paper uses a degradation level as a parameter representing the strength of degradation. For instance, the degradation levels are noise levels for additive white Gaussian noise (AWGN). This paper assumes that the original clean image of a degraded image can be acquired. Consequently, degraded images are assumed to be synthesized from clean images without any degradations using a degradation operator while changing degradation levels. Note that synthesizing degraded images can be regarded as a kind of data augmentation. To the best of the author’s knowledge, there is limited literature on data augmentation methods of degraded images over various levels of degradation. One such method is mixed training,10 which is a straightforward approach for augmenting degraded images with various levels of degradation. Mixed training involves the following four steps. (1) A clean image is randomly sampled from a training dataset. (2) A degradation level, followed by a uniform distribution, is randomly sampled. (3) A degraded image is acquired from the clean image by applying a degradation operator with the sampled degradation level. (4) A classification network is trained using the degraded image. However, a classification network trained by mixed training loses the classification ability of clean images compared with a network trained with only clean images8 because mixed training trains a network averagely over various levels of degradation, as illustrated in Fig. 1. To overcome this drawback, Endo et al.18,20 introduced a network structure termed the feature adjustor. This paper proposes a data augmentation technique to overcome this drawback without relying on special network structures, in which degraded images are assumed to have a single known degradation with unknown levels of degradation. Therefore, our goal is to construct a data augmentation technique for degraded images that can train a classification network of degraded images without losing the classification ability of clean images, as depicted in Fig. 1. Figure 2 illustrates several data augmentations of degraded images. Mixed training, as shown in Fig. 2(a), has already been mentioned. Figure 2(b) shows mixed training with random erasing,21 which first generates degraded images by mixed training and then applies random erasing to them. CutBlur22 is presented in Fig. 2(c). CutBlur generates a degraded image with a clean region or a clean image with a degraded region. These regions are highlighted as rectangles with white edges in Fig. 2(c). Generally, the same data augmentation methods are applied to both clean and degraded images during the training of classification networks. However, this might not be the best strategy because clean images belong to a distinct domain from degraded images. A more intuitive approach would be to apply appropriate augmentation methods to images based on their respective domains. Based on this idea, this paper proposes a data augmentation technique that applies different operations to clean and degraded images. Specifically, as illustrated in Fig. 2(d), the proposed method is a combination of random erasing for clean images and CutBlur for degraded images. Applying different data augmentations for each clean and degraded image enhances a classification network of degraded images without losing the classification ability of clean images. This paper’s contributions are as follows.
The remainder of this paper is organized as follows. Section 2 describes the related works of this paper. Section 3 explains the proposed method. Then experiments are described in Sec. 4. Finally, conclusions are described in Sec. 5. 2.Related Works2.1.Data Augmentations of Degraded ImagesThere are few papers related to data augmentations of degraded images with various degradation levels. Peng et al.10 investigated the fine-grained classification of low-resolution images. They proposed staged training in which a classifier is trained with high-resolution images before training the classifier with low-resolution images. Their aim was to transfer the knowledge of high-resolution images to the classifier of low-resolution images rather than to provide data augmentation. In their experiments, they used mixed training, which involves randomly sampling both low- and high-resolution images to train a network. Their results showed that mixed training is superior to staged training for the classification of high-resolution images. In this paper, mixed training is used as the baseline of data augmentation of degraded images with various levels of degradation. Meanwhile, Yoo et al.22 introduced the data augmentation technique CutBlur for single-image super-resolution. CutBlur replaces a region of either a high-resolution or a low-resolution image with the corresponding region of its paired image, assuming a pair of high-resolution and low-resolution images exists. This paper applies CutBlur to the classification of degraded images with various levels of degradation. Specifically, in the proposed method, CutBlur is applied to only degraded images. 2.2.Data Augmentations of Image Mixing and DeletingThere are many data augmentations of image mixing and deleting, as surveyed by Naveed et al.25 DeVries et al.26 introduced Cutout, which is a data augmentation technique that deletes a fixed-size square region from images. Cutout always deletes a square region from images but randomly selects the position of the region. Zhong et al.21 proposed a data augmentation technique called random erasing, which randomly deletes a rectangle region inside images. Notably, the size of the rectangle region is randomly determined. Moreover, random erasing allows for the replacement of the rectangle region with various colors or random noise. Yun et al.27 proposed CutMix data augmentation, which replaces a rectangle region of an image with a region of another image. Cutout, random erasing, and CutMix do not consider the presence of image degradation. By contrast, the proposed method takes into account degradation and changes data augmentation methods based on the presence or absence of degradation in a training image. Specifically, the proposed method applies random erasing to only clean images without any degradation. Hendrycks et al.28 introduced AUGMIX to boost robustness against a domain gap between training and testing images. First, AUGMIX performs multiple operations on an image independently and acquires corresponding images. Then those images are merged into a single image. Notably, AUGMIX does not incorporate any degradations, which testing images include, in data augmentations. This paper uses the same degradation operators between the training and testing image domains. Thus the problem setting of this paper differs from that of AUGMIX. 2.3.Classification of Degraded ImagesThree primary approaches exist for the classification of degraded images: a straightforward approach, a restoration approach, and a knowledge distillation approach. The straightforward approach10,11 trains an image classification network with degraded images directly. By contrast, the restoration approach9,12–15 is a sequential network composed of a restoration network and a classification network, in which the classification network is trained with clean images without any degradation. First, degraded images are restored by the restoration network. Next, restored images are input into the classification network trained with clean images. The classification network may be fine-tuned with restored images. The knowledge distillation approach for degraded images16–20,29 transfers the knowledge of a teacher network into a student network. Typically, the teacher network is a classification network trained with only clean images. The student network is trained with degraded images to coincide with image features or the predicted distribution of the teacher network, which has clean images as input. This paper focuses on the straightforward approach, which offers a clearer impact of data augmentations than other approaches. 3.Proposed MethodAlthough clean and degraded images belong to different domains, the same data augmentation methods are usually performed on them. However, it would be more reasonable to apply distinct data augmentations to each domain. Based on this concept, this paper proposes a data augmentation technique that applies distinct transformations to clean and degraded images, as illustrated in Fig. 3. Initially, mixed training10 is employed to sample degradation levels using a discrete uniform distribution, in which a clean image is not sampled. With the determined degradation level, a degraded image is synthesized by applying a degradation operator to a clean image. Subsequently, either a clean or degraded image is randomly sampled, as demonstrated in Fig. 3. If a clean image is selected, random erasing21 is applied to the clean image with a decision probability, , in which random erasing is used to erase a region inside the clean image. Though original random erasing can replace the erased region with random noise or various colors, the proposed method always sets the region colored black. Conversely, when a degraded image is selected, CutBlur22 is applied to the degraded image with decision probability , in which CutBlur replaces a region inside the degraded image with its corresponding clean image. The detailed algorithm of the proposed method is shown in Algorithm 1. Algorithm 1Detailed algorithm of the proposed method. Random [a,b] and Uniform [a,b] denote discrete and continuous uniform random number generators on [a,b], respectively. ⌊z⌋ returns a maximum integer not to exceed z.
4.ExperimentsThis section validates the efficacy of the proposed method. Table 1 shows four existing methods compared with the proposed method in terms of domain sampling probabilities and data augmentations, in which the domain sampling probability means a probability of sampling each domain for clean and degraded images. “Clean” signifies a naive training method using only clean images without any degradations. On the other hand, “Mixed” denotes mixed training, which randomly samples degradation levels, including a clean image and applies degradation. For “Mixed” the sampling probability of each degradation level is , where denotes the number of degradation levels, excluding a clean image. “Clean” and “Mixed” are baselines for clean images and degraded images, respectively. Then “Mixed R.E.” denotes mixed training combined with random erasing data augmentation. After generating degraded images using mixed training, “Mixed R.E.” applies random erasing. As based on mixed training, the domain sampling probability of “Mixed R.E.” is the same as that of “Mixed.” For a fair comparison with the proposed method, the region of an image was always erased and filled with black. Although the standard random erasing selects a height and an aspect ratio randomly, “Mixed R.E.” randomly samples a height and a width in the same way as Algorithm 1. Furthermore, “CutBlur” denotes the data augmentation method by the same name. “CutBlur” has a domain sampling probability of for each clean and degraded image domain. For a fair comparison with the proposed method, the decision probability of “CutBlur” was always set to 1.0. This implies that clean and degraded images were always mixed. The implementation of “CutBlur” mirrored the CutBlur part of Algorithm 1. Finally, “Proposed” stands for the proposed method. The decision probability of “Proposed” was set to 1.0, as discussed in Appendix 6.2. The parameters and for “Mixed R.E.,” “CutBlur,” and “Proposed” were fixed at 0.125 and 0.75, respectively. All experiments were executed using Python 3.7.7, Pytorch 1.9.0, CUDA 11.7, and PIL 9.3 on an NVIDIA RTX-A6000 GPU and an Intel Core i9-10940X clocked at CPU 3.3 GHz. Other detailed experiments are described in Appendix A. Table 1Comparison of existing and proposed methods in terms of domain sampling probabilities and data augmentations. “Mixed” and “Mixed R.E.” denote mixed training10 and mixed training with random erasing (R.E.),21 respectively. N represents the number of degradation levels used in the experiments, excluding a clean image.
4.1.Training and Evaluation ProceduresThe training procedure for experiments is as follows. First, the horizontal flip is randomly applied to a clean image sampled from a dataset. Then the clean image is transformed by each training condition, as seen in Table 1. Subsequently, random cropping yields an augmented image. Finally, a classification network is trained with the augmented image while minimizing the expectation of cross-entropy loss between estimated and true labels. The expectation is replaced by the sample mean on a minibatch in training. The details of the optimizer settings are explained in subsequent analyses because they depend on the structure of classification networks. The evaluation procedure is as follows. First, degraded testing images are synthesized from clean testing images by applying a degradation operator across every degradation level. Then a trained classification network infers class labels for both clean and degraded images. Finally, two evaluation metrics are calculated: accuracy and interval mean accuracy.8 The accuracy, denoted by Acc, gives the number of correct predictions divided by the total number of testing images. The interval mean accuracy is defined as where denotes a classification network with parameter . denotes a degradation operator such as JPEG distortion, Gaussian blur, etc. , , and represent a clean image, the associated true label of the clean image, and the ’th degradation level, respectively. and are some integers satisfying . The rationale behind using the interval mean accuracy is as follows. A calculated Acc for each degradation level may exhibit fluctuations due to the variance in a classification network’s predictions. This implies that the Acc does not necessarily show a smooth change for degradation levels, even if degradation levels are consecutive. Therefore, the interval mean accuracy, which averages accuracies over a range of degradation levels, provides a more coherent and effective metric than analyzing individual Acc. The interval mean accuracy simplifies the process of understanding the classification performance over various levels of degradation.4.2.Analysis for JPEG CIFAR-10Now, experimental comparisons are performed to confirm the proposed method’s effectiveness. Furthermore, we also demonstrate that the proposed method does not depend on a specific network architecture by evaluating three classification CNNs: VGG16,1 ResNet50,4,5 and PyramidNet110-2706 with ShakeDrop regularization.7 These CNNs differ in architectures and numbers of parameters. In these experiments, the following dataset, degradation, and optimizers were used. Dataset. The CIFAR-10 dataset,23 comprising 60,000 RGB images with a resolution of , was used. This dataset is split into 50,000 training images and 10,000 testing images across 10 classes. Degradation. JPEG distortion was focused on because JPEG compression is the de facto standard of image compression. In all experiments for JPEG distortion, JPEG quality factors, ranging from 1 to 100, were used instead of degradation levels. For clarity, we call the CIFAR-10 dataset that applies JPEG distortion as “JPEG CIFAR-10.” Optimizers. For VGG16 and ResNet50, RAdam30 optimizer was used with an initial learning rate of 0.001 and a weight decay of 0.0001. On the other hand, PyramidNet110-270 with ShakeDrop regularization was trained using stochastic gradient decent (SGD), consistent with the approach reported by Yamada et al.7 The learning rate was set to 0.1 initially and scaled down by multiplying 0.1 at the 75th and 150th epochs. Additionally, a momentum of 0.9 and a weight decay of 0.0001 were applied. Table 2 shows the interval mean accuracy of JPEG CIFAR-10 for VGG16, ResNet50, and PyramidNet110-270 with ShakeDrop regularization. Regarding the accuracy of JPEG CIFAR-10, the results are shown in Appendix B. First, focusing on the results of VGG16, the proposed method demonstrates classifying degraded images without losing the classification ability of clean images. Comparing “Clean” and “Mixed,” “Clean” shows a higher accuracy in and clean images than “Mixed,” but it is worse for other interval mean accuracies. In particular, the performance of “Mixed” drops significantly by 0.052 in classifying clean images. In the case of “Mixed R.E.,” “Mixed R.E.” significantly underperforms “Clean” by 0.040 in the classification of clean images, and “Mixed R.E.” outperforms “Mixed.” Regarding “CutBlur,” “CutBlur” almost outperforms “Mixed” but still underperforms “Clean” by 0.026 in the classification of clean images. When comparing “Proposed” with “Clean,” “Proposed” exhibits only a difference of 0.006, indicating good classification performance of clean images. Moreover, “Proposed” almost outperforms three existing methods, i.e., “Mixed,” “Mixed R.E.,” and “CutBlur,” except for . The results show that a classification CNN trained by the proposed method can classify degraded images without losing the classification ability of clean images. Table 2Interval mean accuracy of JPEG CIFAR-10 with three networks: VGG16, ResNet50, and PyramidNet110-270 with ShakeDrop regularization. The JPEG quality factor is used instead of a degradation level. “All” means the interval mean accuracy for all quality factors, including a clean image. All results are averaged over three runs. Italic numbers represent standard deviations. In each interval, the bold value indicates the highest interval mean accuracy among all five methods.
Subsequently, we confirm that the proposed method is effective not only for VGG16 but also for other CNNs. Regarding ResNet50, the interval mean accuracy of JPEG CIFAR-10 is shown in the middle row of Table 2. Only “Proposed” achieves an almost equivalent accuracy to “Clean” in classifying clean images. However, “Proposed” underperforms “Mixed” in . Furthermore, the interval mean accuracy for PyramidNet110-270 with ShakeDrop regularization is shown in the bottom of Table 2. Comparing “Proposed” and “Mixed,” “Proposed” outperforms “Mixed” except for . This tendency is similar to the tendency of ResNet50. However, the relationship between “Proposed” and “Mixed R.E.” is slightly different from VGG16 and ResNet50. Specifically, “Proposed” performs better than “Mixed R.E.” for , , and clean images but worse for other interval mean accuracies, as seen in Table 2. In other words, “Proposed” shows a good performance for high-quality images, and “Mixed R.E.” shows a good performance for low-quality images. Notably, only “Proposed” outperforms “Clean” in classifying clean images. As a result, the proposed method is also effective for ResNet50 and PyramidNet110-270 with ShakeDrop regularization. This indicates that the proposed method does not depend on a specific network architecture. 4.3.Application to Other DegradationsThe proposed method is evaluated for other degradations with the CIFAR-10 dataset: Gaussian blur, AWGN, and salt-and-pepper noise. In these evaluations, PyramidNet110-270 with ShakeDrop regularization was used because it showed the best performance in classifying JPEG CIFAR-10. The second row of Table 3 shows the interval mean accuracy of Gaussian blurring CIFAR-10. The degradation level denotes the standard deviation of a Gaussian blur kernel, varying from 0 to 5 in increments of 0.1. In classifying clean images, “Proposed” outperforms “Clean” and “CutBlur.” Moreover, “Proposed” is superior for high-quality images, whereas “Mixed R.E.” is superior for low-quality images. This tendency is almost similar to one of JPEG CIFAR-10. Table 3Interval mean accuracy of CIFAR-10 under several degradations using PyramidNet110-270 with Shakedrop regularization. The degradation level denotes a standard deviation of a kernel for Gaussian blur, a standard deviation for AWGN, and a density for salt-and-pepper noise. “All” means the interval mean accuracy for all degradation levels, including a clean image. All results are averaged over three runs. Italic numbers represent standard deviations. In each interval, the bold value indicates the highest interval mean accuracy among all five methods.
Subsequently, the evaluation shifts to CIFAR-10 with added white Gaussian noise, termed AWGN CIFAR-10. The interval mean accuracy of AWGN CIFAR-10 is presented in the third row of Table 3. The degradation level denotes the standard deviation of Gaussian noise, varying from 0 to 50 in increments of 1.0, in which the intensity is the 8-bit base. “Proposed” outperforms “Clean” in classifying clean images. In addition, “Proposed” is superior for high-quality images, whereas “Mixed R.E.” is superior for low-quality images. This tendency is similar to one of Gaussian blurring CIFAR-10. Finally, the last row of Table 3 shows the interval mean accuracy of CIFAR-10 with added salt-and-pepper noise. The degradation level signifies the density of salt-and-pepper noise, varying from 0 to 0.25 in increments of 0.01. “Proposed” outperforms other existing methods for all interval mean accuracies. These results show that the proposed method can train a classification network without losing the classification ability of high-quality images, including clean images, for the above three degradations, that is, the proposed method is effective for not only JPEG distortion but also other degradations. 4.4.Analysis for CIFAR-100To evaluate the efficacy of the proposed method on other datasets, the proposed method is applied to CIFAR-100.23 CIFAR-100 has 100 classes and contains 50,000 training images and 10,000 testing images. Regarding a classification network, PyramidNet110-270 with ShakeDrop regularization was used. The training strategy was almost the same as that of CIFAR-10 except for the detailed settings of SGD. The learning rate was set to 0.5 initially and scaled down by multiplying 0.1 at the 150th and 225th epochs. Additionally, a momentum of 0.9 and a weight decay of 0.0001 were applied. Regarding degradations, four types were analyzed: JPEG distortion, Gaussian blur, AWGN, and salt-and-pepper noise. The range of JPEG quality factors and degradation levels was consistent with that of CIFAR-10 for each type. Table 4 shows the interval mean accuracy of CIFAR-100 degraded by four types of degradation. Only “Proposed” attains the same level of performance as “Clean” in classifying clean images for all degradations. However, “Proposed” underperforms “Mixed” or “Mixed R.E.” for the classification of low-quality images except for salt-and-pepper noise. Regarding salt-and-pepper noise, “Mixed R.E.” and “Proposed” show almost similar performance for every interval mean accuracy. Table 4Interval mean accuracy of CIFAR-100 under several degradations using PyramidNet110-270 with Shakedrop regularization. For JPEG distortion, the JPEG quality factor is used instead of a degradation level. Regarding the other degradations, the degradation level denotes a standard deviation of a kernel for Gaussian blur, a standard deviation for AWGN, and a density for salt-and-pepper noise. “All” means the interval mean accuracy for all quality factors or degradation levels, including a clean image. All results are based on a single run. In each interval, the bold value indicates the highest interval mean accuracy among all five methods.
The results show that the proposed method can classify high-quality images well, including clean images, whereas “Mixed R.E.” is superior for the classification of low-quality images. This tendency is the same as one observed in CIFAR-10 evaluations, that is, the proposed method is also effective for the CIFAR-100 dataset. 4.5.Analysis for TINY ImageNetIn this section, the efficacy of the proposed method is evaluated using TINY ImageNet.24 This dataset was chosen due to its higher resolution compared with the CIFAR datasets. TINY ImageNet contains 100,000 training images and 10,000 testing images for 200 classes. Each class has 500 images for training and 50 images for testing. TINY ImageNet are RGB images with a resolution of . For the network architecture, ResNet564,5 was utilized instead of PyramidNet110-270 with ShakeDrop regularization to reduce the training time. The optimization method was SGD. The learning rate was set to 0.1 initially and scaled down by multiplying 0.1 at epochs 60, 120, 160, 200, 240, and 280. Additionally, a momentum of 0.9 and a weight decay of 0.0001 were applied. Table 5 shows the interval mean accuracy of TINY ImageNet degraded by four types of degradation: JPEG distortion, Gaussian blur, AWGN, and salt-and-pepper noise. Regarding AWGN and salt-and-pepper noise, “Mixed” outperforms “Clean” in the classification of clean images and already attains this paper’s goal. Consequently, “Mixed R.E.” shows almost the best performance for AWGN and salt-and-pepper noise. This superior performance appears to result from the higher resolution of TINY ImageNet in addition to naive pixel-wise degradations. However, “Proposed” outperforms “Mixed R.E.” in classifying clean images and almost outperforms “Mixed.” Table 5Interval mean accuracy of TINY ImageNet under several degradations with ResNet56. For JPEG distortion, the JPEG quality factor is used instead of a degradation level. Regarding the other degradations, the degradation level denotes a standard deviation of a kernel for Gaussian blur, a standard deviation for AWGN, and a density for salt-and-pepper noise. “All” means the interval mean accuracy for all quality factors or degradation levels, including a clean image. All results are based on a single run. In each interval, the bold value indicates the highest interval mean accuracy among all five methods.
Next, regarding JPEG distortion, the accuracy difference between “Clean” and “Mixed” is relatively minor by 0.007 in classifying clean images. Thus “Mixed R.E.” can outperform “Clean” in classifying clean images and is the best in the interval “All.” This results from the image quality being not so degraded as JPEG CIFAR-10 due to TINY ImageNet being higher resolution even if JPEG distortion is applied. However, “Proposed” outperforms “Mixed R.E.” for classifying high-quality images including clean images and almost outperforms “Mixed.” Finally, focusing on Gaussian blur, only “Proposed” outperforms “Clean” in classifying clean images as well as high-quality images. In addition, “Proposed” is the best in the interval “All.” As a result, only “Proposed” stably outperforms “Clean” in classifying clean images for all types of degradation. The proposed method is effective for not only CIFAR-100 but also for TINY ImageNet. 5.ConclusionsThis paper proposed a data augmentation technique for degraded images with various levels of degradation by applying distinct data augmentations for each clean and degraded image. This paper also showed that the proposed method can effectively train a classification network of degraded images without losing the classification ability of clean images. In addition, experimental results showed that the proposed method creates a more stable performance in classifying high-quality images than mixed training with random erasing and CutBlur. Furthermore, the proposed method’s effectiveness was confirmed for four types of degradation: JPEG distortion, Gaussian blur, AWGN, and salt-and-pepper noise. Finally, the robustness of the proposed method was demonstrated for three datasets, i.e., CIFAR-10, CIFAR-100, and TINY ImageNet, and for four classification CNNs, i.e., VGG16, ResNet50, ResNet56, and PyramidNet110-270 with ShakeDrop regularization. Although this paper proved the effectiveness of the proposed method, we found an improvement opportunity. The proposed method enhances the classification of high-quality images; however, it reduces the classification ability of low-quality images. A straightforward extension could consider the domain sampling probability as an adjustable parameter. Optimizing the domain sampling probability might reduce the observed trade-off. This trade-off should be further investigated, especially by analyzing the feature discrepancy between clean and degraded images, in the near future. 6.Appendix A: Detailed Analyses for the Proposed MethodIn this section, the proposed method is deeply analyzed in terms of the combination of data augmentation, decision probability, and domain sampling probability. All analyses are performed using VGG16 trained with JPEG CIFAR-10. 6.1.Combination of Data AugmentationsThis section provides a numerical validation to ensure the appropriateness of the combined data augmentations implemented in the proposed method. The key point of the proposed method is to apply distinct data augmentations for each clean and degraded image domain. Here three distinct data augmentations are considered for each domain: random erasing, CutBlur, and the absence of any data augmentation. In total, nine possible combinations are examined, as shown in Table 6. The notation “X–Y” in Table 6 denotes that X and Y are applied for each clean and degraded image, where “N,” “R,” and “C” represent the absence of any data augmentation (none), random erasing (R.E.), and CutBlur, respectively. Table 6Interval mean accuracy of JPEG CIFAR-10 based on VGG16 with respect to nine possible combinations of random erasing (R.E.), CutBlur, and the absence of data augmentation (none). Erasing and decision probabilities were set to 1.0. Moreover, domain sampling probabilities were 12 for both clean and degraded image domains. R–C is averaged over three runs, and the others are based on a single run. Italic numbers represent standard deviations. In each interval, the bold value indicates the highest interval mean accuracy among all combinations.
Regarding a clean image domain, R–N almost outperforms C–N and shows better performance for high-quality images. By contrast, for a degraded image domain, the differences between N–R and N–C are relatively minor. Consequently, two combinations of data augmentations emerge as candidates: R–R and R–C. Comparing R–R and R–C, R–C outperforms R–R except for , as seen in Table 6. In addition, R–C shows the best performance for the classification of clean images and almost outperforms other combinations except for . These results indicate that the proposed method R–C is a rational combination of data augmentations. Furthermore, the following analysis discusses the impact of random erasing and CutBlur on the proposed method R–C. First, random erasing is discussed. Focusing on , R–R notably underperforms both R–N and N–R by a margin of over 0.1, as shown in Table 6. Comparing R–N and N–R, R–N is slightly more accurate than N–R in classifying clean images. As random erasing reinforces extracting image features for a classification network, these results suggest that random erasing may amplify the differences between features of clean and degraded images. Therefore, random erasing should be selectively applied to one image domain. In the proposed method, random erasing enhances the classification ability of high-quality images, including clean images. Next, CutBlur is discussed. Comparing N–N with C–N, their performance is nearly identical. This indicates that applying CutBlur to clean images does not significantly improve the results. However, N–C outperforms N–N by 0.1 in the interval “All” and is almost the same performance as C–C for all interval mean accuracies. This implies that CutBlur is not effective for a clean image domain and should be applied to a degraded image domain. Within the context of the proposed method, CutBlur boosts classification across all degradation levels, excluding extremely low-quality images. Moreover, CutBlur might enhance the classification of clean images because degraded images processed with CutBlur retain a region of clean images. 6.2.Decision ProbabilityIn Sec. 4, a decision probability of 1.0 was used. This section confirms how the proposed method performs by changing a decision probability. Table 7 shows the interval mean accuracy when the decision probability is varied from 0 to 1.0 in increments of 0.2. The classification performance for high-quality images improves as the decision probability increases. Regarding the classification of clean images, a decision probability of 1.0 outperforms all other probabilities. Therefore, a decision probability of 1.0 seems to be the most reasonable choice. Table 7Interval mean accuracy for VGG16 trained on JPEG CIFAR-10, with the decision probability varying from 0 to 1.0 in increments of 0.2. The case of 1.0 is averaged over three runs, and the others are based on a single run. Italic numbers represent standard deviations. In each interval, the bold value indicates the highest interval mean accuracy among all six decision probabilities.
6.3.Domain Sampling ProbabilityThe proposed method always uses a domain sampling probability of . Here we numerically validate the impact of a domain sampling probability. Table 8 shows the interval mean accuracy when the domain sampling probability of degraded images is varied from 0.1 to 0.9 in increments of 0.1. From the point of this paper’s goal, the proposed method needs to be as close to both “Clean” in the classification of clean images and “Mixed” in the classification of low-quality images as possible. As seen in Table 2, “Clean” and “Mixed” show an accuracy of 0.928 for the classification of a clean image and of 0.752, respectively. Compared with these two values, probabilities around 0.5 and 0.6 are good choices, as seen in Table 8, that is, using a domain sampling probability of 0.5 seems plausible. Table 8Interval mean accuracy for VGG16 trained on JPEG CIFAR-10, with the domain sampling probability of degraded images varying from 0.1 to 0.9 in increments of 0.1. The case of 0.5 is averaged over three runs, and the others are based on a single run. Italic numbers represent standard deviations. In each interval, the bold value indicates the highest interval mean accuracy among all nine domain sampling probabilities.
7.Appendix B: Accuracy of JPEG CIFAR-10The accuracy of JPEG CIFAR-10 is presented for each JPEG quality factor using three networks: VGG16, ResNet50, and PyramidNet with ShakeDrop regularization. Figure 4 shows the classification accuracy of JPEG CIFAR-10 for each degradation level, where the accuracy of clean images is plotted next to the JPEG quality factor of 100. For all networks, “Proposed” outperforms “Mixed” over the JPEG quality factor of 20. Moreover, “Proposed” approaches to “Clean” as the JPEG quality factors increase. These observations are consistent with the analysis using the interval mean accuracy. Code and Data AvailabilityThe code used to generate the results is available in a GitHub repository ( https://github.com/kendo-al/da-degradedimg_pytorch). ReferencesK. Simonyan and A. Zisserman,
“Very deep convolutional networks for large-scale image recognition,”
in Int. Conf. Learn. Represent.,
(2015). Google Scholar
J. Springenberg et al.,
“Striving simplicity: the all convolutional net,”
in Int. Conf. Learn. Represent. Workshop Track,
(2015). Google Scholar
J. Tompson et al.,
“Efficient object localization using convolutional networks,”
in IEEE Conference on Computer Vision and Pattern Recognition,
648
–656
(2015). https://doi.org/10.1109/CVPR.2015.7298664 Google Scholar
K. He et al.,
“Deep residual learning for image recognition,”
in IEEE Conf. Comput. Vision and Pattern Recognit.,
770
–778
(2016). https://doi.org/10.1109/CVPR.2016.90 Google Scholar
K. He et al.,
“Identity mappings in deep residual networks,”
in Eur. Conf. Comput. Vision,
630
–645
(2016). https://doi.org/10.1007/978-3-319-46493-0_38 Google Scholar
D. Han, J. Kim, J. Kim,
“Deep pyramidal residual networks,”
in IEEE Conf. Comput. Vision and Pattern Recognit.,
5927
–5935
(2017). https://doi.org/10.1109/CVPR.2017.668 Google Scholar
Y. Yamada et al.,
“ShakeDrop regularization for deep residual learning,”
IEEE Access, 7 186126
–186136 https://doi.org/10.1109/ACCESS.2019.2960566
(2019).
Google Scholar
K. Endo, M. Tanaka and M. Okutomi,
“CNN-based classification of degraded images,”
in Proc. IS&T Int. Symp. Electron. Imaging,
(2020). https://doi.org/10.2352/ISSN.2470-1173.2020.10.IPAS-028 Google Scholar
Y. Pei et al.,
“Effects of image degradation and degradation removal to CNN-based image classification,”
IEEE Trans. Pattern Anal. Mach. Intell., 43
(4), 1239
–1253 https://doi.org/10.1109/TPAMI.2019.2950923 ITPIDJ 0162-8828
(2021).
Google Scholar
X. Peng et al.,
“Fine-to-coarse knowledge transfer for low-res image classification,”
in IEEE Int. Conf. Image Process.,
(2016). https://doi.org/10.1109/ICIP.2016.7533047 Google Scholar
N. Das et al.,
“Keeping the bad guys out: protecting and vaccinating deep learning with JPEG compression,”
(2017). https://arxiv.org/abs/1705.02900 Google Scholar
D. Cai et al.,
“Convolutional low-resolution fine-grained classification,”
Parttern Recognit. Lett., 119 166
–171 https://doi.org/10.1016/j.patrec.2017.10.020
(2019).
Google Scholar
Y. Pei et al.,
“Effects of image degradations to CNN-based image classification,”
(2018). https://arxiv.org/abs/1810.05552 Google Scholar
K. Endo, M. Tanaka and M. Okutomi,
“Classifying degraded images over various levels of degradation,”
in IEEE Int. Conf. Image Process.,
(2020). https://doi.org/10.1109/ICIP40778.2020.9191087 Google Scholar
K. Endo, M. Tanaka and M. Okutomi,
“CNN-based classification of degraded images with awareness of degradation levels,”
IEEE Trans. Circuits Syst. Video Technol., 31 4046
–4057 https://doi.org/10.1109/TCSVT.2020.3045659 ITCTEM 1051-8215
(2021).
Google Scholar
Y. Pei, Y. Huang and X. Zhang,
“Consistency guided network for degraded image classification,”
IEEE Trans. Circuits Syst. Video Technol., 31 2231
–2246 https://doi.org/10.1109/TCSVT.2020.3016863 ITCTEM 1051-8215
(2021).
Google Scholar
S. Wan et al.,
“Feature consistency training with JPEG compressed images,”
IEEE Trans. Circuits Syst. Video Technol., 30 4769
–4780 https://doi.org/10.1109/TCSVT.2019.2959815 ITCTEM 1051-8215
(2020).
Google Scholar
K. Endo, M. Tanaka and M. Okutomi,
“CNN-based classification of degraded images without sacrificing clean images,”
IEEE Access, 9 116094
–116104 https://doi.org/10.1109/ACCESS.2021.3105957
(2021).
Google Scholar
D. Daultani et al.,
“ILIAC: efficient classification of degraded images using knowledge distillation with cutout data augmentation,”
Electron. Imaging, 35
(9), 296-1
–296-6 https://doi.org/10.2352/EI.2023.35.9.IPAS-296 ELIMEX
(2023).
Google Scholar
K. Endo, M. Tanaka and M. Okutomi,
“Semantic segmentation of degraded images using layer-wise feature adjustor,”
in IEEE/CVF Winter Conf. Appl. Comput. Vision (WACV),
3204
–3212
(2023). https://doi.org/10.1109/WACV56688.2023.00322 Google Scholar
Z. Zhong et al.,
“Random erasing data augmentation,”
in Proc. AAAI Conf. Artif. Intell. (AAAI),
(2020). https://doi.org/10.1609/aaai.v34i07.7000 Google Scholar
J. Yoo, N. Ahn and K. Sohn,
“Rethinking data augmentation for image super-resolution: a comprehensive analysis and a new strategy,”
in IEEE/CVF Conf. Comput. Vision and Pattern Recognit.,
8372
–8381
(2020). https://doi.org/10.1109/CVPR42600.2020.00840 Google Scholar
A. Krizhevsky,
“Learning multiple layers of features from tiny images,”
(2009).
Google Scholar
Y. Le and X. Yang,
“Tiny imagenet visual recognition challenge,”
(2015). http://vision.stanford.edu/teaching/cs231n/reports/2015/pdfs/yle_project.pdf Google Scholar
H. Naveed et al.,
“Survey: image mixing and deleting for data augmentation,”
(2021). https://arxiv.org/abs/2106.07085 Google Scholar
T. DeVries and G. W. Taylor,
“Improved regularization of convolutional neural networks with cutout,”
(2017). https://arxiv.org/abs/1708.04552 Google Scholar
S. Yun et al.,
“CutMix: regularization strategy to train strong classifiers with localizable features,”
in IEEE/CVF Int. Conf. Comput. Vision,
6022
–6031
(2019). https://doi.org/10.1109/ICCV.2019.00612 Google Scholar
D. Hendrycks et al.,
“AugMix: a simple data processing method to improve robustness and uncertainty,”
in Proc. Int. Conf. Learn. Represent. (ICLR),
(2020). Google Scholar
D. Guo et al.,
“Degraded image semantic segmentation with dense-Gram networks,”
IEEE Trans. Image Process., 29 782
–795 https://doi.org/10.1109/TIP.2019.2936111 IIPRE4 1057-7149
(2020).
Google Scholar
L. Liu et al.,
“On the variance of the adaptive learning rate and beyond,”
in Int. Conf. Learn. Represent.,
(2020). Google Scholar
BiographyKazuki Endo is an associate professor at Teikyo Heisei University. He received his bachelor’s degree in mathematics, his master’s degree in industrial engineering and management, and his DEng degree in systems and control engineering from Tokyo Institute of Technology in 1997, 1999, and 2022, respectively. He joined Industrial Bank of Japan, Ltd. (current Mizuho bank) in 1999. Since 2022, he has been an associate professor in the Department of Business at Teikyo Heisei University. |