Recognize fine grained categories is challenging task, which has attracted more and more attention in recent years. Different from traditional image recognition, fine-grained image recognition is to recognize different sub-classes under the same general category. Due to posture, illumination and other reasons, fine grained recognition has large intra-class diversities and subtle inter-class similarities, Most of the works focused on how to localize discriminative regions and fine-grained feature learning. But they negative the structure of the fine grained labels. In this paper, we propose label hierarchy constraint network (LHC) for fine-grained classification. The network include two branchs, coarse-level branch and fine level branch. In the middle layers of the neural networks, we use the coarse branch to predict the coarse labels, which can be regarded as the guidance of fine-grained labels. In the upper layers, we predict fine-grained labels. Then we map the result of coarse-grained branch prediction probability distribution to the fine-grained branch. Experiments show the effectiveness of our method.
State-of-the-art object detection networks have reduced the running time and get better detection results. However, for remote sensing image scenes, in a certain projection direction, the remote sensing image target will be tilted, The positioning of the horizontal bounding box used by the exit detection algorithm will cause a lot of overlap between the target bounding boxes, After using NMS (Non-maximum suppression),it will lead to the lost of the target. In this paper,we propose a Rotated Faster R-CNN(R-FRCNN) that is a target positioning method based on arbitrary angle bounding box,which can perform non-redundant positioning on the target,thus can reduce the missed detection rate. when the target is densely distributed or the angle of the object is arbitrary.Compared with traditional and state-ofthe art object detection algorithms,our approach obtain the superior performance.
With the development of Geospatial technology and remote sensing technology, a large number of remote sensing images come into application with assist of computers. Although convolutional networks have great performance in computer vision, features extracted by convolutional network doesn’t have the characteristic of rotation invariance, which means the current neural network methods can’t adapt to rotated objects. Considering the multiangle characteristics of remote sensing images, we proposed a Rotation Invariance Spatial transformation Network (RI-STNET) to extract the rotation invariance object features. RI-ST-NET combines convolutional neural networks and the Spatial Transformer Networks (STN) rotating the object to an angle which more easily to identify and is trained by means of Siamese network sharing the same weights of two network branches . Thus RI-ST-NET can adapt to the object features of different rotation patterns which then improved that effectively promote the accuracy of remote sensing retrieval. A Rotation Invariance Spatial Transformation Network combines the advantages of STN and tuple training which can catch the rotation of the same object when used in image retrieval task. A series of evaluation contrast experiments on chosen dataset demonstrate the performance of the proposed method.
Aircraft detection is a challenging task in remote sensing images which attract increasing attention in recent years. Existing methods based on fully-supervised convolutional neural networks (CNN) require expensive labeling information such as bounding box, which is time consuming and difficult to obtain. Recently, weakly supervised methods only using image-level labels has drawn increasing attention in natural imagery. An approach called class activation map (CAM) based on weakly supervised performs well in natural scene images for object detection, but there is a problem when using it in remote sensing images: inaccurate localization. In this paper, we propose a method called Active Region Corrected (ARC) to locate aircraft accurately. We find that generating the localization map in the classified network by extracting the feature before the last pooling layer contains more accurate position information but a lot of noise, and then we use the CAM to generate a localization map which contains rough location information of aircraft. Combining these two localization maps we can get the exact position of the aircraft. Experiments conducted on data set verify that our proposal obtains a superior performance on aircraft detect and localization in remote sensing images.
With the increasing amount of high-resolution remote sensing images, large-scale remote sensing image retrieval(RSIR) becomes more and more significant and has attracted great attention. Traditional image retrieval methods generally use hand-crafted features which are not only time-consuming but also always get poor performance. Deep learning recently achieves remarkable performance due to its powerful ability to learn high-level semantic features, so researchers attempt to take advantage of features derived from Convolutional Neural Networks(CNNs) in RSIR. But remote sensing image is different from natural scene image, its background is more complicated with a lot of noise and existing deep learning method didn’t handle this well. Both the speed and the accuracy achieve unsatisfactory performance. In this paper, we propose a rotation invariant hashing network that represents an image as a binary hash code to retrieve image faster while considering the rotation invariance of the same target. The results of the experiments on some available remote sensing datasets show that our method is effective and outperforms than other features which is usually used in RSIR.
Object detection is one of the most important issues in the field of remote sensing analysis. The lack of semantic information about objects poses difficulty for traditional methods in exploring effective features for object discrimination. Being capable of feature extraction, a series of region-based convolutional neural networks (R-CNN) have been widely and successfully applied for object detection in natural images recently. However, most of them suffer from the poor detection performance of small-sized targets, which means that few of them can be introduced directly for small-sized object detection in remote sensing images. This paper proposes a modified method based on faster R-CNN, which is composed of a feature extraction network, a region proposal network and an object detection network. Compared to faster R-CNN, in the feature extraction network, the proposed method removes the forth pooling layer and employs dilated convolutions on the all subsequent convolutional layers to enhance the resolution of the final feature maps, which provide more detailed and semantic feature information of targets to help detect objects especially the small-sized one. In the object detection network, contextual features around the region proposals are added as complement feature information to help distinguish objects accurately. Experiments conducted on two data sets verify that our proposal obtains a superior performance on small-sized object detection in remote sensing images.
Existing methods for visual saliency based image retrieval typically aim at single instance image. However, without any prior knowledge, the content of single instance image is ambiguous and these methods cannot effectively reflect the object of interest. In this paper, we propose a novel image retrieval framework based on multi-instance saliency model. First, the feature saliency is computed based on global contrast, local contrast and sparsity, and the synthesize saliency map is obtained by using Multi-instance Learning (MIL) algorithm to dynamically weight the feature saliency. Then we employ a fuzzy region-growth algorithm on the synthesize saliency map to extract the saliency object. We finally extract color and texture feature as the retrieval feature and measure feature similarity by Euclidean distance. In the experiments, the proposed method can achieve higher multi-instance image retrieval accuracy than the other single instance image retrieval methods based on saliency model.
Multiple-instance learning (MIL) has been successfully utilized in image retrieval. Existing approaches cannot select positive instances correctly from positive bags which may result in a low accuracy. In this paper, we propose a new image retrieval approach called multiple instance learning based on instance-consistency (MILIC) to mitigate such issue. First, we select potential positive instances effectively in each positive bag by ranking instance-consistency (IC) values of instances. Then, we design a feature representation scheme, which can represent the relationship among bags and instances, based on potential positive instances to convert a bag into a single instance. Finally, we can use a standard single-instance learning strategy, such as the support vector machine, for performing object-based image retrieval. Experimental results on two challenging data sets show the effectiveness of our proposal in terms of accuracy and run time.
Image representation is the key part of image classification, and Fisher kernel has been considered as one of the most effective image feature coding methods. For the Fisher encoding method, there is a critical issue that the single GMM only models features within a rough granularity space. In this paper, we propose a method that is named Multi-scale and Multi-GMM Pooling (MMP), which could effectively represent the image from various granularities. We first conduct pooling using the multi-GMM instead of a single GMM. Then, we introduce multi-scale images to enrich the model’s inputs, which could improve the performance further. Finally, we validate out proposal on PASCAL VOC2007 dataset, and the experimental results show an obvious superiority over the basic Fisher model.
Existing visual saliency detection methods are usually based on single image, however, without priori knowledge, the contents of single image are ambiguous, so visual saliency detection based on single image can’t extract region of interest. To solve it, we propose a novel saliency detection based on multi-instance images. Our method considers human’s visual psychological factors and measures visual saliency based on global contrast, local contrast and sparsity. It firstly uses multi-instance learning to get the center of clustering, and then computes feature relative dispersion. By fusing different weighted feature saliency map, the final synthesize saliency map is generated. Comparing with other saliency detection methods, our method increases the rate of hit.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.