Graphical Abstract (Optional)



|[pic] | | |

| |Pattern Recognition Letters | |

| |journal homepage: | |

Ensembles of dense and dense sampling descriptors for the HEp -2 cells classification problem

Loris Nannia, (, Alessandra Luminib, Florentino Luciano Caetano dos Santosc, Michelangelo Pacic and Jari Hyttinenc

aDEI, University of Padua, viale Gradenigo 6, Padua, Italy

bDISI, Università di Bologna, Via Venezia 52, 47521 Cesena, Italy

cELT, Tampere University of Technology, BioMediTech, Korkeakoulunkatu 3, Tampere 33720, Finland

|ABSTRACT |

|The classification of Human Epithelial (HEp-2) cells images, acquired through Indirect Immunofluorescence (IIF) microscopy, is an |

|effective method to identify staining patterns in patient sera. Indeed it can be used for diagnostic purposes, in order to reveal |

|autoimmune diseases. However, the automated classification of IIF HEp-2 cell patterns represents a challenging task, due to the large|

|intra-class and the small inter-class variability. Consequently, recent HEp-2 cell classification contests have greatly spurred the |

|development of new IIF image classification systems. |

|Here we propose an approach for the automatic classification of IIF HEp-2 cell images by fusion of several texture descriptors by |

|ensemble of Support Vector Machines combined by sum rule. Its effectiveness is evaluated using the HEp-2 cells dataset used for the |

|“Performance Evaluation of Indirect Immunofluorescence Image Analysis Systems” contest, hosted by the International Conference on |

|Pattern Recognition in 2014: the accuracy on the testing set is 79.85%. |

|The same dataset was used to test an ensemble of ternary-encoded local phase quantization descriptors, built by perturbation |

|approaches: the accuracy on the training set is 84.16%. Finally, this ensemble was validated on 14 additional datasets, obtaining the|

|best performance on 11 datasets. |

|Our MATLAB code is available at . |

| |

|Keywords: |

|HEp-2 cell classification; bag-of-features; texture descriptors; machine learning; support vector machine; ensemble |

|2015 Elsevier Ltd. All rights reserved. |

| |

Indirect Immunofluorescence (IIF) is used to detect specific antibodies in the patient serum for the diagnosis of autoimmune diseases (ADs). These are caused by the abnormal activity of the immune system which attacks the body tissues [1]. Although ADs are considered relatively rare (e.g. compared to cardiac diseases), they show high mortality and morbidity and their etiology is still far from being fully understood. Over the years special emphasis has been given to the genetic and environmental factors [1] and important epidemiologic studies have been published about the prevalence of the most common ADs. In 1997, Jacobson et al. [2] estimated a 3.2% prevalence in the US, focused in a subset of 24 ADs. In 2007, Eaton et al. [3] estimated a prevalence of 5.4% in a subset of 31 ADs, based on the Danish National Patient Register. This study has been updated by Cooper et al. in 2009 [4], reporting a higher prevalence ranging from 7.6% to 9.4%. Moreover, ADs show interesting demographic patterns. The research of Cooper & Stroehla [5] underlines how, in the US, incidence of ADs is higher in women than in men, e.g. 85% of ADs patients, such as systemic lupus erythematosus, Sjögren’s syndrome or scleroderma are women. For other ADs, the prevalence ratio drops to 60 - 75%. Racial factors also influence in the prevalence of ADs illustrated in the US by blacks showing a higher risk for systemic lupus erythematosus and scleroderma, while whites a higher risk of multiple sclerosis compared to blacks and Asians.

The primary test for the evaluation of ADs is the Antinuclear Antibody (ANA) test, which has been reported particularly effective in the diagnosis of many ADs diseases [6]. The gold standard procedure in ANA is the IIF assays [7], which consists in using two antibodies, a primary naked one, which binds to the target antigen, and a secondary fluorescent antibody that binds to the primary one. A cellular substrate, e.g. a monolayer of HEp-2 cells, is used to incubate the patient serum allowing the ANAs to bind to the nuclei of the cells. The HEp-2 substrate allows the expression of many antigens to whom the ANAs can bind and, once the primary-secondary complex binds to the ANAs, several staining patterns can be produced. As reported by Lane & Gravel [8], staining patterns are specific for one or few diseases, e.g. homogeneous for systemic lupus erythematosus, speckled for Sjögren’s syndrome and mixed connective tissue disease, or nucleolar for scleroderma. The fluorescent samples are examined by means of a fluorescence microscope for the assessment of (i) the fluorescence intensity with respect to positive and negative controls and (ii) the staining patterns in the positive samples. Especially this second task is the most challenging, also for expert physicians, and may lead to high intra- and inter-operator variability. Bizzaro et al. [9] reported an inter-lab consensus of 92.6% for fluorescence intensity, but only 76% for fluorescence pattern classification. A computer-aided diagnosis (CAD) system is able to minimize these limitations and speed up the sample screening.

The feasibility and the interest that such topic aroused in the scientific community is proven by the high number of publications in the field and by the annual contest in HEp-2 cell IIF images classification hosted in 2012 and 2014 by the International Conference on Pattern Recognition (ICPR) and, in 2013, by the International Conference on Image Processing (ICIP). Foggia et al. [10] summarized the state of the art on HEp-2 cell IIF images classification. In general, classical feature sets, such as morphological measurements (e.g. number and circularity of connected regions, size of the cells, properties related to the holes inside the cells, etc.), texture descriptors (e.g. Haralick features from the grey level co-occurrence matrix and variations of local binary pattern (LBP) descriptor) were used. Approaches specific for this dataset were also developed. Stoklasa et al. [11] proposed a granularity-based descriptor which computes as features the distribution of grains in the image through a series of morphological openings. In the same paper, a specific implementation of the surface descriptor is provided that computes the statistical properties of the image, considered as a topographic surface made of valleys and hills. In Wiliem et al. [12], the cell pyramid matching framework was tailored to the HEp-2 problem. It is a region-based approach, which pools local histograms of visual words into three histograms associated to (i) the whole cell region, (ii) the inner region and (iii) the outer region, thus exploiting also the spatial information. Moreover, Foggia et al. [10] report some strategies for augmenting the training set, such as image rotation [13] or spontaneous activity patterns [14]. For the ICPR 2014 contest, two different tasks were assigned: Task 1 for the Cell Level Classification and Task 2 for the Specimen Level Classification. It is worth noting that many methods, e.g. [15–17], increased the amount of training samples by patch extraction, flipping and rotation. The best approach for Task 1 was reported by Manivannan et al. [15] by means of an ensemble of support vector machines (SVMs) trained with multi-resolution local patterns, scale-invariant feature transform, random projections and intensity histograms combined with original image rotations, dense patch extraction and a bag-of-words-based feature encoding. Gao et al., in 2014, [16] exploited convolutional neural networks together with image rotation to increase the number of training samples. Also Codrescu [17] augmented the available dataset by image rotation and classified it through an extended version of the finite impulse response multilayer. The same two Tasks were proposed for this special issue, and in this paper we focus on Task 1 only, which consists in the classification of the six staining patterns (homogeneous, speckled, nucleolar, centromere, golgi and nuclear membrane) in pre-segmented single cell images (dataset details in Section 4.1). During the ICPR 2014 contest [18] we used four descriptors based on local binary pattern [19], namely pyramid LBP [20], local configuration pattern [21], rotation invariant co-occurrence among adjacent LBP [13], extended LBP [22] and also the Strandmark morphological features [23] with an ensemble of SVMs as classifier [24,25] (accuracy on testing set 78.27%).

Here we propose an improved version of our ICPR 2014 approach, based on the following ideas:

• Since IIF images present large intra-class and small inter-class variability, different dense and dense sampling descriptors can be useful to represent images. We fuse several texture descriptors having different characteristics.

• The Bag of Features (BoF) technique has recently emerged as one of the most powerful methods for image representation. In this work, we propose to design ensembles of codebooks for BoF using different strategies for codebook building. Each different texton vocabulary is used to train a SVM classifier. The final ensemble is the fusion by sum rule of all the trained SVMs.

• Ensembles, built through perturbation of the parameters of a texture descriptor, improve the classification performance of the single stand-alone descriptor. In this work we test and validate different methods for building ensembles of different descriptors. The most interesting result is the performance, validated in 15 datasets, of an ensemble of local phase quantization descriptors based on a ternary coding.

Material and methods

Texture descriptors

In this section we review the "dense" (Sections 2.1.1 - 2.1.8) and "dense sampling" (Sections 2.1.9 and 2.1.10) descriptors used in this work. "Dense" means feature extraction on the whole image (or on a whole region), in contrast with "dense sampling", which denotes extraction only at specific points.

RIC-LBP

The Rotation Invariant Co-occurrence among adjacent Local Binary Pattern (RIC-LBP) [13] is a novel variant of LBP [19]. RIC-LBP takes into account the spatial relation, i.e. the co-occurrence, among LBP codes by a histogram of LBP pairs. In this work we computed RIC-LBP with 3 parameter sets (LBP radius, displacement among LBPs): (1, 2), (2, 4) and (4, 8).

ELBP

Extended Local Binary Patterns (ELBP) [22] is another generalization of LBP which considers both intensity-based and difference-based descriptors to be extracted from local patches for a total of four descriptors: (i) the central pixel intensity; (ii) the intensities in the neighborhood; (iii) the radial-difference and (iv) the angular-difference. In this work, we do not use the angular difference, since we experimentally observed a descriptor performance decrease. The final descriptor is obtained by concatenating histograms from two intensity-based and one difference-based descriptor evaluated at different neighborhoods granularities (radius, #pixels): (1, 8) and (2, 16).

LCP

Local Configuration Pattern (LCP) [21] is designed to consider both the microscopic configuration and local structures of an image. For local structural information, standard LBP is used. A novel microscopic configuration model (MiC) is developed; MiC is computed by estimating the optimal weights to linearly reconstruct the central pixel intensity by exploiting the intensities of the neighboring pixels.

LPQ

Local Phase Quantization (LPQ)[1] is a texture descriptor [26] that uses the local phase information extracted from the 2-D short-term Fourier transform (STFT). For each pixel, the STFT is computed over a rectangular neighborhood of radius r, from which four complex coefficients, corresponding to the 2-D frequencies u1 = [a, 0]T, u2 = [0, a]T, u3 = [a, a]T, u4 = [a, -a]T (where a is a scalar frequency parameter) are considered and quantized to construct the final descriptor. The four complex coefficients are decorrelated by a whitening transform before quantization, assuming a Gaussian distribution and a fixed correlation coefficient between adjacent pixel values. Then the vector Gx, which contains the decorrelated STFT coefficients, is quantized using a scalar quantizer: gj is the jth component of Gx and qj = 1 if gj ≥ 0, otherwise 0. Afterwards, these quantized coefficients are represented as integers in [0, 255] using the binary coding (Eq. 1)

[pic] (1)

Finally, a histogram of these integer values is composed and used as feature vector. The final feature vector is given by the concatenation of the two histograms obtained with window sizes 3 and 5.

PLBP

The Pyramid Transform Domain LBP (PLBP) [20] is another extension of conventional LBP based on the Gaussian Pyramid Decomposition of the original image. In this work we considered 3 level pyramids obtained by a 5×5 lowpass kernel and a downsampling ratio of Rx=Ry=2. Rotation invariant LBPs are considered with (radius, #pixels) parameter sets (1, 8) and (2, 16).

HASC

Heterogeneous Auto-Similarities of Characteristics (HASC) [27] is an image descriptor designed to encode at the same time linear and non-linear relations. In this work, we considered the HASC descriptors obtained from the grey-level image, the first and second order gradient in both X and Y directions, and the first order gradient magnitude.

BSIF

The Binarized Statistical Image Features (BSIF) [28] computes a binary code for each pixel by linearly projecting local image patches onto a subspace. The basis vectors are learnt via independent component analysis and linear projections on these bases are binarized via thresholding.

Morpho

With Morpho we denote a set of morphological descriptors and intensity-based features computed on a collection of preprocessed images derived from the original image [23].

SIFT

Scale invariant feature transform (SIFT) [29] is an image descriptor associated to a selected image point. SIFT[2] is computed as histograms of a given region that takes the gradient locations (quantized into a 4×4 location grid) and orientations (quantized into 8 values), and weighs them by the gradient magnitude and a Gaussian window superimposed over the region. In this work, we use a grid sampling strategy with a spacing of 6 pixels and a patch size for extracting SIFT feature of 8 and 16 pixels.

SID

Scale-invariant signal descriptors (SID), introduced in [30], it is a method for obtaining the scale invariance without a scale selection. It is based on a logarithmic sampling coupled with multi-scale signal processing. Scale invariance is obtained computing the Fourier transform magnitude. In this work, we use the default setting parameters: sampling with a spacing of 5 pixels and patch size from 6 to 30 pixels.

BoF

Bag of Features (BoF) [31] is based on the construction of a vocabulary of feature descriptors, used to describe a whole image. The BoF procedure implemented in this work is based on the following steps:

• CODEBOOK CREATION

o Patch extraction: in this work, BoF is coupled with SIFT or SID for extracting features. Each image is divided into five overlapping sub-windows of dimension equal to a quarter of the whole image and a set of descriptors is extracted from each sub-window. It is a pyramidal approach [32], although from each sub-window a different codebook is extracted and used to train a different SVM, which are combined by sum rule.

o Clustering: a different set of textons is created for each class, one for each subgroup of N images (N=50) of the training set (due to computational issues). Each texton is built by clustering descriptors with k-means, where the number of clusters k is set to 25. We run tests with k=10:5:40, the lowest performance was obtained with k={10, 15 ,20}, while similar performance was obtained with k>20. To reduce the computation time we chose k=25.

• CODEBOOK ASSIGNMENT

o Patch extraction: the same feature extraction approach is used to obtain a set of descriptors from the query image.

o Assignment: each extracted descriptor is assigned to one element of the codebook according to the minimum Euclidean distance criterion.

Proposed ensemble of dense descriptors

When the weighted sum rule is used, the weights are obtained (for performance maximization) in each fold of the 10-fold cross validation, with an internal 5-fold cross validation only on the training data of that fold. Once in each fold the same weights are obtained, we have directly reported those values, e.g. as in ERP.

EPP

In the Ensemble based on Parameter Perturbation (EPP) we tested different configurations of LPQ [26]. We show that it is possible to design an effective ensemble by combining sets of LPQ extracted by varying the parameters r (the neighborhood sizes, r([1, 3, 5]), a (the scalar frequency, a([0.8, 1, 1.2, 1.4, 1.6]), and ρ (the correlation coefficient between adjacent pixel values, ρ([0.75, 0.95, 1.15, 1.35, 1.55, 1.75, 1.95]). We used the same set proposed by Paci et al. [33] to avoid data overfitting. Each LPQ was used to train a different classifier and all the classifiers were eventually combined by sum rule. Moreover, in EPP, we tested a modified version of LPQ based on ternary coding [33].

ERP

Ensemble based on Region Perturbation (ERP) [34] is another approach to create ensembles by extracting textural features from different regions of an image. This idea was introduced by Abdesselam et al. [35] to combine texture descriptors (e.g. LBP and LTP) and the edge and non-edge regions of an image. It is supported by the fact that the most observed locations in an image/scene are those featuring the highest spatial frequency, e.g. edges. The general method consists in:

1. Dividing the image into salient regions by means of a segmentation algorithm.

2. Extracting textural features from each region.

3. Combining the textural information extracted from the different regions. In Abdesselam et al. [35], region based histograms (summarizing the texture information) were concatenated into a comprehensive feature vector. Here, we use each region feature-set to train a single SVM; all the SVMs are eventually fused by sum rule.

In this paper we only test one descriptor, RIC-LBP, with two different segmentation algorithms:

• Edge (Ed): by means of a Sobel filter, edges are detected in the original image and it is divided into the edge and the non-edge regions; hence two SVMs are trained and their scores are combined by sum rule.

• Wavelet (Wa): by means of four wavelets, the original image is decomposed into its horizontal, vertical and diagonal coefficient matrices, as in Daubechies [36]. Each matrix is resized to match the original image size and the local mean value mv is computed. The original image is eventually thresholded into two regions containing pixels with value greater and smaller than mv, respectively. In this way six descriptors are extracted from each image. Hence, six SVMs are trained and their scores are combined by sum rule.

The final score of ERP is given by the following sum rule: SVM trained using the features extracted from the original image (with weight 2) + Ed (with weight 1) + Wa (with weight 1). Before fusion, the scores of each method (e.g. Ed, Wa) were normalized to mean 0 and standard deviation 1.

The proposed approach

In Figure 1, a scheme of the proposed approach is shown. Each input image is first processed by two image enhancing methods in order to improve its quality. Afterwards, several poses are generated in order to increase the size of the training set. For BoF, an additional sub-windowing step is required. The feature extraction step is performed on the original enhanced image and in all the artificial poses, in order to obtain several feature vectors. Each vector is classified by a different SVM and the final decision is obtained as a result of the fusion by the sum rule of all the classifier responses.

Preprocessing

The HEp-2 dataset contains images with strong cell intensity differences and, to improve the quality of the images, we used two preprocessing approaches:

1. Contraharmonic mean filter (CHM) was used to remove possible noise present in the image Iorig while preserving the cell and organelles edge features. CHM (Eq. 2) behaves as a high-pass filter if the order Q>0 and as a low-pass filter if Q ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download