The Use of Biplot Analysis and Euclidean Distance with Procrustes Measure for Outliers Detection
Outlier is an object that has unique characteristics compared with other objects. Detection of outlier needs to be performed in order to avoid errors in decision-making related to data. Another reason for the detection of outlier is to determine the cause and the meaning of the difference from the outliers. Two methods for detection of outlier are Minimum Covariance Determinant (MCD) and Fast Minimum Covariance Determinant (FMCD). Unfortunately, MCD and FMCD require a longer iteration thus it is difficult to detect outlier in large data. In this work, we introduce alternative methods to detect outliers i.e. direct and indirect biplot analyses and direct and indirect Euclidean distances and then we determine the effectiveness of methods in detecting outlier using Procrustes measure. The larger the size of Procrustes measure, the better the method for detecting the outliers. There are six data used in the simulation analysis, i.e. the generated data with the characteristics of one group data, one group data with the outlier, one group data with the top and bottom outliers, two-group data, three-group data. Inferred data criteria based on simulation analysis results that MCD, FMCD, indirect biplot and indirect Euclidean distance can’t detect outlier of grouped data. Applicative analysis is the detection of outliers in the welfare data of the Indonesian people based on seven indicators. Papua and DKI Jakarta provinces are concluded as outliers based on all methods. Further analysis reveals that direct Euclidean, indirect Euclidean, and indirect biplot are the best methods. However, direct Euclidean is the simplest method.
Bakhtiar T & Siswadi. (2011). Orthogonal procrustes analysis: Its transformation arrangement and minimal distance. International Journal of Applied Mathematics and Statistics, 20(M11), 16-24.
Bakhtiar T & Siswadi. (2015). On the symmetrical property of procrustes measure of distance. International Journal of Pure and Applied Mathematics, 99(3), 315-324.
Filzmoser P. 2005. Identification of multivariate outliers: A performance study. Austrian Journal of Statistics, 34(2), 127-138.
Gabriel KR. (1971). The biplot graphic display of matrices with application to principal component analysis. Biometrika, 58(3), 453-467.
Jolliffe IT. (2002). Principal component analysis. (2nd Edition). New York: Springer-Verlag.
Kaufman L. & Rousseeuw PJ. (2005). Finding group in data an introduction to cluster analysis. New Jersey: John Wiley and Sons.
Lopuhaä HP & Rousseeuw PJ. (1991). Breakdown points of affine equivariant estimators of multivariate location and covariance matrices. The Annals of Statistics, 4, 229-248.
Rousseeuw P. (1984). Least median of squares regression. Journal of the American Statistical Association, 79, 871-880.
Rousseeuw PJ & Driessen KV. (1999). A fast algorithm for the minimum covariance determinant estimator. Technometrics, 41(3), 212-223.
Wedlake RS. (2008). Robust Principal Component Analysis Biplot. Stellenbosch: University of Stellenbosch.
Copyright (c) 2018 International Journal of Engineering and Management Research
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.