Samir Brahim Belhaouari

Associate Professor, Hamad Bin Khallifa Uniersity

sbelhaouari [AT] hbku.edu.qa

Unsupervised outlier detection in multidimensional data

Efficient and Robust Outlier Detection Using Unidimensional Distance Space and Joint Probability Density Estimation

Abstract. Detection and removal of outliers in high-dimensional data is critical for robust machine learning performance. This paper introduces novel unsupervised statistical methods for outlier detection, leveraging data compactness and transformation to unidimensional distance space using D-k-NN. The distance vector, \(d_k \in \mathbb{R}\), transforms \(N\)-dimensional data, with extreme values calculated as \(LE_{dk} = Q1_{dk} - c_1(Q2_{dk} - Q1_{dk})\) and \(UE_{dk} = Q3_{dk} + c_2(Q3_{dk} - Q2_{dk})\), optimizing outlier identification. Joint probability density estimation based on normal distributions is utilized, where \(\zeta = \beta Q3_{dk}\) and \(f(x,y) = \frac{1}{2\pi \zeta I} e^{-\left( \frac{(x-x_i)^2 + (y-y_i)^2}{2\zeta^2} \right)}\). Comprehensive performance analysis on benchmark datasets demonstrates superiority over state-of-the-art methods in scenarios with non-normal or mixed noise distributions.
    

Illustration of the proposed experimental design.