Exploring The Hubness-Related Properties of Oceanographic Sensor Data
This publications was submitted to the Conference on Data Mining and Data Warehouses (SiKDD 2011), 10 October 2011, Ljubljana, Slovenia.
In this paper we examine how the high dimensionality of oceanographic sensor data impacts the potential use of nearest-neighbor machine learning methods. We focus on one particular consequence of the curse of dimensionality – hubness. We examine the hubness of oceanographic data and show how it can be used to visualize and detect both prototypical sensors/locations, as well as ambiguous and potentially erroneous ones. We proceed to define an easy classification problem on the data, showing that the recently developed hubness-aware classification methods may help to overcome some of the hubness-related issues in sensor data.