In a recent study, researchers attempted to cluster countries based on their performance in various categories, where performance was measured as a percentage or fraction. The categories themselves were categorical, creating a mix of numerical and categorical data. The goal was to group countries that were similar across all categories combined. Using K-Means clustering with K set to 3, aiming for High, Medium, and Low performance clusters, the results were unexpected. The method involved one-hot encoding the performance categories. However, the clustering failed to meet expectations as 33% of countries, like China, were placed into multiple clusters simultaneously due to varying performances across different categories. This indicates a significant challenge in clustering when dealing with mixed data types, highlighting the need for more sophisticated clustering techniques that can handle such complexities effectively.
Source: stackoverflow.com
