40 Features, 5 Dimensions: The Superposition Dilemma in Neural Networks

In neural networks, the ideal scenario is a one-to-one relationship where each neuron activates for a single feature. However, reality often involves models with fewer dimensions than features, leading to superposition where one neuron represents multiple features. For instance, in InceptionV1, a neuron might respond to cat faces, car fronts, and cat legs simultaneously. This phenomenon complicates model explainability, particularly in deep learning where neurons represent complex patterns. A study using a synthetic dataset with 40 features compressed into 5 dimensions demonstrated that ReLU-based models form superposition more effectively than linear models, especially at higher sparsity levels. At a sparsity of 0.9, the off-diagonal elements in the weight matrix visualization were significantly larger, indicating stronger feature correlations. This suggests that sparsity and non-linearity are key to understanding how neural networks handle complex feature relationships.

Source: towardsdatascience.com