Sean Pedersen

Sean's Blog

On the Structure of Neural Embeddings

A small collection of insights on the structure of embeddings (latent spaces) produced by deep neural networks.

Manifold Hypothesis: High-dimensional data sampled from natural (real-world) processes lies in a low-dimensional manifold.

Hierarchical Organization: Features organize hierarchically across layers - earlier layers capture low-level (small context) features while deeper layers represent increasingly abstract (large context) concepts.

Linear Hypothesis: Neural networks represent features as linear directions in their activation space.

Superposition Hypothesis: Neural nets represent more “independent” features than a layer has neurons (dimensions) by representing features as a linear combination of neurons.

Universality Hypothesis: Circuits reappear across different models for the same data.

Adversarial Vulnerability: Small changes in input space can cause large shifts in embeddings and therefore also in predictions made from them, suggesting the learned manifolds have irregular geometric properties.

Neural Collapse: After extensive training, class features in the final layer cluster tightly around their means, with the network's classification weights aligning with these mean directions. Within-class variation becomes minimal compared to between-class differences, effectively creating distinct, well-separated clusters for each class.