Sean Pedersen

Recently I came across an interesting phenomenon concerning the CLIP text encoder embedding space: I have discovered yet another gap in the CLIP embedding space - the semantic complexity gap. I played around with embeddings of words and sentences when I noticed an interesting pattern: atomic words occupy the same cluster with single-concept sentences while multi-concept sentences form a distinct cluster.

Single- vs Multi-Concept Sentences

By single-concept sentences I mean sentences that bring together related concepts that have been likely in the training data. By multi-concept sentences I mean sentences that unify unrelated concepts that are unlikely in the training data.

A list of single-concept sentences:

A list of multi-concept sentences:

Experimental Setup

2D projection of Atomic words + Single-concept sentences

atomic words + single-concept sentences

The single-concept sentences nicely spread out the big cluster of atomic words (occupying the same latent sub-space).

2D projection of Atomic words + Multi-concept sentences

atomic words + multi-concept sentences

The multi-concept sentences on the other hand form a tight seperate cluster that does not spread out around the atomic words.

How does this relate to the Linear Hypothesis?

The distribution of the single-concept sentences is nicely explained by the linear hypothesis, which states that neural networks learn to organize semantic concepts in a way that makes them linearly separable and that meaningful semantic transformations can often be represented as linear operations in the embedding space.

The distribution of the multi-concept (unrelated) sentences may hint at the limits of the linear hypothesis, namely that it does not hold for out-of-distribution samples.

Show me the code

Here is my code (Jupyter Notebook): https://github.com/SeanPedersen/semantic-complexity-gap/blob/main/CLIP_Concept_Gap.ipynb

Open Questions

References

#machine-learning