Sean's Blog

Semantic Complexity Gap

Recently I came across an interesting phenomenon concerning the CLIP text encoder embedding space: I have discovered yet another gap in the CLIP embedding space - the semantic complexity gap. I played around with embeddings of words and sentences when I noticed an interesting pattern: atomic words occupy the same cluster with single-concept sentences while multi-concept sentences form a distinct cluster.

Single- vs Multi-Concept Sentences

By single-concept sentences I mean sentences that bring together related concepts that have been likely in the training data. By multi-concept sentences I mean sentences that unify unrelated concepts that are unlikely in the training data.

A list of single-concept sentences:

  • Love and friendship go hand in hand just like best friends.
  • The sun sets beautifully over the ocean, creating a perfect end to the day.
  • Music and art often inspire each other, leading to creative masterpieces.
  • Cooking is an art that brings people together around the dinner table.

A list of multi-concept sentences:

  • The cat danced with the moon while eating chocolate.
  • A guitar floated across the ocean, singing about a dragon.
  • The computer played a symphony for a mountain of pizza.
  • A sunflower rode a bicycle through a jungle of clouds.

Experimental Setup

  • Embed 10,000 most common English words (atomic concepts)
  • Embed curated list of multi-concept sentences (unrelated concepts)
  • Embed curated list of single-concept sentences (related concepts)
  • Create 2D projection of atomic words + multi-concept sentences
  • Create 2D projection of atomic words + single-concept sentences

2D projection of Atomic words + Single-concept sentences

atomic words + single-concept sentences

The single-concept sentences nicely spread out the big cluster of atomic words (occupying the same latent sub-space).

2D projection of Atomic words + Multi-concept sentences

atomic words + multi-concept sentences

The multi-concept sentences on the other hand form a tight seperate cluster that does not spread out around the atomic words.

How does this relate to the Linear Hypothesis?

The distribution of the single-concept sentences is nicely explained by the linear hypothesis, which states that neural networks learn to organize semantic concepts in a way that makes them linearly separable and that meaningful semantic transformations can often be represented as linear operations in the embedding space.

The distribution of the multi-concept (unrelated) sentences may hint at the limits of the linear hypothesis, namely that it does not hold for out-of-distribution samples.

Show me the code

Here is my code (Jupyter Notebook): https://github.com/SeanPedersen/semantic-complexity-gap/blob/main/CLIP_Concept_Gap.ipynb

Open Questions

  • Is this just a random artifact or a novel phenomenon? (I did not put much time and rigor into this - so I am curious if this is reality)
  • Does this semantic complexity gap between related and unrelated concepts also occur for ImageNet classifiers (e.g. image of cat and dog vs image of cat and airplane?

References