Real Use Cases for Cosine Similarity, Dot Product, and Euclidean Distance

When dealing with vector similarity (Cosine Similarity, Dot Product, and Euclidean Distance), it’s easy to get lost in math. But math is often best understood with a set of narrative-style examples that show how each might be used in everyday scenarios. The illustrations we've included should help to paint a complete picture too.

Cosine Similarity

What It Is: Cosine similarity focuses on the direction of two vectors rather than their magnitude. If two vectors are pointing in similar directions, they’ll have a high cosine similarity score—even if one is much longer than the other.

Example 1: Movie Night Matchmaking

Scenario: Alice has rated 50 movies on a streaming platform; Bob has only rated 5. Despite the difference in volume, they have a similar taste in science fiction and historical dramas.

Why Cosine Similarity? Because Bob’s smaller number of rated movies shouldn’t penalize him if they happen to align perfectly in theme with Alice’s much larger set.

Illustration (Cosine Similarity in Movies):

┌───────────────────────────────────┐
│   Movie Genres Axis (Embeddings) │
└───────────────────────────────────┘
↑
Alice’s Vector A
/↖
/   ↖       (Despite fewer data points,
/     ↖        the angle is what matters!)
/       ↖
/         ↖
↓           ↖ Bob’s Vector B

Notice how A and B make a small angle (θ). Cosine similarity doesn’t care that |A| may be much bigger; it cares how close the lines are in direction.

Example 2: Resumes vs. Job Descriptions

Scenario: A job post detailing 20 required skills (Vector A) vs. a concise résumé with just 5 highlighted skills (Vector B). Both documents focus on data analysis, but one is significantly longer.

Why Cosine Similarity? Because we only care about overlapping skills, not the total length of the document. If the résumé’s 5 key skills match 5 of the job post’s primary skills, the angle is small, meaning high similarity.

Illustration (Matching Skills Dimension):

(Job Description)
┌────────────────┐
│   Skills A     │   A
│  (long vector) │ → → → → →
└────────────────┘

(Resume)
┌───────────────┐
B → │  Skills B      │ → →
│(short vector)  │
└───────────────┘

Cosine Similarity measures
how close these two point
in “skills-space,” ignoring
that A is longer.

The résumé’s brevity doesn’t overshadow the fact that B points in the same direction as A for those crucial data analysis topics.

Dot Product

What It Is: The dot product captures both the direction and magnitude of vectors. When two vectors are big and aligned, you get a large dot product.

Example 1: Personalized Product Picks

Scenario: Charlie loves camping gear—his preference vector (A) emphasizes outdoor-related categories. Two tents, X and Y, have different “magnitudes” in the camping dimension.

Why Dot Product? Because we care not just that both items are aligned with camping interest, but also how strongly they align. If Tent Y’s “camping factor” is huge, it will yield a larger dot product with Charlie’s vector.

Illustration (Higher Magnitude, Higher Match):

Charlie's Preference (A)
          ┌────────┐
 A =      │Camping=5│
          │Elec=1   │
          └────────┘
                 ↓
   •------------> (High value 
                  in camping)

     Tent X (B₁): Camping=3
     Tent Y (B₂): Camping=7

   Dot Product(A, B₂) > Dot Product(A, B₁)
   because 5 × 7 > 5 × 3

Here, 5×7 = 35 is bigger than 5×3 = 15, so Tent Y gets a higher rank when recommended to Charlie.

Example 2: Trending News Articles

Scenario: Two articles, A1 and A2, both aligned to a user’s topic of interest (direction). A1 is extremely popular (high magnitude in “popularity” dimension), A2 is less so.

Why Dot Product? Because we want to factor in popularity as well as alignment. A1, with higher magnitude in the popularity axis, yields a larger product with the user’s interest vector.

Illustration (Article Popularity Dimension):

User Interest Vector U
      ↑
      |  (Prefers renewable energy)
      |
      ↓

  Article A1: 
    - Topic: Renewable Energy (aligned with U)
    - Popularity: Very High (big magnitude)

  Article A2:
    - Topic: Renewable Energy (aligned with U)
    - Popularity: Moderate

  DotProduct(U, A1) >> DotProduct(U, A2)
  because A1's vector is "bigger" in the 
  popularity dimension.

The final recommendation algorithm might rank A1 higher because it checks both alignment (topic match) and intensity (popularity).

Euclidean Distance

What It Is: The “straight-line” distance between two points in multi-dimensional space. Especially useful if you want to measure overall difference across all dimensions.

Example 1: Customer Segmentation in Retail

Scenario: A supermarket plots each customer in a multi-dimensional space (one axis for dairy spend, another for produce, etc.).

Why Euclidean Distance? Because we want to see which customers are truly “close” in terms of actual spending behavior. Smaller distance = more similarity in purchasing patterns.

Illustration (Customer Clustering):

Spends on Dairy (y-axis)
          ↑
          |         • (Bob)
          |        
          |   d=?
          |             • (Charlie)
  --------+---------------------------→ Spends on Produce (x-axis)
          | 
          |       • (Alice)
          |
          |

In this 2D example:
distance(Alice, Bob) < distance(Alice, Charlie),
so Alice and Bob are more similar.

A typical clustering algorithm (like k-means) would try to group Alice and Bob together if they’re near each other.

Example 2: Similar Patients in Healthcare

Scenario: A hospital represents each patient by multiple health metrics (age, blood pressure, cholesterol, etc.). Two patients with near-identical stats are close in the vector space.

Why Euclidean Distance? Because we care about cumulative difference across all metrics, not just direction or weighted alignment. Patients with nearly the same blood pressure, cholesterol, etc. end up close by, indicating potentially similar diagnoses or treatments.

Illustration (Patient Profiles in 3D Health Space):

┌─────────────────────┐
│   3D Health Space   │
│(Age, BP, Cholesterol)
└─────────────────────┘

(Patient A)   •

Distance = ?               • (Patient B)

If the distance between A and B is small, the healthcare system may look at B’s diagnosis to inform A’s treatment plan.

Key Takeaways

Cosine Similarity
- Focus on direction, ignore magnitude.
- Perfect for scenarios like text document comparison (aligning topics, ignoring total length).
Dot Product
- Incorporates magnitude as well as direction.
- Great for scenarios where larger vectors have higher importance (e.g., popular articles, strong interest levels).
Euclidean Distance
- Measures straight-line difference across all dimensions.
- Ideal for clustering and “neighbors” analysis (e.g., segmenting customers, matching patient profiles).

Each of these metrics – Cosine Similarity, Dot Product, and Euclidean Distance – has its place. Whether you’re comparing text documents of different lengths, ranking items by popularity, or grouping similar consumers by behavior, choosing the right similarity measure can help you capture the essence of what “similar” really means in your application.

Additional resources:
Choosing Between Cosine Similarity, Dot Product, and Euclidean Distance for RAG Applications