# The Content Recommendation Engine ("What to Watch")

> This spec is a living starting point — open for revision once the prototype exists.

## Purpose

Matches movies and shows to user taste profiles, independent of subscriptions or budget. Goal: build the best possible platform-agnostic watchlist.

## Two Scoring Approaches, Blended

1. **Collaborative Filtering ("People Like You")** — aggregate user patterns. If users who liked X also liked Y, and you liked X, Y gets a boost.
2. **Thematic/Content-Based Filtering ("Vibes & Themes")** — specific tags (e.g. "Dystopian Sci-Fi", "Deadpan Humor", "Ensemble Cast"), not just broad genres. Builds a multidimensional taste profile per user.

### Hybrid Score Formula

```
Hybrid Score = (Alpha × Thematic Score) + ((1 − Alpha) × Collaborative Score)
```

Alpha is a tunable knob (0–1). Alpha = 0.7 means 70% theme-based, 30% community-based. Tune up if results feel too generic; tune down if they feel too niche.

### Recency Decay (Continuous, Not a Hard Cliff)

```
Time Multiplier = 1.0 + e^(−Decay Constant × Days Since Rating)
```

| Age of Rating | Multiplier |
|---|---|
| Day 1 | ~1.99× (nearly double weight) |
| Day 90 | ~1.50× |
| Day 180 | ~1.25× |
| Day 365+ | ~1.0× baseline (floor — never decays to zero) |

Decay Constant ~0.0077 for a 180-day focus curve.

---

## Inputs

### 1. The 4-Point Explicit Feedback System

When a user rates a title, it drives the taste profile:

- 👎 **Thumbs Down** — hard block on the title; slight negative weight on its primary themes
- 😶 **Neutral** — no algorithmic movement; doesn't influence future suggestions
- 🧩 **Right Vibe, Wrong Movie** — dislike the specific title (blocked from re-recommendation), but significantly boost its underlying themes for future discovery. *UI note: hidden by default; appears only as a secondary interaction after a Thumbs Down or Neutral (e.g. hover state, slide-up prompt). Prevents clutter while preserving the nuance.*
- 👍 **Liked It** — massive positive weight on the title's franchise, direct sequels/spin-offs, AND all underlying themes

### 2. Implicit Signals (Watch Behavior)

- **Binge Velocity** — finishing a 10-episode season over a weekend counts more than watching it over two months
- **Abandonment** — stop watching + no resume within 14 days = soft Thumbs Down for that title's themes
- **Cooldown Expiration** — when a user's rewatch timer hits zero, the title re-enters recommendations at max priority weight, treated like a new release

### 3. Emphasis Toggles (Set at Watchlist Add)

- **"Must Watch"** — engine aggressively hunts for similar content
- **"Nice to Watch"** — queued without significantly skewing discovery

---

## Group Dynamics ("Watch Together")

Primary rule: any Thumbs Down or Abandoned status from any member is a hard veto for the entire room.

### Group Intersection & Veto Formula

```
Group Score = (Σ Individual Scores) × Group Veto Factor
```

- **Group Veto Factor** = 0 if anyone vetoed, 1 if nobody vetoed
- One veto collapses the entire title's Group Score to 0 — the title is instantly eliminated

### Fallback — Majority Rules

Triggers if strict intersection yields fewer than 3 results:

```
Compromise Score = Σ Individual Scores  IF  (Non-Vetoes ≥ Total Users − 1)
```

For a room of 4: at least 3 must not have vetoed. Allows 1 soft veto to slide through. If the threshold isn't met, the title stays buried.

Overlapping highly-boosted themes (multiple users with high individual scores) float to the top of the group's recommendation list.

---

## Cold Start & Onboarding Strategy

New users get seeded with "High-Footprint" titles — massive cultural reach, heavy and distinct thematic clustering.

**Example:** *Brooklyn Nine-Nine* acts as a perfect calibration case. It immediately establishes a baseline for both the thematic engine (workplace comedy, ensemble cast, deadpan humor) and collaborative side (massive dataset of what other fans also watch). By forcing interactions with strategically chosen titles like this, the algorithm can deliver accurate recommendations from day one without an empty screen.

---

## Outputs

- **Populates "Recommended For You"** on the home/landing view, filtered to active subscriptions
- **Drives Search clusters** — personalized categories like "Because you liked The Matrix → Dystopian Cyberpunk" instead of generic "Sci-Fi"
- **Feeds the Subscription Optimizer** — platforms with catalogs rich in the user's boosted themes gain background weight advantage in the Rotate Churn algorithm, even if the user hasn't explicitly added those titles to a list

---

## A Note on Embeddings vs. Explicit Tags

The spec above uses hand-tagged themes + collaborative scoring. This is implementable without ML infrastructure and produces explainable results ("You liked this because it's Dystopian Cyberpunk").

A modern alternative worth evaluating before the production build: **vector embeddings + cosine similarity**. Each title gets a high-dimensional vector representation (trained on viewing behavior, descriptions, cast, etc.). Recommendations are found by nearest-neighbor search in that vector space. Scales better, surfaces non-obvious matches, no manual theme taxonomy to maintain.

Tradeoff: embeddings require ML infrastructure, cold-start handling, and are harder to debug and explain. Explicit tags are more transparent and tunable — and the explainability is itself a UX feature.

**The two can coexist:** embeddings for broad discovery underneath; explicit theme labels as the UX surface ("Because you liked..."). This is worth a deliberate decision before the production build, not during prototyping.

**For the prototype:** all recommendation output is static fake data. This doesn't need to be resolved now.
