A Bitter Lesson

August 14, 2025

I've been helping my friends at hooper.gg build something pretty cool: you film yourself playing basketball, and their product automatically counts your stats, tracks your movements, and cuts together highlight reels. No extra cameras needed, just your phone and some clever deep learning models.

My part of the project involved building models for player tracking (using SAM2 as a foundation) and shot classification (determining whether a shot is a free throw, layup, three-pointer, etc). The technical work was fascinating, but this is the story of how I learned an expensive lesson about machine learning.

The Initial Success

I started with a ResNet model for shot classification, beginning with free throws since they seemed to have the least variability. Plus, my hooper.gg friends told me that tracking in-game free throws would be genuinely valuable for users' stat lines (otherwise they'd have to input them manually).

The early results looked promising. With about 100 data points of 4-second video clips split roughly 20% free throws, 80% everything else, I trained the model on ~90 clips and tested on 10. The model showed green for all the test cases. Time to scale up!

The Data Wall

I was looking to get a few hundred clips of free throws, but while looking through the data I realized that most people playing pickup basketball don't play with free throws.

Even when players are filming themselves to improve their game, they typically don't call fouls, or if they do, they just take the ball out at the top rather than shooting free throws. After personally watching hours of basketball footage (thankfully at 3x speed), I was coming up mostly empty-handed.

The "Clever" Solution

Then I had what felt like a breakthrough: why not get GPT-4o to do the labeling for me?

My friends already had a model that could identify shot timestamps in full sessions. So I devised this plan:

Spin up a GPU node on RunPod (with all the usual CUDA/PyTorch setup pain)
Run the shot detection model to get timestamps of shots
Clip ±2 seconds around each timestamp
Extract 5 still frames from each 4-second clip
Feed each frame to GPT-4o via API to classify as "free throw" or "not free throw"
Aggregate the votes across frames for the final classification

Initial testing of a few samples in ChatGPT looked promising, so I was optimistic.

The (Partial) Success

The system worked! Sort of.

The good news: we could now process videos at scale. The bad news: 90%+ of clips were coming back as "not free throws," which we already had plenty of. We still needed to find videos with actual free throws to get those crucial 100-200 positive examples.

We decided to exploit the pattern of people not usually playing pickup basketball to our advantage. We realized if the first 20% of a video contained no free throws, we could pretty safely assume the players weren't playing with free throws at all and skip the rest. This saved us from processing entire videos unnecessarily.

I estimated $100 in OpenAI credits would be enough to label 30+ hours of footage.

The Reality Check

GPT-4o wasn't perfect. It was overzealous, with about 50% false positives on free throw classifications. So I still needed manual review in the end.

But the real kicker came when my friend casually mentioned: "You know, we could just hire someone in Vietnam to label this data for $50..."

The Lesson

I had spent days engineering a sophisticated pipeline combining ResNet models with GPT-4o for automated data labeling. The result? A system that was still 3x more expensive than human labelers and still required manual verification.

The harsh but valuable lesson: Always check if human labelers in lower-cost regions can do the job before building your own AI solution—even when you're convinced that AI is "good enough" for the task.

Sometimes the most elegant solution isn't the most engineered one. Sometimes it's just asking the right person to look at some videos for a few hours.

Still, building the models was a blast, and hooper.gg is doing some genuinely impressive work with computer vision for basketball analytics. The technical challenges were real and the solutions were satisfying—I just wish I'd thought to price out human alternatives before diving into the automation rabbit hole.

Siddharth Ramakrishnan