It all started with a simple question:
"Can i count how many pizzas are sold just by watching a video?"
At first glance, it seemed like an easy task. Point a camera at the kitchen, run a model, and start counting. But as i dove deeper, i realized this was more than just detection. It was about understanding context and covering a lot of edge cases. SO what really defines a pizza sale?
As i explored different strategies, one critical decision had to be made early:
"What exactly does a pizza sale look like on video?"
First answer: We count a pizza is sold when the employee hand the pizza box to the customer.
At first, it made sense. That moment when a customer receives their box feels like the natural end point of a sale.
But there are ton of cases i have to address.
It was too fragile. Too dependent on things i couldn’t control.
Second answer: A pizza is considered sold once it is placed into an open box
Now! Let’s get straight to the next phase:
We count a pizza is sold when employee put the pizza into box