CUHK-Shenzhen FreedomAI × Tencent Hunyuan reveal: what really blocks phone agents from deployment isn't success rate — it's the privacy boundary

Over the past two years, the direction of AI agents has become unmistakable.

From chatting and writing code, to reading GUIs, tapping buttons, and filling forms — agents are steadily taking over real device operations. After systems like OpenClaw and Claude Computer Use took off, the community started asking a serious question:

Agents aren't just chatbots anymore. They're getting close to actually doing things for you.

But once agents start moving onto phones, a much more immediate question emerges:

Would you actually trust it to run on your phone?

Not because it can't do the job — quite the opposite. It's because it's all too likely to do too much.

It might request one extra permission, fill in one form field it shouldn't have, or hand your phone number to some tiny entry point you never even noticed. None of this involves hacking or adversarial prompts. It happens during the most normal, everyday mobile tasks.

To address this, the FreedomAI team at CUHK-Shenzhen, together with Tencent Hunyuan Vision, CUHK, HKU, HKUST, and SJTU, present a new piece of work. Its most important contribution isn't yet another leaderboard — it's the first time the community can seriously answer:

While completing a normal task, did the phone agent actually respect the user's privacy boundary?

📄 Paper: Do Phone-Use Agents Respect Your Privacy? 💻 Code: github.com/FreedomIntelligence/MyPhoneBench 📊 Trajectories: MyPhoneBench-Trajectories


Ordering a Hamburger — Why Should That Send a Chill Down Your Spine?

Let's start with the most everyday example imaginable. The figure below makes the problem brutally clear: all the user wanted was a hamburger, but the agent gave away their information step by step along the way.

12d379533229f0e93488b136a8d0da51.png

You ask a phone agent to order a hamburger from the KFC WeChat Mini Program. Sounds perfectly normal.

But look at what it actually did: