Thico(Sike) Xiang

微信图片_20251002002154_19_7(1).jpg

向思科

<aside> <img src="https://prod-files-secure.s3.us-west-2.amazonaws.com/44aa5dbc-d55c-477f-9d96-25e2fe4ed8c1/cd361d2a-9856-49d3-bfa3-abc0d900c2fa/25231.png" alt="https://prod-files-secure.s3.us-west-2.amazonaws.com/44aa5dbc-d55c-477f-9d96-25e2fe4ed8c1/cd361d2a-9856-49d3-bfa3-abc0d900c2fa/25231.png" width="40px" /> Github

</aside>

<aside> 👨🏼‍🎓 Google Scholar

</aside>

<aside> 📧 Email

</aside>

<aside> 💬 WeChat

</aside>

👋 Hi there!

I’m Thico(Sike) Xiang, a PhD student in Computer Science at Durham University, advised by Dr. Amir Atapour-Abarghouei, in the Adaptive Intelligence Lab. My research focuses on computer vision, especially Vision-Language Models (VLMs), Multimodal Representation Learning, and Medical Imaging Analysis, as well as Agent-based Systems, Medical Agents, Embodied AI and Autonomous Robotics.

Previously, I worked on multimodal learning and medical AI at the University of Electronic Science and Technology of China and the Internet Center of a hospital. I enjoy developing vision-language models and exploring their potential across diverse tasks. I believe impactful AI research should not only solve practical problems but also advance fundamental methods and theories.

📜 Selected Publications

Checkup2Action: A Multimodal Clinical Check-up Report Dataset for Patient-Oriented Action Card Generation

Sike Xiang, Shuang Chen, Kevin Qinghong Lin, Jialin Yu, Yijia Sun, Philip Torr, Amir Atapour-Abarghouei

ARXIV.[paper]

Hierarchical Connectivity-Based Optimisation for Multi-UAV Relative State Estimation

Jia Cheng, Sike Xiang, Amir Atapour-Abarghouei

AIM 2026.[Accepted]

BcQLM: Efficient Vision-Language Understanding with Distilled Q-Gated Cross-Modal Fusion

Sike Xiang, Shuang Chen, Amir Atapour-Abarghouei

EMNLP 2025.[paper]

💡 Projects

Checkup2Action

Checkup2Action is a multimodal clinical check-up report dataset and benchmark for generating safe, prioritised, patient-facing Action Cards from complex medical reports.

BcQLM

BcQLM is a lightweight yet high-performing multimodal large language model that integrates BreezeCLIP with a Q-Gated fusion mechanism to achieve efficient and scalable vision-language understanding under resource constraints.

🔬 Experience

Durham University, 09/2025–Present

Demonstrator Combined Academic Role

Durham University, 09/2024–Present

PhD Student, with Dr. Amir Atapour-Abarghouei

University of Electronic Science and Technology of China, 10/2023–07/2024

Research Assistant, with Dr. Wang wenyi

University of Glasgow, 09/2022–09/2023

Master of Science

Mianyang Third People's Hospital, 07/2022–08/2022

Internet Center Intern

Beijing Institute of Technology, Zhuhai, 09/2018–06/2022

Bachelor of Engineering

Acknowledgments The template of this personal website is shamelessly brought from here.