We are seeking a highly skilled NPU Runtime Engineer specializing in LLM Serving to design, optimize, and deploy large language models (LLMs) for efficient inference in production environments. This role involves working with cutting-edge AI serving frameworks, optimizing NPU-based inference performance, and integrating LLMs with scalable distributed systems.

Responsibilities and Opportunities

Key Qualifications

Ideal Qualifications