NPU Runtime Software Engineer

Designing an RBLN runtime module that interfaces RBLN compiler and driver modules
Designing and implementing user-level APIs, adding support for various language bindings, deploying ML models to the RBLN SDK, and maintaining user documentation
Conducting benchmarking and profiling to evaluate the existing runtime system's performance and implementing optimizations to enhance the overall system performance of RBLN NPU products
Optimizing inference serving using vLLM for NPU and conducting SOTA (State-of-the-Art) research to improve model serving performance

Bachelor's or higher degree in Computer Science, Electrical Engineering, or a related field
Comprehensive understanding of deep learning models and their applications in vision, natural language processing, speech recognition, and other domains
Familiarity with system software, including compilers, runtimes, drivers, firmware, etc.
Proficiency in programming languages: C++ and Python
Knowledge of data structures, algorithms, and OOP design patterns
Strong written and verbal communication skills

Hands-on experience with AI accelerator (e.g., GPU) driver APIs and runtimes
Exposure to ML frameworks such as PyTorch, TensorFlow, ONNXRuntime, TensorRT, and their respective optimization techniques
Solid understanding of operating systems, resource management, and high-performance computing principles
Deep expertise in Python or modern C++ and its advanced features for writing efficient, high-performance code
Experience with multithreading and parallel programming
Experience with serving platforms such as vLLM, TorchServe, and Triton Inference Server