Responsibilities and Opportunities
- Designing an RBLN runtime module that interfaces RBLN compiler and driver modules
- Designing and implementing user-level APIs, adding support for various language bindings, deploying ML models to the RBLN SDK, and maintaining user documentation
- Conducting benchmarking and profiling to evaluate the existing runtime system's performance and implementing optimizations to enhance the overall system performance of RBLN NPU products
- Optimizing inference serving using vLLM for NPU and conducting SOTA (State-of-the-Art) research to improve model serving performance
Key Qualifications
- Bachelor's or higher degree in Computer Science, Electrical Engineering, or a related field
- Comprehensive understanding of deep learning models and their applications in vision, natural language processing, speech recognition, and other domains
- Familiarity with system software, including compilers, runtimes, drivers, firmware, etc.
- Proficiency in programming languages: C++ and Python
- Knowledge of data structures, algorithms, and OOP design patterns
- Strong written and verbal communication skills
Ideal Qualifications
- Hands-on experience with AI accelerator (e.g., GPU) driver APIs and runtimes
- Exposure to ML frameworks such as PyTorch, TensorFlow, ONNXRuntime, TensorRT, and their respective optimization techniques
- Solid understanding of operating systems, resource management, and high-performance computing principles
- Deep expertise in Python or modern C++ and its advanced features for writing efficient, high-performance code
- Experience with multithreading and parallel programming
- Experience with serving platforms such as vLLM, TorchServe, and Triton Inference Server