HPC / Full-Stack Infrastructure Engineer

Location: Remote, Africa

Full Time

Role Overview

We are looking for a highly skilled HPC / Full-Stack Infrastructure Engineer to architect and manage the compute, data, and deployment backbone of our scientific AI systems. You will build and scale infrastructure that supports climate modeling, geospatial pipelines, and physics-informed machine learning workloads.

Key Responsibilities

Architect, deploy, and maintain high-performance computing (HPC) environments
Build reliable data and model pipelines for scientific AI workloads
Implement MLOps and DevOps systems for model training, evaluation, and deployment
Optimize compute workflows (GPU clusters, Kubernetes, distributed systems)
Manage APIs, backend services, cloud infra, and developer tooling
Ensure secure, scalable, and efficient infrastructure across the stack

Requirements

3-5 years experience in DevOps Engineering/Infrastructure Engineering, or similar roles
Expertise in Linux systems, Docker, Kubernetes, GPU computing
Experience with HPC or distributed training frameworks (Ray, MPI, Dask, Slurm)
Strong backend engineering experience (Python, Go, or Node.js)
Experience with cloud platforms (AWS, GCP, Azure)
Understanding of CI/CD pipelines and infrastructure-as-code (Terraform, Ansible)
Experience supporting scientific or ML teams is a strong plus
Comfortable communicating in English, both written and verbal