🪃 About Memorang
Our mission is to automate how humans acquire and master skills. The first step in our journey is to build the AI stack for education to transform the credentialing and publishing industries with the best tools to build curricula, assessments, and apps at scale.
After winning #1 (of 1,500) in the Vercel AI Accelerator we bootstrapped to millions in revenue and profitability while delivering over 200MM assessments via our AI platform. We also just closed our first strategic investment to scale faster and deploy our solution to millions of additional learners.
We're a lean, talented team that rewards agency, curiosity, and shipping real products that directly impact the lives of millions.
"Memorang has literally the best gen AI results I've seen applied to a real world problem."
- Director of Trust and Safety, Google Deepmind
🎯 Role
As a Principal DevOps Engineer, you'll work with the Head of Platform Engineering to deliver the infrastructure foundation powering our AI-driven education platform that serves millions of learners. You'll help execute our serverless AWS strategy, improve reliability toward 99.99% uptime, and raise the bar for scale, security, and developer velocity. You'll partner closely with engineering leadership to turn the infrastructure roadmap into production-ready systems and standards.
🛠️ Sample projects could include…
- Designing and implementing Infrastructure as Code using tools like Terraform, AWS CDK, Pulumi, or similar to provision complete Dev, Staging, and Production environments across AWS, Supabase, Neo4j, Vercel and third-party services.
- Building disaster recovery plans, RTO / RPO targets, multi-region deployment templates, DNS failover, and quarterly DR simulation runbooks.
- Simplifying our infrastructure by migrating legacy services to a fully serverless architecture using AWS Lambda and Terraform.
- Designing a self-healing infrastructure layer that automatically detects and resolves common production issues.
- Improving database performance by optimizing our graph database (Neo4j) and vector search infrastructure.
- Driving developer velocity by building a platform that enables product engineers to ship code to production safely in minutes.
- Working with the Head of Platform Engineering to deliver infrastructure roadmaps, security standards, and pragmatic rollout plans.
- Implementing OpenTelemetry across Lambda, Hono, and Next.js services, with structured logs, distributed traces, key service metrics, and actionable alerting through Better Stack or similar platforms.
🤝 You might be a fit if you…
- Have 8+ years of DevOps/SRE experience with deep expertise in AWS serverless architectures.
- Have strong hands-on experience with Infrastructure as Code tools such as Terraform, AWS CDK, Pulumi, CloudFormation, or similar.
- Have a strong background in database reliability engineering (PostgreSQL, Neo4j, Redis).
- Are obsessed with observability and have experience implementing comprehensive monitoring/tracing (Datadog, OpenTelemetry).
- Have a security-first mindset and experience with compliance standards (SOC2, FERPA).
- Write high-quality, maintainable code in TypeScript, Python, or Go.
- Have DevSecOps experience, including secure CI/CD, secrets management, vulnerability scanning, policy-as-code, or security automation.