Alpha invests in incredible companies.

We are always searching for talented people who want to make a difference. Join our extended team.

Sr. Director - Backend Engineering

Coupang

Coupang

Software Engineering
Seoul, South Korea
Posted on Oct 28, 2025
  • Sr Director- Backend Engineering

    Key Skills and Role Responsibilities:

    This role is for a strategic and technical leader to define, build, and operate the infrastructure orchestration systems that power our organization's cutting-edge Artificial Intelligence (AI) initiatives. The Senior Director will lead a team responsible for ensuring a robust, scalable, cost-efficient, and high-performance platform for all stages of the AI lifecycle, from experimentation and training to deployment and inference.

    Strategy and Leadership

    • Define and execute the long-term vision and roadmap for the company’s AI infrastructure Network Services, aligning it with overall business and AI Services goals.

    • Lead, mentor, and grow a high-performing engineering and operations team focused on AI infrastructure and platform engineering.

    • Manage budget and resource allocation for AI infrastructure Network Services deliverables.

    • Act as a key liaison between AI infrastructure and other services owners and consumers, core engineering, Cloud infrastructure, and executive leadership.

    AI Infra Development and Operations

    • Oversee the design, implementation, and maintenance of the core network orchestration platforms for large-scale AI model training (e.g., distributed training, hyperparameter tuning) and deployment (e.g., containerization, serverless functions, edge deployment).

    • Ensure reliability, security, and compliance of the AI infrastructure, meeting strict standards for data governance and model integrity.

    • Establish Service Level Objectives (SLOs) and Key Performance Indicators (KPIs) for the AI platform services and lead efforts for continuous optimization and performance tuning.

    Technology and Architecture

    • Select, evaluate, and integrate the core technologies required for the AI stack (e.g., Cloud Overlay/Under networking, Infiniband, Load Balancer, DNS, Core Networking, Kubernetes, Ray, GPU/accelerator management, distributed file systems).

    • Champion infrastructure-as-code (IaC) principles to manage and provision AI resources consistently and at scale.

    Qualifications

    Required

    • Education: Bachelor's or Master’s degree in Computer Science, Engineering, or a related technical field.

    • Experience:

      • 15+ years of progressive experience in software engineering, infrastructure, or platform operations.

      • 5+ years of experience leading and managing technical teams, ideally in a Director or Sr. Director level or equivalent capacity.

      • Deep, hands-on experience designing and operating large-scale distributed systems and cloud-native network architectures.

      • Proven experience specifically with AI infrastructure orchestration (e.g., using Kubernetes) and managing accelerated compute resources (GPUs, TPUs, etc.).

      • 15+ years of Cloud backend engineering, Cloud Design, Deployment, DevOps.

      • 15+ years of experience leading system design and architecture leveraging Private Clouds and AWS and/or Azure/GCP.

      • 10+ years of demonstrable experience building and operating infrastructure as code, Infra Automation, and comfort with various flavors of Linux.

      • 15+ years of experience in building high-performance, highly available, and scalable distributed systems in the cloud.

      • 15+ years of experience in building and managing high-performance, highly available, and scalable Hybrid Cloud environments.

      • Excellent cross-group collaboration, outstanding verbal and written communication skills.

    • Skills:

      • Expert-level knowledge of containerization and orchestration (Docker, Kubernetes).

      • Software Defined Cloud Networking.

      • Strong background in DevOps and MLOps principles and tooling.

      • Proficiency in at least one modern programming language (e.g., Python, Go).

      • Exceptional strategic planning, organizational, and written/verbal communication skills.

    Preferred

    • Prior experience managing infrastructure for training and inference of large language models (LLMs) or foundation models.

    • Experience in a regulated industry with strict compliance requirements.

    • AI Private Cloud - Building and operating.

    Success Metrics

    A successful Senior Director - AI Infrastructure Orchestration will be measured by:

    • The time-to-market for AI infrastructure build, scale, and operation.

    • The resource utilization rate and cost efficiency of the AI compute infrastructure.

    • The reliability and uptime of the core AI platform services.

    • The talent retention and development within the AI Infrastructure team.