Alpha invests in incredible companies.

We are always searching for talented people who want to make a difference. Join our extended team.

Director - IT Reliability Engineering

Coupang

Coupang

IT
Bengaluru, Karnataka, India
Posted 6+ months ago
  • Continuously transform traditional operations from a manually reactive organization to a self-healing automated one.
  • Implement best practices with a strategic focus on automation, and continuous improvement initiatives to enhance infra and application performance and reliability.
  • Implement and maintain advanced system monitoring tools and practices to proactively identify and address issues.
  • Ensure 24/7 coverage, managing shifts and rotations to maintain continuous support for clients. Proactive monitoring of real-time alerts, incidents, and requests from customers, and coordinate technician responses to resolve issues promptly.
  • Partner across Technology and Business units to exceed the standards of availability and performance that enable our business partners to perform their business functions and ensure an exceptional experience for our internal customers.
  • Develop and manage budget, optimizing resource allocation to meet operational goals.
  • Maintain a customer-centric approach by ensuring timely, accurate, and professional responses to customer inquiries and incidents. Implement best practices to enhance customer satisfaction and loyalty, striving to exceed business expectations.
  • Partners with Engineering and IT teams to govern creation of break/fix standard operating procedures.
  • Develop and implement key performance indicators (KPIs), metrics packages, and related information for CIO and Executive Leadership team.
  • Lead technology-related change initiatives, ensuring smooth transitions and minimal disruptions to the organization.
  • Responsible for incident, problem and change management processes and operations.
  • Building and executing the crisis management and business continuity strategy and plan.
  • Responsible for the internal and external event communication during outages.