Senior Site Reliability Engineer I - Monitoring
Careem is building the Everything App for the greater Middle East, making it easier than ever to move around, order food and groceries, manage payments, and more. Careem is led by a powerful purpose to simplify and improve the lives of people and build an awesome organisation that inspires. Since 2012, Careem has created earnings for over 2.5 million Captains, simplified the lives of over 50 million customers, and built a platform for the region’s best talent to thrive and for entrepreneurs to scale their businesses. Careem operates in over 70 cities across 10 countries, from Morocco to Pakistan.
ABOUT THE ROLE
We are looking for someone passionate about automation, tooling, and frameworks to join the Monitoring team. You will be part of the team that has the mandate to build infra/app monitoring system/framework and enable all projects across Careem to improve the visibility and we get insight of system events, people can define alerts and get notified in case of any incident.
Key responsibilities include:
- Development of our distributed monitoring system to meet the challenging functional, scalability and reliability requirements for our fast-growing business
- Design/Architect solutions with a focus on scalability, testability, and maintainability
- Encourages and supports others to take on responsibility, authority, and accountability
- Coach, and mentor colleagues on an energetic, growing team.
- Facilitate collaboration with other engineers, product owners, and designers to solve interesting and challenging problems across our platform
- Build and ship new features and systems, with an emphasis on code quality, maintainability, readability, and testing
- Develop, maintain, and extend a variety of systems, including open-source, ready-made, and in-house applications.
- Be a valued member of an autonomous, cross-functional agile team
- Focus on quality and know what it means to ship high quality code.
- 5+ years of experience with monitoring systems like Prometheus , NewRelic, AppDynamic etc
- Experience in developing and debugging in one of these OOP languages, Java, Python, Bash, Go.
- Expert knowledge on Kubernetes
- Experience with Cloud Infrastructure (AWS preferred)
- Experience with infrastructure automation (Infrastructure as Code)
- Experience in architecture/design, developing, operating and troubleshooting highly available systems at scale
- Experience in building and owning tools for medium to large engineering teams.
- Experience of building systems, dashboards and metrics to facilitate a data-driven approach to problem resolution.
- Strong Unix or Linux background, including topics around network stack and scripting
- Obsession about keeping costs low while building solutions.
Nice to Have:
- Experience in multi-tiered distributed systems
- Proficient in configuring, managing, and optimizing Prometheus and Thanos stack for effective monitoring.
- CICD is a plus
- Experience on EKL stack and/or Log management.
- Experience with cloud-centric application development and deployment (AWS preferred)
What we’ll provide you
We offer colleagues the opportunity to drive impact in the region while they learn and grow. As a Careem colleague you will be able to:
- Work and learn from great minds by joining a community of inspiring colleagues.
- Put your passion to work in a purposeful organisation dedicated to creating impact in a region with a lot of untapped potential.
- Explore new opportunities to learn and grow every day.
- Enjoy the flexibility that comes with the trust of being an owner; work in a hybrid style with a mix of days at the office and at home, and remotely from any country in the world for 30 days a year with unlimited vacation days per year.
- Access to healthcare benefits and fitness reimbursements for health activities including: gym, health club and training classes.