Our client, a leading e-commerce start-up in the MENA region, is seeking an experienced Head of SRE to lead our organisation and ensure the reliability, scalability, and performance of our mission-critical e-commerce platform. This role combines deep technical expertise with strategic leadership to drive operational excellence across our AWS and Cloudflare infrastructure, serving millions of customers globally.
Responsibilities and Duties
- Develop and execute a comprehensive infrastructure strategy aligned with company goals and objectives.
- Oversee the design, implementation, and maintenance of multi-cloud-based infrastructure.
- Evaluate and implement new policies and processes to improve infrastructure efficiency, scalability and compliance with industry standards and regulatory requirements.
- Lead and manage the infrastructure teams, including hiring, training, and performance management.
- Oversee and mentor managers and team leads within the infrastructure team to ensure effective leadership and operational efficiency.
- Manage relationships with infrastructure vendors and service providers.
- Negotiate contracts and service level agreements (SLAs) to ensure optimal service delivery.
- Work with the CyberSecurity dept. to develop and enforce security policies and procedures to protect company data and infrastructure.
- Optimize infrastructure costs and performance through continuous analysis and improvement.
Qualifications
- 8+ years of experience in Site Reliability Engineering, Platform Engineering, or DevOps roles
- 3+ years in engineering leadership positions managing teams of 5+ engineers
- 5+ years of hands-on experience with AWS services including EC2, ECS/EKS, RDS, Lambda, CloudFormation/CDK
- 3+ years of experience with Cloudflare or similar CDN/edge computing platforms
- Proven track record managing infrastructure for high-traffic e-commerce or consumer-facing applications
Technical Skills
- Cloud Platforms: Expert-level AWS knowledge with deep expertise in managed services (Solutions Architect Professional certification required)
- Container Orchestration: Advanced EKS management, Kubernetes administration,Helm, Docker, and container security
- AWS Managed Services: Extensive experience with RDS, ElasticSearch, MSK, ECS Fargate, Lambda, API Gateway, ALB/NLB, CloudFront, and AWS security services
- Infrastructure as Code: Terraform for managing complex AWS environments
- Monitoring & Observability: AWS CloudWatch, Prometheus, Grafana, ELK Stack
- Security Tools: AWS Security Hub, GuardDuty, Config, CloudTrail, IAM, Secrets Manager, and third-party security platforms
- Programming Languages: Proficiency in Python, Go, or similar languages for automation, security tooling, and infrastructure management