Find Your Dream Job

Search through thousands of job postings to find your next opportunity

Date Posted

Job Type

Technology

Work Setting

Salary Range

$0k $100k $200k+

Experience Level

Associate Director- AWS SRE

PepsiCo

Hyderabad, Telangana, India

Overview

We are looking for a seasoned Associate Director of Site Reliability Engineering (SRE) to lead our AWS-focused SRE initiatives. In this role, you will be responsible for overseeing the reliability, scalability, and performance of critical applications and infrastructure hosted on AWS. You will lead a team of experienced SREs, drive strategic operational improvements, and ensure the seamless functioning of our cloud ecosystem to meet business and customer needs

Responsibilities

  • Leadership and Team Management:
    • Lead and mentor a team of SRE professionals, fostering a culture of innovation, collaboration, and accountability.
    • Develop and implement career development plans, provide coaching, and facilitate knowledge-sharing within the team.
  • Operational Excellence:
    • Drive the adoption of SRE principles, including SLAs, SLOs, and error budgets, to enhance system reliability and performance.
    • Oversee incident management processes, ensuring timely resolution and comprehensive root cause analysis.
    • Establish and monitor operational KPIs to measure and improve system availability and performance.
  • Automation and Tooling:
    • Champion the use of automation to reduce manual processes, improve efficiency, and enhance system reliability.
    • Implement and optimize Infrastructure as Code (IaC) using tools like Terraform, CloudFormation, or CDK.
  • AWS Infrastructure Management:
    • Design, build, and maintain scalable and secure AWS-based infrastructure to support current and future workloads.
    • Leverage AWS services such as EC2, RDS, Lambda, S3, CloudWatch, and others to enhance operational capabilities.
  • Collaboration and Stakeholder Engagement:
    • Partner with engineering, product, and DevOps teams to align SRE initiatives with business objectives.
    • Act as a key liaison between the SRE team and executive stakeholders, communicating updates on reliability and risks.
  • Risk and Security Management:
    • Ensure compliance with security standards and best practices within AWS environments.
    • Identify risks related to cloud infrastructure and implement strategies for mitigation.
Qualifications

  • Bachelor’s degree in Computer Science, Engineering, or a related field (or equivalent experience).
  • Should have 15+ years of experience with 10+ years of experience in cloud-based infrastructure and operations, with at least 4 years in a leadership role.
  • Deep expertise in AWS services, architecture, and tools, including hands-on experience with core AWS services (e.g., EC2, ECS, Lambda, S3, VPC, IAM).
  • Proficiency in automation scripting (e.g., Python, Bash) and Infrastructure as Code (e.g., Terraform, CloudFormation).
  • Strong knowledge of monitoring and observability tools like CloudWatch, Prometheus, Grafana, or Datadog.
  • Proven experience managing large-scale production environments, incident response, and operational scaling.
  • Hands-on experience with CI/CD pipelines and DevOps methodologies.

Preferred Qualifications

  • AWS certifications, such as AWS Certified Solutions Architect (Professional) or AWS Certified DevOps Engineer.
  • Experience with Kubernetes (EKS) and containerization technologies like Docker.
  • Familiarity with FinOps principles for cost optimization in AWS environments.
  • Strong analytical skills and a data-driven approach to decision-making.
  • Exceptional communication, leadership, and stakeholder management abilities.

NewSREJobs

Connecting top SRE talent with leading companies.

For SRE Professionals

For Employers

Company