Find Your Dream Job

Search through thousands of job postings to find your next opportunity

Date Posted

Job Type

Technology

Work Setting

Salary Range

$0k $100k $200k+

Experience Level

SRE Engineer

Infosys

Bengaluru East, Karnataka, India

  • Bachelor's degree in Computer Science, Engineering, or related field
  • 5+ years of experience in Site Reliability Engineering or similar roles
  • Strong experience with cloud platforms (AWS/Azure/GCP) and infrastructure-as-code
  • Extensive knowledge of monitoring tools (e.g., Prometheus, Grafana, ELK Stack)
  • Proficiency in at least one programming language (Python, Go, or Java preferred)
  • Experience with containerization and orchestration (Docker, Kubernetes)
  • Strong understanding of networking, system design, and distributed systems

Key Responsibilities, Command Center Design & Implementation

  • Architect and implement a centralized command center that provides comprehensive visibility into both infrastructure and application layers
  • Establish standardized operational procedures, runbooks, and escalation protocols for incident management
  • Design and implement monitoring solutions that provide real-time insights into system health, performance metrics, and business KPIs Operations Management:
  • Lead the development of automated remediation solutions for common operational issues
  • Implement and maintain SLOs/SLIs across critical services and applications
  • Drive continuous improvement in incident response times and system reliability metrics
  • Collaborate with development teams to ensure applications are designed with operational excellence in mind Tool Development & Integration:
  • Develop and maintain monitoring dashboards that provide actionable insights for both technical and non-technical stakeholders
  • Implement and customize monitoring tools for infrastructure and application performance monitoring
  • Create automation scripts and tools to streamline operational processes
  • Integrate various monitoring and alerting systems to provide a unified view of system health Leadership & Collaboration:
  • Mentor junior engineers in SRE practices and command center operations
  • Collaborate with security, development, and infrastructure teams to ensure comprehensive monitoring coverage
  • Partner with business stakeholders to align monitoring strategies with business objectives
  • Lead post-incident reviews and drive implementation of learned improvements Preferred Qualifications:
  • Experience in designing and implementing enterprise-scale command centers
  • Knowledge of AIOps and machine learning for IT operations
  • Certification in relevant cloud platforms or technologies is good to have
  • Experience with chaos engineering and resilience testing
  • Background in implementing ITIL practices across any of the IT services
  • Excellent problem-solving and analytical abilities
  • Strong communication skills and ability to work with cross-functional teams
  • Experience in incident management and on-call rotations
  • Proven track record of improving system reliability and performance
  • Ability to handle high-pressure situations and make quick decisions
  • Strong documentation and technical writing skills

NewSREJobs

Connecting top SRE talent with leading companies.

For SRE Professionals

For Employers

Company