Find Your Dream Job

Search through thousands of job postings to find your next opportunity

Date Posted

Job Type

Technology

Work Setting

Salary Range

$0k $100k $200k+

Experience Level

Principal Site Reliability Engineer (Cortex Cloud Security Posture Management)

SIDRAM TECHNOLOGIES

Atlanta, GA

Atlanta GA- Internal

As a Principal SRE with the Cortex Cloud Security Posture Management team, you will:

  • Cloud Expertise - Utilize your expertise in monitoring cloud platforms, particularly Google Cloud Platform, to optimize our infrastructure leveraging cloud-native technologies
  • Incident Management - Leverage incident management processes to ensure efficient resolution of system issues and minimal impact on services
  • Automation - Automate complex monitoring and alerting tasks by building tools for cloud operations, such as automated remediation of known issues and auto-scaling
  • CI/CD - Develop and maintain application deployment tools such as Terraform and Helm
  • Continuously Improve - Stay up-to-date with cutting-edge technologies, evaluate their potential impact on our operations, and implement them when appropriate
  • On-Call - Participate with our DevOps team to provide follow-the-sun operational coverage in the production of our SaaS product
  • Collaborate - Work with our Engineering team to influence the operability of the product and ensure the reliability and availability of our services

Qualifications

Your Experience

  • Incident and Alerts Management - Clear understanding of incident and alerts management in Site Reliability Engineering
  • DevOps/SRE Expertise - 5+ years of experience as a DevOps/SRE engineer with a passion for technology and a strong motivation for high reliability at the service level
  • Cloud Proficiency - High proficiency in either Google Cloud Platform or Amazon Web Services
  • Kubernetes and Docker - High proficiency with Kubernetes and Docker for container orchestration
  • Scripting and Automation - High proficiency in Python programming and Linux Shell commands - Experience with Terraform for infrastructure as code
  • Security - Strong grasp of security concepts and best practices
  • Observability - Experience with observability and incident response tools
  • Communication Skills - Effective communication and interpersonal skills, with the ability to work and coordinate between multiple teams
  • Troubleshooting - Ability to effectively troubleshoot and address emerging and complex problems
  • Independence - Ability to operate independently, make decisions, take action, and take responsibility

New SRE Jobs

Connecting top SRE talent with leading companies.

For SRE Professionals

For Employers

Company