As a Principal SRE with the Cortex Cloud Security Posture Management team, you will:
Cloud Expertise - Utilize your expertise in monitoring cloud platforms, particularly Google Cloud Platform, to optimize our infrastructure leveraging cloud-native technologies
Incident Management - Leverage incident management processes to ensure efficient resolution of system issues and minimal impact on services
Automation - Automate complex monitoring and alerting tasks by building tools for cloud operations, such as automated remediation of known issues and auto-scaling
CI/CD - Develop and maintain application deployment tools such as Terraform and Helm
Continuously Improve - Stay up-to-date with cutting-edge technologies, evaluate their potential impact on our operations, and implement them when appropriate
On-Call - Participate with our DevOps team to provide follow-the-sun operational coverage in the production of our SaaS product
Collaborate - Work with our Engineering team to influence the operability of the product and ensure the reliability and availability of our services
Qualifications
Your Experience
Incident and Alerts Management - Clear understanding of incident and alerts management in Site Reliability Engineering
DevOps/SRE Expertise - 5+ years of experience as a DevOps/SRE engineer with a passion for technology and a strong motivation for high reliability at the service level
Cloud Proficiency - High proficiency in either Google Cloud Platform or Amazon Web Services
Kubernetes and Docker - High proficiency with Kubernetes and Docker for container orchestration
Scripting and Automation - High proficiency in Python programming and Linux Shell commands - Experience with Terraform for infrastructure as code
Security - Strong grasp of security concepts and best practices
Observability - Experience with observability and incident response tools
Communication Skills - Effective communication and interpersonal skills, with the ability to work and coordinate between multiple teams
Troubleshooting - Ability to effectively troubleshoot and address emerging and complex problems
Independence - Ability to operate independently, make decisions, take action, and take responsibility