We are looking for highly skilled professionals who can contribute to our team's success and help us maintain and improve the reliability of our systems.

Our customer is a multinational corporation with more than a century of history and offices in over 180 countries. Their most ambitious goal at the time is to introduce a range of Reduced-Risk Products (RRPs). The target audience is more than 1 billion of consumers around the globe.

Requirements:

Must-Have Capabilities:
Intermediate understanding of SRE principles and practices.
Ability to handle more complex tasks and contribute to the improvement of processes.
Intermediate troubleshooting and problem-solving skills.
Intermediate knowledge of the following technologies:
New Relic: Advanced monitoring and alerting setup, including custom dashboards.
ELK: Advanced log management, analysis, and visualization.
Opsgenie: Advanced alert management, including integration with other tools.
Terraform/Terraform Enterprise: Advanced IaC tasks, including module creation and management.
Bitbucket/GitHub: Advanced version control, including branching strategies and code reviews.
Python: Advanced scripting and automation, including API integrations.
JavaScript: Advanced scripting for automation tasks and tool integrations
.Jenkins: Advanced CI/CD pipeline setup, including complex workflows and integrations
.AWS: Intermediate understanding of cloud platforms and services
.Should-Have Capabilities
:Ability to mentor junior engineers and share knowledge
.Strong communication and collaboration skills
.Nice-to-Have Capabilities
:Understanding of Node.js
.Familiarity with container technologies (e.g., Docker, Kubernetes)
.Familiarity with Ansible

.Responsibilities

:Design, build, and maintain software delivery pipelines and infrastructure that support continuous integration, delivery, and deployment
.Collaborate with development and operations teams to ensure that software is delivered with high quality, speed, and reliability
.Automate manual processes, such as testing, deployment, and monitoring, to improve efficiency and reduce errors
.Develop and maintain monitoring and alerting systems to proactively identify and address issues in production environments
.Troubleshoot production issues, conducting root cause analysis, and implementing remediation plans
.Manage and scale infrastructure resources, such as servers, databases, and cloud services, to ensure optimal performance and cost-effectiveness
.Implement security best practices and ensure compliance with industry standards and regulations
.Continuously learn and keep up to date with new technologies and industry trends to improve system performance, security, and efficiency

Find Your Dream Job

Date Posted

Job Type

Technology

Work Setting

Salary Range

Experience Level

4330 matching jobs

Associate DevOps Engineer(Kubernetes, CI/CD, container orchestration)

DevOps Engineer - AWS

Devops Junior

Site Reliability Engineer

Site Reliability Engineer

Director Site Reliability Engineering

DevOps Engineer

DevSecOps Engineer

Site Reliability Engineer

Azure DevOps Engineer

Senior SRE (Site Reliability Engineer)

New SRE Jobs

For SRE Professionals

For Employers

Company