Site Reliability Engineer

At the intersection of machine learning and large-scale infrastructure, the SRE team for our Applied Machine Learning group is redefining how intelligent systems operate at global scale. We blend the principles of software engineering with systems reliability to keep our AI and recommendation systems resilient, high-performing, and ever-evolving.

As a Site Reliability Engineer on this team, you'll be hands-on with some of the most advanced AI technologies, helping architect, maintain, and scale machine learning platforms that serve millions-if not billions-of users. You'll also play a critical role in optimizing system performance, making hardware and capacity recommendations, and automating everything possible.

What You'll Do:

Ensure our ML systems run smoothly, efficiently, and reliably-no matter how complex or large they get.
Dive deep into the guts of distributed systems to identify and resolve bottlenecks before they become outages.
Contribute to and lead the automation of infrastructure, pipelines, and operational routines.
Collaborate with engineering and hardware teams on capacity planning, architecture choices, and performance tuning.

What You Bring:

Deep knowledge of distributed systems and the experience to troubleshoot them with precision.
A Bachelor's or Master's in Computer Science or a closely related field focused on software development or systems engineering.
Solid programming chops in at least one of the following: Python, C/C++, or Go.
Strong foundation in algorithms, data structures, and computer science fundamentals.

Preferred Extras:

Experience designing and operating high-scale, high-availability systems.
Passion for writing clean, optimized code and automating away manual tasks.
Prior SRE experience in large distributed production environments.

Find Your Dream Job

Date Posted

Job Type

Technology

Work Setting

Salary Range

Experience Level

4330 matching jobs

Associate DevOps Engineer(Kubernetes, CI/CD, container orchestration)

DevOps Engineer - AWS

Devops Junior

Site Reliability Engineer

Site Reliability Engineer

Director Site Reliability Engineering

DevOps Engineer

DevSecOps Engineer

Site Reliability Engineer

Azure DevOps Engineer

Senior Site Reliability Engineer

New SRE Jobs

For SRE Professionals

For Employers

Company