Find Your Dream Job

Search through thousands of job postings to find your next opportunity

Date Posted

Job Type

Technology

Work Setting

Salary Range

$0k $100k $200k+

Experience Level

Senior Site Reliability Engineer

Optomi

United States

Senior Site Reliability Engineer

Optomi in partnership with one of our premier clients Senior Site Reliability Engineer (SRE) to join our Data Platform team within a leading global media organization. In this mission-critical role, you’ll design, scale, and maintain the infrastructure powering data products and real-time insights across digital and physical experiences. This position sits at the intersection of DevOps, data engineering, and platform reliability—working closely with cross-functional teams to ensure the scalability, observability, and reliability of high-throughput data systems.

You’ll drive innovation across petabyte-scale data pipelines using automation, infrastructure-as-code, and cloud-native technologies—reducing operational overhead, improving incident response, and unlocking greater velocity for data-driven products.


What the right candidate will enjoy

  • A mission-critical role shaping the backbone of real-time data products at a global media leader
  • Full remote flexibility with a high-impact team working on cutting-edge infrastructure
  • Hands-on work with petabyte-scale pipelines and cloud-native tooling
  • A solid runway—initial 6-month contract with likely long-term extension


Required Qualifications

  • 6+ years in software engineering focused on SRE, DevOps, or platform infrastructure
  • Fluent in Python and one statically typed language (e.g., Go, Java, TypeScript)
  • Deep AWS experience: Lambda, ECS/EKS, Kinesis, S3, IAM, SNS/SQS, API Gateway
  • Strong background in distributed systems at scale
  • Expert in observability: metrics, logs, traces, and system health design
  • Skilled with Terraform, AWS CDK, and CI/CD automation
  • Comfortable working with SQL/NoSQL data systems and understanding architectural trade-offs
  • History of managing SLAs, SLOs, SLIs, and leading incident response
  • Strong communication skills across teams and functions


Nice to Have

  • Real-time data infra or analytics pipeline experience
  • DataDog and serverless observability chops
  • Experience with performance tuning, distributed tracing, and post-incident retros
  • Proven impact on system reliability metrics (MTTR, MTTD, deployment cadence)
  • Understanding of cloud data compliance and security best practices
  • Bonus points for media, streaming, or high-availability consumer tech backgrounds

New SRE Jobs

Connecting top SRE talent with leading companies.

For SRE Professionals

For Employers

Company