Client is seeking exceptional Site Reliability Engineers to manage, tune and debug the large-scale highly available distributed systems. You will be working with a team of passionate and talented engineers in automation, tuning, and troubleshooting of Apache Pinot and SQL DBs. We are looking for motivated, hardworking and focused individuals who have a real passion for operational excellence, data systems, and automation.

Project overview:

Client is a cloud-based software company that enables business customers to derive advanced insights from real-time and historical data. Client was founded by the core software engineering team and inventors of Apache Pinot, which currently powers hundreds of user-facing applications at companies across industries, including LinkedIn, Uber, Target, 7Eleven, Etsy, Walmart, WePay, Factual, Weibo, and more. Clint's Cloud solution has enabled even more companies to deploy and operate real-time analytics at scale, including Stripe, Sovrn, Roadie, Just Eat Takeaway.com, Dialpad, Guitar Center, Blinkit, and more.

Requirements:

5+ years of experience as an engineer (SRE, SDET, or development)
Experience managing highly available production facing distributed systems and in-depth knowledge of Java are a plus
Experience with cloud platforms such as AWS, GCP, or Azure
Experience with Kubernetes and container orchestration
Familiarity with streaming systems, such as Kafka, Pulsar, Flume, Flink, Spark, or similar
Knowledge of standard methodologies related to security, performance, and disaster recovery
Strong troubleshooting and critical thinking skills

Responsibilities:

Leverage various monitoring and alerting services to solve intricate programming problems at scale.
Manage and tune multiple critical customer-facing Apache Pinot clusters
Monitor availability, read/write latencies, and other key telemetry to proactively identify SLO misses and help mitigate issues
Build a rapport with and work closely with customers to mitigate and resolve incidents
Execute disaster recovery strategies with minimal downtime
Collaborate with other engineers to understand and troubleshoot systems and use the experience gained to influence the roadmap of other teams

Find Your Dream Job

Date Posted

Job Type

Technology

Work Setting

Salary Range

Experience Level

4330 matching jobs

Associate DevOps Engineer(Kubernetes, CI/CD, container orchestration)

DevOps Engineer - AWS

Devops Junior

Site Reliability Engineer

Site Reliability Engineer

Director Site Reliability Engineering

DevOps Engineer

DevSecOps Engineer

Site Reliability Engineer

Azure DevOps Engineer

Site Reliability Engineer

New SRE Jobs

For SRE Professionals

For Employers

Company