Find Your Dream Job

Search through thousands of job postings to find your next opportunity

Date Posted

Job Type

Technology

Work Setting

Salary Range

$0k $100k $200k+

Experience Level

Site Reliability Engineer

Intellias

United States

Client is seeking exceptional Site Reliability Engineers to manage, tune and debug the large-scale highly available distributed systems. You will be working with a team of passionate and talented engineers in automation, tuning, and troubleshooting of Apache Pinot and SQL DBs. We are looking for motivated, hardworking and focused individuals who have a real passion for operational excellence, data systems, and automation.

Project overview:

Client is a cloud-based software company that enables business customers to derive advanced insights from real-time and historical data. Client was founded by the core software engineering team and inventors of Apache Pinot, which currently powers hundreds of user-facing applications at companies across industries, including LinkedIn, Uber, Target, 7Eleven, Etsy, Walmart, WePay, Factual, Weibo, and more. Clint's Cloud solution has enabled even more companies to deploy and operate real-time analytics at scale, including Stripe, Sovrn, Roadie, Just Eat Takeaway.com, Dialpad, Guitar Center, Blinkit, and more.

Requirements:

  • 5+ years of experience as an engineer (SRE, SDET, or development)
  • Experience managing highly available production facing distributed systems and in-depth knowledge of Java are a plus
  • Experience with cloud platforms such as AWS, GCP, or Azure
  • Experience with Kubernetes and container orchestration
  • Familiarity with streaming systems, such as Kafka, Pulsar, Flume, Flink, Spark, or similar
  • Knowledge of standard methodologies related to security, performance, and disaster recovery
  • Strong troubleshooting and critical thinking skills

Responsibilities:

  • Leverage various monitoring and alerting services to solve intricate programming problems at scale.
  • Manage and tune multiple critical customer-facing Apache Pinot clusters
  • Monitor availability, read/write latencies, and other key telemetry to proactively identify SLO misses and help mitigate issues
  • Build a rapport with and work closely with customers to mitigate and resolve incidents
  • Execute disaster recovery strategies with minimal downtime
  • Collaborate with other engineers to understand and troubleshoot systems and use the experience gained to influence the roadmap of other teams

New SRE Jobs

Connecting top SRE talent with leading companies.

For SRE Professionals

For Employers

Company