Ready to shape the future of AI infrastructure and build systems that power the most advanced unstructured data pipelines in the world?

At Unstructured, we’re building the backbone of generative AI—enabling companies to transform PDFs, HTML, Word docs, images, and more into high-performance data pipelines that scale. Our tools are already used by half of the Fortune 500, and our open-source package has been downloaded 26+ million times. Now we’re entering our next chapter—and we’re hiring a Site Reliability Engineer to help scale our systems and safeguard our infrastructure.

If you’re energized by reliability, love solving infrastructure challenges at scale, and want to help define how modern AI systems run in production, this is your moment. You’ll work closely with Engineering, Product, and Customer teams to build scalable systems, streamline CI/CD, and make reliability a first-class citizen across everything we deploy.

🏢 This role is hybrid in San Francisco—join us in-office 3x a week for deep collaboration, whiteboard sessions, and hands-on impact.

🔧 What You’ll Own & Drive

🛠 Scale & Stability at the Core

Design and implement highly available, observable, and scalable infrastructure across cloud environments

Build resilient systems that meet the demands of enterprise-grade, production AI workloads

⚙️ Automate Everything

Develop Infrastructure-as-Code using Terraform, Pulumi, and others

Own CI/CD automation and build reusable pipelines with GitHub Actions and modern DevOps tooling

🚀 Own Kubernetes & Orchestration

Manage and optimize our Kubernetes clusters and containerized environments

Tune Helm charts, service mesh configs, and orchestration systems for performance and security

📊 Obsess Over Observability

Implement and maintain monitoring, logging, and alerting with tools like Prometheus, Grafana, Datadog, and Elastic

Ensure we can see, understand, and respond to system behavior in real-time

🧪 Drive Production Readiness

Partner with engineering to prepare features and systems for production rollouts

Contribute to capacity planning, deployment strategies, and fault-tolerant system design

🔥 Lead Incident Response

Support and lead incident response processes, postmortems, and root cause analysis

Champion a culture of blameless retrospectives and continuous improvement

💻 Accelerate Engineering Velocity

Improve developer experience through tooling, automation, and streamlined feedback loops

Help teams move faster without sacrificing quality or uptime

🧬 What You Bring

4+ years in SRE, DevOps, or Infrastructure Engineering roles supporting high-scale production environments
Deep experience with cloud platforms like AWS, GCP, or Azure
Expertise in Kubernetes, Docker, and container orchestration at scale
Strong Linux systems and networking fundamentals
Scripting and automation skills (Python, Bash, or Go preferred)
Proficiency with Infrastructure-as-Code (Terraform, Pulumi, Ansible, or similar)
Solid understanding of monitoring and observability best practices
A calm, systems-thinking approach to incident response and reliability

💎 Bonus Points

Experience supporting ML infrastructure or real-time data pipelines
Exposure to serverless or event-driven architectures
Contributions to open-source DevOps projects or communities
Familiarity with security and compliance in cloud-native environments

🌟 Why You’ll Love It Here

Impact That Matters: Own the core infrastructure behind AI systems used by the Fortune 500

Big Technical Challenges: Solve hard, meaningful problems at the cutting edge of cloud and data

Elite Team: Join a sharp, humble group of engineers who value execution and impact

SF Office Vibes: Collaborate live with real whiteboards and real humans (not just Slack threads)

Flexible Culture: Hybrid structure with async-friendly, low-ego collaboration

This role's salary is benchmarked against San Francisco market rates to remain competitive with top-tier talent in high-cost-of-living regions. Final compensation may vary based on experience, skill set, and location.

Find Your Dream Job

Date Posted

Job Type

Technology

Work Setting

Salary Range

Experience Level

4330 matching jobs

Associate DevOps Engineer(Kubernetes, CI/CD, container orchestration)

DevOps Engineer - AWS

Devops Junior

Site Reliability Engineer

Site Reliability Engineer

Director Site Reliability Engineering

DevOps Engineer

DevSecOps Engineer

Site Reliability Engineer

Azure DevOps Engineer

Site Reliability Engineer

New SRE Jobs

For SRE Professionals

For Employers

Company