Find Your Dream Job

Search through thousands of job postings to find your next opportunity

Date Posted

Job Type

Technology

Work Setting

Salary Range

$0k $100k $200k+

Experience Level

Site Reliability Engineer (SRE) – US Hours

Radix

Pilar, Cordillera Admin Region, Philippines

About Radix

Radix is a fast-growing SaaS company serving the multifamily industry with actionable data and insights. Our values-Curiosity, Resilience, Impact, Courage, and Responsibility-are at the heart of how we operate and grow. At Radix, we believe that exceptional people build exceptional companies, and our Site Reliability Engineer (SRE) will play a critical role in ensuring the stability, performance, and observability of our platform, supporting our mission to provide transparency and industry-leading standards.

Your Impact

As a Site Reliability Engineer (SRE) at Radix, you will be at the forefront of ensuring our production systems are reliable, scalable, and performant. You will collaborate closely with DevOps and Engineering teams to proactively monitor, troubleshoot, and enhance system reliability. Your expertise will contribute to a seamless user experience, helping Radix maintain its reputation as the trusted source of data-driven insights in the multifamily industry. This role will work US hours to support US operations ensuring coverage for business needs.

Key Outcomes

  • Proactive System Monitoring: Implement and improve monitoring solutions to detect outages and performance issues before they impact users.
  • Incident Response & Resolution: Work with DevOps and Engineering to quickly diagnose and resolve production issues.
  • Enhanced Observability: Collaborate with teams to improve system visibility, logging, and alerting mechanisms.
  • Performance Optimization: Identify and implement improvements to ensure high system performance and scalability.
  • Post-Mortem Analysis: Conduct thorough post-incident reviews to prevent recurring issues and enhance system resilience.
  • Documentation & Process Improvement: Maintain detailed documentation of systems, workflows, and incident resolutions to ensure knowledge sharing and operational efficiency.

Key Responsibilities

  • Monitor production servers for outages and performance degradation, ensuring rapid issue detection and resolution.
  • Work closely with DevOps and Engineering teams to diagnose and address system failures and performance bottlenecks.
  • Enhance system observability by improving logging, metrics, and alerting capabilities.
  • Analyze system performance trends and implement optimizations to improve reliability and efficiency.
  • Conduct post-mortems for outages, documenting findings and recommendations for future prevention.
  • Develop and maintain detailed documentation for system architecture, troubleshooting procedures, and operational workflows.

Experience

What You Bring

  • 3+ years of experience in Site Reliability Engineering, DevOps, or related roles.
  • Strong background in monitoring, logging, and alerting tools (e.g., Prometheus, Grafana, ELK stack, Datadog).
  • Ability to work US hours to support US operations
  • Experience working with cloud platforms (AWS, Azure, or GCP).
  • Proficiency in scripting and automation (Python, Bash, or similar).
  • Familiarity with containerization and orchestration technologies (Docker, Kubernetes).
  • Experience conducting post-mortem analyses and implementing reliability improvements.

Skills

  • Strong analytical and problem-solving skills with a data-driven approach to system reliability.
  • Excellent collaboration and communication skills, working effectively with cross-functional teams.
  • Ability to thrive in a fast-paced, evolving environment with a proactive and ownership-driven mindset.

Personal Attributes

  • Curiosity: A drive to explore new technologies and continuously improve systems.
  • Resilience: Ability to adapt, troubleshoot, and lead incident response effectively.
  • Impact-Focused: A commitment to delivering high-performing, reliable systems.
  • Courage: Willingness to challenge assumptions and drive meaningful improvements.
  • Responsibility: Deep ownership of system reliability and performance.

How We Work At Radix

At Radix, we thrive in an environment built on trust, innovation, and collaboration. Our values guide everything we do, empowering team members to take ownership, make data-driven decisions, and continuously iterate to drive success for our clients and our company.

Join us at Radix, where your expertise in site reliability will help shape the future of an industry ready for transformation.

Job Posted by ApplicantPro

New SRE Jobs

Connecting top SRE talent with leading companies.

For SRE Professionals

For Employers

Company