Radix is a fast-growing SaaS company serving the multifamily industry with actionable data and insights. Our values-Curiosity, Resilience, Impact, Courage, and Responsibility-are at the heart of how we operate and grow. At Radix, we believe that exceptional people build exceptional companies, and our Site Reliability Engineer (SRE) will play a critical role in ensuring the stability, performance, and observability of our platform, supporting our mission to provide transparency and industry-leading standards.
Your Impact
As a Site Reliability Engineer (SRE) at Radix, you will be at the forefront of ensuring our production systems are reliable, scalable, and performant. You will collaborate closely with DevOps and Engineering teams to proactively monitor, troubleshoot, and enhance system reliability. Your expertise will contribute to a seamless user experience, helping Radix maintain its reputation as the trusted source of data-driven insights in the multifamily industry. This role will work US hours to support US operations ensuring coverage for business needs.
Key Outcomes
Proactive System Monitoring: Implement and improve monitoring solutions to detect outages and performance issues before they impact users.
Incident Response & Resolution: Work with DevOps and Engineering to quickly diagnose and resolve production issues.
Enhanced Observability: Collaborate with teams to improve system visibility, logging, and alerting mechanisms.
Performance Optimization: Identify and implement improvements to ensure high system performance and scalability.
Post-Mortem Analysis: Conduct thorough post-incident reviews to prevent recurring issues and enhance system resilience.
Documentation & Process Improvement: Maintain detailed documentation of systems, workflows, and incident resolutions to ensure knowledge sharing and operational efficiency.
Key Responsibilities
Monitor production servers for outages and performance degradation, ensuring rapid issue detection and resolution.
Work closely with DevOps and Engineering teams to diagnose and address system failures and performance bottlenecks.
Enhance system observability by improving logging, metrics, and alerting capabilities.
Analyze system performance trends and implement optimizations to improve reliability and efficiency.
Conduct post-mortems for outages, documenting findings and recommendations for future prevention.
Develop and maintain detailed documentation for system architecture, troubleshooting procedures, and operational workflows.
Experience
What You Bring
3+ years of experience in Site Reliability Engineering, DevOps, or related roles.
Strong background in monitoring, logging, and alerting tools (e.g., Prometheus, Grafana, ELK stack, Datadog).
Ability to work US hours to support US operations
Experience working with cloud platforms (AWS, Azure, or GCP).
Proficiency in scripting and automation (Python, Bash, or similar).
Familiarity with containerization and orchestration technologies (Docker, Kubernetes).
Experience conducting post-mortem analyses and implementing reliability improvements.
Skills
Strong analytical and problem-solving skills with a data-driven approach to system reliability.
Excellent collaboration and communication skills, working effectively with cross-functional teams.
Ability to thrive in a fast-paced, evolving environment with a proactive and ownership-driven mindset.
Personal Attributes
Curiosity: A drive to explore new technologies and continuously improve systems.
Resilience: Ability to adapt, troubleshoot, and lead incident response effectively.
Impact-Focused: A commitment to delivering high-performing, reliable systems.
Courage: Willingness to challenge assumptions and drive meaningful improvements.
Responsibility: Deep ownership of system reliability and performance.
How We Work At Radix
At Radix, we thrive in an environment built on trust, innovation, and collaboration. Our values guide everything we do, empowering team members to take ownership, make data-driven decisions, and continuously iterate to drive success for our clients and our company.
Join us at Radix, where your expertise in site reliability will help shape the future of an industry ready for transformation.