Binary Defense is seeking a Senior Site Reliability Engineer (SRE) to join our engineering team.
In this role, you will be responsible for designing, implementing, and maintaining our infrastructure, deployment pipelines, and operational processes to ensure high availability, scalability, and reliability of our systems.
Key Responsibilities
Design, build, and maintain scalable and reliable infrastructure using Infrastructure as Code (IaC) principles Implement and manage CI/CD pipelines to enable frequent, reliable software deployments
Monitor system performance and availability, and respond to incidents as needed
Collaborate with development teams to improve application performance and reliability
Automate routine operational tasks and implement self-healing systems
Participate in on-call rotations to provide 24/7 support for critical systems
Document processes, configurations, and infrastructure components
Conduct capacity planning and performance tuning
Implement and enforce security best practices across infrastructure
Requirements
5+ years of experience in SRE, DevOps, or similar roles
Strong understanding of Linux/Unix systems administration
Experience with cloud platforms (AWS, GCP, or Azure)
Proficiency in infrastructure as code tools (Terraform, CloudFormation, or similar)
Experience with containerization technologies (Docker, Kubernetes)
Working knowledge of CI/CD tools (Jenkins, GitLab CI, GitHub Actions, or similar) Familiarity with monitoring and observability tools (Prometheus, Grafana, Datadog) Scripting and automation skills (Python, Bash, or similar)
Understanding of networking concepts and security best practices
Excellent problem-solving and troubleshooting skills
Preferred
Experience with database administration (MySQL, PostgreSQL, MongoDB)
Experience with configuration management tools (Ansible, Chef, Puppet)
Familiarity with distributed systems and microservices architecture
Experience with incident management and postmortem processes