Find Your Dream Job

Search through thousands of job postings to find your next opportunity

Date Posted

Job Type

Technology

Work Setting

Salary Range

$0k $100k $200k+

Experience Level

Site Reliability Engineer (SRE)

Astra Tech

Abu Dhabi, Abu Dhabi Emirate, United Arab Emirates

About Us


Established in 2022, Astra Tech has rapidly expanded its influence by strategically acquiring and developing key platforms such as PayBy, Rizek, Quantix, and Botim. These acquisitions have culminated in the creation of the world’s first Ultra App, Botim, which seamlessly integrates fintech, e-commerce, AI-powered tech solutions, and communication services into one intuitive and user-friendly experience. This powerful combination allows users to manage their finances, shop, and stay connected—all within a single, cohesive platform.


With over 150 million users across 155 countries, Astra Tech is more than just a tech company—it is a movement committed to enhancing lives through innovation. As a visionary leader in tech development and investment, our mission is clear: to revolutionize technology solutions for consumers and businesses, harnessing the power of AI to elevate digital experiences to unprecedented heights globally.


Role Summary


As a Site Reliability Engineer, you will be responsible for enhancing the reliability, scalability, and efficiency of our infrastructure and operations. You will automate routine tasks, optimize middleware components, manage Kubernetes clusters, and maintain robust monitoring systems using Prometheus. The role also involves contributing to CI/CD pipeline development, managing cloud resources with a focus on cost optimization, and driving improvements in operational processes through automation and proactive incident resolution.


Key Responsibilities

  • Automate routine operational tasks using Shell scripting, ensuring efficiency in log analysis, batch management, and system optimization.
  • Maintain and optimize middleware components supporting infrastructure operations, ensuring stability and performance.
  • Administer and optimize Kubernetes clusters, ensuring scalability, security, and performance.
  • Maintain and optimize monitoring and alerting systems based on Prometheus, ensuring high availability of services.
  • Contribute to the development of CI/CD pipelines
  • Manage cloud resources efficiently, implementing cost optimization strategies to reduce cloud expenditure.
  • Improve operational processes, develop automation tools, troubleshoot incidents, and enhance system stability and reliability.


Key Requirements

  • Proficiency in Shell scripting for automating operational workflows and system management tasks.
  • Experience in Python or Go, preferably for system automation, tooling, or backend services.
  • At least 5 years experience in Operation & Maintenance-related job experience. At least 2 years of hands-on Kubernetes administration experience, including expertise in CSI, CNI, and managing clusters with 20+ nodes in production.
  • Experience with Prometheus for monitoring and alerting in an enterprise environment.
  • Familiarity with CI/CD deployment processes, with knowledge of GitOps principles. Hands-on experience with GitOps is a plus.
  • Experience managing cloud platforms using Infrastructure as Code (IaC) tools like Terraform/OpenTofu. Azure experience is a plus.
  • Strong problem-solving skills, a proactive approach to troubleshooting, and a commitment to improving operational efficiency and system reliability.

New SRE Jobs

Connecting top SRE talent with leading companies.

For SRE Professionals

For Employers

Company