Find Your Dream Job

Search through thousands of job postings to find your next opportunity

Date Posted

Job Type

Technology

Work Setting

Salary Range

$0k $100k $200k+

Experience Level

Site Reliability Engineer ( Data & Algorithm )

Unison Consulting

George Town, Penang, Malaysia

We are seeking a highly skilled Site Reliability Engineer (SRE) to join our Data & Algorithm team, where you'll be pivotal in building and maintaining resilient, scalable, and high-performing systems. You will act as the bridge between development and operations—championing reliability, reducing operational toil, and driving excellence through observability, automation, and deep system-level expertise.

This is a hands-on, high-impact role for someone who thrives in a fast-paced, multitasking environment and has a strong foundation in infrastructure, automation, and modern cloud-native tools.

Key Responsibilities:

  • Design and implement resilient and scalable system architectures to ensure high availability.
  • Drive the adoption and monitoring of Service Level Objectives (SLOs) and Service Level Indicators (SLIs) for all critical services.
  • Develop automation tools and scripts (Python & Bash) to reduce manual interventions and operational toil.
  • Troubleshoot and resolve infrastructure and application issues, especially around Kubernetes, storage modules, and containerization.
  • Collaborate closely with engineering, data, and DevOps teams to implement best practices for system reliability and incident management.
  • Conduct root cause analysis and post-incident reviews, implementing improvements to prevent recurrence.
  • Use tools like Grafana to monitor system health, derive insights, and tune performance curves effectively.
  • Manage and maintain documentation for all systems, processes, and incident responses.
  • Support and troubleshoot key-value and NoSQL databases, as well as Kafka or BMQ (forked Kafka) for data streaming.
  • Handle multitasking under pressure, prioritize workloads, and maintain effective communication during high-stress scenarios.
  • Translate and convert data formats (CSV, JSON, etc.) using scripting to support analytics and system configurations


Requirements

Required Qualifications:

  • Strong programming/scripting skills in Python and Bash.
  • Deep understanding of Kubernetes internals, containerization, and troubleshooting at the infrastructure level.
  • Experience in cloud platforms like AWS, GCP, or Azure.
  • Solid background in Linux system administration and networking fundamentals.
  • Proficient with tools like Git and VS Code.
  • Hands-on experience with monitoring tools, especially Grafana.
  • Familiarity with NoSQL databases and data streaming platforms (Kafka, BMQ).
  • Strong grasp of SRE principles: SLOs, SLIs, SLA management, toil reduction, incident handling.
  • Ability to multitask and thrive in high-pressure environments

NewSREJobs

Connecting top SRE talent with leading companies.

For SRE Professionals

For Employers

Company