Find Your Dream Job

Search through thousands of job postings to find your next opportunity

Date Posted

Job Type

Technology

Work Setting

Salary Range

$0k $100k $200k+

Experience Level

Site Reliability Engineer (Transformation)

UnionBank of the Philippines

Pasig, National Capital Region, Philippines

Job Description


As an SRE, you will ensure the reliability, availability, and performance of our services and infrastructure. You will work closely with development teams to build and operate scalable, fault-tolerant systems while driving automation and improving operational efficiency. The ideal candidate will have a strong background in system administration, scripting, and a passion for building high-performance, resilient systems.

Key Responsibilities:

  • Monitor and maintain the reliability, availability, and performance of production systems and applications.
  • Develop and maintain monitoring, alerting, and logging solutions to ensure system health and performance.
  • Collaborate with software engineering teams to design and implement scalable, reliable, and efficient architectures.
  • Automate routine operational tasks such as system deployment, configuration, and scaling using infrastructure-as-code (IaC) tools.
  • Implement and improve incident response processes, including root cause analysis and post-mortem reports.
  • Optimize system performance, troubleshoot issues, and provide solutions to enhance reliability.
  • Manage and maintain cloud infrastructure (AWS, GCP, Azure), including provisioning, scaling, and managing resources.
  • Participate in capacity planning, disaster recovery, and high-availability strategies.
  • Ensure security and compliance best practices are followed across infrastructure and applications.
  • Drive culture improvements in operational efficiency, quality, and speed, including implementing best practices for change management.
  • Monitor system performance and troubleshoot issues, ensuring the stability, scalability, and security of infrastructure and applications.

Required Skills and Qualifications:

  • Strong experience in Linux/Unix systems administration and networking.
  • Proficiency in programming/scripting languages such as Python, Go, Bash, or similar.
  • Expertise with cloud platforms (AWS, GCP, Azure) and containerization technologies (Docker, Kubernetes).
  • Familiarity with configuration management and automation tools like Ansible, Puppet, or Chef.
  • Experience with monitoring and observability tools like Prometheus, Grafana, Nagios, or the ELK stack.
  • Solid understanding of incident management, disaster recovery, and business continuity planning.
  • Knowledge of distributed systems and microservices architectures.
  • Familiarity with version control systems (e.g., Git) and CI/CD pipelines (e.g., Jenkins, GitLab).
  • Experience in using tools such as Qlik, Confluent, Machine Learning (SageMaker, CML) and Snowflake
  • Strong problem-solving, debugging, and troubleshooting skills.
  • Ability to work in a collaborative environment and effectively communicate across teams.
  • Understanding of security practices and policies, including network security and data protection.

Preferred Qualifications:

  • Experience with infrastructure-as-code (IaC) tools such as Terraform, CloudFormation, or similar.
  • Certification in cloud platforms is a plus (AWS Certified Solutions Architect, Google Cloud Professional Cloud Architect, etc.).
  • Experience with automated deployment systems and container orchestration tools (e.g., Kubernetes).
  • Knowledge of Site Reliability Engineering principles and practices.
  • Strong understanding of Linux Operating System
  • Strong understanding of Network and Security and Compliance
  • Strong understanding of Cloud infrastructure
  • Experience in using microservices and containerization (Docker, Kubernetes, etc.)

New SRE Jobs

Connecting top SRE talent with leading companies.

For SRE Professionals

For Employers

Company