Find Your Dream Job

Search through thousands of job postings to find your next opportunity

Date Posted

Job Type

Technology

Work Setting

Salary Range

$0k $100k $200k+

Experience Level

Senior Engineer - Site Reliability

Okada Manila

Parañaque, National Capital Region, Philippines

I. MAJOR RESPONSIBILITIES AND DUTIES:

  • Configure and maintenance of the enterprise monitoring tool to provide realtime visibility and state of health across the technology stack
  • Design and create dashboards to provide multi-level view based on functional requirement such as executive and tactical views
  • Create and maintain key threshold across all monitoring elements to ensure proactive detection and early detection of impending incident or problem
  • Analyze events and correlate to all observability and monitoring tools to capture trends and behavior patterns to assist in proactive course of actions
  • Design, develop and utilize automation tools and scripts to address repetitive actions and where possible create correction course of action to prevent and/or reduce prolonged outages
  • Work closely with operations team during incident and problem management for quick reaction response as identified using the monitoring tools
  • Regularly review and optimize infrastructure performance using logs, metrics and traces as part of continuous improvements thru adjustment of thresholds and monitoring requirement as environment constantly change
  • Develop and maintain a robust alerting strategy, including integration with on-call tools to ensure timely escalation and resolution of critical issues.
  • Implement and manage end-to-end event lifecycle processes to ensure accurate incident detection and efficient response.

II. JOB SPECIFICATIONS:

Educational Requirement:

  • Bachelor’s degree in Computer Science, Information Technology, or a related field; or equivalent work experience.

Experience Requirement:

  • 2–5+ years of extensive experience as systems and network administrator
  • Hand-on experience managing monitoring tools such as but not limited to Solarwinds, Nagios, etc.
  • Evident understand what Observability and what it does

Skills and Attributes:

  • Proficient with major cloud platforms such as AWS, GCP, Azure and Alibaba Cloud
  • Hands-on experience with SNMP based monitoring tools such as Solarwinds, Nagios, CheckMK, etc.
  • Good grasp on Observability platform such as Splunk and Dynatrace
  • Experience with containerization platform such as Docker and Kubernetes
  • Extensive experience with virtualization technology such as VMWare
  • Strong knowledge of networking using collapsed architecture or similar enterprise networking technology
  • Knowledgeable in scripting languages such as Python, Bash, or PowerShell.
  • AWS Certified Solutions Architect, Azure Solutions Architect, or equivalent certification.
  • Certified Kubernetes Administrator (CKA)Solid understanding of disaster recovery and business continuity practices.

Other Qualifications:

  • Strong analytical skills to identify, troubleshoot, and resolve complex technical issues.
  • Excellent verbal and written communication skills for interacting with team members, stakeholders, and end-users. Ability to explain technical concepts to non-technical audiences.
  • Ability to work effectively in a team environment and collaborate with other IT Groups
  • Effective prioritization and management of multiple tasks and projects.
  • Flexibility to adapt to changing technologies, tools, and business requirements.
  • Proactive in identifying areas for improvement and suggesting enhancements.
  • Should be able to train junior team members
  • Ability to work under pressure and remain decisive

New SRE Jobs

Connecting top SRE talent with leading companies.

For SRE Professionals

For Employers

Company