As a Site Reliability Development/Engineering, you will be responsible for the operation of production environments, including systems and databases, supporting critical business operations for Singapore’s governmental sovereign cloud environment. You will be focused on automation and optimization of operations for multiple production environments. You will recommend new and novel solutions to improve availability, performance, and supportability. This is an opportunity to bring a combination of deep technical knowledge with administration/analysis knowledge of Oracle's Cloud Infrastructure to provide escalation support to a wide range of complex production environment problems related to immense growth, scaling, leveraging the cloud, extremely high performance, and high availability requirements. As a Site Reliability Development/Engineering, you will also guide junior engineers to solve complex problems, take part in large-scale incident bridges and help to build and optimize processes and procedures.
This role is open to Singaporeans and PRs only.
This role will involve the successful applicant working on government projects which may require security clearance being obtained and maintained as a condition of employment. Candidates applying for this role must be willing to provide necessary personal details for the application and maintenance of necessary security clearance.
Experience required over 4-5 years
Responsibilities
Development of automation and optimization’s focused on operational excellence.
Deep dive, root cause and solve for systemic issues.
Enhance Operations quality outcomes through scalable automations.
Install, monitor, maintain, support, and optimize all production server hardware and software.
Provide escalated technical support for complex technical issues which may include leading problem management cases and providing management status.
Coordinate escalated support cases and lead appropriate internal technical resources and/or third-party vendors to resolution and coordinate a storage infrastructure of Oracle system and database appliances.
Responsible for Oracle production environments; assist with server operating system and application upgrades, bug fixes, and patching; and work on standardization projects for both hardware and software under the Oracle technology stack while providing consistent system uptime as expected in a Cloud environment.
Lead communications with key partners in solving complex technical problems.
Provide technical guidance and leadership to junior members to enable them to grow in their careers.
This team will provide support and administration on a 24/7 basis and will require rotation across day and night shifts.
Requirements:
This role is open to Singaporeans and PRs only.
This role will involve the successful applicant working on government projects which may require security clearance being obtained and maintained as a condition of employment. Candidates applying for this role must be willing to provide necessary personal details for the application and maintenance of necessary security clearance.
Experience with Linux System Administration, Networking, Storage, Compute, and Virtualization
An understanding and experience working with technologies such as Kubernetes, Terraform, Ansible, Chef and Puppet.
Experience participating in or running incident bridges of significant scale
Customer focus, with a passion for delighting customers
Experience in SRE, cloud technical support, cloud operations, NOC or similar
Demonstrate ability to quickly learn new technical disciplines and then train others