Our client, a well-known enterprise financial services organization, is looking for a Senior Site Reliability to join their team. It is important that your most recent experience has been focused on core SRE skills like observability, monitoring, reiliability, availability, etc.
Requirements:
Experience with SRE design to address reliability and resiliency with availability of 5-9s
Experience in working in a cloud environment (OCP and AWS EMR).
Experience with application monitoring tools, observability, and performance assessments.
Strong experience with CI/CD pipelines (Jenkins or similar; Git/GitHub)
High level of familiarity with the Linux command line and scripting
Proven skills in high availability and scalability design, as well as performance monitoring and testing
Experience developing automation solutions in Java, bash, Python, Perl (or other similar languages)
Extremely comfortable with production environments, firewalls, and networking
Studied architectural patterns at scale, including thoughtfully designed APIs, repeatable delivery pipelines, and efficient computer engineering principles.
Experience as part of an Agile engineering or development team
Strong experience in deploying, observing, altering, logging, and monitoring systems (Splunk, Datadog) with a mindset towards predictive analysis.
Working knowledge of Ansible and Terraform.
Responsibilities
Analyze, design, code, test, and deploy new user stories and product features with high quality (security, reliability, operations) to production. Understands the software development lifecycle and leverages critical thinking skills to properly evaluate features and functionality.
Guide early-career engineers by providing learning tasks as well as work related tasks, directs the work of emerging talent, and helps them continue to grow in their technical skillset through mentorship.
Oversee application, system, and architecture design decisions and guides team to achieve key results for products assigned to them.
Remediate issues using engineering principles and creates proactive design solutions for potential failures to ensure high reliability of technical solutions.
Achieve team commitments (and influence others to do the same) through collaboration with other engineers, architects, product owners and data scientists.
Contribute to and leads technology communities in areas of design-thinking, tools/technology, agile software development, security, architecture and/or data.
Create and enforce IT standards within the system/application infrastructure and compatibility with the architecture of the platform.