Manager, Cloud Platforms

Site Reliability Engineering

Information Technology/Infrastructure

Remote: CST or EST time zone highly preferred!

OVERVIEW

As FTD’s Cloud Platform and SRE Manager you will champion a technological and cultural transformation toward DevOps and SRE practices, enabling efficient delivery and operation of high quality, reliable, secure software at scale. As a hands-on leader you will architect, engineer, optimize, and operate our Google Cloud platform environment, including Google Kubernetes Engine; re-envision and innovate Continuous Integration & Continuous Delivery and Infrastructure as Code solutions; and incubate and proliferate Site Reliability Engineering principles and practices to ensure the stability and reliability of our commerce platforms.

KEY RESPONSIBILITIES

Provide thought leadership and strategic guidance to FTD’s technology division in cloud architectures, as well as DevOps and SRE principles and practices
Lead and develop a team of engineers engaged in Google Cloud architecture and engineering, CI/CD, Site Reliability Engineering, Kubernetes administration, and related operational support
Drive adoption of SRE principles including SLOs and SLIs, error budgets, metrics-driven observability and decision-making, automating repetitive tasks, chaos engineering, and incident and problem management processes
Collaborate with technology teams to streamline, document and support CI/CD automation leveraging Jenkins, Bitbucket, and other tools, with an eye toward modernization and innovation
Provision and manage cloud resources and configuration using Terraform, Google Cloud SDK, kubectl, Google Cloud Console and other tools, and drive adoption of consistent provisioning practices
Promote development and security best practices and implement supporting automation, with a “shift left” mentality
Implement and maintain effective infrastructure and application observability solutions to improve visibility and streamline incident detection, response, and prevention
Troubleshoot and resolve an array of issues in CI/CD, Google Cloud Platform (GCP), Google Kubernetes Engine (GKE) and other technologies
Provide leadership in incident response and problem management activities to rapidly restore service and subsequently prevent recurrence
Perform continuous cloud cost analysis, attribution, and optimization
Maintain compliance with relevant security frameworks (e.g. SOC 2, CIS), standards (e.g. PCI-DSS) and regulations (e.g. CCPA), including participation in audits and assessments
Promote and practice agile workflows and processes within your team (Kanban preferred)
Create and maintain technical, procedural, and educational documentation and diagrams related to FTD’s network ecosystem
Embrace a culture of collaboration, enablement, customer service, continuous improvement, transparency, and financial responsibility
Perform other duties as directed

KNOWLEDGE, SKILLS AND ABILITIES

Bachelor's or advanced degree in Computer Science, Information Systems, or a related field, or equivalent experience
5+ years architecting, delivering, and operating scalable, reliable, high-performance, and secure infrastructure and applications in on-prem and cloud environments (Google Cloud Platform or similar)
2+ years managing a high-performing team(s) in close collaboration with resources in various technical disciplines
2+ years in software engineering with languages such as Java, C#, Python, JavaScript, and related frameworks, ideally in a fast-paced 24x7 e-commerce environment
Google Professional Cloud Architect or similar certification desired
Broad experience in infrastructure technologies including networking, systems engineering, databases, information security, virtualization, backup and restore, observability, etc.
Advanced experience with CI/CD methodologies and technologies, including Jenkins implementations leveraging various plug-ins and Groovy-based customizations
Proficiency with microservices principles and orchestration, including containerization (e.g. Docker) and Kubernetes (e.g. Google Kubernetes Engine)
Experience with rapid detection and troubleshooting of technical issues using various observability and Application Performance Monitoring (APM) tools
Strong experience leveraging Infrastructure as Code (e.g. Terraform) and related tools for infrastructure provisioning and configuration
Excellence in navigating and prioritizing multiple simultaneous responsibilities of varying scope and complexity
Ability to effectively articulate technical concepts via oral, written, and other non-verbal communications to audiences at varying levels of proficiency
Demonstrated desire and ability to be self-directed, take ownership of issues, share knowledge, and establish a prominent level of credibility
Ability to operate effectively under pressure, both independently and in collaboration with others across multiple disciplines

DIRECT REPORTS

Senior DevOps Engineer (US)
Senior Cloud Platform Engineer (India)
Cloud Platform Engineer (India)
Site Reliability Engineer (India)

We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by applicable laws, regulations and ordinances.

Find Your Dream Job

Date Posted

Job Type

Technology

Work Setting

Salary Range

Experience Level

4330 matching jobs

Associate DevOps Engineer(Kubernetes, CI/CD, container orchestration)

DevOps Engineer - AWS

Devops Junior

Site Reliability Engineer

Site Reliability Engineer

Director Site Reliability Engineering

DevOps Engineer

DevSecOps Engineer

Site Reliability Engineer

Azure DevOps Engineer

Cloud Platform and SRE Manager

New SRE Jobs

For SRE Professionals

For Employers

Company