As FTD’s Manager, Cloud Platforms you will champion a technological and cultural transformation toward DevOps and SRE practices, enabling efficient delivery and operation of high quality, reliable, secure software at scale. As a hands-on leader you will architect, engineer, optimize, and operate our Google Cloud platform environment, including Google Kubernetes Engine; re-envision and innovate Continuous Integration & Continuous Delivery and Infrastructure as Code solutions; and incubate and proliferate Site Reliability Engineering principles and practices to ensure the stability and reliability of our commerce platforms.
KEY RESPONSIBILITIES
Provide thought leadership and strategic guidance to FTD’s technology division in cloud architectures, as well as DevOps and SRE principles and practices
Lead and develop a team of engineers engaged in Google Cloud architecture and engineering, CI/CD, Site Reliability Engineering, Kubernetes administration, and related operational support
Drive adoption of SRE principles including SLOs and SLIs, error budgets, metrics-driven observability and decision-making, automating repetitive tasks, chaos engineering, and incident and problem management processes
Collaborate with technology teams to streamline, document and support CI/CD automation leveraging Jenkins, Bitbucket, and other tools, with an eye toward modernization and innovation
Provision and manage cloud resources and configuration using Terraform, Google Cloud SDK, kubectl, Google Cloud Console and other tools, and drive adoption of consistent provisioning practices
Promote development and security best practices and implement supporting automation, with a “shift left” mentality
Implement and maintain effective infrastructure and application observability solutions to improve visibility and streamline incident detection, response, and prevention
Troubleshoot and resolve an array of issues in CI/CD, Google Cloud Platform (GCP), Google Kubernetes Engine (GKE) and other technologies
Provide leadership in incident response and problem management activities to rapidly restore service and subsequently prevent recurrence
Perform continuous cloud cost analysis, attribution, and optimization
Maintain compliance with relevant security frameworks (e.g. SOC 2, CIS), standards (e.g. PCI-DSS) and regulations (e.g. CCPA), including participation in audits and assessments
Promote and practice agile workflows and processes within your team (Kanban preferred)
Create and maintain technical, procedural, and educational documentation and diagrams related to FTD’s network ecosystem
Embrace a culture of collaboration, enablement, customer service, continuous improvement, transparency, and financial responsibility
Perform other duties as directed
KNOWLEDGE, SKILLS AND ABILITIES
Bachelor's or advanced degree in Computer Science, Information Systems, or a related field, or equivalent experience
5+ years architecting, delivering, and operating scalable, reliable, high-performance, and secure infrastructure and applications in on-prem and cloud environments (Google Cloud Platform or similar)
2+ years managing a high-performing team(s) in close collaboration with resources in various technical disciplines
2+ years in software engineering with languages such as Java, C#, Python, JavaScript, and related frameworks, ideally in a fast-paced 24x7 e-commerce environment
Google Professional Cloud Architect or similar certification desired
Broad experience in infrastructure technologies including networking, systems engineering, databases, information security, virtualization, backup and restore, observability, etc.
Advanced experience with CI/CD methodologies and technologies, including Jenkins implementations leveraging various plug-ins and Groovy-based customizations
Proficiency with microservices principles and orchestration, including containerization (e.g. Docker) and Kubernetes (e.g. Google Kubernetes Engine)
Experience with rapid detection and troubleshooting of technical issues using various observability and Application Performance Monitoring (APM) tools
Strong experience leveraging Infrastructure as Code (e.g. Terraform) and related tools for infrastructure provisioning and configuration
Excellence in navigating and prioritizing multiple simultaneous responsibilities of varying scope and complexity
Ability to effectively articulate technical concepts via oral, written, and other non-verbal communications to audiences at varying levels of proficiency
Demonstrated desire and ability to be self-directed, take ownership of issues, share knowledge, and establish a prominent level of credibility
Ability to operate effectively under pressure, both independently and in collaboration with others across multiple disciplines