Find Your Dream Job

Search through thousands of job postings to find your next opportunity

Date Posted

Job Type

Technology

Work Setting

Salary Range

$0k $100k $200k+

Experience Level

Cloud Platform and SRE Manager

FTD

Decatur, IL

Manager, Cloud Platforms


Information Technology/Infrastructure


 


OVERVIEW


As FTD’s Manager, Cloud Platforms you will champion a technological and cultural transformation toward DevOps and SRE practices, enabling efficient delivery and operation of high quality, reliable, secure software at scale. As a hands-on leader you will architect, engineer, optimize, and operate our Google Cloud platform environment, including Google Kubernetes Engine; re-envision and innovate Continuous Integration & Continuous Delivery and Infrastructure as Code solutions; and incubate and proliferate Site Reliability Engineering principles and practices to ensure the stability and reliability of our commerce platforms.


 


KEY RESPONSIBILITIES



  • Provide thought leadership and strategic guidance to FTD’s technology division in cloud architectures, as well as DevOps and SRE principles and practices

  • Lead and develop a team of engineers engaged in Google Cloud architecture and engineering, CI/CD, Site Reliability Engineering, Kubernetes administration, and related operational support

  • Drive adoption of SRE principles including SLOs and SLIs, error budgets, metrics-driven observability and decision-making, automating repetitive tasks, chaos engineering, and incident and problem management processes

  • Collaborate with technology teams to streamline, document and support CI/CD automation leveraging Jenkins, Bitbucket, and other tools, with an eye toward modernization and innovation

  • Provision and manage cloud resources and configuration using Terraform, Google Cloud SDK, kubectl, Google Cloud Console and other tools, and drive adoption of consistent provisioning practices

  • Promote development and security best practices and implement supporting automation, with a “shift left” mentality

  • Implement and maintain effective infrastructure and application observability solutions to improve visibility and streamline incident detection, response, and prevention

  • Troubleshoot and resolve an array of issues in CI/CD, Google Cloud Platform (GCP), Google Kubernetes Engine (GKE) and other technologies

  • Provide leadership in incident response and problem management activities to rapidly restore service and subsequently prevent recurrence

  • Perform continuous cloud cost analysis, attribution, and optimization

  • Maintain compliance with relevant security frameworks (e.g. SOC 2, CIS), standards (e.g. PCI-DSS) and regulations (e.g. CCPA), including participation in audits and assessments

  • Promote and practice agile workflows and processes within your team (Kanban preferred)

  • Create and maintain technical, procedural, and educational documentation and diagrams related to FTD’s network ecosystem

  • Embrace a culture of collaboration, enablement, customer service, continuous improvement, transparency, and financial responsibility

  • Perform other duties as directed


KNOWLEDGE, SKILLS AND ABILITIES



  • Bachelor's or advanced degree in Computer Science, Information Systems, or a related field, or equivalent experience

  • 5+ years architecting, delivering, and operating scalable, reliable, high-performance, and secure infrastructure and applications in on-prem and cloud environments (Google Cloud Platform or similar)

  • 2+ years managing a high-performing team(s) in close collaboration with resources in various technical disciplines

  • 2+ years in software engineering with languages such as Java, C#, Python, JavaScript, and related frameworks, ideally in a fast-paced 24x7 e-commerce environment

  • Google Professional Cloud Architect or similar certification desired

  • Broad experience in infrastructure technologies including networking, systems engineering, databases, information security, virtualization, backup and restore, observability, etc.

  • Advanced experience with CI/CD methodologies and technologies, including Jenkins implementations leveraging various plug-ins and Groovy-based customizations

  • Proficiency with microservices principles and orchestration, including containerization (e.g. Docker) and Kubernetes (e.g. Google Kubernetes Engine)

  • Experience with rapid detection and troubleshooting of technical issues using various observability and Application Performance Monitoring (APM) tools

  • Strong experience leveraging Infrastructure as Code (e.g. Terraform) and related tools for infrastructure provisioning and configuration

  • Excellence in navigating and prioritizing multiple simultaneous responsibilities of varying scope and complexity

  • Ability to effectively articulate technical concepts via oral, written, and other non-verbal communications to audiences at varying levels of proficiency

  • Demonstrated desire and ability to be self-directed, take ownership of issues, share knowledge, and establish a prominent level of credibility

  • Ability to operate effectively under pressure, both independently and in collaboration with others across multiple disciplines


DIRECT REPORTS



  • Senior DevOps Engineer (US)

  • Senior Cloud Platform Engineer (India)

  • Cloud Platform Engineer (India)

  • Site Reliability Engineer (India)

New SRE Jobs

Connecting top SRE talent with leading companies.

For SRE Professionals

For Employers

Company