IT Engagements is a global staff augmentation firm providing a wide-range of talent on-demand and total workforce solutions. We have an immediate opening for the below position with one of our premium clients.
We're Hiring: SRE with Strong Python Automation Experience
Mode of Hiring : Full Time Permanent- ONLY USC AND GC WORKS _ NO H1b and OPts
Note: This role is not open for remote work.
Are you a highly skilled Automation Engineer with a strong systems engineering background? Do you thrive in fast-paced environments and enjoy solving complex challenges with innovative automation solutions? If so, we want to hear from you!
Key Responsibilities
Develop Python-based automation solutions to streamline on-prem and cloud infrastructure management (GCP and Kubernetes).
Continuously identify and implement opportunities to enhance operational excellence.
Build proactive and scalable solutions to improve reliability.
Implement and manage configuration automation using Ansible (desirable).
Integrate tools and services via APIs for seamless interoperability.
Enhance deployment reliability with automated chaos strategies, failover mechanisms, and self-healing infrastructure.
Develop proactive monitoring solutions using Splunk, GCP Operations Suite, Grafana, and Prometheus.
Perform deep root cause analysis (RCA) and incident management for complex system failures.
Work on system resilience and performance tuning to ensure mission-critical applications run efficiently.
Apply AI/ML techniques to automation workflows for anomaly detection, predictive scaling, and intelligent alerting.
Required Skills & Experience
Strong background in Systems Engineering with a focus on automation and reliability.
Proficiency in Python (intermediate to expert level) for automation development.
Hands-on expertise with Kubernetes and cloud platforms (GCP or other major cloud platforms).
Experience integrating tools and platforms via APIs.
Expertise in monitoring tools like Splunk, GCP Operations Suite, Grafana, and Prometheus.
Strong problem-solving skills and ability to excel in high-stakes environments.
Experience with Ansible for infrastructure automation.
Prior experience in mission-critical teams managing large-scale, high-availability systems is a plus.
Enthusiasm for AI/ML and AIOps to enhance automation workflows.