5+ years of experience in SRE, DevOps, or Platform Engineering roles.
Kubernetes, Docker, and other container orchestration tools.
DevOps Tools - CI/CD pipelines using tools like Jenkins, Git.
Programming Languages – Java, Python, Ansible.
Monitoring & Logging – Deep understanding of Observability solutions using Grafana, Prometheus, and ELK Stack.
DBMS knowledge (preferably MQSQL/PostgreSQL).
Good To Have
Kafka.
Cloud Exposure.
JavaScript, RESTful Webservices
Responsibilities
Automation - Develop and maintain automation scripts to streamline repetitive tasks like provisioning infrastructure, deploying updates, scaling systems, and managing configurations.
Monitoring & Alerting – Implement robust monitoring systems to detect system anomalies, performance bottlenecks and potential failures, triggering alerts for appropriate and timely intervention.
System Design & Architecture – Collaborate with development teams to design reliable and maintainable system architectures, considering fault tolerance, redundancy, and disaster recovery strategies.
Performance Optimization – Analyze system metrics to identify performance bottlenecks and implement optimizations to improve system responsiveness and efficiency.
Capacity Planning - Assess system capacity needs to ensure scalability and prevent performance degradation