Find Your Dream Job

Search through thousands of job postings to find your next opportunity

Date Posted

Job Type

Technology

Work Setting

Salary Range

$0k $100k $200k+

Experience Level

SRE Engineer

Weekday AI (YC W21)

Bengaluru, Karnataka, India

This role is for one of the Weekday's clients

Min Experience: 2 years

Location: Bengaluru

JobType: full-time

We are seeking a motivated and detail-oriented Site Reliability Engineer (SRE) with 2+ years of experience to join our growing technology team. In this role, you will play a critical part in ensuring the reliability, scalability, and performance of our systems and applications. Your primary responsibility will be maintaining and improving our infrastructure while providing application-level support to internal and external stakeholders.

You will work closely with DevOps, development, and support teams to manage production environments, streamline deployment processes, and maintain service availability. If you're passionate about automating operations, optimizing system performance, and improving incident response, we'd love to hear from you.

Requirements


Key Responsibilities:

  • Maintain and manage Linux-based production systems, ensuring high availability and optimal performance.
  • Automate deployment pipelines and manage CI/CD workflows using tools like Jenkins, GitLab CI/CD, or similar.
  • Administer Kubernetes (K8s) clusters, ensuring seamless container orchestration and deployment.
  • Provide ongoing application support, monitor performance metrics, and resolve incidents within defined SLAs.
  • Collaborate with development teams to design and implement reliable, scalable solutions.
  • Apply ITIL practices for incident, change, and problem management to improve operational efficiency.
  • Conduct root cause analysis and contribute to postmortems for production incidents.
  • Participate in 24x7 on-call rotation and proactively monitor services to minimize downtime.
  • Continuously improve observability using monitoring and alerting tools like Prometheus, Grafana, or similar.


Key Skills and Qualifications:

  • 2+ years of hands-on experience in a Site Reliability, DevOps, or System Administration role.
  • Strong proficiency in Linux system administration, including performance tuning and troubleshooting.
  • Solid understanding and experience with CI/CD pipelines and deployment tools.
  • Experience managing Kubernetes clusters in production environments.
  • Familiarity with ITIL processes and practical application in support/incident handling.
  • Hands-on experience with monitoring tools and logging frameworks.
  • Scripting knowledge (e.g., Bash, Python) to automate routine tasks and deployments.
  • Strong analytical and problem-solving skills, with a proactive and detail-oriented mindset.
  • Excellent communication and collaboration skills to work effectively across teams.


Nice to Have:

  • Cloud platform experience (AWS, GCP, or Azure).
  • Experience with Infrastructure as Code (IaC) tools like Terraform or Ansible.
  • Exposure to databases and basic SQL querying.
  • Previous experience in a fast-paced, high-availability SaaS environment.

NewSREJobs

Connecting top SRE talent with leading companies.

For SRE Professionals

For Employers

Company