Job Summary
- We are looking for a Junior Site Reliability Engineer (SRE) with strong Java coding and debugging skills to help maintain the reliability, performance, and scalability of our critical systems.
- This role involves working closely with senior engineers to monitor systems, automate processes, and enhance infrastructure reliability.
- Ideal candidates are passionate about Java, DevOps, cloud technologies, and automation in a fast-paced environment.
Experience: 2-4 years
Key Responsibilities
System Reliability & Performance:
- Monitor and maintain the availability of key services and applications.
- Define and enhance SLIs, SLOs, and SLAs for system reliability.
- Identify and resolve performance bottlenecks and inefficiencies.
Incident Management & Monitoring
- Assist in incident response, troubleshooting production issues, and conducting RCA (root cause analysis).
- Improve monitoring, logging, and alerting systems using tools like Prometheus, Grafana, and Elastic APM.
- Participate in on-call rotations for incident handling.
Java Coding & Debugging
- Write and debug Java-based applications to enhance system reliability.
- Analyze logs, troubleshoot issues, and optimize Java services.
- Gain exposure to JVM monitoring, thread dumps, and heap analysis.
- Collaborate with developers to boost Java application reliability.
Automation & Infrastructure
- Work with Infrastructure as Code (IaC) using Helm or Ansible.
- Optimize configurations for scalability and reliability.
- Automate operational tasks for efficiency improvements.
Collaboration & Learning
- Collaborate with senior SREs and software engineers to enhance system reliability.
- Continually increase knowledge in cloud computing (AWS, Azure, GCP), Kubernetes, and DevOps practices.
Skills & Qualifications
Required Skills:
- Strong Java programming and debugging skills (must-have).
- Experience with Linux systems, networking, and cloud platforms (AWS, Azure, GCP).
- Familiarity with monitoring tools such as Prometheus, Grafana, or New Relic.
- Experience troubleshooting and analyzing Java application performance.
- Strong problem-solving skills and analytical capabilities.
Preferred Skills
- Scripting skills in Python, Bash, or Go for automation.
- Experience with Kubernetes and containerization.
- Familiarity with infrastructure-as-code tools like Terraform or Ansible.
Skills: debugging,azure,linux systems,cloud platforms (aws, azure, gcp),prometheus,scripting (python, bash, go),aws,monitoring tools (prometheus, grafana, new relic),grafana,devops,containerization,problem-solving,analytical capabilities,networking,linux,java programming,kubernetes,infrastructure as code (helm, ansible, terraform),troubleshooting,java