Find Your Dream Job

Search through thousands of job postings to find your next opportunity

Date Posted

Job Type

Technology

Work Setting

Salary Range

$0k $100k $200k+

Experience Level

Site Reliability Engineer

ITC Infotech

Bengaluru, Karnataka, India

Site Reliability Engineer


Location: Bangalore / Remote


Mandatory Skills – Strong exp in SRE , Production support, Incident management, AWS, Python , Shell scripting


Job Description:

What you¿ll do:

  • Contribute to all aspects of the production environment for all merchant loyalty use cases ¿ Contribute to strategies for all facets of observability ¿ Identify areas of improvement in production
  • Ability to understand MTTR, SLO, SLI definitions and apply them to services.
  • Respond to incidents and own/drive incident manager role during active CIs Keep mitigation/resolution efforts on task by asking for updates, contributing data/investigation (when appropriate) Provide progress summaries and comms suggestions to Support within SLAs to enable effective customer comms during CIs ¿
  • Contribute to reliable, fault-tolerant, efficiently scalable and cost-effective services and infrastructure ¿
  • Maintain services once they are live by measuring and monitoring availability, latency and overall system health
  • Practice sustainable incident response and blameless postmortems
  • Able to create and execute queries to big data platform and relational data tables to identify process issues or to perform mass updates, preferred ¿ Ability to isolate problems between hardware and software
  • Analyze ITSM activities of the platform and provide feedback loop to development teams on operational gaps or resiliency concerns ¿ Support services before they go live through activities such as system design consulting, capacity planning and launch reviews ¿ Maintain services once they are live by measuring and monitoring availability, latency and
  • overall system health
  • Execute sustainably through mechanisms like automation and evolve systems by pushing for changes that improve reliability and velocity ¿ Work with a global team spread across tech hubs in multiple geographies and time zones What experience you need: ¿ Experience in Splunk and SignalFx
  • Experience with Amazon Web Services including RDS ¿ Relevant data DevOps, SRE, or general systems engineering experience ¿ Experience in managing large production platforms.
  • Experience architecting and implementing data governance processes and tooling (data catalogs, lineage tools, role-based access control, PII handling)
  • Strong coding ability in Python or other languages like Java, C#, Golang, C, C++, Perl or Ruby etc.
  • Systematic problem-solving approach, coupled with strong communication skills and a sense of ownership and drive
  • Ability to help debug and optimize code and automate routine tasks ¿
  • Ability to support many different stakeholders. Experience in dealing with difficult situations and making decisions with a sense of urgency is needed
  • Interest in designing, analyzing and troubleshooting large-scale distributed systems
  • Appetite for change and pushing the boundaries of what can be done with automation
  • Experience in working across development, operations, and product teams to prioritize needs and to build relationships is a must ¿ Good Handle on Change Management and Release Management aspects of Software

New SRE Jobs

Connecting top SRE talent with leading companies.

For SRE Professionals

For Employers

Company