We are seeking a motivated and experienced Site Reliability Engineer (SRE) to join our dynamic team. The ideal candidate will have a strong background in application performance monitoring, logging and tracing, and web performance optimization. You will play a crucial role in ensuring system reliability, scalability, and performance across our applications.
Key Responsibilities:
- Application Performance Monitoring (APM): Utilize tools such as Splunk Cloud, New Relic, Dynatrace, and AppDynamics to monitor application performance and ensure optimal user experience.
- Logging & Tracing: Implement and manage logging and tracing solutions using OpenTelemetry, AWS X-Ray, and Fluentd for effective performance tracking.
- Performance Debugging: Identify and resolve performance bottlenecks in React and Node.js applications through memory leak detection, CPU profiling, and load testing.
- Database Optimization: Optimize MongoDB and DynamoDB performance by implementing indexing, caching strategies, and TTL management.
- Adobe Experience Manager (AEM) Optimization: Fine-tune AEM setups by improving caching mechanisms and dispatcher performance.
- API Gateway & Serverless Architecture: Manage AWS API Gateway, Lambda, EventBridge, and other serverless services to ensure high availability and fault tolerance.
- Circuit Breaker & Fault Tolerance: Design and implement resilience patterns, including retries and backoffs, to enhance system reliability.
- Monitoring & Incident Management: Utilize Sentry and LogRocket for frontend and backend error monitoring, and manage incident response, including SLOs, SLIs, and postmortems.
- Web Performance Optimization: Enhance web performance through CDN implementation, lazy loading, and caching strategies.
Qualifications:
Education: Bachelor’s degree in Computer Science or a related field.
Experience:
- Proven experience in an SRE or related role, with a focus on application performance monitoring and system reliability.
- Familiarity with cloud platforms (AWS preferred) and serverless architectures. Experience with Splunk Cloud is a MUST.
Skills:
- Strong analytical and problem-solving skills.
- Proficiency in scripting languages (e.g., Python or Bash) and performance debugging tools.
- Excellent communication and collaboration skills to work effectively in a cross-functional team.
Preferred Qualifications:
- Experience with frontend technologies such as React.
- Knowledge of APM tools and performance tuning techniques.
- Familiarity with incident management processes and tools.