We are looking for a Lead Site Reliability Engineer (SRE) to ensure the reliability of critical applications and help patients. You will be responsible for setting up monitoring tools, creating proactive alerting systems, and analyzing incident data. You will also lead design reviews and develop robust scripts and automation tools.
Requirements
- Extensive observability background
- OpenTelemetry
- AWS knowledge
- Kubernetes
- DevOps practices
- monitoring tools (Splunk, AppDynamics, Datadog)
- scripting (Python)
- L1 & L2 support
- incident management
- ITIL
- documentation skills
- disaster recovery
- business continuity planning
- creating ServiceNow dashboards
- Linux
- shell scripting
- ITIL Foundation certification
- AWS or other Cloud Certification(s)
- SQL
- Postman
- MuleSoft
- networking concepts
Benefits
- company bonus
- comprehensive benefit program
- 401(k)
- pension
- vacation benefits
- medical, dental, vision and prescription drug benefits
- flexible benefits (e.g., healthcare and/or dependent day care flexible spending accounts)
- life insurance and death benefits
- time off and leave of absence benefits
- well-being benefits (e.g., employee assistance program, fitness benefits, and employee clubs and activities)