The Lead Site Reliability Engineer will provide mentorship to a growing SRE team, lead efforts for updating production with new versions/infrastructures, and ensure highest level of uptime to meet the customer SLA. The role requires a highly skilled technology professional with excellent communication skills, strategic mindset, and strong analytical and troubleshooting skills on AWS Cloud Platform.
Requirements
- Bachelor's or Master's in Computer Science discipline
- 5+ years' experience in Site Reliability Engineering or related position in AWS Cloud Platform
- At least 2 AWS Certifications (AWS Sysops Admin and Architects certifications preferred)
- Deep experience with AWS, Docker, and Kubernetes, CloudFormation, CloudWatch, CodeDeploy, DynamoDB, Lambda, SQS, Amazon FSX, Elastic Search, and networking concepts
- Ability to explain technical concepts in clear, non-technical language
- Working knowledge of infrastructure components (e.g. routers, load balancers, cloud products, container systems, compute, storage, and networks)
- Knowledge of security and compliance standards such as SOC/PCI is a plus
- Program at a high level in at least one language such as Java, C#, Javascript, Python, or Ruby
- Integration experience with PagerDuty, ServiceNow, Datadog, CloudWatch
Benefits
- Flexible vacation
- Two company-wide Mental Health Days off
- Access to the Headspace app
- Retirement savings
- Tuition reimbursement
- Employee incentive programs
- Resources for mental, physical, and financial wellbeing
- Work from anywhere for up to 8 weeks per year
- Hybrid model
- Paid volunteer days off annually
- Opportunities to get involved with pro-bono consulting projects and Environmental, Social, and Governance (ESG) initiatives