The Lead Site Reliability Engineer will be hands-on and provide mentorship to other team members on core SRE principles and tools. They will participate in end-to-end operational aspects of Production environment and work closely with Architects, DevOps, Product and development teams.
Requirements
- Skilled with cloud operations/administration in Amazon AWS.
- Bachelors or Master’s in Computer Science discipline.
- 5+ years’ experience focussed on Site Reliability Engineering or related position in AWS Cloud Platform.
- At least 2 AWS Certifications are must. (AWS Sysops Admin and Architects certifications preferred).
- Experience working with SQL, Windows Servers, Load balancers, Linux.
- Deep experience with AWS, Docker and Kubernetes, CloudFormation, CloudWatch, CodeDeploy, DynamoDB, Lambda, SQS, Amazon FSX, Elastic Search and networking concepts are must.
- Program at a high level in at least one language such as: Java, C#, Javascript, Python or Ruby.
- Integration experience with PagerDuty, ServiceNow, Datadog, CloudWatch.
- Good understanding of Site Reliability Engineering (SRE) philosophies, technologies, platforms and tools, SLO management, incident resolution, and automation;
Benefits
- Flexible vacation
- Two company-wide Mental Health Days off
- Access to the Headspace app
- Retirement savings
- Tuition reimbursement
- Employee incentive programs
- Resources for mental, physical, and financial wellbeing
- Flexible work arrangements, including work from anywhere for up to 8 weeks per year, and hybrid model, empowering employees to achieve a better work-life balance
- Two paid volunteer days off annually and opportunities to get involved with pro-bono consulting projects and Environmental, Social, and Governance (ESG) initiatives