Homecare Homebase is hiring a Platform Reliability Engineer to work in a hybrid environment, combining software and systems engineering to build and run large-scale, distributed, fault-tolerant systems. The ideal candidate will have 3+ years of experience in a 24x7 production enterprise-class environment as an SRE or comparable role, with expertise in coding, complexity analysis, troubleshooting, and large-scale modern system design.
Requirements
- Practice sustainable incident response and blameless postmortems.
- Operationalization of services including system testing, instrumentation, monitoring, capacity model development, training, and transition to operation teams.
- Write engineering level documentation and develop operational excellent standard operating procedures and run books with a bias towards automation.
- Maintain services once they are live by measuring and monitoring availability, latency and overall system health.
- Platform engineering and automation to maintain scale and reliability of systems.
- Manage deployments of major releases.
Benefits
- Strong written and verbal interpersonal skills.
- Excellent problem solving and analytical skills with attention to detail and driving issues to resolution.
- Experience solving problems via automation using orchestration platforms such as JAMS, Ansible, Azure Automation, and ServiceNow Flows.