CVS Health

Staff Observability Operations Engineer

Join CVS Health in CT as a Staff Observability Operations Engineer. Oversee observability platforms, requiring 7+ years in IT and SRE. Great benefits!

ServiceNow Role Type:
Department - JobBoardly X Webflow Template
Application Developer
ServiceNow Modules:
Department - JobBoardly X Webflow Template
IT Operations Management
Department - JobBoardly X Webflow Template
Incident Management
Department - JobBoardly X Webflow Template
DevOps
ServiceNow Certifications (nice to have):
Department - JobBoardly X Webflow Template
Certified Implementation Specialist - Event Management
Department - JobBoardly X Webflow Template
Certified Implementation Specialist - IT Service Management
Department - JobBoardly X Webflow Template
Certified Implementation Specialist - Security Incident Response

Job description

Date - JobBoardly X Webflow Template
Posted on:
 
February 7, 2025

CVS Health is seeking a Staff Observability Operations Engineer to oversee and optimize the observability platform, ensuring seamless and efficient operations. The successful candidate will have a strong background in Site Reliability Engineering (SRE), modern observability practices, and the management and implementation of modern observability and event management platforms.

Requirements

  • 7+ years of experience in IT operations, with significant responsibilities in system monitoring, performance tuning, and troubleshooting enterprise applications.
  • 5+ years in a Site Reliability Engineering (SRE) role deploying and managing modern observability solutions.
  • 5+ years managing and implementing observability and event management platforms (e.g., AppDynamics, Splunk, Prometheus, Grafana).
  • Experience developing and administering ServiceNow ITOM event management solutions, ensuring seamless integration with observability tools.
  • Experience deploying and managing service reliability platforms (e.g., xMatters, OpsGenie, PagerDuty), configuring incident notifications, incident command workflows, and automating incident remediation workflows.
  • Experience with and deep knowledge of cloud environments, cloud monitoring platforms, and container orchestration tools (e.g., AWS/CloudTrail, Azure/Monitor, GCP/GCM, Kubernetes, OpenShift).
  • Proficiency in Python and other scripting languages such as Ansible, PowerShell, and Bash for automation and configuration. Experience with and passion for deploying things “as code”.

Benefits

  • Medical, dental, and vision benefits
  • 401(k) retirement savings plan
  • Employee Stock Purchase Plan
  • Fully-paid term life insurance plan
  • Short-term and long term disability benefits
  • Well-being programs
  • Education assistance
  • Free development courses
  • CVS store discount
  • Discount programs with participating partners
  • Paid Time Off (“PTO”) or vacation pay
  • Paid holidays throughout the calendar year

Requirements Summary

7+ years of experience in IT operations, 5+ years in a Site Reliability Engineering (SRE) role, 5+ years managing and implementing observability and event management platforms