CVS Health

Staff Observability Operations Engineer

Join CVS Health as a Staff Observability Operations Engineer in Denver, CO. Leverage ServiceNow ITOM for observability solutions, manage platforms, and enhance system performance. Benefits include medical plans, 401(k), and wellness programs.

ServiceNow Role Type:
ServiceNow Modules:
Department - JobBoardly X Webflow Template
Event Management
Department - JobBoardly X Webflow Template
IT Operations Management
Department - JobBoardly X Webflow Template
IT Service Management
Department - JobBoardly X Webflow Template
Integration Hub
ServiceNow Certifications (nice to have):
Department - JobBoardly X Webflow Template
Certified Implementation Specialist - Event Management

Job description

Date - JobBoardly X Webflow Template
Posted on:
 
March 31, 2025

We are seeking experienced Staff Observability Operations Engineers to oversee and optimize our observability platform. Responsibilities include deploying observability solutions, managing and administering observability and event management platforms, handling release management, system upgrades, patching, integrations, and troubleshooting incidents.

Requirements

  • 7+ years of experience in IT operations with significant responsibilities in system monitoring, performance tuning, and troubleshooting enterprise applications
  • 5+ years in a Site Reliability Engineering (SRE) role deploying and managing modern observability solutions
  • 5+ years managing and implementing observability and event management platforms (e.g., AppDynamics, Splunk, Prometheus, Grafana)
  • Experience developing and administering ServiceNow ITOM event management solutions, ensuring seamless integration with observability tools
  • Experience deploying and managing service reliability platforms (e.g., xMatters, OpsGenie, PagerDuty), configuring incident notifications, incident command workflows, and automating incident remediation workflows
  • Experience with and deep knowledge of cloud environments, cloud monitoring platforms, and container orchestration tools (e.g., AWS/CloudTrail, Azure/Monitor, GCP/GCM, Kubernetes, OpenShift)
  • Proficiency in Python and other scripting languages such as Ansible, PowerShell, and Bash for automation and configuration
  • Experience with and passion for deploying things "as code"
  • Hands-on experience deploying, managing, and administering observability platforms
  • Hands-on experience leading, coordinating, and performing migration of application, platform, and infrastructure observability solutions (e.g., full-stack APM, RUM, Session Replay, Server, Storage, Network, Database, NLB, etc.) from legacy tools to modern platforms
  • Hands on experience performing system upgrades, patching, and integrations to ensure platform stability and security
  • Experience developing and implementing monitoring and logging standards for infrastructure, platforms, and applications
  • Experience building and instrumenting dashboards to deliver technical and business process insights leveraging standard observability/BI platforms (e.g., AppDynamics, Grafana, Tableau, PowerBI)
  • Experience establishing and implementing event correlation policies and related rules to enrich event data, increase signal-to-noise-ratio for events, and reduce MTTD and MTTR
  • Excellent problem-solving skills, with the ability to handle multiple tasks, prioritize effectively, and work under pressure
  • Proven ability to troubleshoot and resolve complex technical issues related to observability platforms
  • Experience managing customer issues and requests, providing timely and effective solutions
  • Experience monitoring platform performance and implementing enhancements to support scalability and complexity
  • Experience leveraging telemetry data to automate performance optimization and capacity planning
  • Proficiency in scripting and programming languages such as Ansible, PowerShell, Bash, Python, YAML, XML, and JSON to automate deployment, configuration and instrumentation
  • Experience coordinating and managing release cycles for observability platforms
  • Knowledge of best practices in release management to ensure smooth and timely deployments
  • Experience configuring and leveraging source code management tools and workflows to manage and deploy Monitoring as Code
  • Excellent communication skills, both verbal and written
  • Ability to collaborate effectively with cross-functional teams and stakeholders
  • Strong interpersonal skills, with the ability to engage effectively with both technical teams and business stakeholders
  • Commitment to continuous improvement and staying current with industry trends and best practices
  • Ability to identify opportunities for process optimization and efficiency gains
  • Strong customer service orientation with the ability to manage customer relationships effectively
  • Experience in providing excellent customer service and support for observability solutions
  • Knowledge of compliance and security standards related to observability platforms
  • Ability to implement tools and processes to detect and remediate configuration drift and security risks
  • Experience managing operational data and systems access to ensure compliance with internal and external audit and regulatory requirements
  • Proficiency maintaining comprehensive documentation of observability platform configurations, processes, and procedures
  • Ability to generate and analyze reports on platform performance, incidents, and customer requests

Benefits

  • Affordable medical plan options
  • 401(k) plan (including matching company contributions)
  • Employee stock purchase plan
  • No-cost programs for all colleagues including wellness screenings, tobacco cessation and weight management programs, confidential counseling and financial coaching
  • Benefit solutions that address the different needs and preferences of our colleagues including paid time off, flexible work schedules, family leave, dependent care resources, colleague assistance programs, tuition assistance, retiree medical access and many other benefits depending on eligibility

Requirements Summary

7+ years of experience in IT operations, 5+ years in SRE, 5+ years managing observability platforms, and experience with cloud environments, container orchestration tools, and scripting languages