CVS Health

Technical Lead, Observability Operations

Join CVS Health as a Technical Lead in Observability Operations in Connecticut. Oversee observability platforms, manage migrations, and enhance capabilities. 7+ years IT experience required. Benefits include 401(k), PTO, and education assistance.

ServiceNow Role Type:
Department - JobBoardly X Webflow Template
Technical Project Manager
ServiceNow Modules:
No items found.
ServiceNow Certifications (nice to have):

Job description

Date - JobBoardly X Webflow Template
Posted on:
 
January 21, 2025

CVS Health is seeking a Technical Lead for Observability Operations to manage and optimize their core observability and event management platforms, migrating telemetry from legacy platforms to modern solutions, and deploying solutions to improve current observability capabilities and instrument new applications and platforms.

Requirements

  • 7+ years of experience in IT operations with significant responsibilities in system monitoring, performance tuning, and troubleshooting enterprise applications
  • 5+ years in a Site Reliability Engineering (SRE) role deploying and managing modern observability solutions
  • 5+ years managing and implementing observability and event management platforms (e.g., AppDynamics, Splunk, Prometheus, Grafana for data visualization and monitoring)
  • Experience developing and administering ServiceNow ITOM event management solutions, ensuring seamless integration with observability tools
  • Experience deploying and managing xMatters service reliability platform, configuring incident notifications, incident command workflows, and automating incident remediation workflows
  • Experience with and deep knowledge of cloud environments, cloud monitoring platforms, and container orchestration tools (e.g., AWS/CloudTrail, Azure/Monitor, GCP/GCM, Kubernetes, OpenShift)
  • Experience with modern development tools and practices, including version control, continuous integration/continuous deployment (CI/CD), and agile methodologies
  • Proficiency in Python and other scripting languages such as Ansible, PowerShell, and Bash for automation and configuration
  • Experience with and passion for deploying things “as code”
  • Hands-on experience deploying, managing, and administering observability platforms
  • Hands-on experience leading, coordinating, and performing migration of application, platform, and infrastructure observability solutions (e.g., full-stack APM, RUM, Session Replay, Server, Storage, Network, Database, NLB, etc.) from legacy tools to modern platforms
  • Hands on experience performing system upgrades, patching, and integrations to ensure platform stability and security
  • Experience developing and implementing monitoring and logging standards for infrastructure, platforms, and applications
  • Experience building and instrumenting dashboards to deliver technical and business process insights leveraging standard observability/BI platforms (e.g., AppDynamics, Grafana, Tableau, PowerBI)
  • Experience establishing and implementing event correlation policies and related rules to enrich event data, increase signal-to-noise-ratio for events, and reduce MTTD and MTTR
  • Excellent problem-solving skills, with the ability to handle multiple tasks, prioritize effectively, and work under pressure
  • Strong analytical skills, with a focus on delivering actionable insights and improvements
  • Proven ability to troubleshoot and resolve complex technical issues related to observability platforms
  • Experience managing customer issues and requests, providing timely and effective solutions
  • Experience monitoring platform performance and implementing enhancements to support scalability and complexity
  • Experience leveraging telemetry data to automate performance optimization and capacity planning
  • Proficiency in scripting and programming languages such as Ansible, PowerShell, Bash, Python, YAML, XML, and JSON to automate deployment, configuration and instrumentation
  • Experience coordinating and managing release cycles for observability platforms
  • Knowledge of best practices in release management to ensure smooth and timely deployments
  • Experience configuring and leveraging source code management tools and workflows to manage and deploy Monitoring as Code
  • Excellent communication skills, both verbal and written
  • Ability to collaborate effectively with cross-functional teams and stakeholders
  • Strong interpersonal skills, with the ability to engage effectively with both technical teams and business stakeholders
  • Commitment to continuous improvement and staying current with industry trends and best practices
  • Ability to identify opportunities for process optimization and efficiency gains
  • Strong customer service orientation with the ability to manage customer relationships effectively
  • Experience in providing excellent customer service and support for observability solutions
  • Knowledge of compliance and security standards related to observability platforms
  • Ability to implement tools and processes to detect and remediate configuration drift and security risks
  • Experience managing operational data and systems access to ensure compliance with internal and external audit and regulatory requirements
  • Proficiency in maintaining comprehensive documentation of observability platform configurations, processes, and procedures
  • Ability to generate and analyze reports on platform performance, incidents, and customer requests

Benefits

  • CVS Health bonus, commission or short-term incentive program
  • 401(k) retirement savings plan
  • Employee Stock Purchase Plan
  • Fully-paid term life insurance plan
  • Short-term and long term disability benefits
  • Paid Time Off (“PTO”) or vacation pay
  • Paid holidays throughout the calendar year
  • Number of paid holidays, sick time and other time off are provided consistent with relevant state law and Company policies
  • Well-being programs
  • Education assistance
  • Free development courses
  • CVS store discount
  • Discount programs with participating partners

Requirements Summary

7+ years of experience in IT operations, 5+ years in a Site Reliability Engineering (SRE) role, and proficiency in Python and other scripting languages such as Ansible, PowerShell, and Bash for automation and configuration