CVS Health

Staff Observability Operations Engineer

Join CVS Health in Connecticut as a Staff Observability Operations Engineer. Oversee observability platforms, requiring 7+ years in IT and SRE. Benefits include medical, 401(k), and education assistance.

Direct Hire

Senior

ServiceNow Role Type:

Application Developer

ServiceNow Modules:

IT Operations Management

DevOps

Incident Management

Customer Service Management

ServiceNow Certifications (nice to have):

Certified Implementation Specialist - Event Management

Certified Implementation Specialist - IT Service Management

Certified Implementation Specialist - Platform Analytics

Job description

Posted on:

February 7, 2025

We are seeking a Staff Observability Operations Engineer to oversee and optimize our observability platform, ensuring seamless and efficient operations. The ideal candidate will have a strong background in Site Reliability Engineering (SRE), modern observability practices, and the management and implementation of modern observability and event management platforms.

Requirements

7+ years of experience in IT operations, with significant responsibilities in system monitoring, performance tuning, and troubleshooting enterprise applications.
5+ years in a Site Reliability Engineering (SRE) role deploying and managing modern observability solutions.
5+ years managing and implementing observability and event management platforms (e.g., AppDynamics, Splunk, Prometheus, Grafana).
Experience developing and administering ServiceNow ITOM event management solutions, ensuring seamless integration with observability tools.
Experience deploying and managing service reliability platforms (e.g., xMatters, OpsGenie, PagerDuty), configuring incident notifications, incident command workflows, and automating incident remediation workflows.
Experience with and deep knowledge of cloud environments, cloud monitoring platforms, and container orchestration tools (e.g., AWS/CloudTrail, Azure/Monitor, GCP/GCM, Kubernetes, OpenShift).
Proficiency in Python and other scripting languages such as Ansible, PowerShell, and Bash for automation and configuration. Experience with and passion for deploying things “as code”.
Hands-on experience deploying, managing, and administering observability platforms.
Hands-on experience leading, coordinating, and performing migration of application, platform, and infrastructure observability solutions (e.g., full-stack APM, RUM, Session Replay, Server, Storage, Network, Database, NLB, etc.) from legacy tools to modern platforms.
Hands on experience performing system upgrades, patching, and integrations to ensure platform stability and security.
Experience developing and implementing monitoring and logging standards for infrastructure, platforms, and applications.
Experience building and instrumenting dashboards to deliver technical and business process insights leveraging standard observability/BI platforms (e.g., AppDynamics, Grafana, Tableau, PowerBI).
Experience establishing and implementing event correlation policies and related rules to enrich event data, increase signal-to-noise-ratio for events, and reduce MTTD and MTTR.
Excellent problem-solving skills, with the ability to handle multiple tasks, prioritize effectively, and work under pressure.
Proven ability to troubleshoot and resolve complex technical issues related to observability platforms.
Experience managing customer issues and requests, providing timely and effective solutions.
Experience monitoring platform performance and implementing enhancements to support scalability and complexity.
Experience leveraging telemetry data to automate performance optimization and capacity planning.
Proficiency in scripting and programming languages such as Ansible, PowerShell, Bash, Python, YAML, XML, and JSON to automate deployment, configuration and instrumentation.
Experience coordinating and managing release cycles for observability platforms.
Knowledge of best practices in release management to ensure smooth and timely deployments.
Experience configuring and leveraging source code management tools and workflows to manage and deploy Monitoring as Code.
Excellent communication skills, both verbal and written.
Ability to collaborate effectively with cross-functional teams and stakeholders.
Strong interpersonal skills, with the ability to engage effectively with both technical teams and business stakeholders.
Commitment to continuous improvement and staying current with industry trends and best practices.
Ability to identify opportunities for process optimization and efficiency gains.
Strong customer service orientation with the ability to manage customer relationships effectively.
Experience in providing excellent customer service and support for observability solutions.
Knowledge of compliance and security standards related to observability platforms.
Ability to implement tools and processes to detect and remediate configuration drift and security risks.
Experience managing operational data and systems access to ensure compliance with internal and external audit and regulatory requirements.
Proficiency maintaining comprehensive documentation of observability platform configurations, processes, and procedures.
Ability to generate and analyze reports on platform performance, incidents, and customer requests.

Benefits

Medical, dental, and vision benefits
401(k) retirement savings plan
Employee Stock Purchase Plan
Fully-paid term life insurance plan
Short-term and long-term disability benefits
Well-being programs
Education assistance
Free development courses
CVS store discount
Discount programs with participating partners
Paid Time Off (PTO) or vacation pay
Paid holidays throughout the calendar year

Requirements Summary

7+ years of experience in IT operations, Site Reliability Engineering (SRE), and observability platform management. Proficiency in Python and other scripting languages

Staff Observability Operations Engineer

Job description

Requirements

Benefits

Requirements Summary

Apply now

CVS Health

More job openings

Client Engagement Manager (CEM) / Sr ServiceNow Architect

Sr. SailPoint (IAM) Engineer

Associate Relations Consultant Sr.