IT - Senior Observability Engineer
Software Guidance & Assistance, Inc., (SGA), is searching for an IT - Senior Observability Engineer for a contract assignment with one of our premier Insurance clients in Holmdel, NJ or Bethlehem, PA . Hybrid schedule 2 days in Holmdel (Wed & Thurs required) or 3 days in Bethlehem
We are seeking a dedicated and detail-oriented Senior Observability Engineer to join our Enterprise Observability Engineering team. The ideal candidate will bring deep expertise in AppDynamics, Splunk, OpenTelemetry, and AWS-native services, along with strong DevOps experience.
This role is responsible for the administration, configuration, implementation, and ongoing optimization of observability platforms that enable end-to-end visibility across applications, infrastructure, and cloud-native workloads. You will play a critical role in ensuring platform reliability, performance, and actionable insights to support engineering and business teams.
Responsibilities :
- Observability Platform Administration & Implementation
- Administer, configure, and support AppDynamics, Splunk, and OpenTelemetry (OTel) platforms to meet enterprise monitoring and observability needs.
- Design and implement observability solutions aligned to MELT (Metrics, Events, Logs, Traces) best practices.
- Perform regular upgrades, patching, and security hardening of observability platforms.
- Monitoring, Reliability & Maintenance
- Continuously monitor the health, availability, and performance of observability platforms.
- Ensure data integrity, retention, and availability across metrics, logs, and traces.
- Proactively identify and remediate platform performance, scalability, and reliability issues.
- Cloud & Full-Stack Observability
- - Implement and support observability for AWS services, including:
- EKS, ECS, Lambda Functions
- SNS/SQS, S3, CloudWatch
- Deliver full-stack observability, including:
- Kubernetes cluster and workload metrics
- Service discovery, events, and application performance data
- Leverage OpenTelemetry for instrumentation, context propagation, collectors, and sampling strategies.
- Dashboard, Alerting & Reporting
- Create and maintain dashboards, reports, and alerts in AppDynamics and Splunk.
- Collaborate with application, platform, and DevOps teams to define meaningful monitoring and alerting standards.
- Reduce noise through alert tuning and promote actionable signal over raw data.
- DevOps & Automation
- Integrate observability into CI/CD pipelines using GitHub, Jenkins, ArgoCD, and automation frameworks.
- Develop scripts and automation using Python, JavaScript, or Bash to streamline onboarding, configuration, and maintenance activities.
- User Support, Enablement & Training
- Provide tier-2/3 support for observability-related issues.
- Assist internal teams with troubleshooting, root cause analysis, and performance investigations.
- Develop and deliver training materials and knowledge sessions to improve tool adoption and effective usage.
- Documentation & Best Practices
- Platform configurations
- Onboarding procedures
- Operational runbooks and standards
- Define and enforce observability best practices across the organization.
- Incident Response & Collaboration
- Partner with IT, SRE, and DevOps teams to ensure comprehensive monitoring coverage.
- Participate in incident response efforts, leveraging observability data to accelerate detection, diagnosis, and resolution.
Required Skills:
- 5-7+ years of experience in Observability, Monitoring, SRE, or Platform Engineering roles.
- Proven hands-on experience implementing, managing, and maintaining AppDynamics, Splunk, and OpenTelemetry in enterprise environments.
- Technical Skills:
- AppDynamics (APM, dashboards, alerts)
- Splunk (configuration, administration, data onboarding)
- OpenTelemetry (instrumentation, collectors, sampling)
- EKS, ECS, Lambda
- SNS/SQS, S3, CloudWatch
- GitHub, Jenkins, ArgoCD
- CI/CD pipelines and GitOps practices
- Strong expertise in Metrics, Events, Logs, and Traces (MELT)
-Full-stack and cloud-native observability
Automation & Scripting:- Python, JavaScript, Bash
- Strong understanding of IT infrastructure, applications, and networking
- Excellent problem-solving and analytical skills
- Strong communication and collaboration abilities
- Ability to work independently as well as in a team-oriented environment
- Detail-oriented with a strong focus on operational excellence
- Bachelor's degree in Computer Science, Information Technology, or a related field
When we say let's work better together, we mean it. You'll join a diverse team built on these core values: customer service, employee development, and quality and integrity in everything we do. Be yourself, love what you do and find your passion at work.
Please find us at .
SGA is an Equal Opportunity Employer and does not discriminate on the basis of Race, Color, Sex, Sexual Orientation, Gender Identity, Religion, National Origin, Disability, Veteran Status, Age, Marital Status, Pregnancy, Genetic Information, or Other Legally Protected Status.We are committed to providing access, equal opportunity, and reasonable accommodation for individuals with disabilities in employment, and our services, programs, and activities. Please visit our company to request an accommodation or assistance regarding our policy.
#LI-SL1