Lead Site Reliability Engineer

apartmentJudge Group, Inc. placePhiladelphia calendar_month5/18/26

Location: Philadelphia, PA
Salary: $150,000.00 USD Annually - $180,000.00 USD Annually
Description:

We are seeking a Lead Site Reliability Engineer (SRE) who combines deep technical expertise with strong leadership and client-facing capabilities. This is a high-impact role responsible for ensuring the reliability, scalability, and performance of our cloud infrastructure and kiosk platform.

You will lead a team of engineers while remaining hands-on, owning uptime, SLAs, and incident management, and driving long-term improvements in system resilience and operational maturity. This role also requires working closely with Fortune 500 clients, translating complex technical concepts into clear, business-friendly insights.

What Makes This Role Unique

This is a rare opportunity for a hybrid leader who can:

Operate as a hands-on SRE expert
Lead and mentor a team of engineers
Act as a client-facing technical advisor
Drive both real-time operations and long-term reliability strategy

Key Responsibilities:

Reliability & Operations

Own platform uptime, SLAs, and overall system reliability
Lead incident response, root cause analysis, and postmortems
Develop and maintain disaster recovery and business continuity plans

Infrastructure & Automation

Design, build, and optimize cloud infrastructure and Kubernetes environments
Automate deployments and operational tasks using CI/CD and Infrastructure-as-Code (Terraform preferred)
Improve system scalability, performance, and resilience

Observability & Monitoring

Implement and enhance monitoring, alerting, and observability tools (e.g., Prometheus, Grafana, New Relic)
Establish operational standards, runbooks, and best practices

Leadership & Collaboration

Lead, mentor, and develop a team of ~6 engineers
Partner with platform engineering, QA, and development teams to ensure operational readiness
Serve as a technical point of contact for clients, clearly communicating system health, risks, and solutions

Required Qualifications:

8+ years of experience in SRE, DevOps, or Platform Engineering
2+ years in a lead or managerial role
Strong expertise in:
Cloud infrastructure (AWS, Azure, or Google Cloud Platform)
Kubernetes and containerized environments
CI/CD pipelines and release engineering
Infrastructure-as-Code (Terraform preferred)
Proficiency in scripting/automation (Python, Bash, or Go)
Deep understanding of observability, monitoring, and logging systems
Experience with GitOps workflows (e.g., ArgoCD)
Proven experience managing production systems with strict uptime requirements

Preferred Experience :

Client-facing experience in enterprise or SaaS environments (required)
Experience communicating with non-technical stakeholders and Fortune 500 clients
Background in high-availability systems and large-scale distributed environments

What We're Looking For :

A hands-on technical leader who can balance execution and strategy
Strong communicator with executive presence
Someone who thrives in high-ownership, fast-paced environments
A mentor who can elevate team performance and operational excellence

By providing your phone number, you consent to: (1) receive automated text messages and calls from the Judge Group, Inc. and its affiliates (collectively "Judge") to such phone number regarding job opportunities, your job application, and for other related purposes.

Message & data rates apply and message frequency may vary. Consistent with Judge's Privacy Policy, information obtained from your consent will not be shared with third parties for marketing/promotional purposes. Reply STOP to opt out of receiving telephone calls and text messages from Judge and HELP for help.

Contact:

This job and many more are available through The Judge Group. Please apply with us today!

local_fire_departmentUrgent

Staff Site Reliability Engineer

apartmentTrans Union LLCplacePhiladelphia

TransUnion's Job Applicant Privacy Notice Personal Information We Collect Your Privacy Choices Team Overview At TransUnion, this role will report to a DevOps Director. The Site Reliability Engineering team drives reliability strategy, elevates...

thumb_up_altRecommended

Reliability Engineer with Security Clearance

apartmentLockheed MartinplaceMoorestown (NJ), 19 mi from Philadelphia

products. We provide the resources, inspiration and focus and if you have the passion and courage to dream big, we want to build a better tomorrow with you. We are looking for a Reliability & Maintainability Engineer to join our team. In this role, you...

business_centerHigh salary

Site Reliability Engineering Manager

apartmentJobotplaceBrooklawn, 8 mi from Philadelphia

is investing heavily in modern cloud platforms, automation, and engineering practices to improve the reliability and scalability of its digital systems. Why join us? Lead reliability initiatives for systems supporting large-scale logistics and warehouse...

Best jobs you don't want to miss:

Systems Engineer Jobs in Philadelphia

Mechanical Engineer Jobs in Philadelphia

Engineering Jobs in Philadelphia

Research Engineer Jobs in Philadelphia 4 Urgent

Building Engineer Jobs in Philadelphia 7 Urgent