- Company Name
- Optum
- Job Title
- Senior Site Reliability Engineer
- Job Description
-
**Job Title**
Senior Site Reliability Engineer
**Role Summary**
Lead and manage site reliability engineering initiatives while driving reliability, resiliency, and observability improvements across cloud‑based services. Coordinate technical projects, lead agile ceremonies, and report progress to executive stakeholders, ensuring operational excellence and continuous improvement.
**Expectations**
- Deliver reliable, high‑availability services through proactive monitoring, incident response, and problem management.
- Manage multiple concurrent technical projects, driving them from planning through execution, including sprint planning, retrospectives, and status reporting.
- Communicate technical findings and project updates clearly to technical teams and non‑technical executives.
- Champion automation, security, and observability best practices to reduce toil and enhance system performance.
**Key Responsibilities**
1. **Project Management & Coordination**
- Own end‑to‑end delivery of SRE projects, maintaining schedules, deliverables, and risk mitigation.
- Facilitate agile ceremonies (sprint planning, stand‑ups, retrospectives) and ensure alignment across cross‑functional teams.
2. **Communication & Reporting**
- Prepare executive‑level presentations and status reports.
- Translate technical concepts for non‑technical audiences.
3. **SRE & Reliability**
- Design, implement, and iterate reliability solutions (SLIs/SLOs, chaos engineering, capacity planning).
- Lead incident response and post‑mortem analysis; drive continuous improvement.
- Optimize monitoring, alerting, and observability across services.
4. **Observability & Tooling**
- Integrate and adopt platforms such as Splunk, Dynatrace, Grafana, and other instrumentation tools.
- Recommend and implement monitoring best practices.
5. **On‑Call & Rotation**
- Participate in scheduled on‑call rotations, ensuring 24/7 availability and rapid incident resolution.
6. **Process & Security Improvement**
- Drive process enhancements for CI/CD, IaC, and security vulnerability remediation.
**Required Skills**
- Proven project management in agile or hybrid environments (3+ years).
- Strong written and verbal communication; experience presenting to executives.
- Technical background in SRE, cloud operations, or related engineering role (5+ years).
- Deep understanding of cloud platforms: AWS, Azure, or GCP.
- Familiarity with infrastructure‑as‑code (Terraform, CloudFormation, ARM).
- Experience with observability tools: Splunk, Dynatrace, Grafana, or equivalents.
- Knowledge of security vulnerability management and remediation.
- Strong organizational and multi‑tasking abilities.
**Required Education & Certifications**
- High school diploma, GED, or higher.
- 5+ years of technology experience.
**Preferred Certifications**
- AWS Certified SysOps Administrator, or equivalent AWS credential.
- Google Cloud Professional Cloud Architect or AWS Solutions Architect.
- Splunk Certified Architect or similar.
- Other relevant cloud or observability certifications.