- Company Name
- Swoon
- Job Title
- System Engineer
- Job Description
-
**Job Title**
Systems Reliability Engineer
**Role Summary**
Responsible for driving reliability, automation, and operational excellence across critical engineering platforms. Leads SRE strategy, implements observability, automates with Ansible, and mentors teams to adopt SRE principles, ensuring scalable, fault‑tolerant, and compliant multi‑cloud systems.
**Expectations**
- 5+ years in SRE, DevOps, or IT operations.
- Strong focus on reliability, performance, and scalable architecture.
- Expertise in observability tools (Grafana, AppDynamics, Sumo Logic).
- Proficient in Ansible for IaC and automation.
- Experience with AWS, Azure, or GCP multi‑cloud environments.
- Ability to mentor and influence cross‑functional teams.
- Excellent problem‑solving, communication, and collaboration skills.
**Key Responsibilities**
- Develop and refine SRE strategy, release management, and best‑practice frameworks.
- Mentor engineering and product teams on SRE principles: service ownership, SLIs/SLOs, continuous improvement.
- Design and implement observability standards (logging, monitoring, dashboards, alerting).
- Enhance monitoring maturity, proactively detect and resolve issues using Grafana, AppDynamics, Sumo Logic.
- Build reliable, scalable, fault‑tolerant systems across multi‑cloud environments.
- Automate operations and reduce toil with Ansible Automation Platform; create event‑driven workflows and IaC.
- Participate in incident response, post‑mortems, and action‑item tracking.
- Ensure security, audit, and compliance adherence (privacy, disaster recovery).
- Produce and maintain system documentation, runbooks, and knowledge assets.
- Collaborate with DevOps, product, engineering, and vendor partners to improve system stability and performance.
**Required Skills**
- Site Reliability Engineering / DevOps / IT operations experience.
- Strong reliability engineering and scalable architecture knowledge.
- Proficiency in Grafana, AppDynamics, Sumo Logic.
- Ansible expertise for provisioning, configuration, and automation.
- Multi‑cloud environment experience (AWS, Azure, GCP).
- Leadership and mentorship capabilities.
- Troubleshooting, root‑cause analysis, and problem‑solving.
- Excellent verbal and written communication.
- Customer‑service oriented collaboration.
**Required Education & Certifications**
- Bachelor’s degree in Computer Science, Engineering, or equivalent experience.
- Legal authorization to work in the United States.
---