- Company Name
- Matlen Silver
- Job Title
- Site Reliability Engineer
- Job Description
-
Job Title: Site Reliability Engineer
Role Summary: Senior SRE responsible for designing, deploying, and operating highly secure, multi‑cloud infrastructure (Azure, AWS, GCP) with a strong focus on IaC, automation, and compliance. Leads large‑scale containerized environments (EKS, AKS, OpenShift) and implements robust security controls across IAM, encryption, patching, and vulnerability remediation.
Expectations:
- Lead end‑to‑end infrastructure architecture, security design, and deployment for complex systems.
- Drive automation through Terraform, Ansible/Chef, and integrate CI/CD pipelines.
- Mentor and collaborate with platform, security, and development teams.
- Operate incident response, root cause analysis, and performance tuning for production services.
- Maintain and evolve SLOs/SLAs, scalability, resilience, and cost optimization.
Key Responsibilities:
- Architect and manage Azure VNet, subnet, NSG, VPN, CDN, Traffic Manager, DNS, DHCP, and virtual appliances.
- Create, maintain, and refactor Terraform modules for EKS clusters, Azure VMs, and multi‑cloud resources.
- Implement IAM policies, Azure AD, AWS IAM, and secure application of encryption and Patching.
- Deploy and scale container orchestration platforms (EKS, AKS, OpenShift) with Docker and Kubernetes best practices.
- Configure monitoring and observability (Prometheus, Grafana, Datadog, New Relic).
- Build CI/CD pipelines (Jenkins, GitLab CI, GitHub Actions, CircleCI) for automated testing, build, and deployment.
- Conduct threat analysis, vulnerability scanning, and enforce remediation plans.
- Manage incident response, automated alerting, and root cause analysis to improve reliability metrics.
- Design and implement secure coding practices and compliance frameworks (HIPAA, PCI, GDPR).
- Collaborate with network teams on DNS, load balancers, firewalls, and VPNs.
Required Skills:
- 13+ years professional experience in Cloud Infrastructure, DevOps, or SRE.
- Advanced IaC with Terraform, including module design and versioning.
- Proficient in Azure and/or AWS; multi‑cloud experience strongly preferred.
- Strong security expertise: IAM, encryption, vulnerability remediation, patching.
- Containerization: Docker, Kubernetes (EKS, AKS, OpenShift).
- Scripting/Programming: Python, Go, Bash, or Ruby.
- Automation & Configuration Management: Terraform, Ansible, Chef.
- Cloud services: EC2, S3, Azure VMs, Kubernetes clusters, serverless environments.
- Networking fundamentals: DNS, VPN, load balancing, firewalls.
- Observability tools: Prometheus, Grafana, Datadog, New Relic.
- CI/CD tools: Jenkins, GitLab CI, GitHub Actions, CircleCI.
- Experience with HashiCorp Vault and advanced Terraform module architecture.
- Incident management, root cause analysis, performance tuning, SLO/SLA definition.
Required Education & Certifications:
- Bachelor’s degree in Computer Science, Engineering, or related field (equivalent experience acceptable).
- Relevant certifications: Microsoft Certified: Azure Solutions Architect Expert, AWS Certified Solutions Architect – Professional, or Terraform Associate.
- Security certifications such as CISSP, CISM, or equivalent highly valued.
Alpharetta, United states
On site
Senior
08-12-2025