- Company Name
- Therapixel
- Job Title
- Site Reliability Engineer, France
- Job Description
-
**Job Title**
Site Reliability Engineer
**Role Summary**
Ensure reliability, scalability, and performance of production systems for AI‑driven medical imaging services. Automate deployment, monitoring, and recovery processes; collaborate with development, production, and AI implementation teams; and support cloud‑based AI workloads in a health‑IT environment.
**Expectations**
- Maintain high availability and security of cloud infrastructure.
- Proactively monitor and resolve incidents, meeting SLA/SLO targets.
- Drive automation and observability improvements.
- Communicate effectively with product, engineering, and customer‑success stakeholders.
- Contribute to capacity planning and continuous improvement initiatives.
**Key Responsibilities**
- Implement and enhance monitoring, alerting, and observability (Prometheus, Grafana, ELK, etc.).
- Automate deployment, scaling, and recovery using Terraform, Ansible, FluxCD, ArgoCD, etc.
- Participate in incident management: detection, troubleshooting, resolution, and post‑mortems.
- Track SLA/SLO/SLI metrics and support capacity planning.
- Design, build, and maintain highly available, secure infrastructure on public cloud platforms.
- Manage infrastructure security: secrets handling, hardening, and compliance.
- Collaborate with product and engineering to improve application reliability and performance.
**Required Skills**
- Linux systems administration in production environments.
- Strong Docker and Kubernetes expertise.
- Infrastructure‑as‑Code proficiency (Terraform, Helm, FluxCD).
- CI/CD pipeline experience (GitLab CI, Jenkins, GitHub Actions).
- Scripting/programming ability (Bash, Python, or Go).
- Public cloud experience (GCP, AWS, or Azure).
- Observability stack knowledge (Prometheus, Grafana, Datadog, ELK).
- English fluency (written & spoken); French a plus.
- Self‑starter with strong organization, planning, and interpersonal skills.
**Required Education & Certifications**
- Bachelor’s degree in Computer Science, Biomedical Engineering, or related field.
- 2+ years of relevant experience in health‑IT or cloud provider environments, including operating AI workloads in the cloud. (Specific certifications not mandatory).