- Company Name
- Roku
- Job Title
- Senior Software Engineer, DevOps - Data Platform
- Job Description
-
**Job Title**
Senior Software Engineer, DevOps – Data Platform
**Role Summary**
DevOps Engineer responsible for automating, scaling, and maintaining a large Big‑Data and analytics platform on cloud infrastructure. Leads CI/CD pipeline development, infrastructure provisioning, monitoring, and disaster recovery for a data lake hosting >70 PB of data. Drives reliability, performance, and security across distributed systems.
**Expectations**
- Build and evolve automated infrastructure and deployment workflows.
- Ensure high availability, fault tolerance, and rapid recovery for production clusters.
- Provide expert guidance on cloud best practices and cost‑effective scaling.
- Collaborate with distributed engineering teams to influence architecture and product roadmap.
**Key Responsibilities**
- Automate cloud infrastructure provisioning (Terraform, Kubernetes, Docker).
- Design and implement CI/CD pipelines for data platform components (Kafka, Spark, Presto, Flink, etc.).
- Set up and maintain monitoring, alerting, and incident response (Grafana, PagerDuty, log aggregation).
- Script and automate rapid response to infrastructure issues; perform low–level debugging and performance tuning.
- Conduct architectural reviews, advise on scaling, resource utilization, and reliability.
- Engage in on‑call rotations and participate in system engineering around edge cases and disaster recovery.
- Maintain and update tooling (Chef, Puppet, Ansible) and ensure security compliance.
**Required Skills**
- 8+ years of DevOps or Site Reliability Engineering experience.
- Proficiency with GCP (preferred); adequate experience on AWS, Azure, or other public clouds.
- Strong hands‑on experience with at least three of: Hadoop, Kafka, Spark, Airflow, Presto, Druid, Opensearch, HAProxy, Hive.
- Kubernetes and Docker expertise.
- Infrastructure as Code with Terraform.
- Linux/Unix system administration and shell scripting; Python scripting proficiency.
- Monitoring and alerting tools (Grafana, PagerDuty) and participation in incident rotations.
- Configuration management (Chef, Puppet, Ansible).
- Networking, network security, and data security fundamentals.
- AI literacy or experience with Gen AI technologies.
**Required Education & Certifications**
- Bachelor’s degree in Computer Science, Engineering, or equivalent professional experience.
- Industry certifications (e.g., GCP Professional Data Engineer, AWS Certified DevOps Engineer, Kubernetes Administrator) are advantageous but not mandatory.