- Company Name
- NTT DATA, Inc.
- Job Title
- Machine Learning Operations Engineer F/H
- Job Description
-
**Job Title**
Machine Learning Operations Engineer
**Role Summary**
Architect, deploy, and maintain end‑to‑end ML and DevOps pipelines that deliver reliable, scalable AI solutions. Translate business requirements into technical designs, build CI/CD workflows, and manage infrastructure throughout development, testing, and production.
**Expectations**
- Deliver high‑quality ML models and associated software within agreed SLAs.
- Lead cross‑functional teams to integrate new pipelines and services.
- Continuously optimize deployment processes and infrastructure for performance and security.
**Key Responsibilities**
- Analyze business/user requirements and produce detailed system design documents.
- Design, implement, and automate CI/CD pipelines for ML code, data, and model artifacts.
- Provision, configure, and maintain cloud and on‑prem environments (AWS, Azure, GCP).
- Deploy and release ML models, ensuring zero‑downtime and compliance with release policies.
- Monitor and troubleshoot production deployments; respond to incidents per SLAs.
- Document development procedures, operational Runbooks, and security controls.
- Conduct proof‑of‑concepts and collaborate with stakeholders to refine technical solutions.
- Review post‑implementation metrics and recommend process improvements.
- Support integration, testing, and quality assurance for all ML releases.
**Required Skills**
- Advanced expertise in DevOps practices and tools (Git, Docker, Kubernetes, Helm).
- Proficiency in scripting (Python, Bash, Ruby) and model management (MLflow, Kubeflow).
- Deep understanding of cloud services, IaC (Terraform, CloudFormation).
- CI/CD pipeline design, automated testing, and release management.
- Strong analytical, problem‑solving, and documentation abilities.
- Excellent communication with technical teams and business stakeholders.
- Project management skills to coordinate multi‑team, multi‑phase deliveries.
**Required Education & Certifications**
- Bachelor’s degree in Computer Science, Information Technology, or related field.
- DevOps certification (e.g., AWS DevOps, Azure DevOps Engineer, Google Cloud Professional DevOps Engineer).
- Agile certification (e.g., Scrum Master, SAFe Practitioner).
- Cloud platform certification (AWS, Azure, or GCP).
- Scripting language certification (Python, Bash, or equivalent).