Job Specifications
***4x a week in office, downtown Toronto
This role will support a set of cloud- and ML-heavy initiatives, with a strong focus on production ML platforms, cloud networking, and API-driven architectures. We are looking for someone who is deeply hands-on, comfortable operating at the infrastructure and platform layer, and experienced in supporting ML teams running models in production.
You’ll work closely with software engineers, ML engineers, and data teams to ensure that cloud infrastructure, deployment pipelines, and ML services are secure, observable, and scalable.
Key Responsibilities:
Cloud Infrastructure & Networking
Design, build, and operate cloud infrastructure across GCP (preferred), AWS, or Azure
Own cloud networking architecture including VPCs, load balancers, firewall, security policies, and IAM strategies
Ensure reliability, performance, and cost efficiency of cloud environments
DevOps & Platform Engineering
Build and operate microservices deployed into serverless environments such as Cloud Run or equivalent platforms
Implement and maintain CI/CD pipelines and automation using Terraform, GitHub Actions, and related tooling
Partner closely with application teams to enable safe, fast, and repeatable deployments
MLOps & ML Platform Support
Support ML and AI services in production, including deploying, operating, and monitoring models and pipelines
Hands-on experience with Google Vertex AI and ML platform operations
Deploy and operate MCP servers and other AI/ML-driven services in live environments
Work closely with ML teams to productionize models and ensure operational excellence
API Management & Observability
Design and manage API gateways and API management platforms (Apigee or native cloud API gateways) at scale
Implement strong observability practices including logging, monitoring, alerting, and notification systems
Troubleshoot performance, reliability, and data flow issues across distributed systems
Qualifications & Experience
Required
5+ years of professional experience in DevOps, Cloud Engineering, Platform Engineering, or MLOps roles
Strong hands-on experience with cloud networking (VPCs, load balancers, firewall rules, IAM)
Advanced API management experience using Apigee or native cloud API gateways
Direct MLOps experience, including deploying and operating ML/AI services in production
Deep, recent experience with GCP, particularly Vertex AI
Hands-on experience with Docker, Kubernetes, and serverless platforms (e.g., Cloud Run)
Strong scripting and automation skills using Python and Bash (SQL familiarity is a plus)
Proven experience with Infrastructure as Code (Terraform) and CI/CD automation
Strong understanding of cloud observability and operational best practices
About the Company
In today's dynamic and fiercely competitive business environment, and with a war on talent, identifying and finding the right talent is crucial for an organization's success and growth. It's essential to ensure the right fit from the start. That's where Found People Inc steps in. Found People Inc is Canada's premier boutique contingency recruitment firm. Their expertise is in recruiting top talent and building high-performance teams for startups, SaaS, and Fortune 500 companies focusing on mid to senior-level roles in techno...
Know more