- Company Name
- Envision Technology Solutions
- Job Title
- Sr AI Platform Engineer- AI Platform Engineer (Guardrails, Observability & Evaluation Infrastructure)
- Job Description
-
**Job Title:**
Sr. AI Platform Engineer – Guardrails, Observability & Evaluation Infrastructure
**Role Summary:**
Design, implement, and maintain enterprise‑scale AI platform services that enforce data guardrails, ensure model safety, provide observability, and support evaluation of Generative AI applications. Drive platform consistency, SDK adoption, and cross‑team enablement for AI teams.
**Expectations:**
Deliver robust, reusable platform components that enable safe, compliant, and observable GenAI systems across multiple business units.
**Key Responsibilities:**
- Design & implement data guardrail frameworks: preprocessing, redaction, PII/PHI filtering, DLP integration, and prompt defenses.
- Build “Model Armor” for safe inputs/outputs: validation, prompt‑injection mitigation, harmful content detection, fact‑checking, and policy enforcement.
- Integrate safety tooling (policy engines, classifiers, DLP APIs, safety models).
- Develop observability pipelines (Arize AI, LangSmith, or equivalent): tracing LLM calls, token usage & cost tracking, latency, prompt/model versioning.
- Define LLM‑specific logging schemas and build monitoring dashboards: performance, cost, anomalies, safety events.
- Implement alerting, SLOs/SLIs, and telemetry for inference systems.
- Architect evaluation harnesses for GenAI: RAG evaluation, summarization/QA, human‑in‑loop review, CI/CD integration.
- Build reusable libraries, APIs, and services: prompt/versioning, embedding pipelines, retrieval adapters, data loaders, tool schemas.
- Provide documentation, onboarding, examples, and developer tooling.
- Conduct training and propagate best practices across engineering, product, and data science teams.
**Required Skills:**
- 5–10+ years software/ML infrastructure engineering.
- Strong Python (FastAPI, async, typing, Pydantic, testing).
- Experience with model safety/guardrails: prompt injection, PII redaction, toxicity filters, policy enforcement.
- Hands‑on with LLM observability platforms (Arize AI, LangSmith).
- Proficiency in building evaluation frameworks (RAGAS, G‑Eval, custom rubrics).
- Familiarity with vector databases (Pinecone, Weaviate, Milvus) and retrieval pipelines.
- Knowledge of LLM architecture, tokenization, embeddings, context limits, and RAG patterns.
- Cloud experience (preferably GCP), Kubernetes/GKE, containers, CI/CD.
- Understanding of security, governance, DLP, data privacy, RBAC, and enterprise compliance.
- Excellent documentation, communication, and influence skills across stakeholder groups.
**Required Education & Certifications:**
- Bachelor’s degree or higher in Computer Science, Software Engineering, Data Science, or related field.
- Relevant certifications (e.g., Google Professional Cloud Architect, Certified Kubernetes Administrator) are a plus but not mandatory.