- Company Name
- TPA technologies
- Job Title
- Data Science Engineer
- Job Description
-
**Job Title:** Data Science Engineer
**Role Summary:**
Design, build, and optimize end‑to‑end data pipelines and machine learning workflows on AWS and Snowflake. Deliver scalable data lake architectures, deploy models with SageMaker, and collaborate with data scientists and business stakeholders to extract actionable insights.
**Expectations:**
- 12+ month contract, hybrid schedule (4 days/week).
- Must attend final onsite interview.
**Key Responsibilities:**
1. Develop and maintain ETL pipelines using AWS Glue for ingestion, transformation, and integration.
2. Preprocess data and engineer features with Python and Apache Spark.
3. Design and implement Snowflake Data Lake and warehouse architectures with scalable S3 storage.
4. Train, evaluate, deploy, and monitor ML models using SageMaker.
5. Conduct exploratory data analysis and build visualizations to communicate findings.
6. Work cross‑functionally with data scientists, analysts, and stakeholders to define requirements and deliver predictive models.
7. Implement robust data governance, quality assurance, and security measures across the data lifecycle.
8. Stay current with ML algorithms, data engineering best practices, and emerging AWS services.
**Required Skills:**
- Programming: Python, Spark (PySpark).
- Cloud & Data Platforms: AWS (Glue, S3, SageMaker), Snowflake (Data Lake, Warehouse).
- Data Engineering: ETL, data lake architecture, feature engineering, scalability.
- Machine Learning: model training, evaluation, deployment, monitoring.
- Data Quality & Governance: metadata management, compliance, data security.
- Analytical Tools: SQL, EDA, visualization (Tableau, Power BI, or similar).
- Soft Skills: problem solving, communication, collaboration across teams.
**Required Education & Certifications:**
- Bachelor’s degree in Computer Science, Data Science, Statistics, or related field (or equivalent experience).
- Proven experience in AWS data services and Snowflake.
- Practical expertise in Spark and Python for large‑scale data processing.
- ML model development and deployment experience with SageMaker or equivalent platform.