Skills
Python
Java
Scala
SQL
Big Data
Data Engineering
Apache Kafka
Apache Spark
PostgreSQL
CI/CD
DevOps
Docker
Kubernetes
Jenkins
Azure Data Factory
Problem-solving
Analytics and Reporting
Machine Learning
Programming
Databases
apache
git
SQL Server
Azure
AWS
Analytics
Spring
Snowflake
Hadoop
Spark
Spring Boot
CI/CD Pipelines
Databricks
Kafka
Microservices
Job Specifications
Job Title: Java Data Engineer
Location: Phoenix, AZ
Job Overview
We are seeking a Java Data Engineer with strong experience in building scalable data pipelines, integrating real-time streaming systems, and optimizing data workflows for analytics and AI use cases. The ideal candidate will combine Java development expertise with a deep understanding of data engineering, ETL, and big data ecosystems (e.g., Spark, Hadoop, Kafka, Snowflake, or Azure).
Key Responsibilities
Design and develop data ingestion and transformation pipelines using Java, Spark, and SQL.
Work with streaming frameworks such as Apache Kafka or Event Hubs to process real-time data.
Build and maintain ETL/ELT workflows that integrate structured and unstructured data sources.
Collaborate with data architects and analysts to optimize data models for analytics and reporting.
Develop and deploy microservices and APIs for data integration and consumption.
Implement data validation, lineage tracking, and governance standards across pipelines.
Optimize performance and reliability of distributed systems and batch/streaming jobs.
Collaborate with DevOps teams to automate CI/CD pipelines for data workflows.
Required Skills and Experience
5–10 years of experience in software or data engineering.
Strong programming skills in Java 8+ (Spring Boot preferred).
Experience with Apache Spark (Core, SQL, or Structured Streaming).
Solid understanding of SQL and relational databases (PostgreSQL, SQL Server, etc.).
Hands-on experience with Kafka, Hadoop, Azure Data Factory, or AWS Glue.
Familiarity with data lake / lakehouse architectures (Delta Lake, Iceberg, etc.).
Proficient in writing optimized, reusable, and testable code.
Understanding of CI/CD, Git, Jenkins, Docker, and Kubernetes.
Nice to Have
Experience with Python or Scala for data engineering tasks.
Exposure to Snowflake, Databricks, or Azure Synapse.
Knowledge of Machine Learning model pipelines and MLOps workflows.
Strong problem-solving and analytical mindset.