Skills

Python Java Scala SQL Big Data Data Engineering Apache Kafka Apache Spark PostgreSQL CI/CD DevOps Docker Kubernetes Jenkins Azure Data Factory Problem-solving Analytics and Reporting Machine Learning Programming Databases apache git SQL Server Azure AWS Analytics Spring Snowflake Hadoop Spark Spring Boot CI/CD Pipelines Databricks Kafka Microservices

Job Specifications

Job Title: Java Data Engineer

Location: Phoenix, AZ

Job Overview

We are seeking a Java Data Engineer with strong experience in building scalable data pipelines, integrating real-time streaming systems, and optimizing data workflows for analytics and AI use cases. The ideal candidate will combine Java development expertise with a deep understanding of data engineering, ETL, and big data ecosystems (e.g., Spark, Hadoop, Kafka, Snowflake, or Azure).

Key Responsibilities

Design and develop data ingestion and transformation pipelines using Java, Spark, and SQL.
Work with streaming frameworks such as Apache Kafka or Event Hubs to process real-time data.
Build and maintain ETL/ELT workflows that integrate structured and unstructured data sources.
Collaborate with data architects and analysts to optimize data models for analytics and reporting.
Develop and deploy microservices and APIs for data integration and consumption.
Implement data validation, lineage tracking, and governance standards across pipelines.
Optimize performance and reliability of distributed systems and batch/streaming jobs.
Collaborate with DevOps teams to automate CI/CD pipelines for data workflows.

Required Skills and Experience

5–10 years of experience in software or data engineering.
Strong programming skills in Java 8+ (Spring Boot preferred).
Experience with Apache Spark (Core, SQL, or Structured Streaming).
Solid understanding of SQL and relational databases (PostgreSQL, SQL Server, etc.).
Hands-on experience with Kafka, Hadoop, Azure Data Factory, or AWS Glue.
Familiarity with data lake / lakehouse architectures (Delta Lake, Iceberg, etc.).
Proficient in writing optimized, reusable, and testable code.
Understanding of CI/CD, Git, Jenkins, Docker, and Kubernetes.

Nice to Have

Experience with Python or Scala for data engineering tasks.
Exposure to Snowflake, Databricks, or Azure Synapse.
Knowledge of Machine Learning model pipelines and MLOps workflows.
Strong problem-solving and analytical mindset.