Skills

Python Java Scala SQL NoSQL Big Data Data Engineering Apache Spark MySQL MongoDB Cassandra PostgreSQL Encryption GitHub GitLab CI/CD DevOps Docker Kubernetes Jenkins GCP BigQuery Decision-making Linux Programming Databases apache git Azure AWS Analytics GCP Snowflake Spark Databricks OpenShift Kafka Terraform Linux Administration Flink

Job Specifications

Data Engineer - Glasgow Hybrid - 6 month contract - Inside IR35

About Bigspark

We are creating a world of opportunity for businesses by responsibly harnessing data and AI to enable positive change. We adapt to our clients needs and then bring our engineering, development and consultancy expertise. Our people and our solutions ensure they head into the future equipped to succeed.

Our clients include Tier 1 Banking and Insurance clients, we have also been listed in the Sunday Times Top 100 Fastest Growing Private Companies.

The Role

Were looking for a Data Engineer to developer enterprise-scale data platforms and pipelines that power analytics, AI, and business decision-making. You'll work in a hybrid capacity which will require 2 days per month onsite in Glasgow offices.

What You'll Do

Develop highly available, scalable batch and streaming pipelines (ETL/ELT) using modern orchestration frameworks.
Integrate and process large, diverse datasets across hybrid and multi-cloud environments.

What You'll Bring

3+ years commercial data engineering experience
Strong programming skills in Python, Scala, or Java, with clean coding and testing practices.
Big Data & Analytics Platforms: Hands-on experience with Apache Spark (core, SQL, streaming), Databricks, Snowflake, Flink, Beam.
Data Lakehouse & Storage Formats: Expert knowledge of Delta Lake, Apache Iceberg, Hudi, and file formats like Parquet, ORC, Avro.
Streaming & Messaging: Experience with Kafka (including Schema Registry & Kafka Streams), Pulsar, AWS Kinesis, or Azure Event Hubs.
Data Modelling & Virtualisation: Knowledge of dimensional, Data Vault, and semantic modelling; tools like Denodo or Starburst/Trino.Cloud Platforms: Strong AWS experience (Glue, EMR, Athena, S3, Lambda, Step Functions), plus awareness of Azure Synapse, GCP BigQuery.
Databases: Proficient with SQL and NoSQL stores (PostgreSQL, MySQL, DynamoDB, MongoDB, Cassandra).
Orchestration & Workflow: Experience with Autosys/CA7/Control-M, Airflow, Dagster, Prefect, or managed equivalents.
Observability & Lineage: Familiarity with OpenLineage, Marquez, Great Expectations, Monte Carlo, or Soda for data quality.
DevOps & CI/CD: Proficient in Git (GitHub/GitLab), Jenkins, Terraform, Docker, Kubernetes (EKS/AKS/GKE, OpenShift).
Security & Governance: Experience with encryption, tokenisation (e.g., Protegrity), IAM policies, and GDPR compliance.
Linux administration skills and strong infrastructure-as-code experience.