• Full Time (allows remote)
  • Remote (UK)
  • No. of Vacancies: 1
  • Experience: 5 - 7 Years
  • Posted 2 months ago

Role overview

We now have a huge number of disparate data sources across the business and the data currently sits on a variety of platforms. We are looking to build a data lake (AWS) to pool all the data and then provide structured warehouses that feed on the data lake.

We are looking for someone who has ideally done something similar i.e. Worked on a data lake project and built pipelines to fill the lake with raw data. You would be responsible for architecture, design and development. We are looking for solid coding ability (Python and SQL) and vast experience with AWS.

 

A data lake will feed into Delta Lake with Pyspark/Spark being utilised. Databricks will sit on top allowing for structured cloud warehousing.

Databricks cloud services will be used for data ingestion, data transformation and processing in delta lake, and data serving.

All candidates will need experience with either Databricks or significant experience with Spark (Pyspark).

Core skills:

Data Engineering

  • Experience in data transformation solution design and development using batch and streaming data sources
  • Experience in the development of ETL pipelines using Python and SQL
  • Experience in the development of CI/CD pipelines using GitHub Actions and Javascript
  • Experience in embedding data quality and validation into the release and execution of data pipelines
  • Experience in using traditional ETL tools
  • Experience delivering solutions using distributed processing technologies, principally Spark and MapReduce
  • Cloud-native tooling, for AWS, would include experience in Amazon Glue, Lambda, SNS, Kinesis, RDS, Redshift, S3, Athena et al.
  • Apache Hadoop and knowledge of multiple distributions (Cloudera, HortonWorks, HDInsight etc.) associated Apache Big data products (Hive, Impala, Oozie etc.)
  • Data ingestion design includes batch and real-time architectures using tools like Kafka, Storm, Kinesis or equivalents.
  • Data governance and metadata management using tools like Apache Atlas.
  • Data transformation technologies include but are not limited to Spark, Python or Nifi.
  • Data deployment experience on cloud-native and hybrid cloud solutions
  • Microservice / SOA / stateless approaches to data ingestion & consumption
  • Expertise and experience in producing solution and information architectures using a subset and/or all of the technologies above.
  • Information Glossary tooling e.g. IIGC, Informatica Enterprise Data Governance or Colilbra.

Data Management

  • Experience in data modelling and optimisation for row and columnar based environments. Using tools such as Infosphere Data Architect, Erwin etc
  • Data Governance approaches and technologies which cover Business Glossary, Metadata Management and Data Lineage
  • Security governance and access management at infrastructure, server, application levels including role and attribute-based access
  • Consulting with regards to recent data-related regulations including the Data Protection Act (DPA) and Global Data Protection Regulation (GDPR)

General

  • Cross-sector consulting and delivery using the above technologies and capabilities
  • Experience in delivering solutions and capabilities using the above in both an agile and waterfall delivery methodology.
  • To be able to translate requirements/problem statements into a big data and/or analytics solution using the above technologies and capabilities.

Preferred Technical and Professional Expertise

  • Expertise and experience in developing data science solutions using tools such as Python, R, TensorFlow or spaCy
  • Expertise and experience in developing solutions across multiple cloud applications
  • Knowledge of hybrid cloud data development and containerisation techniques including Kubernetes, Docker or Cloud Foundry
  • Evidence in contributing to the community of data engineers, either within or across a single department or organisation.

We understand that you won’t have all these tools & techs. We are flexible!

Why HENI?

  • Competitive salary on offer, dependent on the level of experience and skills
  • Work in a dynamic and fast-paced environment with new challenges
  • A forward-thinking business with a solutions-oriented approach and focus on getting things done
  • A collaborative, agile team who are passionate about their work
  • Get involved in a variety of projects, see how they develop into polished products and services

Please enter your personal information

only doc, pdf or docx file allowed and file size will be less than 2MB