The Lead Data Engineer will be responsible for overseeing junior data engineering activities and aiding in building the organization’s data collection systems and processing pipelines. Oversee infrastructure, tools and frameworks used to support the delivery of end-to-end solutions to business problems through high performing data infrastructure. Responsible for expanding and optimising the organization’s data and data pipeline architecture, whilst optimising data flow and collection to ultimately support data initiatives.
Overseeing activities of the junior data engineering teams, ensuring proper execution of their duties and alignment with the Organization’s vision and objectives.
Provide oversights and expertise to the Data Team that is responsible for the design, deployment, and maintenance of the business’s data platforms.
Draw performance reports and strategic proposals form his gathered knowledge and analyses results for senior leadership.
Contributing to the continual improvement of the business’s data platforms through thorough observations and well-researched knowledge.
Keep track of industry best practices and trends and through acquired knowledge, takes advantage of process and system improvement opportunities.
Acts as a subject matter expert from a data perspective and provides input into all decisions relating to data engineering and the use thereof. Provide guidance in terms of setting governance standards.
Liaise with and collaborate with data analysts, data warehousing engineers, and data scientists in finding and applying best practices within the Data and Analytics department.
Define business’s data requirements, which will ensure that the collected data is of a high quality and optimal for use across the department and the business at large.
Manage the analysis of complex data elements and systems, data flow, dependencies, and relationships in order to contribute to conceptual physical and logical data models.
Oversee, design, and develop algorithms for real-time data processing within the business and to create the frameworks that enable quick and efficient data acquisition. Deploy sophisticated analytics programs, machine learning and statistical methods.
Perform thorough testing and validation in order to Ensure proper data governance and quality across EDO and the business as a whole.
Oversee the process for enabling and running data migrations across different databases and different servers and defines and implements data stores based on system requirements and consumer requirements.
Developing ETL processes that convert data into formats through a team of data analysts and dashboard charts. Oversee large-scale data Hadoop platforms and to support the fast-growing data within the business
Monitor the existing metrics, analyse data, and lead partnership with other Data and Analytics teams in an effort to identify and implement system and process improvements.
Utilise data to discover tasks that can be automated and identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery, re-designing infrastructure for greater scalability, etc.
Build analytics tools that utilise the data pipeline to provide actionable insights into customer acquisition, operational efficiency and other key business performance metrics.
Create data tools for analytics and data scientist team members that assist them in building and optimising the Organization into an innovative industry leader.
Oversee the assembly of large, complex data sets that meet functional / non-functional business requirements and align data architecture with business requirements.
Owns and extends the business’s data pipeline through the collection, storage, processing, and transformation of large data-sets.
Oversee the process for creating and maintaining optimal data pipeline architecture and creating databases optimized for performance, implementing schema changes, and maintaining data architecture standards across the required organization databases.
Strong analytic skills related to working with unstructured datasets.
Build processes supporting data transformation, data structures, metadata, dependency and workload management.
A successful history of manipulating, processing and extracting value from large disconnected datasets.
Working knowledge of message queuing, stream processing, and highly scalable ‘big data’ data stores.
We are looking for a candidate with 8 – 10 years of experience. They should have experience with the following software/ tools:
Big data tools: Hadoop, Spark, Kafka, etc.
Relational SQL and NoSQL databases, including Postgres and Cassandra.
Data pipeline and workflow management tools: Azkaban, Luigi, Airflow, etc.
AWS cloud services: EC2, EMR, RDS, Redshift.
Stream-processing systems: Storm, Spark-Streaming, etc.
Object-oriented/object function scripting languages: Python, Java, C++, Scala, etc.