CS408
Fundamentals of Data Engineering

Faculty
Nikolay Markov
Data Platform Lead at Altenar.
Course length
Duration
Total hours
Credits
Language
Course type
Fee for single course
Fee for degree students
Skills you’ll learn
Overview
The purpose of this course is to introduce the engineering perspective on data analytics in modern companies. It presents the essential knowledge, tools, and concepts required to design and implement data engineering processes that simplify data-driven decision-making. In practice, this makes it easier to access datasets, manage large volumes of data effectively, and gain insights while using the available hardware efficiently.
Students will learn how to select components for building data pipelines, including retrieving, processing, and storing data with popular open-source tools such as Python, Airflow, Hadoop, Spark, and Kafka, as well as various types of databases and storage systems for Data Lake and Data Warehouse architectures.
Learning highlights
- The study of modern approaches to building data transmission pipelines and data architectures for data-driven companies.
- Understanding of the key features and considerations involved in choosing specific solutions for different components of data processing architectures, and how to adapt them to particular business cases.
- Explore the typical problems and challenges that teams and organisations face when establishing an internal data culture and will be introduced to a framework for addressing these issues.
Course outline
15 classes
Session 1
What do data engineers do? ETL and ELT concepts. Python as a language of choice for DE. Basic tooling and libraries.
Session 2
Linux basics. Terminal as an ultimate data engineering tool. Threads and parallel computation. Introduction to parallelism.
Session 3
Concurrent and asynchronous applications. Working with HTTP and web services.
Session 4
DevOps and MLOps concepts. Docker and configuration management. Python project structure, workflow for modern teams.
Session 5
Distributed file storage and processing the classic way - Apache Hadoop and MapReduce concept.
Session 6
Introduction to Apache Spark - architecture and a concept of RDD.
Session 7
Apache Spark - DataFrame API.
Session 8
Relational Databases and MPP. Difference between OLTP and OLAP. Intro to dbt.
Session 9
Data Lake and Data Warehouse concepts. Layered data storage architecture. SCD. Star Schema/Data Vault/Anchor.
Session 10
NoSQL and analytical databases. MongoDB, Clickhouse. Data Marts.
Session 11
Data Orchestration, DAGs. Introduction to Apache Airflow.
Session 12
Stream data processing. Introduction to Apache Kafka and Spark Structured Streaming.
Session 13
Cloud-based tools and components for data engineering.
Session 14
Problems and solutions for modern data-driven companies and teams.
Session 15
Final Exam.
Prerequisites
Familiarity with the basic Linux command line and Docker containers.
Basics of Python programming.
Basic understanding of databases and SQL.
Methodology
During the course students will:
Listen to lectures.
Solve quizzes.
Participate in discussions.
Practice with various tools using Python and Docker.
Implement multiple small projects as part of their homework.
Grading
Nikolay is an industry expert with more than 12 years in active software development and system design, as well as an established teacher of many courses, both public and corporate. These cover Data Engineering, system design, Linux, Python/Go programming and some other areas. He participated in ground-up design of data processing systems for many companies in the US, EU and Russia, establishing processes and data culture to gain insights from huge amounts of data with complex requirements. Also, Nikolay is a member of several program committees of various conferences, including SmartData and Moscow Python Conf++. Passionate to take part in sharing knowledge and experience, he participated in many podcasts, roundtable discussions and was one of the founders of Data Breakfast event that now brings together data specialists in many cities across the world.
See full profileApply for this course
Fundamentals of Data Engineering
by Nikolay Markov
Total hours
45 Hours
Dates
Feb 23 - Mar 13, 2026
Fee for single course
€1500
Fee for degree students
€750
How to secure your spot
Complete the form below to kickstart your application
Schedule your Harbour.Space interview
If successful, get ready to join us on campus
FAQ
Will I receive a certificate after completion?
Yes. Upon completion of the course, you will receive a certificate signed by the director of the program your course belonged to.
Do I need a visa?
This depends on your case. Please check with the Spanish or Thai consulate in your country of residence about visa requirements. We will do our part to provide you with the necessary documents, such as the Certificate of Enrollment.
Can I get a discount?
Yes. The easiest way to enroll in a course at a discounted price is to register for multiple courses. Registering for multiple courses will reduce the cost per individual course. Please ask the Admissions Office for more information about the other kinds of discounts we offer and what you can do to receive one.