What you'll learn
- Create plugins to add functionalities to Apache Airflow.
- Using Docker with Airflow and different executors
- Master core functionalities such as DAGs, Operators, Tasks, Workflows, etc
- Understand and apply advanced concepts of Apache Airflow such as XCOMs, Branching and SubDAGs.
- The difference between Sequential, Local and Celery Executors, how do they work and how can you use them.
- Use Apache Airflow in a Big Data ecosystem with Hive, PostgreSQL, Elasticsearch etc.
- Install and configure Apache Airflow
- Think, answer and implement solutions using Airflow to real data processing problems
VirtualBox must be installed - A VM of 3Gb will have to be downloaded
At least 8 gigabytes of memory
Some prior programming or scripting experience. Python experience will help you a lot but since it's a very easy language to learn, it shouldn't be too difficult if you are not familiar with.
Apache Airflow is an open-source platform to programmatically author, schedule and monitor workflows. If you have many ETL(s) to manage, Airflow is a must-have.
In this course you are going to learn everything you need to start using Apache Airflow through theory and pratical videos. Starting from very basic notions such as, what is Airflow and how it works, we will dive into advanced concepts such as, how to create plugins and make real dynamic pipelines.
Who this course is for:
- People being curious about data engineering.
- People who want to learn basic and advanced concepts about Apache Airflow.
- People who like hands-on approach.
About the instructor
My name is Marc Lamberti, I’m 27 years old and I’m very happy to arouse your curiosity! I’m currently working as Big Data Engineer in full-time for the biggest online bank in France, dealing with more than 1 500 000 clients. For more than 3 years now, I created different ETLs in order to address the problems that a bank encounters everyday such as, a platform to monitor the information system in real time to detect anomalies and reduce the number of client’s calls, a tool detecting in real time any suspicious transaction or potential fraudster, an ETL to valorize massive amount of data into Cassandra and so on.
The biggest issue when you are a Big Data Engineer is to deal with the growing number of available open source tools. You have to know how to use them, when to use them and how they connect to each other in order to build robust, secure and performing systems solving your underlying business needs.
I strongly believe that the best way to learn and understand a new skill is by taking a hands-on approach with just enough theory to explain the concepts and a big dose of practice to be ready in a production environment. That’s why in each of my courses you will always find practical examples associated with theoric explanations.
Have a great learning time!