loading

Apache Airflow: The Hands-On Guide

15,198 Enrolled Students

Course Features

Partner
Level
All levels
Start Date
Duration
1 year
Access
Full lifetime access
Certificate
Certificate of completion

Apache Airflow: The Hands-On Guide

15,198 Enrolled Students

What you'll learn

  • Coding Production Grade Data pipelines by Mastering Airflow through Hands-on Examples
  • How to Follow Best Practices with Apache Airflow
  • How to Scale Airflow with the Local, Celery and Kubernetes Wxecutors
  • How to Set Up Monitoring with Elasticsearch and Grafana
  • How to Secure Airflow with authentication, crypto and the RBAC UI
  • Core and Advanced Concepts with Pros and Limitations
  • Mastering DAGs with timezones, unit testing, backfill and catchup
  • Organising the DAG folder and keep things clean

Requirements

  • Notions of Docker and Python
  • Virtual Box installed (Only for local Kubernetes cluster part)
  • Vagrant installed
  • The course "The Complete Hands-On Introduction to Apache Airflow" can be a nice plus.

Description

Apache Airflow is a platform created by community to programmatically author, schedule and monitor workflows.

It is scalabledynamicextensible and modulable.

Without any doubts, mastering Airflow is becoming a must-have and an attractive skill for anyone working with data.

What you will learn in the course:

  • Fundamentals of Airflow are explained such as what is Airflow, how the scheduler and the web server works
  • The Forex Data Pipeline project is incredible way to discover many operators in Airflow and deal with Slack, Spark, Hadoop and more
  • Mastering your DAGs is a top priority and you will be able to play with timezonesunit testing your DAGshow to structure your DAG folder and much more
  • Scaling Airflow through different executors such as the Local Executor, the Celery Executor and the Kubernetes Executor will be explained in details. You will discover how to specialise your workers, how to add new workerswhat happens when a node crashes.
  • Kubernetes cluster of 3 nodes will be set up with RancherAirflow and the Kubernetes Executor in local to run your data pipelines.
  • Advanced concepts will be shown through practical examples such as templatating your DAGshow to make your DAG dependent of another, what are Subdags and deadlocks, and more.
  • You will set up a Kubernetes cluster in the cloud with AWS EKS and Rancher  in order to use Airflow along with the Kubernetes Executor
  • Monitoring Airflow is extremely important! That's why you will know how to do it with Elasticsearch and Grafana.
  • Security will be also addressed in order to make your Airflow instance compliant with your company. Specifying roles and permissions for your users with RBACPrevent from accessing the Airflow UI with authentication and password,  data encryption and more.

In addition:

  • Many practical exercises are given along the course so that you will have occasions to apply what you learn.
  • Best practices are stated when needed to give you the best ways of using Airflow
  • Quiz are available to assess your comprehension at the end of each section.
  • Answering fast your questions is my top-priority and I will do my best for you.

I put a lot of effort in order to give you the best content and I hope you will enjoy it as much as I enjoyed doing it.

At the end of the course you will more confident than ever to use Airflow

Wish you a great success!

Marc Lamberti

Who this course is for:

  • Data Engineers
  • Inspiring Data Engineers
  • DevOps
  • Software Engineers
  • Data Scientists

Instructors

About the instructor

Marc Lamberti

Hi there,

My name is Marc Lamberti, I’m 27 years old and I’m very happy to arouse your curiosity! I’m currently working as Big Data Engineer in full-time for the biggest online bank in France, dealing with more than 1 500 000 clients. For more than 3 years now, I created different ETLs in order to address the problems that a bank encounters everyday such as, a platform to monitor the information system in real time to detect anomalies and reduce the number of client’s calls, a tool detecting  in real time any suspicious transaction or potential fraudster, an ETL to valorize massive amount of data into Cassandra and so on.

The biggest issue when you are a Big Data Engineer is to deal with the growing number of available open source tools. You have to know how to use them, when to use them and how they connect to each other in order to build robust, secure and performing systems solving your underlying business needs.

I strongly believe that the best way to learn and understand a new skill is by taking a hands-on approach with just enough theory to explain the concepts and a big dose of practice to be ready in a production environment. That’s why in each of my courses you will always find practical examples associated with theoric explanations.

Have a great learning time!

Reviews

Student feedback

0

Course Rating
5
0
4
0
3
0
2
0
1
0

Related Courses

What you’ll learn

  • Coding Production Grade Data pipelines by Mastering Airflow through Hands-on Examples
  • How to Follow Best Practices with Apache Airflow
  • How to Scale Airflow with the Local, Celery and Kubernetes Wxecutors
  • How to Set Up Monitoring with Elasticsearch and Grafana
  • How to Secure Airflow with authentication, crypto and the RBAC UI
  • Core and Advanced Concepts with Pros and Limitations
  • Mastering DAGs with timezones, unit testing, backfill and catchup
  • Organising the DAG folder and keep things clean

Requirements

  • Notions of Docker and Python
  • Virtual Box installed (Only for local Kubernetes cluster part)
  • Vagrant installed
  • The course “The Complete Hands-On Introduction to Apache Airflow” can be a nice plus.

Description

Apache Airflow is a platform created by community to programmatically author, schedule and monitor workflows.

It is scalabledynamicextensible and modulable.

Without any doubts, mastering Airflow is becoming a must-have and an attractive skill for anyone working with data.

What you will learn in the course:

  • Fundamentals of Airflow are explained such as what is Airflow, how the scheduler and the web server works
  • The Forex Data Pipeline project is incredible way to discover many operators in Airflow and deal with Slack, Spark, Hadoop and more
  • Mastering your DAGs is a top priority and you will be able to play with timezonesunit testing your DAGshow to structure your DAG folder and much more
  • Scaling Airflow through different executors such as the Local Executor, the Celery Executor and the Kubernetes Executor will be explained in details. You will discover how to specialise your workers, how to add new workerswhat happens when a node crashes.
  • Kubernetes cluster of 3 nodes will be set up with RancherAirflow and the Kubernetes Executor in local to run your data pipelines.
  • Advanced concepts will be shown through practical examples such as templatating your DAGshow to make your DAG dependent of another, what are Subdags and deadlocks, and more.
  • You will set up a Kubernetes cluster in the cloud with AWS EKS and Rancher  in order to use Airflow along with the Kubernetes Executor
  • Monitoring Airflow is extremely important! That’s why you will know how to do it with Elasticsearch and Grafana.
  • Security will be also addressed in order to make your Airflow instance compliant with your company. Specifying roles and permissions for your users with RBACPrevent from accessing the Airflow UI with authentication and password,  data encryption and more.

In addition:

  • Many practical exercises are given along the course so that you will have occasions to apply what you learn.
  • Best practices are stated when needed to give you the best ways of using Airflow
  • Quiz are available to assess your comprehension at the end of each section.
  • Answering fast your questions is my top-priority and I will do my best for you.

I put a lot of effort in order to give you the best content and I hope you will enjoy it as much as I enjoyed doing it.

At the end of the course you will more confident than ever to use Airflow

Wish you a great success!

Marc Lamberti

Who this course is for:

  • Data Engineers
  • Inspiring Data Engineers
  • DevOps
  • Software Engineers
  • Data Scientists