Airflow is one of the most popular workflow management solution, it author, schedule and monitor workflows. This blog post will talk about how to install Airflow on Ubuntu 18.04 Server.
Requirements
- Python 2.7
- pip
- Ubuntu 18.04 Server (at least 4 GB RAM size)
Install Python and pip
We will be using Python 2.7 in this tutorial, Lets start by installing Python on your Ubuntu Machine.
sudo apt-get install python-setuptools
pip is a Python package management tool. We’ll be using this for installing packages required in Airflow.
sudo apt-get install python-pip
Note: It is recommended to use latest pip version. For upgrading pip version, use pip upgrade command given below.
sudo pip install --upgrade pip
Installing PostgreSQL for Airflow
Airflow comes with sqlite database backend, this database system will not be able to run data pipeline on webUI. We would require to have more powerful database system like PostgreSQL, it is an open source database management system, that comes with robust feature set, data integrity and extensibility. We will install PostgreSQL and configure it to use with Airflow.
sudo apt-get install postgresql postgresql-contrib
As we have already installed postgresql database using above mentioned command. We will now create a database for airflow and grant access to a sudo user. Lets access to psql, a command line tool for Postgres.
sudo -u postgres psql
After logging in successfully, we will get psql prompt (postgres=#). We will create a new user and provide privileges to it.
CREATE ROLE ubuntu; CREATE DATABASE airflow; GRANT ALL PRIVILEGES on database airflow to ubuntu; ALTER ROLE ubuntu SUPERUSER; ALTER ROLE ubuntu CREATEDB; GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public to ubuntu;
Now connect to airflow database and get connection information.
postgres-# \c airflow
After successful connection, prompt will be changed to airflow-#. We will verify this by fetching connection info
airflow=# \conninfo
\conninfo command output:
You are connected to database "airflow" as user "postgres" via socket in "/var/run/postgresql" at port "5432".
We’ll change settings in pg_hb.conf file for required configuration as per Airflow. You can run command SHOW hba_file to find location of pg_hba.conf file.Most likely located at pg_hb.conf located at /etc/postgresql/*/main/pg_hba.conf
open this file with vim and change ipv4 address to 0.0.0.0/0 and listen_addresses to listen_addresses = ‘*’.
We will restart PostgreSQL to load changes.
sudo service postgresql restart
Install Airflow
As PostgreSQL is already installed and configured. Next, We will install Airflow and configure it.
Set AIRFLOW_HOME environment variable to ~/airflow.
export AIRFLOW_HOME=~/airflow
Install Ubuntu dependencies required for Apache Airflow.
- sudo apt-get install libmysqlclient-dev ( for airflow airflow mysql )
- sudo apt-get install libssl-dev ( for airflow cryptograph package)
- sudo apt-get install libkrb5-dev ( for airflow kerbero package )
- sudo apt-get install libsasl2-dev ( for airflow hive package )
After installing dependencies, Install Airflow and its packages.
sudo pip install apache-airflow
for other subpackages like celery, async, crypto, rabbitmq etc., you can check apache airflow installation page
After successfully installing airflow, we will initialise Airflow’s database
airflow initdb
Now airflow.cfg file should be generated in airflow home directory, we will tweak some configuration here to get better airflow functionality.
We will be using CeleryExecutor instead of SequentialExecutor which come by default with airflow. Change
executor = CeleryExecutor
For DB connection we will pass PostgreSQL database ‘airflow’, that we have created in earlier step.
sql_alchemy_conn = postgresql+psycopg2://ubuntu@localhost:5432/airflow
For removing examples on the home page load_examples variable can set to False
Change broker_url and celery_result_backend to the same config, as shown below
broker_url = amqp://guest:guest@localhost:5672//
celery_result_backend = amqp://guest:guest@localhost:5672//
After doing all these setting just save your configuration and exit.
For Loading new configurations, we should run
airflow initdb
Installing Rabbitmq
Rabbitmq is a message broker, that required to rerun airflow dags with celery. Rabbitmq can be installed with following command.
sudo apt install rabbitmq-server
We will change configuration NODE_IP_ADDRESS=0.0.0.0 in configuration file located at
/etc/rabbitmq/rabbitmq-env.conf
Now Start RabbitMQ service
sudo service rabbitmq-server start
Installing Celery
Celery is a python api for rabbitmq, We can install celery using pip
sudo pip install celery
Some Celery versions may not be compatible with rabbitmq, so you should have to check versions that are supported with airflow. It is know that celery version between 3.1.17 and less than version 4.0 is compatible with airflow.
you can install higher version with
sudo pip uninstall celery
and install lower version like this
sudo pip install 'celery>=3.1.17,<4.0
Starting Airflow
All the required installation and configuration is done. We will create a dags folder in airflow home directory .i.e; at /home/ubuntu/airflow location
mkdir -p /home/ubuntu/airflow/dags/
and then we’ll start all airflow services to up airflow webUI
airflow webserver airflow scheduler airflow worker
If you want to up airflow continuously up, you should run these command with -D flag like
airflow webserver -D, this will run airflow as a Daemon in background. You required to do it for all the services, If you want to keep these services continuously up.
Stopping Aiflow
When you are running Airflow as a Daemon, it becomes little trickier to stop it. First you have to get process id of airflow and then kill it using sudo.
cat $AIRFLOW_HOME/airflow-webserver.pid
above command will print Airflow process ID now kill it using command
sudo kill -9 {process_id of airflow}
Start Airflow, using commands
airflow webserver, airflow scheduler and airflow worker.
Airflow runs on port 8080, port configuration can also be changed form airflow.cfg. Visit localhost:8080 to find Airflow running with user interface.

Pranav is a software developer at Vuja De. His works includes managing Amazon AWS, other cloud services and internal infrastructure