SQL to Tableau

The objective of this project is to showcase connecting Tableau (could be done with any BI software naturally) and thus reducing the size of the information that is extracted, which greatly improves the visualisation performance.

Database design (schema)

I first created a mock database in MS Access, with 2 tables:

flights – This one contains information extracted from the EUROSTAT database – the list of airport pairs for all flights from Germany. The extraction process is similar to the one described in this article.

I cleaned the data and kept only those for the years 2018, 2019 and 2020 and the following columns (airport of origin, destination country, destination airport, year, number of flights, number of passengers and number of seats available) :

This table 2500 rows, representing data for the top airport pairs, each of the pairs would have at least one flight a day in average (356 per year).

delays – The data comes from the EUROCONTROL’s Airport ATFM delay dataset. I extrected the airports from Germany and kept the following columns (year, airport ICAO code, airport name, minutes of delay, number of flights):

This tables has 484 rows.

The query

The objective of the visulisation was to aggregate the pairs of airports of origin with countries of destination, filtering out only the airports that had an average ATFM delay per flight higher than 0.90 min (average value for 2019 as per EUROCONTROL Standard Inputs for Economic Analyses.

To do this I wrote the following SQL query. The query combines both tables using the airport ICAO code and the year as a key.

SELECT f.apt_ori, f.ctr_des, f.YEAR, SUM(f.CAF_PAS_DEP) AS number_flights, SUM(f.PAS_BRD_DEP) AS pax, SUM(f.ST_PAS_DEP) AS seats_available, SUM(f.PAS_BRD_DEP)/SUM(f.ST_PAS_DEP)*100 AS load_factor_perc
FROM flights AS f LEFT JOIN delays AS d ON (f.apt_ori = d.APT_ICAO) AND (f.YEAR = d.YEAR)
WHERE f.ctr_des <> “DE”
GROUP BY f.apt_ori, f.ctr_des, f.YEAR
HAVING SUM(d.delay_min/d.arrivals_num) > 0.9

The result and the visulisation

The resulting query has 206 rows. In the next step, I could directly load the database into Tableau and use the query result for the following visualisation:


Posted

in

by

Tags: