{"id":137,"date":"2023-06-01T16:49:35","date_gmt":"2023-06-01T15:49:35","guid":{"rendered":"https:\/\/zdebla.me\/?p=137"},"modified":"2023-06-16T19:28:28","modified_gmt":"2023-06-16T18:28:28","slug":"2-ways-of-data-aggregation-busiest-airports","status":"publish","type":"post","link":"https:\/\/zdebla.me\/index.php\/2023\/06\/01\/2-ways-of-data-aggregation-busiest-airports\/","title":{"rendered":"3 ways of data aggregation (busiest airports) &#8211; Excel, SQL, R"},"content":{"rendered":"\n<p>In this post I\u202fwant to show 3 different ways to aggregate the data for the busiest airports in EUROCONTROL area over the period of 2016-2022 vizualization.\u00a0\u00a0<\/p>\n\n\n\n<p>In one of them I\u202fwill be using <strong>MS Excel<\/strong> with Pivot Table, second one in the second one a \u202f<strong>SQL statement<\/strong> (running on a mock database table created in MS Access). The third way will be using <strong>R <\/strong>language, running <strong>RStudio<\/strong>.\u00a0<\/p>\n\n\n\n<p>The source of the data is the Airport Traffic dataset from EUROCONTROL, available <a href=\"https:\/\/ansperformance.eu\/data\/\">here<\/a>.<\/p>\n\n\n\n<p class=\"has-small-font-size\" style=\"margin-top:var(--wp--preset--spacing--30);margin-right:var(--wp--preset--spacing--30);margin-bottom:var(--wp--preset--spacing--30);margin-left:var(--wp--preset--spacing--30);padding-top:0;padding-right:0;padding-bottom:0;padding-left:0\"><em>Disclaimer:&nbsp;<\/em><\/p>\n\n\n\n<p class=\"has-small-font-size\" style=\"margin-top:var(--wp--preset--spacing--30);margin-right:var(--wp--preset--spacing--30);margin-bottom:var(--wp--preset--spacing--30);margin-left:var(--wp--preset--spacing--30);padding-top:0;padding-right:0;padding-bottom:0;padding-left:0\"><em>This data is published by the EUROCONTROL Performance Review Unit in the interest of the exchange of information. It may be copied in whole or in part providing that this copyright notice and disclaimer are included. The information may not be modified without prior written permission from the EUROCONTROL Performance Review Unit.&nbsp;<\/em><\/p>\n\n\n\n<p>These are the metadata from the dataset:&nbsp;<\/p>\n\n\n\n<figure class=\"wp-block-image size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/zdebla.me\/wp-content\/uploads\/2023\/06\/image-10.png\" alt=\"\" class=\"wp-image-138\" width=\"684\" height=\"203\" srcset=\"https:\/\/zdebla.me\/wp-content\/uploads\/2023\/06\/image-10.png 602w, https:\/\/zdebla.me\/wp-content\/uploads\/2023\/06\/image-10-300x89.png 300w\" sizes=\"auto, (max-width: 684px) 100vw, 684px\" \/><\/figure>\n\n\n\n<p class=\"has-text-align-center has-vivid-cyan-blue-color has-text-color has-large-font-size\"><strong>Excel&nbsp;<\/strong><\/p>\n\n\n\n<p>Getting (aggregating) this data in a Pivot table is quite simple, which is one of the reasons why it is so popular.&nbsp;<\/p>\n\n\n\n<p>All we have to do is to choose the column FLT_TOT_1 (Number total IFR movements), SUM of it, for the <em>Values, <\/em>and YEAR and APT_NAME for the Rows (in most cases using the ICAO code would be more exact as it would eliminate the possibility of having incostistencies in the Name field of the airport, which would lead to aggregation mistakes. However the EUROCONTROL data are very constistent in this manner)<\/p>\n\n\n\n<p class=\"has-text-align-center\"><img decoding=\"async\" src=\"blob:https:\/\/zdebla.me\/1a875c01-225d-42c0-8eec-32e591b6215b\"><\/p>\n\n\n\n<p>Next we want to sort the data.&nbsp;<\/p>\n\n\n\n<p class=\"has-text-align-center\"><img decoding=\"async\" style=\"\" src=\"blob:https:\/\/zdebla.me\/c5e1b0bc-f952-4c24-b288-ab301b43b59d\"><\/p>\n\n\n\n<p>The resulting data set is what we wanted, however the format is not usable yet for a\u202fBI application yet.&nbsp;&nbsp;<\/p>\n\n\n\n<p class=\"has-text-align-center\"><img decoding=\"async\" src=\"blob:https:\/\/zdebla.me\/d6af030b-b5bf-463f-afdd-a40dc7afcc3c\" style=\"\"><\/p>\n\n\n\n<p>To change the format, we must first change, in the tab <em>Design \u2013 Subtotals \u2013 Do not show subtotals, <\/em>next <em>Report Layout \u2013 Show in tabular form <\/em>and finally <em>Report Layout \u2013 Repeat all items labels. <\/em>The resulting table is in a\u202fformat usable in a\u202fBI application.&nbsp;&nbsp;<\/p>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-28f84493 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<p><img decoding=\"async\" src=\"blob:https:\/\/zdebla.me\/9c6dc6d3-b076-40a1-8a1f-23fe9846e3fb\"><\/p>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<p><img decoding=\"async\" src=\"blob:https:\/\/zdebla.me\/503e478b-e4f4-45f9-8714-b1f5bb3bef7d\"><\/p>\n<\/div>\n<\/div>\n\n\n\n<p>Next we copy the data into a\u202fnew workbook and paste values only.<\/p>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-28f84493 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<p><img decoding=\"async\" src=\"blob:https:\/\/zdebla.me\/77f2ec78-00eb-4220-b49f-f29cfa0d657e\"><\/p>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<p><img decoding=\"async\" src=\"blob:https:\/\/zdebla.me\/b8d4b560-8f69-484a-b7e3-5a3fc9872b3b\"><\/p>\n<\/div>\n<\/div>\n\n\n\n<p>We may want to sort the pasta data by <em>Sum of FLT_TOT_1. <\/em>This was not possible in the Pivot table as it was limited to sorting by Year and SUM (see screenshot of sorting, a bit above).&nbsp;&nbsp;<\/p>\n\n\n\n<p class=\"has-text-align-center has-vivid-cyan-blue-color has-text-color has-large-font-size\"><strong>SQL&nbsp;<\/strong><\/p>\n\n\n\n<p>The beauty of SQL is that we can get the results usable in BI application in a\u202fsingle SQL statement.&nbsp;&nbsp;<\/p>\n\n\n\n<p>Considering the name of the mock table is \u201eairport_traffic\u201c and containing the data same column names as the excel file (see metadata) :&nbsp;<\/p>\n\n\n\n<p><\/p>\n\n\n\n<p class=\"has-cyan-bluish-gray-background-color has-background\" style=\"margin-top:0;margin-right:0;margin-bottom:0;margin-left:0;padding-top:0;padding-right:0;padding-bottom:0;padding-left:0\"><strong>SELECT <\/strong>airport_traffic.apt_name, airport_traffic.Year, Sum(airport_traffic.flt_tot_1) AS sum_flt&nbsp;&nbsp;<\/p>\n\n\n\n<p class=\"has-cyan-bluish-gray-background-color has-background\" style=\"margin-top:0;margin-right:0;margin-bottom:0;margin-left:0;padding-top:0;padding-right:0;padding-bottom:0;padding-left:0\"><strong>FROM <\/strong>airport_traffic&nbsp;<\/p>\n\n\n\n<p class=\"has-cyan-bluish-gray-background-color has-background\" style=\"margin-top:0;margin-right:0;margin-bottom:0;margin-left:0;padding-top:0;padding-right:0;padding-bottom:0;padding-left:0\"><strong>GROUP BY<\/strong> airport_traffic.Year, airport_traffic.apt_name&nbsp;<\/p>\n\n\n\n<p class=\"has-cyan-bluish-gray-background-color has-background\" style=\"margin-top:0;margin-right:0;margin-bottom:0;margin-left:0;padding-top:0;padding-right:0;padding-bottom:0;padding-left:0\"><strong>ORDER BY&nbsp;<\/strong> Sum(airport_traffic.flt_tot_1) <strong>DESC<\/strong>, Table1.Year <strong>DESC<\/strong>;&nbsp;&nbsp;<\/p>\n\n\n\n<p class=\"has-text-align-center\"><img decoding=\"async\" src=\"blob:https:\/\/zdebla.me\/3392235c-6d8f-4572-b0fd-4004d3d520d2\"><\/p>\n\n\n\n<p>Compare the result with the final sorted data extracted in Excel.<\/p>\n\n\n\n<p class=\"has-text-align-center has-vivid-cyan-blue-color has-text-color has-medium-font-size\"><strong>R<\/strong><\/p>\n\n\n\n<p>Running a command in R (using RStudio) is another great way to aggregate the data, using also just one command.<\/p>\n\n\n\n<p class=\"has-ibm-plex-mono-font-family has-tiny-font-size\">airport_traffic &lt;- read_csv(&#8220;Airport_Traffic.csv&#8221;)<\/p>\n\n\n\n<p>Actually thanks to being able to import the CSV data into a data frame and work with it directly is even more straightforward than the previous steps.<\/p>\n\n\n\n<p class=\"has-ibm-plex-mono-font-family has-tiny-font-size\">airport_traffic %>% group_by(YEAR, APT_NAME) %>% summarise(sum_flt = sum(FLT_TOT_1)) %>% arrange(-sum_flt)<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"334\" height=\"196\" src=\"https:\/\/zdebla.me\/wp-content\/uploads\/2023\/06\/obrazok-44.png\" alt=\"\" class=\"wp-image-326\" srcset=\"https:\/\/zdebla.me\/wp-content\/uploads\/2023\/06\/obrazok-44.png 334w, https:\/\/zdebla.me\/wp-content\/uploads\/2023\/06\/obrazok-44-300x176.png 300w\" sizes=\"auto, (max-width: 334px) 100vw, 334px\" \/><\/figure>\n","protected":false},"excerpt":{"rendered":"<p>In this post I\u202fwant to show 3 different ways to aggregate the data for the busiest airports in EUROCONTROL area over the period of 2016-2022 vizualization.\u00a0\u00a0 In one of them I\u202fwill be using MS Excel with Pivot Table, second one in the second one a \u202fSQL statement (running on a mock database table created in [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[11,1],"tags":[],"class_list":["post-137","post","type-post","status-publish","format-standard","hentry","category-articles","category-uncategorised"],"_links":{"self":[{"href":"https:\/\/zdebla.me\/index.php\/wp-json\/wp\/v2\/posts\/137","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/zdebla.me\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/zdebla.me\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/zdebla.me\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/zdebla.me\/index.php\/wp-json\/wp\/v2\/comments?post=137"}],"version-history":[{"count":7,"href":"https:\/\/zdebla.me\/index.php\/wp-json\/wp\/v2\/posts\/137\/revisions"}],"predecessor-version":[{"id":327,"href":"https:\/\/zdebla.me\/index.php\/wp-json\/wp\/v2\/posts\/137\/revisions\/327"}],"wp:attachment":[{"href":"https:\/\/zdebla.me\/index.php\/wp-json\/wp\/v2\/media?parent=137"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/zdebla.me\/index.php\/wp-json\/wp\/v2\/categories?post=137"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/zdebla.me\/index.php\/wp-json\/wp\/v2\/tags?post=137"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}