I think this time we got the numbers right ... we just don't know which ones to use. — I think this time we got the numbers right … we just don’t know which ones to use.

Talking about CheckIn.com, we have been asked many times, how we crunch our numbers. Or that our drivetimes are different from Google. Yes?

The second question is rather important, as before you crunch the numbers, you got to see what you work with. And this article on LinkedIn by Jasper Venema rang a bell last week…

So let’s talk about statistics today.

Passengers

We are in the process to add some new region to CheckIn.com and as usual, the first and foremost figure we need are the passengers. We usually use Wikipedia, but even between different Wikipedia pages, more so even between different Wikipedia languages, we find different passenger figures. Now we usually compare them with commercial data we get and guess what: There’s official sources such as ACI, IATA, national statistics, airport associations, but also commercial sources like ANNA.aero, Albatross, AEX or others and in all cases, we have – sometimes substantial – discrepancies on annual passengers per year for a(ny) given airport…

So we started to ask the airports. And get again other numbers.

We know one difference, where an airport association doesn’t use the departures and arrivals, but simply doubles the departures they get. Not very contemporary and definitely not state-of-the-art, but yes, it explains some. Jasper Venema’s article explains some other. But in numbers we don’t much care about explanations. It should be in our industries own and vital interest to use the same number for the same “item” (here “total airport passengers for a given year”). And quite honestly: If the airline has different numbers as they don’t count non-ref passengers, so be it. With most airlines not happy to give out “their” numbers for a given airport or route, the number that counts is the one the airport publishes.

Drivetimes

Whereas we showed long time ago, that they differ from tool to tool. And sorry, Google is neither the best, nor the most accurate of those. We compared more than 20 different tools, from our initial logistics software used by trucking companies via Google, MapPoint, Maptitude, Apple, Here, … Today we mostly use OpenStreetMaps, as we found them on the tests we did in different countries and where other tools failed, they come up with the proper calculations. Even on ferries they are mostly accurate, where Microsoft and Google still translate long-haul ferries with Zero drivetime.

We cannot consider traffic jams, temporary construction sites or detours, but found OpenStreetMap to provide lower speed defaults on highways likely overloaded. We don’t know how fast you drive, but neither do Google, Bing & Co. – we got to work with assumptions.

Start + End Points

Another bug we have in our backlog and work on constantly (it’s “relax work”) are the city centers. We calculate population based on the municipality. Now municipality borders are nothing really easy to use for mapping. Take the example of Hamburg. For some reason, Hamburg “owns” a part of the North Sea. Such we had to modify our boundary data for Hamburg to exclude that intentionally as it caused questions on our default example and the map to be “off Hamburg”. Then you need a “geopoint”, a given geographical point defined by latitude and longitude. For many municipalities, there is such a point defined, usually called the “admin center”. But many municipalities either have not defined such point – or it’s a (stupid) “theoretical” centroid that does not relate to streets. Where missing, the drive time takes such computed centroid too, the center of the boundary. In many cases that results in a point somewhere inaccessible by road. There it takes the spatially next road, which does not have to be easily accessible or be well connected to the main roads. Or the centroid is too far off from any road.

Airports are also prime candiates. The geopoint to be used for navigation very often is not the terminal road, but the center of the main runway. The next road might also not be near the terminal, but on the other side of the airport. As such, for each and every of the airports in our database, we defined the geopoint at the terminal or closest to the terminal. For many smaller airports, there is no street data in any of the map tools we use, as those roads are managed “privately”.

Around Lugano, we found many municipalities being located in the alps, with a town, and a lot of mountain with ski slopes. Unfortunately, without a defined city center, drive times differed substantially between a drive to the next municipalities city center defined and the one undefined. Having covered those, Lugano remains an “interesting map”, as there are also several municipalities with “exclaves”, split into different parts surrounded by other municipalities. But we can color only the complete one. So parts are in one drive time zone, others are in the next. Look at Locarno, where there is no admin center, but the centroid ended in the middle of the lake…

Helgoland has an airport, but the entire island is banned for cars. No drive times ツ

Population + Maps

And don’t underestimate that the population for all those municipalities we have on file are not the same coming from Eurostat, national statistics offices or the towns themselves. The naming differs between those sources and there is no “common code” like we have in aviation, to uniquely identify those towns. That likely also being the cause of the +20% mistakes when using that commercial maps provider (€32K) for drive time calculations that caused us that ad hoc map change earlier this year. Little town Münster, Bavaria is not the large city Münster in Northrhine-Westphalia the commercial mapping provider returned. And is that now Münster, Bayern or Munster, Bavaria or Muenster (Lech)? Worse in France I can tell you… So we had to make sure we only use geopoints and not unreliable “names” and maintain an extensive list of “associations” to make sure we have the data properly associated – until the next update when they changed a lot again.

But worse; you can’t use Eurostat everywhere, even within the EU. With their data being outdated the day they publish them or regions like Scotland using a totally different and incompatible data model, so they publish “calculated estimates” for the wards. More guesstimates than estimates. And Europe only covers just 28 states anyway, the entire Balkan is missing, Norway, as well as most the microstates … Are the Aland Islands independent or part of Finland? Those are just examples.

And then we need to associate cartography data from the cadastre offices that is incompatible on the same year to their (own) national statistics and Eurostat. So that also goes into the number crunching. Do this for one airport is bad enough. Do it for Europe? We wouldn’t try that stunt again, now we know what we had to go through… And no, the commercial “solutions” are just as bad, so we had to do it “again” for our own database. So we use OpenStreetMap for the mapping. But for our layers, we compiled our own database of administrative boundaries, meanwhile mostly from national cadastre offices with own updates to make the maps match the population data.

Other variables

So we take into account the airport size by passengers, defining (assuming) the “reach” of the airport. That’s also something variable, as in some areas there’s a lot of large airports (i.e. Germany to BeNeLux), other regions, airports are rather scarce. Spain for example has Madrid in the center and except for two minor airports all other airports are on the coast. To Bucharest a substantial number of people drive eight hours. We calculate ferry times, including standard waiting times, but what about ferries that go once a day and then you have 18 hours to wait for next days (once-daily) flight?

As I keep saying: Despite all the data we provide on flown passengers on that or similar routes, on passenger potential in the catchment area, etc., I disagree with the recent statement by Marc Gordien on his very good article on air service forecasting maths. To look into the future was, is and will be a look in the Crystal Ball. We can only minimize the risk for failure by providing (and using) the best possible data to justify our gut feeling. But considering myself a professional; when I see new bases opened by easyJet, Wizz or other airlines, I frequently find myself at a loss, would have never seen fit to justify the risk. Still, many of those routes work.

It gives reasons to consider the soft factors. Reputation. Ticket prices (and ancillary revenues). Frequency. Ethnicity. Commercial relations. Tourism. And many others. And my commonly used example where a regional airline opened up a route on a trice weekly basis, only just when they started making money to be cannibalized by a low cost carrier with bigger aircraft and less frequency, dumping the route in less than a year. Unfortunately, the regional carrier was gone, the route is no longer served. Data is not everything. But it helps to qualify the real cases and make sure you understand the risk taken on new routes.

Quo Vadis?

Do you find something “weird” on our maps? Please let us know! There are still many mistakes and bugs and we constantly work on the database to improve the information we have. But we believe we now have a rather well working system, the bugs our users point us too are mostly either quickly corrected or (mostly) resulting from reasons beyond our control. And the results very nicely match the facts we get from other sound sources from airlines and airports to compare our results with.

We also work very hard trying to simplify our analyses, compile meaningful facts in the dashboard and provide the more complex detail on the analysis page. We discuss options to also interface the data with other tools, though currently, most of the established companies prefer to live in their silos ツ

Working with (sound) assumptions, the numbers help you to understand and qualify the potential and the risk, but there are exceptions, no matter how many work on the data to improve it. It will remain an ongoing development and ample room for improvements. And such we will gain better and better understanding of the facts. But we can’t read the minds of the decision makers: The paying passengers. We can only assume a likeliness from sound statistical analyses.

We do not replace a route analyst or airline network planner. But we polish the Crystal Ball(s) in use and provide real nicely shining and new ones to take a better look. Check it out.

Food for Thought
Comments welcome!

Comments

I have just been faced with another prime example. The data published by the Hellenic Civil Aviation Authoriy and Fraport for the 14 Greek airports operated by Fraport differ by a mere 2.8 million passengers.
You. Got. To. Be. Kidding. Me.
Given the ACI Statistics Manual, minor offsets might be explainable. That. Is. Not.
And after Fraport Bulgaria simply increased their catchment areas for Varna and Burgas from 1.2 and 1.4 million we have calculated on a European standard algorithm to 3 million, arguing to us, that “they know better”, I slowly stop believing Fraport. Do they now try to beautify their passenger numbers in Greece?
https://www.checkin.com/news/2018/03/the-fraport-cautionary-tale-2018/

The Numbers Game