Skewed probability distributions and airline fares

In O.R., we tend to work a great deal with statistical distributions.  Because randomness is part of life, randomness comes into our models.  Sometimes the probability distributions we use are nice, symmetrical ones like the Normal (Gaussian) distribution, but not always.  And when we encounter skewed distributions, some of the ideas from those symmetrical ones can't be used.  With the Normal distribution, mean, median and mode (or modal class) are the same, but not with skewed distributions.

The other day, British newspapers caught up with some research done by the travel website "Rome2Rio" about budget airfares.  If you enquire about flights on one of the (numerous) travel search websites, the results for fares are skewed.  Lots of results at or just above some minimum, then a range of increasing fares, and a few really high ones.  As an experiment, today I enquired about a return trip from London Heathrow to New York, JFK, flying the day before my next birthday and returning one day after that date.  The website came up with 697 quotes, from 474GBP to 2812GBP.  Now, hopefully, nobody is going to pay the latter sum, and so Rome2Rio asked the questions "What is an indicative amount to pay?", or "What sum indicates the typical 'target' fare?"

Their suggestion is to find the 20th percentile and use that.  That will be higher than the really low figures, which might be oddities, and far less than the high and unusable ones.  So for my experiment, I found 498GBP as the 20th percentile.

I did two more experiments for the same dates.  London to Singapore return, and London to Athens return

To Singapore, there were 270 quotes, from 664GBP to a huge 6766GBP with 823GBP as the 20th percentile.

To Athens, there were 330 quotes, from 272GBP to 1972GBP, with 386GBP as the 20th percentile.

But Rome2Rio didn't stop there.  They did some regression analysis on the fares that they found to see how their indicative fares related to the length of the flight, and their simple regression equation is that the fare you should be looking for is:
50USD plus 0.11 times the flight length in miles (British readers, 32GBP+0.07times flight length in miles)

Because Britain has a very competitive budget airline infrastructure, Rome2Rio's figures apparently tend to be too high.  So let's see what happens for my flights.

For New York, their formula predicts 548GBPso the UK is cheap compared to the rest of the world
For Singapore, it gives 1010GBP, ditto
For Athens, it gives 274GBP ... the percentile is much higher than this.  Doesn't anyone want to go to Athens?

The regression is interesting for a lot of people, but the basis is also interesting for OR people. How do you deal with a skewed distribution in a sensible way?  The analysts at Rome2Rio have developed a consistent approach, which reflects the psychology of the website user.  If you are handling other skew distributions then you may need to adopt other approaches, but this is a model to bear in mind when you have to.

(And my praise for the website - I entered my postcode for a journey, and - despite being an Australian website - I was given the times and fares for the bus at the end of the road.)

Comments

Popular Posts