Saturday, 31 August 2013

OR55 minus 2

The 55th OR Society conference starts in two days time.  A couple of hundred members of the ORSociety and a few non-members will arrive on Monday or Tuesday.  We have been working on the event for the last year; we, means the conference organiser, Hilary, my co-chair Phil Jones, and others on the committee.  The material is prepared, and today we have had a flurry of emails about last-minute matters.  Two sessions lacked session chairs - could we have volunteers?  Is there a map to show how to move from session to accommodation and vice versa?  We have just learnt of a visitor to the university who is interested in OR. - can we invite him to the reception?

Operational research is about problem solving, so we have done it!

More about the conference later.

Tuesday, 27 August 2013

Skewed probability distributions and airline fares

In O.R., we tend to work a great deal with statistical distributions.  Because randomness is part of life, randomness comes into our models.  Sometimes the probability distributions we use are nice, symmetrical ones like the Normal (Gaussian) distribution, but not always.  And when we encounter skewed distributions, some of the ideas from those symmetrical ones can't be used.  With the Normal distribution, mean, median and mode (or modal class) are the same, but not with skewed distributions.

The other day, British newspapers caught up with some research done by the travel website "Rome2Rio" about budget airfares.  If you enquire about flights on one of the (numerous) travel search websites, the results for fares are skewed.  Lots of results at or just above some minimum, then a range of increasing fares, and a few really high ones.  As an experiment, today I enquired about a return trip from London Heathrow to New York, JFK, flying the day before my next birthday and returning one day after that date.  The website came up with 697 quotes, from 474GBP to 2812GBP.  Now, hopefully, nobody is going to pay the latter sum, and so Rome2Rio asked the questions "What is an indicative amount to pay?", or "What sum indicates the typical 'target' fare?"

Their suggestion is to find the 20th percentile and use that.  That will be higher than the really low figures, which might be oddities, and far less than the high and unusable ones.  So for my experiment, I found 498GBP as the 20th percentile.

I did two more experiments for the same dates.  London to Singapore return, and London to Athens return

To Singapore, there were 270 quotes, from 664GBP to a huge 6766GBP with 823GBP as the 20th percentile.

To Athens, there were 330 quotes, from 272GBP to 1972GBP, with 386GBP as the 20th percentile.

But Rome2Rio didn't stop there.  They did some regression analysis on the fares that they found to see how their indicative fares related to the length of the flight, and their simple regression equation is that the fare you should be looking for is:
50USD plus 0.11 times the flight length in miles (British readers, 32GBP+0.07times flight length in miles)

Because Britain has a very competitive budget airline infrastructure, Rome2Rio's figures apparently tend to be too high.  So let's see what happens for my flights.

For New York, their formula predicts 548GBPso the UK is cheap compared to the rest of the world
For Singapore, it gives 1010GBP, ditto
For Athens, it gives 274GBP ... the percentile is much higher than this.  Doesn't anyone want to go to Athens?

The regression is interesting for a lot of people, but the basis is also interesting for OR people. How do you deal with a skewed distribution in a sensible way?  The analysts at Rome2Rio have developed a consistent approach, which reflects the psychology of the website user.  If you are handling other skew distributions then you may need to adopt other approaches, but this is a model to bear in mind when you have to.

(And my praise for the website - I entered my postcode for a journey, and - despite being an Australian website - I was given the times and fares for the bus at the end of the road.)

Saturday, 3 August 2013

Don't mangle your English in your reports (1)

OR people need to communicate.  They work with models, and talk fluently in the languages of mathematics and statistics (and other tongues).  Then they need to communicate their results.  Poor communication can kill a project.  I hope that whoever wrote the following was not involved with OR!

(from the label of Tesco's everyday sparkling water:)
Tesco Everyday Value Water is drawn from the mains supply and undergoes a filtration process to remove impurities which improves the taste.  Bottled for convenience.

Let's look at the label again:

Tesco Everyday Value Water is drawn from the mains supply (is taken from a big tap)
and undergoes a filtration process (is filtered)
to remove impurities (tap water has very few impurities, by law)
which improves the taste.   (that is how we justify the price - we think it tastes better, and if we tell you it does, then you will buy it)
Bottled for convenience. (How else could it be distributed?  In tins!)

I used to tell my OR project students that before I read their reports, they had to read them aloud to a fellow student who was allowed to tear their mangled English to pieces.  Maybe Tesco label-writers should be told to do the same?

Twice as good - again?

After writing the blog about school examinations being sat twice in order both to gve students an increased chance of achieving a desired level of qualification before leaving school, and to try and boost the school's performance record, I have been given further suggestions about a school's strategy in such a case.

First, there is the constraint of cost, and also the potential benefit.  Entering a school pupil for an examination costs the school money.  Schools are constrained in their finances, and might need some constructive accounting to justify a practice of recording N pupils in year 10, with M>N entries to the maths GCSE examination.  Therefore, the school has to estimate the value of an extra GCSE pass which might enhance its reputation (which then can lead to financial reward for one or more academic years) and relate that to the cost of entering a student for an extra examination.  This looks like a messy problem (see here).  Input the current cohort of students, with estimates of each one's probability of success.  Given constants are the current reputation and the financial model that relates rewards and reputation.  Output - a probability distribution for the future rewards for various scenarios of entering the current cohort into examinations.

Second, obtaining a "C" grade pass at GCSE is not the only statistic that the school may be interested in.  What about the high-flyers?  There is kudos for pupil and school in obtaining a starred "A" grade, so why not enter the high-flyers twice?  I have heard of parents who pay for their very able child to take an exam twice in this sort of case ... but might it not be in the school's interest as well?

Finally, some students in my experience came to university with the ingrained idea that they could take examinations in series until they were satisfied with the result.  And some of them actually asked their tutors, after the result of final examinations, whether they could retake the examinations because they wanted to get a better degree classification.  No, the tutors said, final means final.

Friday, 2 August 2013

Twice as good?

Headline from the Independent newspaper, dated 1st August 2013: "Schools ask pupils to sit maths exams twice to boost their league table scores".

The story which follows explains that several schools are making their pupils sit the GCSE maths examination twice in the same term, and then recording the better result for the school performance league tables.  It also applied to english exams.  School league tables record the percentage of students who achieve a grade "C" or better in maths and english, and such results are critical for the progress of those students.  (GCSE exams are at age 16, year 10)

For years, schools realised that students needed such grades for their careers, and so students would sit the examinations until they passed, in June, in December, in June, ad infinitum.  Those who wanted to teach, for instance, needed these grades.  As electricians might say, they sat the exams in series.  The headline was highlighting the practice of sitting exams in parallel, albeit with different examination boards, and therefore different examiners.  And this, not simply for the benefit of the pupils, but also for the school's reputation. 

It doesn't take a genius to work out that if a pupil has probability p of passing at grade "C" or better, and q=(1-p) of failing, then - assuming independence, justified by the different examiners - taking the exam twice in parallel means probability 1-q^2 of passing at least once, and inevitably, 1-q^2 > p.

Why stop at two exams in parallel?  Take the exam with n examiners, and the probability rises to 1-q^n.  It is a constrained optimisation problem - constrained by time and psychological pressure on the pupils.  Presumably, most schools are split between n=1 and n=2, but I wonder if somewhere, there is a school where n=3?

If you are reading this, you are probably numerate, and for you, p is very close to 1, and your school wouldn't have needed to enter you more than once.