Wednesday, 31 July 2013

Messy problems and messy data

There are times when those of us involved in operational research give the impression that the problems that we deal with are neat and tidy.  This problem is a queueing problem, that one needs an integer programming model, the third one requires an off-the-shelf forecasting model.  Or we can build a simulation model, or arrange a workshop using problem-structuring techniques.  Life is not that simple.  The better text-books used by students admit that their examples and exercises are simplifications of the real world, and some go on to explain the messiness of practical operational research.  I have even seen a few illustrations which resemble dust storms to drive the lesson home.  Yes, real world problems are generally messy.

The latest issue of ORMS Today, the journal of INFORMS (US OR Society) [cover date June 2013], has Anne Robinson, INFORMS president, writing about one source of messiness in our work - the data we use.  She writes, "Data is ugly ... really, really ugly".  (I will forgive her using data as a singular noun; pedants have lost that particular battle.)   And she continues to set out five ways to measure the quality of data, using five "C"s.  They are: Completeness, Correctness, Consistency, Currency, Collaborativeness.

(I have used nouns, where Anne used a mix of nouns and adjectives.)  So she suggests that we spend time before model-building to run through some checks on the data to see of the numbers and descriptors meet these standards.  Sometimes such checks lead to the need to edit the data, carefully, and making assumptions that can be justified afterwards.

Educators in OR, and analytics and statistics, need to do as those better text-books do.  Describe methods for analysing high quality data and then lead into what happens when the data fail in one respect or more.  A few years ago, I was working with a student who had successfully completed a postgraduate module in multivariate statistical analysis.  The project she had involved 800,000 records, each of which had a dozen items of data.  I asked her about the largest data-set she had met in that module; it comprised 200 records.  The first stage of her project was given over to exploring the transition from that number of records to one 4000 times larger, learning how to check the data for Completeness, Correctness and Consistency.  Currency and Collaboration were - for the most part - less significant.  I hope that the lessons learned helped her when she joined a large international OR group, where she would be working with millions of records.

And I believe that one of the joys of operational research is the challenge of messy problems and messy data.




Friday, 19 July 2013

The fish in the sea, all that swim the paths of the seas.

I hadn't really thought about the word "paths" in Psalm 8, verse 8, before I read "The Old Ways" by Robert MacFarlane.  This book has received several very good reviews in the UK, and when I borrowed it from the library, it came with the proviso that I could not renew it.  [Here's an OR solution to the library problem of what to do about loaning popular items - just like students can borrow material on overnight loan.  So I must:
(1) finish the book within 21 days; (2) return it unfinished and hope that I can borrow it again soon; (3) keep it out and pay the overdue fine.  I intend to follow option (1)]

In the book, the author describes some of the old pathways that criss-cross the world, from the times when everyone who travelled, travelled on foot or on the back of an animal (donkey, horse, camel, elephant).  The paths that people followed were not designed by engineers, they were pathways that satisfied numerous criteria about ease of following, shelter in bad weather, a destination and places to rest on the way.  There are many such prehistoric paths in Devon, especially on Dartmoor.  He mentions one on the moor which is marked with white stones to help travellers to avoid the numerous bogs and mires - think Grimpen Mire (below) in The Hound of the Baskervilles.  W G Hoskins, the expert on interpreting local history in the UK, who came from Devon, mentions a medieval road that was used by farmers who drove their cattle to pasture on the high ground of Cosdon Hill (a corruption of "Cows" down) in the summer.  I drove along it last month - it follows a straight line with the hill visible on the skyline, but in places it deviates from being straight to cross streams where the ground was firmer than in the marsh surrounding the springs which fed the stream. 


An atmospheric image of the fictitious Grimpen Mire



Robert MacFarlane also discourses about some of the paths of the sea that have existed for millennia.  In the book, he follows some of them in an open boat.  He comments that travelling by water with a cargo was much more efficient than transport with pack animals until the Industrial Revolution offered canals, railways and well-made roads.  And to illustrate his point, he asks the reader to imagine what a map of Europe would look like if the land areas were treated as impenetrable, and the sea and rivers were to be criss-crossed with "The paths of the sea".  In an earlier blog, I mentioned the silver mines of Bere Ferrers - nearly all the mined material was moved by water, down the River Tamar, especially in the later years when the mines produced copper and tin and other minerals. 

However, there were exceptions to the general rule "everything goes by water".  Some of the paths of the seas converge on bays (not really ports or harbours) where goods would be transferred to pack animals (or humans) to be moved to another bay where the cargo would be reloaded into boats.  Why?  because prehistoric man knew that there were places where the risk of sinking in rough seas was so great that it was better to avoid such places.  It is the OR solution to a special path problem - to find the path from A to B which has the greatest chance of being successful given the transport available.  In Cornwall, one such place of risk was Land's End, and so the sea routes converge on St Michael's Mount in the south and Hayle estuary and St Ives on the north coast.  And a well-used path linked the two.  Other rocky headlands were treated in the same way.

John Norman, my mentor for the OR technique of Dynamic Programming, used to insist that we defined  clearly what we meant by "optimal" when we derived DP recurrences.  Is optimal: shortest?  fastest? safest? cheapest? or what?

Sunday, 7 July 2013

The problem with cost-benefit analysis

OR people and economists often use cost-benefit analysis (CBA) to help advise decision-makers (DMs).  And in many cases it is a useful tool.  When the DM has a number of choices, and the outcomes are multi-dimensional and include some qualitative results, it is often helpful to try and assign costs to try and reduce the number of dimensions that the DM needs to look at.

So, at the outset, I give CBA thumbs up, some of the time.  But not all the time.

And the problem with CBA is in those three words, cost, benefit & analysis.

Measuring the cost of a decision and its outcome is a process that is fraught with pitfalls.  From time to time, the case of Stewkley church is mentioned.  When a new London airport was being considered, one site that was seriously considered was near Stewkley.  The church there would have to be demolished to make way for the proposed runways.  The study for the government valued the church at the insurance value, as it was insured against total destruction by fire.  But the church dated from about 1180, with the site having been used for at least 100 years before that.  It was an irreplaceable piece of architecture, a part of the local landscape.  Someone suggested, seriously, but with a mischievous glint in his eye, that the proper value should be the construction cost, say £100, extrapolated to the present day, roughly 800 years, at 10% per year.  That puts a value of £1.3e+35 on the building.  One costing makes the church of negligible value compared with the cost of the airport, the other makes the airport of negligible value compared with the church.  Although this is an extreme example, it is very hard to put a cost on anything which is rare or unique, or which has aesthetic value.  And, sometimes proponents of CBA overlook their philosophy that "one measures what is convenient to measure".

What about "benefit"?  All OR models and economic models are simplifications of reality.  So the benefits that are shown by the models are only a simplification of what would happen as a result of the decisions.  I have mentioned in earlier blogs that there is a problem in defining the "system" when considering the effect of a decision, or the scope of a model.  As a result, the benefits in CBA are often limited to the ones that the modeller wishes to include, and the resulting scenario is poor in detail.  Society and the technology we use is changing rapidly, so talking of future benefits may simply extrapolate the benefits that are currently experienced.

And "analysis"?  Well. CBA is performed for clients.  Clients have their prejudices, and - shame! - sometimes the process of CBA can be biased towards those prejudices.

When I was a teenager, my father and I discussed replacing an old camera with a new, better one.  I would have needed an advance on my pocket money, I think.  So, I set out the cost of taking pictures with the camera that I then used, compared with the cost of taking the same number using a new one, and I manipulated the figures in my favour by showing that after about 150 pictures (on Kodachrome colour film) I would be better off with the new one.  And so I got the camera.  Had I known today's collector's value on the old camera, I should have kept it as well.  But I recall that experience for demonstrating how one could model decisions, and possibly bias the results.  My dad knew that I was manipulating the data but he went along with it, because he knew that I would need a better camera sooner or later.

Why mention CBA now?  Two events have prompted me to think about it.  The first has been the death of Doug Engelbart who was best known for his work in human and computer interaction, leading to the invention of the computer mouse.  Had one been writing about the future of computers 40 years ago, and performing CBA on decisions involving computer installations, the proliferation of personal computers, in large part due to computer mice, would not have been foreseen.  Teletype terminals - yes - personal computers - no.  Teletypes would have extrapolated the present into the future.  And the second is the decision by the UK government to take one benefit out of their model for a new railway line, the HS2.  Up  to the past week, one of the benefits that has been costed has been the value to workers of time saved by having a faster train,  Now that benefit has been removed, because so many people work while on a train, using laptops, tablets and phones.  Saving 20 to 30 minutes on a journey does not change their productivity by as much as had been calculated earlier.  The definition of benefits has changed.