Always have an IT backup plan

Andrew Zolnai
Zolnai.ca
Published in
5 min readMay 30, 2017
http://www.bbc.co.uk/news/av/uk-40071569/ba-delays-absolute-chaos-at-heathrow

A recent story in The Conversation “BA meltdown: crisis researcher caught in the chaos reports on a massive airline failure” and the news (above) surrounding it here in England reminded me of a personal story of managing an IT system with the Canadian Ski Patrol System (CSPS), at their Calgary ski sale 30 years ago. The bottom line is: while we introduced a new computer system, we had a manual backup system should something go awry… which it did! And said manual system helped us keep operating until the computer network was restored. But it was the insistence, unpopular at the time, that we keep the then current paper system at hand that saved the day.

This is by no means unique, as for example the London Ambulance service computer glitch in 1992 was a textbook case of failed transition from manual to computerised system. Now far be it for me to suggested that the BA system, or NHS’s in the news recently with the WannaCrypt attack are simple and a paper backup may have worked as such. But my story is a very simple one — a point-system installation, implementation, crisis-response and tear-down for a one weekend ski sale — and it underscores the need for fail-over systems.

https://en.wikipedia.org/wiki/Windows_1.0

These were early days in IT systems: Windows 1.0 had just been released the year before, most critical systems still ran on DOS or Unix desktops and mainframe or Unix networks, and Novell Netware was king… and how many remember life before the Internet? It was still yet three years away!

The CSPS ski sale was not only a means for the public to sell their old skis and buy used ones anew, it was also an equipment clinic making sure that only skis with properly working bindings were then resold — if I recall correctly, about ¼ of ski injuries then were from bindings that didn’t release — this was set up in one of the halls on the Calgary Stampede grounds.

But we thought it would be great to set up a computerised inventory-in / cash-out system for this, to replace the manual system in place for 20 years already. We were four volunteers: an IT whizz-kid, an accountant with a good head for cash systems then with Grant Thornton, an old hand who managed ski sales prior, and myself freelancing. We basically borrowed ½ doz. PCs, a Novell network, four printers and a UPS (uninterruptible power supply). They were laid out along a long set of tables in a very large hall: It divided the public coming in one end and exiting the other end, from the equipment testing and inventory system across from the dividing tables in one half of the hall. The other half of the hall was where equipment was displayed for the public to examine and hopefully purchase— needless to say a host of volunteers ran this and only the CSPS could provide these en-masse — it was by the way a harbinger of things to come for the 1988 Calgary Winter Olympics, known for their sterling volunteer effort, vastly increased starting with Sarajevo prior.

Two PCs were for inventory in on one end, where we recorded incoming equipment. It had a check system, where we only put through to the POS (point-of-sale) system the equipment that was tested as safe for resale. The network then passed on the resulting inventory to 4 PCs at the other end of the long line of tables. These had printers and cash system to round it out.

Said PCs had a simple database with items, price and sold/unsold, which matched the paper trail used in the previous years. The accountant, bless her heart, absolutely insisted not only that the database be simple, but also that it match and follow a parallel paper trail — again doable only because the CSPS could double up on volunteers to do so — it took us a couple of months to create, set up and dry-run the system, made easier by the fact we had lots of help from the old paper-trail system volunteers, who typically did dry-runs in the fall prior to the pre-Xmas sale. Our whizz-kid also had a very large basement where we could set up the 6 PC network for him to compose the simple inventory-in / cash-out software and bare-bones network system.

http://cspscalgary.ca/2014-calgary-new-used-ski-sale/

Show time! The Sale had a pretty simple format then: the public brought in their equipment on evenings the week before, for equipment to be inventoried, tested and sorted for return or resale. The weekend was the sale itself, when people examined and hopefully took away their purchases. The following weekday evenings were for pick-up of items rejected from resale.

So everything hummed along through Saturday. But Sunday morning disaster struck. Something happened overnight in the hall that cut the power supply long enough even for the UPS not to be able to keep the system going... The database got corrupted and we simply had a non-functional POS system!

This is where our accountant became an instant hero, because not only had we a parallel paper trail system, but we also had a host of volunteers on call, should that system need to be put in place. Also the very long row of tables, allowed the manual system to be carried out in the middle section, while three of us feverishly worked to get the POS working again at both ends.

Fortunately we got it working by 10 AM, were able to resync it with the paper trail by noon, and the POS system worked Sunday afternoon, whew! And that was critical as that afternoon is where over half the sales occur, when people procrastinate and make last minute decisions. The joke then was: thank God, literally, for Sunday morning church… it reduced traffic and helped us recover!

Lessons learned: as I said, far be it from me to suggest this case study covers all complexities, but its simplicity allows to draw out these salient points:

  • KISS: keep it simple simple and fit-for-purpose, this was an inventory-in / cash-out system with a simple database that mirrored the old paper-trail
  • Listen: accept the advice from those not as gung-ho on IT systems, as they may insist on not-so-popular procedures that saved the day in this case
  • Back up, back up & back up: a) a UPS system helped with the power failure, b) an on-going tape backup system worked long enough after power failure to allow the database to be restored to its pre-incident state, and c) a backup manual procedure allowed to continue the sale unabated
  • Resourcing: none of this is possible without the appropriate level of technical and manpower support, volunteers in this case, to carry it out

--

--