Last changed
15 March 1999 ............... Length about 1,400 words (10,000 bytes).
This is a WWW document by
Steve Draper,
installed at http://www.psy.gla.ac.uk/~steve/talks/mill.html.
You may copy it.
How to refer to it.
Web site logical path:
[www.psy.gla.ac.uk]
[~steve]
[Lists of seminars]
[talks by me]
[this page]
What is the significance of the Millennium bug for computer science?
(N.B. Another, related, seminar will be held in psychology.)
Organisers: Steve Draper; Ray Welland
Time: 4-5pm, Friday, 12 March 1999 in the
CAKES talk series.
Room F161, Computing Science dept. (17 Lilybank Gardens).
Abstract
- Do we need to do anything much about the systems/equipment in our charge?
- Need we worry about systems/equipment that is not in our charge?
Is society / some important infrastructure services going to collapse?
[A millennium catastrophe]
- What does it all mean for computer science?
- Does it show that the current computer science curriculum is out
of date, since the skills recommended for addressing the millennium bug are
completely missing from it?
Is it a case of Y2K bug or a true millennial catastrophe?
A technical problem offering an employment boom for some,
or the end of society and life as we know it?
(How appropriate that in today's society, the fears expressed by millennial
movements are given a technical rather than religious form.)
Or is it not a technical problem, but a management one that computer scientists
are untrained to deal with, thus suggesting that the present DCS
curriculum is hopelessly out of date?
Plan
The session will be 60 (not 30, not 50) minutes long.
Basic plan is to have a string of speakers mainly limited to 5 mins. each.
Speaker list and provisional times at the end.
1. Do we need to do anything much about the systems/equipment in our charge?
Speaker? David Fildes (university officer coordinating year2000 problems).
The problem
Sketch out the existence, size, impact.
Estimated cost and scale [David Fildes?]
Employment generated.
The scale of the problem in money already being spent and likely to be lost
means this is one of the biggest features of current computing practice.
Anyone who wants their students to be employed cannot regard it as
unimportant.
Approaches to fixes
Another is the quite different possible approaches to tackling it.
- One is to design the software better (and replace all existing software?).
- Fix the software: use the analysis tools now appearing to identify all the
places in existing code that need to be checked.
- Rely on testing (abandon reliance on understanding the code) i.e. reset
clocks now, and try to respond to any resulting crashes.
- (The predominant advice in the university) View this not as a design or
technical problem, but as one of management: contingency planning ...
Official strategy
The university's response [Fildes]
DCS response /plan [Welland]
University of Glasgow newsletter, (206)
"The start of the year 2000 is to be marked by a four day close
down of the computing system and networks with the exception of
the main server into the University and one system in Computing
Science which may be studying how computing systems cope with
the start of the year 2000. Power and heating systems would not
be shut down and the director of Estates & Buildings reported that
he was considering bringing extra staff in over the period to deal
with any difficulties should they arise."
2. Need we worry about systems/equipment that is not in our charge?
Is society / some important infrastructure services going to collapse?
The argument is, that there will certainly be some problems, and some of these
may cause major services (electricity, phones, water, ... food supply
to supermarkets, the banking system, ...) to fail. Arguments for this are:
- Too little work too late.
- Nature of major accidents in the past is that when a whole lot of
improbable causes all occur together, you get a major diasaster. So there
will be a large number of problems, almost all of which will fail to spread
and trigger other problems, but in a few unforseeable cases, a disastrous
cascade will occur.
- Some disasters will occur not because of technical failures, but because a
significant minority of people believe they may. The banking system has
always depended on confidence: if people all want their money back at once,
it collapses independent of any technology. Food supplies will fail, if many
people start hoarding to be "safe"; particularly given the JIT approach now
developed, but not yet tested by a severe perturbation.
Matthew Chalmers, who cannot participate on this date, says that rumours
around big banks indicate:
- The more closely someone is working on the technical programming problems,
the more they are stockpiling food and taking precautions against major
societal failures
- If a bank's software fails for 3 days, it will go bust. But that will
only be true if their's fails and others' doesn't. (So: At the moment they are
working to fix it, and possibly hoping that competitors will fail; but
perhaps nearer the time cartels will agree to suspend life, I mean business,
together...)
Speakers: Steffie Plotnikoff.
3. What does it all mean for computer science?
One interesting feature is the enormous uncertainty and absence of consensus
about it, not just among journalists, but among big computer companies and
academic experts. How can we be so ignorant about the behaviour of the
machines we design and depend on (and which in principle, and unlike all the
rest of the universe, are in principle totally deterministic)?
- It marks the coming of age of computer science. In all other design
disciplines, design is normally about fitting a new artifact into a
surrounding that cannot be changed, however difficult that makes it for the
new technology. Few town planners, for instance, are allowed to demolish or
ignore surrounding buildings; and railway engineering similarly is about
fitting in with and building on the technology of the past.
Computer science up to now, because so little
software already existed (and what did, had such a short average lifetime)
has been allowed to proceed with the fantasy that each new design was going to
exist independently of other software; or, equivalently, was going to be a
part of a system whose design was coordinated and known. The millennium bug
shows that in reality, "legacy code" is some of the most important software
in existence, and grownup design (as opposed to academic design exercises)
requires designing around existing code which not only is outside the
designer's control, but typically is no longer understood. That is, its
behaviour is not fully specified, and cannot be reasoned about using methods
that might work within a fully designed system.
- Thus defensive programming needs to become a fundamental technique.
- Similarly, all designs need to have recovery methods when situations occur
that are "impossible" and unforeseen. This approach is familiar for
"rebooting" (rebuilding a complete runtime code image), and often is there for
rebuilding software from components; but is often not there for rebuilding
data.
- Fundamentally, all this means that methods for reasoning about and
designing to accommodate uncertainty must move to the fore. This in fact is a
fundamentally anti-digital move.
The point of digital systems (whether cogwheels or logic gates) is to achieve
complete predictability. However this mode of calculation not only does not
apply to interactions of digital machines with humans, it is proving
incompetent to manage interacting complexes of digital machines in the face
of uncertainties from other sources.
Is technical understanding in fact not very useful in real computing
applications? If the effects, never mind the cure, of the millennium bug
cannot be predicted then surely there is no science whatsoever (and not much
engineering either) in "computer science"? Prediction is the test of science
(control is the test of engineering). Certainly there appears here to be a
stark limit to our understanding in practice, and this must have vital
implications to the kinds of understanding we should be seeking.
It could in fact be argued that this, rather than the defeat of the digital
revolution, is actually simply the unrecognised consequence of going digital.
Whereas in analogue codes, nearby values have similar meanings, in most
digital codes nearby values have no semantic relationship. This means that
digital systems go chaotic in the chaos-theory sense as soon as any errors
appear. 00 being next to 99 in modular but not normal arithmetic is simply an
elementary example of this. So are digital designs fundamentally unstable
engineering (in the sense that bicycle steering and aircraft wings
can be designed to be stable or unstable)?
4. Is the computer science curriculum totally out of date?
Does it show that the current computer science curriculum is out of
date, since the skills recommended for the millennium bug are completely
missing from it? The truly striking thing about the millennium bug is that
the best expert advice for dealing with it does not focus on programming
solutions, even though that is widely perceived as the original "cause".
Instead, they focus on risk management techniques. These are not on the
computer science curriculum. Computer "scientists" are therefore
professionally unqualified and incompetent to deal with the millennium bug: if
you want your organisation to survive it, don't hire someone with a computer
science degree: you need management skills they don't have.
In fact although currently people running IT support services may come from
programming and computer science, the skills they use in their jobs are quite
different from this: they don't program, and they do manage. The curriculum
is already severely out of line.
Speakers: IT industry person, Pete Bailey, Norman Davis?
The question: what skills do they use day to day in fact?
Thus the millennium bug is showing up the possibility that IT professionals now
and in the future need quite different skills, and programming may soon be as
marginal to those in the business as hardware design skills are now. Hardware
used to be on the curriculum 25 years ago, now it is in electrical engineering.
Soon either computer science will transfer algorithms and programming out to a
dept. of programming, or it will wither away, while new departments such as
"the Glasgow school of business information management systems" will become
dominant in both undergraduate education and the interesting research in
systems design.
Steve Draper. Introduction. [1 min.]
Do we need to do anything much about the systems/equipment in our charge?
David Fildes
(University Year 2000
officer). The technical problem. [5 min.]
Ray Welland. The DCS response. [1 min.]
Need we worry about systems/equipment that is not in our charge?
Marie-Odile Bes. How unexpected combinations of rare events cause
disasters. [5 mins.]
David Fildes. Your responsibilities for others' breakdowns. [5 mins.]
Steffie Plotnikoff
Why this is not just a technical
problem. [5 mins.]
What does it all mean for computer science?
Steve Draper. (For content, see above) [5 mins.]
The current computer science curriculum
Steve Draper. [5 mins.]
Norman Davis: the skills IT support REALLY uses [5 mins]
References
See also:
Web site logical path:
[www.psy.gla.ac.uk]
[~steve]
[Lists of seminars]
[talks by me]
[this page]
[Top of this page]