Last changed 19 Sep 1999 ............... Length about 1,000 words (9,000 bytes).
This is a WWW document maintained by Steve Draper, installed at http://www.psy.gla.ac.uk/~steve/TandLresearch.html. You may copy it. How to refer to it.

Web site logical path: [www.psy.gla.ac.uk] [~steve] [this page]

L&T research: strategic considerations

Contents (click to jump to a section)

Preface
The system to be studied
Applied research
Surprises: Diagnosis vs. hypothesis testing

by
Stephen W. Draper

Preface

These are some notes on strategies or approaches to research in the field I will call L&T (learning and teaching). They arise from the workshop on 9-12 Sept. 1999 at Nottingham held by CREDIT.

Here are points I failed to express, or express clearly, in the workshop.

The system to be studied

An important consideration in reviewing the overall scope and strategy for L&T research (one that failed to get proper emphasis in the workshop) is that of what the system we should be studying is. Programmers use "system" as slang for the bit of code they are designing, even though it cannot in any way function alone and its properties almost always strongly depend on other code such as the operating system and user interface software environment in which it must run, not to speak of the hardware and network environment. The lesson of HCI is that that is not the place to draw the system boundary if you have any interest in user interaction: the system studied and designed must include both machine and user. In the 1990s it became widely recognised in HCI and in much of computer science that this in turn is not enough, and the social context of the work place and the human-human interactions around the software often have decisive effects on the outcome: on the work done, and the success or failure of the users' goals. In fact, although this is not often remarked, education has now become in many ways a typical and mainstream computer application domain, as HCI and computer science have developed their grasp of the relationship of a domain to design methods.

But the question of what exactly the relevant system (or "activity" for devotees of activity theory) is, is something that should be questioned and studied. What is the scope of the main interactions and influences affecting the use of educational computing? Not the isolated learner (unless you believe, contrary to copious experimental evidence, that teachers have little effect). If, next, you think teacher, learner and software form an approximately closed system, then you cannot explain a lot of teacher behaviour and its effect on their actions on learners. If however you expand this, then you can address teacher actions in producing reports on each learner, delivering required tests, and so on. Clearly software that supports this is much more helpful to teachers than software that doesn't, and may free up their time to support learners better, and so lead to better learning outcomes, to say nothing of attracting teachers' choice in purchase decisions. The software demonstrated at the workshop showed such attention to the users and their actual work context, in contrast to many of the criticisms voiced which seemed to ignore that in favour of the narrower interests of the critics, and so run the danger of a reversion to software designed around the functions thought desirable by the programmers rather than designed and iteratively modified to be integrated into an actual work situation.

Is this wider view enough? Both consensus public opinion and current required practice say no. Learning is thought to be strongly affected by the home as well as the school; and indeed studies have shown this. The system that determines learning, then, must be expanded again to include the home. Obviously software is increasingly likely to bridge this gap too. Whether we understand how it does so, is much less clear. Parents, for instance, may influence children's learning less through technical help than through providing a motivational atmosphere that favours or militates against it. Uncertainty about the main role and effects of such major parts of the overall L&T system obviously suggest that research is required.

I have sketched this argument in terms of finding a definition, and then a description, of the system. This is a basic requirement of any thorough engineering design approach. A more specifically educational and theoretical approach to essentially the same issues may be found in the socio-cultural approach. Approaches that take a narrower view are likely both to be bewildered by the actual observed behaviour of teachers, children, parents, and politicians in ignoring their designs (which will fail to address most of the real requirements of the situation), and to prove as lacking in explanatory power for non-lab. situations as lab. studies of rat learning have proved to be.

Applied research

One way of describing the difference between science and engineering is to say that science aims at finding laws that describe a single relationship that is universally true (across all times, places, and contexts), while engineering aims to control all the relationships in a single context (a small part of time and space). Thus the law of gravity aims to be true everywhere and always, although of course there are contexts where, while true, it is unimportant (e.g. in the interactions inside an atomic nucleus, or even in the life of bacteria where viscosity and brownian motion are much greater than gravity). To build something, however, you have to deal at once with all the influences affecting causality in that situation, although you can ignore those that, while present, have a negligible effect compared to the others. To put it another way, for science generality is the first consideration, and effect size is secondary; while for engineering, effect size is primary, generality (while nice) is secondary.

This is why experiments with scientific aims frequently try to eliminate causal factors, often without understanding or even identifying them. Random assignment to conditions, and holding procedures constant across conditions are examples of techniques for this: for control without understanding.

To justify the utility of scientific research, a development sequence is often referred to: first theory, then a lab. test of it, then a demo of an artifact based on it, then development, and so on to full practical deployment. The argument is that a theory established now may one day, after a lot more work, make a contribution to workable designs in one or more particular situations of practical importance. A complementary agenda is to promote applied research that is defined by a problem not a theory i.e. a proposed solution. Such research necessarily must identify the main factors operating in the target situation, and investigate ways of controlling them. It is aimed at producing (and demonstrating the effectiveness of) a useful design now; but it may be argued that in the long term it will lead to important theories by identifying the causal factors at work. I.e. localised practical benefit first, theory as delayed spinoff; as opposed to generalised theory now, practical benefit in particular situations as a delayed spinoff.

This is a general argument for a twin strategy of pure and applied, scientific and engineering, research in any area; and there are probably as many historical examples of practical problems leading eventually to new theory as there are of new theory leading to practically valuable applications. (Consider Pasteur as a role model, rather than Einstein or Darwin. Consider Florey rather than Fleming as the key actor in bringing us penicillin. Or consider Florence Nightingale: someone with the wrong theory of disease who nevertheless had an enormous effect on improving mortality rates through improving healthcare, by an essentially scientific approach of changing practice while reporting statistics on the mortality rates before and after the change.) Thus even if we take theory as the final goal of research, an equal balance of pure and applied projects is likely to be the best means (given that the pure projects will indicate their eventual relevance to socially important problems, and the applied problems indicate how they will raise theoretical challenges.)

However the case is even more compelling in education, because human learning is, many now think, intrinsically situational. So an ultra-pure scientific approach of studying isolated factors (or isolated learners) is not appropriate or fruitful in any case, as also follows from the discussion above of establishing the "system" to study. A less agreeable aspect is the fact that in education most "theories" are not in fact tested beyond a single case: not explored to establish whether they really do apply in all times, places, cultures, age groups, situations as a genuinely scientific orientation would require. In fact supposedly applied projects may in practice be better tests of theory than the supposedly theoretical ones, as they often show up limits to the theories better than the original experimental work.

Another aspect of this concerns research that looks at a single factor versus all the significant factors in a given situation. Applied projects must do the latter, and in so doing they point up the paucity of theoretical work in education that addresses the relationships between factors and so between theories. Outside the field the recurring spontaneous conception of a good research question is "do computers improve learning?". Inside the field, most think that's a bad question to which the answer is "it all depends e.g. on the content, the way it's used, ...". But is research that is designed around a single theory of a single factor all that much better? For learning outcomes "it all depends" on a lot more than a single factor. Applied projects force attention to this.

Surprises: Diagnosis vs. hypothesis testing

While hypothesis testing is the main approach to theory-driven research, diagnosis is the main approach in applied work. Instead of ignoring and excluding factors not mentioned in the prior hypothesis, the approach is to accept the situation as it actually is and try to find the factors controlling it, and identify which is the most important one (is having the biggest effect). Instead of applying the same measures to all subjects, diagnosis (clinical interviews) uses a sequence of measures each contingent on the results of the previous ones. A prior hypothesis in diagnosis is a bias often leading to bad conclusions, as in the systematic under-diagnosis of osteoporosis in men and heart problems in women because statistically they are more frequent in the other sex. Diagnosis requires a focus on and respect for data, as opposed to one for theory. This is all the more so as there is no reason to think we have all the theory we need. Thus diagnosis is not the easy task it is in closed or toy worlds, where you enumerate the possible causes and calculate an optimal tree of tests to identify which one it is this time. Whether in medicine, HCI, or education, diagnosis must be open to the real possibility that the problem is one not seen or understood before. That is how progress is made in applied fields. The only general approach, though it cannot be translated into a fixed procedure, is to lay oneself open as much as possible to surprises, and to clues independent of one's prior expectations.

It is interesting to re-view the literature for the techniques the best practitioners have exposed as ways to be surprised; and one can estimate how productive a piece of research is likely to be depending on how well they adopt these. Most, for instance, have learned that simply issuing a questionnaire or taking fixed measurements is unwisely economical, and that interviewing the learners and observing their actual behaviour yields far more information, useful if for nothing else in trying to explain the numbers obtained. We have learned that in peer interaction, we need to take both group measures (e.g. the joint task output or agreed result) and also individual measures because they are usually not the same. We have learned from the work of Christine Howe and others that we should do both immediate and delayed post-tests because again they vary, and in unpredictable ways (sometimes you get decay, sometimes enhancement of the learning). The literature on science education tells us that measures of knowledge based on behaviour (e.g. catching a ball), prediction (e.g. what trajectory will it follow), and explanation are often dissociated: so in fact studies that use only one kind of measure or not really measures of "knowledge" at all but only of one unreliable test task. In none of these cases do we understand the relationships well enough to predict one measure from another, and so cut down on our data gathering. But they are increasing our understanding of the factors underlying learning situations, and they are increasing our knowledge of techniques to use in both theory-driven and applied research.

Web site logical path: [www.psy.gla.ac.uk] [~steve] [this page]
[Top of this page]