Last changed 7 Nov 2019 ............... Length about 16,000 words (109,000 bytes).
(Document started ≈ 2000.) This is a WWW document maintained by Steve Draper, installed at http://www.psy.gla.ac.uk/~steve/hawth.html. You may copy it. How to refer to it.

Web site logical path: [www.psy.gla.ac.uk] [~steve] [this page] [wikiP entry]

The Hawthorne, Pygmalion, Placebo and other effects of expectation: some notes

by
Stephen W. Draper,   Department of Psychology,   University of Glasgow.

Preface

This page is NOT up to date. It has had a few edits as recently as 2019; but most of it was assembled around 2006.

This began as my own notes; but over time I have taken stuff from others. Particularly important contributions from Morag Nimmo, Stephen Senn, the various workers on the wikiPedia entry on the Hawthorne effect, and Graeme Campbell.

Contents (click to jump to a section)

  1. Introduction
  2. The Hawthorne Effect
  3. Jastrow's effect of expectancy on punched card operators
  4. Rosenthal's Pygmalion effect of expectancy advantage
  5. Teacher effects in general
  6. The placebo effect: does it really exist?
  7. Ways of classifying and comparing such effects
  8. Research methods implications
  9. Acknowledgements
  10. References

Introduction: Issues in experimental design

This web page began as a note on the Hawthorne effect: often mentioned, not so easy to find a simple account of it. It now also has a significant revision with reviews of related effects on experiments from expectation and the experimenters: Pygmalion, placebo, and other effects. What they have in common is that performance or other significant objective effects come from (non-objective) causes of humans simply expecting something, or responding to the social rather than material situation of the experiment.

These are notes: a starting point for others that might be helpful, because I couldn't find an authority on this and had to put together some points for myself. I'm not an expert on this.

Most people want to address the point that sometimes the fact that researchers are studying some human participants makes the latter behave differently (thus undermining the experiment). This effect is sometimes called the Hawthorne effect, but only on one interpretation of the actual Hawthorne studies (see below). It is called a number of other things e.g. novelty effect, demand characteristics, etc.

While the Hawthorne studies, and the ensuing discussion of them, were one thing that draw attention to this broad area, I now think they have no special or clear place in how to think about the range of issues. Instead, I think the following are the most important categories of factors affecting experiments with human participants.

There are three very broad categories of factors which affect human behaviour (whether or not we want them to in a given study):

  1. Material factors e.g. lighting level, physiological changes, money offered us.
  2. Direct social effects e.g. legal obligations, wanting to please someone else. (These should not be confused with the many more cases where we derive information socially from others which changes our behaviour, but which was not given to us in order to manipulate us.)
  3. A large middle ground of cognitive issues, where behaviour is changed by what we want, what we do and don't know, what we guess (i.e. beliefs with some, but not certain, grounds). Within this probably the three most often problematic areas in experiments are:
    1. Participants' interpretation of the context they are in: how they understand what is said to them, and what task is being asked of them. Researchers frequently do not realise how each participant has understood the meaning of what they are asked to do.
    2. Expectancies: beliefs about the appropriateness and/or effect of doing something in one particular way. E.g. a simple instruction "walk across the room" is in fact massively under-specified: fast or slow? carefully or while talking to someone else? ...
    3. Learning effects. Humans learn all the time, whether they mean to or not. Scientific experiments have value only if they are repeatable: clearly this is very often in conflict with the fact of human learning, so that each person seldom does something exactly the same way twice. Within this, two things which stimulate unintended learning most often within experiments are:
      1. Feedback (learning can depend on the availability of feedback, i.e. on knowing the consequences of one's actions)
      2. Reflection (learning can depend on being prompted to reflect, to think about feedback and about memories of one's actions).

The Hawthorne Effect

No single fact, no single view about the Hawthorne Effect

The term "Hawthorne effect", coined by French (1953, p.101) in a chapter on field experiments in an edited book on social science research methods, refers back to a series of experiments on managing factory workers carried out around 1924-1933 in the Hawthorne works of the Western Electric Company in Chicago. However there is no one precise meaning for the term, since the results were puzzling to the original experimenters, and their interpretation continues to be sporadically debated. Generally, references to the Hawthorne effect all concern effects on an experiment's results of the awareness of participants that they are the subject of an intervention. However there are many different possible mechanisms, and all may be important in particular cases. What is not disputed is that there is an important issue here, and it is clear that there is a need for a term to refer to these issues: the term "Hawthorne effect" has often been re-appropriated for any issue in the general area. What is not understood is what the full range of issues is, and authors have often (re)defined the term solely in terms of the one aspect and interpretation that concerns them. An attempt to list some of the different mechanisms and effects is made below. Part of the variation in meaning comes from the different interpretations put on the original studies, part comes from the different disciplines concerned with studies of humans (e.g. management science, medicine, psychology, aircraft crash investigation), but underlying it all is the absence of a comprehensive catalogue of the ways in which human awareness sometimes affects the outcomes of experiments on human participants.

The issue of diverse meanings is partly exemplified in, but even then only from the one angle of organisational psychology:
Olson,R., Verley,J., Santos,L. & Salas,C. (1994) "What we teach students about the Hawthorne studies: A review of content within a sample of introductory I-O and OB textbooks" The Industrial-Organizational Psychologist vol.41 no.3 pp.23-39

Finding and referring to the Hawthorne effect in the literature

Note that "Hawthorne" is not the name of a researcher, but of the factory where the effect was first observed and described: the Hawthorne works of the Western Electric Company in Chicago.

One definition of the Hawthorne effect (out of a number) is: An experimental effect in the direction expected but not for the reason expected; i.e. a significant positive effect that turns out to have no causal basis in the theoretical motivation for the intervention, but is apparently due to the effect on the participants of knowing themselves to be studied in connection with the outcomes measured.

Parsons (1974) p.930 defined it as: "Generalizing from the particular situation at Hawthorne, I would define the Hawthorne Effect as the confounding that occurs if experimenters fail to realize how the consequences of subjects' performance affect what subjects do". (However this is just an effect of motivation and learning, and scarely needs a new term. The universal propensity of humans to learn is a constant threat in almost every experiment on people.)

    A short way to refer to the Hawthorne effect is:

  1. French,J.R.P. (1953) "Experiments in field settings" ch.3 pp.98-135 in Festinger,L., & Katz,D. Research methods in the behavioral sciences (New York: Holt, Rinehart & Winston). [This is the paper that coined the term "Hawthorne effect" and discusses it in the context of research methods.]

    or

  2. Mayo,E. (1933) The human problems of an industrial civilization (New York: MacMillan) ch.3. [This is the earliest publication about it, and the one influential in the human resource management field (as opposed to research methods).]

    or

  3. Roethlisberger,F.J. & Dickson,W.J. (1939) Management and the Worker (Cambridge, Mass.: Harvard University Press). [This is the first detailed account of the actual studies, as opposed to conclusions from them.]

    or perhaps

  4. Gillespie, Richard, (1991) Manufacturing knowledge: a history of the Hawthorne experiments (Cambridge : Cambridge University Press)

    The longer way is:

  5. The studies were done 1924-1933 (although the phrase "Hawthorne effect" only appeared in 1953). Roethlisberger & Dickson (1939) give a great amount of detail, and little interpretation. Mayo (1933) gives a shorter account, and additionally the interpretation which has been so influential in the management field: essentially, that it was feeling they were being closely attended to which was the cause of the improvements in performance. French (1953) coined the term, and is probably responsible for seeing it as a general issue in experimental methodology.

What was the original Hawthorne effect?

Basically, a series of studies on the productivity of some factory workers manipulated various conditions (pay, light levels, rest breaks etc.), but each change resulted on average over time in productivity rising, including eventually a return to the original conditions. This was true of each of the individual workers as well as of the group mean.

Clearly the variables the experimenters manipulated were not the only nor the dominant causes of productivity. One interpretation, mainly due to Mayo, was that the important effect here was the feeling of being studied: it is this which is now often being referred to as "the Hawthorne effect".

More detail

1924-1927 there were 2.5 years of illumination level experiments. In 1927 four studies began on selected small groups. In 1932 a questionnaire and interview study of 20,000? employees.

Illumination studies pp.14-18 (part of ch.1) of Roethlisberger & Dickson (1939)
Study 1a-d. a-c were experiments on whole departments.
1a) No control group, experimental groups in 3 different departments. All showed an increase of productivity (from an initial base period), that didn't decrease even with illumination decreases.
1b) 2 groups. The control group got stable illumination; the other got a sequence of increasing levels. Got a substantial rise in production in both, but no difference between the groups.
1c) Experimental and control groups. Experimental group got a sequence of decreasing light levels. Both groups steadily increased production, until finally the light in the experimental group was so low they protested and production fell off.
1d) 2 girls only. Their production stayed constant under widely varying light levels, but they said they preferred the light (1) if experimenter said bright was good, then the brighter they believed it to be the more they liked it; (2) then ditto when he said dimmer was good. And if they were deceived about a change, they said they preferred it i.e. it was their belief about the light level not the actual light level, and what they thought the experimenter expected to be good, not what was materially good.

Study 2: the relay assembly experiments (2a,b) on a group of 1+5 female operators.
2a Rest pauses and hours of work (in a separate room). Small group piecework the only expt. var.
2b About a piecework payment system (on a separate bench, but normal room).
2c Mica splitting test room. Like 2a: separate room, but already and constantly on piecework rates.
2d Bank wiring: pure observation of a 14 man team. Group piecework. Could always easily see their own rate.

Study 2a: a group of 6 experienced female workers segregated; 1 serving, 5 assembling telephone relays: a 1 min. task in good conditions. Output carefully measured. 5 year study. Output (time for every relay produced) was secretly measured for 2 weeks before moving them to the experimental room. Then 5 weeks of measures; then manipulations of pay rules (group piecework for the 5 person group); then 2 5 min. breaks (after a discussion with them on the best length of time); then 2 10 min. breaks (not their preference) again produced improvement; then 6 5 min. rests (dislike, reduced output); then (free?) food in the breaks; shortened the day by 30 mins (output up); shortened it more (output per hour up, but overall down); return to earlier condition (output peaked); etc. etc. Attitudes as well as behaviour and output were measured.

Parsons (1974) argues that in 2a,2d they had feedback on their work rates; but in 2b they didn't. He argues that in the studies 2a-d, there is at least some evidence that the following factors were potent:

  1. Rest periods
  2. Learning, given feedback i.e. skill acquisition
  3. Piecework pay where an individual does get more pay for more work, without counter-pressures (e.g. believing that management will just lower pay rates).

He (re)defines "the Hawthorne effect as the confounding that occurs if experimenters fail to realize how the conseqences of subjects' performance affect what subjects do" [i.e. learning effects, both permanent skill improvement and feedback-enabled adjustments to suit current goals]. So he is saying it is not attention or warm regard from experimenters, but either a) actual change in rewards b) change in provision of feedback on performance. His key argument is that in 2a the "girls" had access to the counters of their work rate, which they didn't previously know at all well. (To see how feedback can be crucial, think bio-feedback, where people can learn to control normally unconscious physiological states of their own body when but only when they are given direct feedback on it.)

It is notable however that he refuses to analyse the illumination experiments, which don't fit his analysis, on the grounds that they haven't been properly published and so he can't get at details, whereas he had extensive personal communication with Roethlisberger & Dickson.

Possibly a longitudinal learning effect. But Mayo says it is to do with the fact that the workers felt better in the situation, because of the sympathy and interest of the observers. He does say that this experiment is about testing overall effect, not testing factors separately. He also discusses it not really as an experimenter effect but as a management effect: how management can make workers perform differently because they feel differently. A lot to do with feeling free, not feeling supervised but more in control as a group. The experimental manipulations were important in convincing the workers to feel this way: that conditions were really different. The experiment was repeated with similar effects on mica splitting workers.

Franke & Kaul (1978) offered yet another interpretation for the management psychology field, and argued it better in Franke (1980). This is argued and summarised here.

When we refer to "the Hawthorne effect" we are pretty much referring to Mayo's interpretation in terms of workers' perceptions, but the data show strikingly continuous improvement. It seems quite a different interpretation might be possible: learning, expertise, reflection — all processes independent of the experimental intervention? However the usual Mayo interpretation is certainly a real possible issue in designing studies in education and other areas, regardless of the truth of the original Hawthorne study.

Recently the issue of "implicit social cognition" i.e. how much weight we actually give to what is implied by others' behaviour towards us (as opposed to what they say e.g. flattery) has been discussed: this must be an element here too.

Clark & Sugrue (1991, p.333) in a review of educational research say that uncontrolled novelty effects cause on average 30% of a standard deviation (SD) rise (i.e. 50%-63% score rise), which decays to small level after 8 weeks. In more detail: 50% of a SD for up to 4 weeks; 30% of SD for 5-8 weeks; and 20% of SD for > 8 weeks, (which is < 1% of the variance).

Can we trust the research?


Candice Gleim says:
Broad experimental effects and their classifications can be found in Campbell, D. T., & Stanley, J. C. (1966). Experimental and quasi-experimental designs for research. Chicago: Rand McNally. and Cook, T.D., & Campbell, D.T. (1979), Quasi-Experimentation: Design and Analysis Issues. Houghton Mifflin Co.

A summary is provided at http://www.edpsycinteractive.org/topics/intro/research.html


Michael L. Kamil says:
You might want to be a bit careful about the scientific basis for the Hawthorne effect. Lee Ross has brought the concept into some question. There was a popular news story in the New York Times in 1998: http://www.nytimes.com/library/review/120698science-myths-review.html.


David Carter-Tod says of the same newspaper piece:
Interestingly in the process of doing a quick search on this I came across the following quote:
A psychology professor at the University of Michigan, Dr. Richard Nisbett, calls the Hawthorne effect 'a glorified anecdote.' 'Once you've got the anecdote,' he said, 'you can throw away the data.'" A dismissive comment which back-handedly tells you something about the power of anecdote and narrative. There is however, no doubt that there is a Hawthorne effect in education particularly.

Some references to it:
http://dhp.com/~laflemm/hmco/Ch7quiz2.htm


Don Smith says:
I recall studying the Hawthorne Effect as an undergraduate for a management degree years ago. At that time the message was that if a group knew they were being studied the results may be biased.

However, I found Harry Braverman's comments in his book "Labor and Monopoly Capital" more interesting. According to Braverman, the Hawthorne tests were based on behaviorist psychology and were supposed to confirm that workers performance could be predicted by pre-hire testing. However, the Hawthorne study showed "that the performance of workers had little relation to ability and in fact often bore a reverse relation to test scores...".

What the studies really showed was that the workplace was not "a system of bureaucratic formal organization on the Weberian model, nor a system of informal group relations, as in the interpretation of Mayo and his followers but rather a system of power, of class antagonisms".

According to Braverman this discovery was a blow to those hoping to apply the behavioral sciences to manipulate workers in the interest of management.


My view: What is wrong about the quoted dismissiveness is that there was not 1 study, but 3 illumination experiments, and 4 other experiments: only 1 of these 7 is alluded to. What is right is that a) there certainly are significant criticisms of the method that can be made and b) most subsequent writing shows a predisposition to believe in the Hawthorne effect, and a failure to read the actual original studies.

So, can we trust the literature?

The experiments were quite well enough done to establish that there were large effects due to causal factors other than the simple physical ones the experiments had originally been designed to study. The output ("dependent") variables were human work, and we can expect educational effects to be similar (but it is not so obvious that medical effects would be). The experiments stand as a warning about simple experiments on human participants as if they were only material systems. There is less certainty about the nature of the surprise factor, other than it certainly depended on the mental states of the participants: their knowledge, beliefs, etc.

Candidate causes for the observed results in the Hawthorne studies are:

  1. Material factors, as originally studied e.g. illumination, ...
  2. Motivation or goals e.g. changes in actual rewards, piecework pay, ...
  3. Expectancies / expectation effects. Quite often it is not either personal motivation or social pressures, but simply the individual acquiring from others information about the speed of work which is usual, which you can expect of yourself; e.g. when you ask someone how long it will take you to climb this hill, or prepare this recipe.
  4. Learning effects. People get better at everything with practice. Counterbalancing in experimental designs can control for this. However assymmetric learning effects can occur and undermine the counterbalancing.
  5. Feedback: can't learn skill without good feedback. Simply providing proper feedback can be a big factor. This can often be a side effect of an experiment, and good ethical practice promotes this further. Yet perhaps providing the feedback with nothing else may be a powerful factor.
  6. The attention of observers (e.g. experimenters).
  7. As an important special case of this: specific and known expectations of others (e.g. experimenters, observers, supervisors, oneself, ....)

Parsons implies that (6) might be a "factor" as a major heading in our thinking, but as a cause it might turn out to be reduced to a mixture of individual, not social, effects: (2, 3, 5). That is: people might take on pleasing the experimenter as a goal, at least if it doesn't conflict with any other motive; but also, improving their performance by improving their skill will be dependent on getting feedback on their performance, and an experiment may give them this for the first time. So you then often wouldn't see any Hawthorne effect — only when it turned out that with the attention came either usable feedback, or information about what to expect of themselves, or a change in motivation.

Adair (1984): warns of gross factual inaccuracy in most secondary publications on Hawthorne effect. And that many studies failed to find it, although nevertheless some did. He argues that we should look at it as a variant of Orne's (1973) experimental demand characteristics. So for Adair, the issue is that an experimental effect depends on the participants' interpretation of the situation; that this may not be at all like the experimenter's interpretation and the right method is to do post-experimental interviews in depth and with care to discover participants' interpretations. So he thinks it is not awareness per se; nor special attention per se; but you have to investigate participants' interpretation in order to discover if/how the experimental conditions interact with the participants' goals (in participants' view). This can affect whether participants' believe something, if they act on it or don't see it as in their interest, etc.

Rosenthal & Jacobson (1992) ch.11 also reviews and discusses the Hawthorne effect.

Its interpretation in management research

The research was and is relevant firstly in the "Human Resources Management" movement. The discovery of the effects in the Hawthorne studies was most immediately a blow to those hoping to apply the behavioural sciences to manipulate workers in the interest of management.

Other interpretations it has been linked to are: Durkheim's 'anomie' concept; the Weberian model of a system of bureaucratic formal organization; a system of informal group relations, as in the interpretation of Mayo and his followers; a system of power, of class antagonisms.

Franke & Kaul (1978) offered yet another interpretation for the management psychology field, as argued and summarised here.

Gillespie (1991) stresses the diversity of interpretation of the Hawthorne experiments at the time and among the researchers involved, as well as later and by others.

He also stresses that although workers (subjects) were extensively interviewed at times during the trial, Mayo developed arguments that were widely accepted for dismissing their interpretations, and imposing other interpretations.

He also points out that these researchers, and much of this field, assumes that happier workers are more productive workers. This was not only used to justify seeking higher productivity (as in the interests of workers as well as management), but led to using measures of productivity directly as measures of worker happiness.

With the advantage of hindsight, and of the wider methodological issues explored within this web page, I would now suggest that the Hawthorne studies are enough to dismiss as naive both (a) the simplest Taylorist view of expecting that studying only material aspects of workers' behaviour (e.g. time, motions, illumination) would be adequate; but equally (b) Mayo's view that thinking only of management's relationship to the workers and issues of human compassion and relationships are adequate. Instead also, and in fact more widely important, are issues of how people govern their behaviour by expectancies about the time, effort and quality they should aim for, and the manifold sources they use to acquire or modify those expectancies. Just because material factors on the one hand, and management-worker relationships on the other, sometimes have an effect does not mean that this exhausts or even dominates the important causal factors determining how productive a person is. Similarly, these areas do not exhaust, nor even predominate amongst, the issues which experimental designs with human participants should attempt to address.

My summary view of Hawthorne

In the light of the various critiques, I think we could see the Hawthorne effect at several levels.

At the top level, it seems clear that in some cases there is a large effect that experimenters did not anticipate, that is due to participants' reactions to the experiment itself. This is the analogue to the Heisenberg uncertainty principle BUT (unlike in quantum mechanics) it only happens sometimes. So as a methodological heuristic (that you should always consider this issue) it is useful, but as an exact predictor of effects, it is not: often there is no Hawthorne effect of any kind. To understand when and why we will see a Hawthorne or experimenter effect, we need more detailed considerations.

At a middle level, I would go with Adair (1984), and say that the most important (though not the only) aspect of this is how the participants interpret the situation. Interviewing them (after the "experiment" part) would be the way to investigate this, and to build in a precautionary check in every experiment.

This is important because factory workers, students, and most experimental participants are doing things at the request of the experimenter. What they do depends on what their personal goals are, how they understand the task requested, whether they want to please the experimenter and/or whether they see this task as impinging on other interests and goals they hold, what they think the experimenter really wants. Besides all those issues that determine their goals and intentions in the experiment, further aspects of how how they understand the situation can be important by affecting what they believe about the effects of their actions. Thus the experimenter effect is really not one of interference, but of a possible difference in the meaning of the situation for participants and experimenter. Since all voluntary action (i.e. actions in most experiments) depends upon both the actors' goals AND on their beliefs about the effects of their actions, differences in understanding of the situation can have big effects.

At the lowest level is the question of what types the specific causal factors might be. The rest of this web page elaborates on them, but a preliminary set might be:

For further critique and contextualisation of the notion of "the Hawthorne effect" see the section below "Ways of classifying and comparing such effects".

Jastrow's effect of expectancy on punched card operators

According to Rosenthal & Jacobson (1968), Jastrow (1900) reported a different striking effect on workers being trained on the then new IBM Hollerith punch card machines in the US census bureau. The first group were expected by the inventor to produce 550 per day, and did so but had great difficulty in improving on that. However a second group who were isolated from the expectation were soon doing 2100 per day.

In my own practice, asking students to do the novel task of writing a paragraph on an unexpected topic in 5 minutes led to no words being written in the time; but asking them plus mentioning that most students write about 15 lines in that time, led to almost all of them meeting that quiet expectation.

Rosenthal's Pygmalion effect of expectancy advantage

Rosenthal & Jacobson (1968/1992) report and discuss at length an important effect, which I shall call the Pygmalion effect. Basically, they showed that if teachers were led to expect enhanced performance from some children then they did indeed show that enhancement, which in some cases was about twice that showed by other children in the same class.

The biggest study was at "Oak school": a US primary school. Teachers were deceived into believing that a set of one fifth of their class were expected to develop much faster than the rest, as measured by IQ points. In fact, this set was randomly selected; or rather, selected by stratified random sampling, the better to guarantee that they were extremely similar in both mean and variation to the rest of the class. The main measure was a kind of IQ test, administered at the start of the school year (pretest) and at 4 months (end of first semester), 8 months (end of second semester and of first year of school), and 20 months (end of second school year with a different teacher). Maximum overall effect at 8 months, but a lot of gain still present at 20 months. There was a big effect on first and second grade children by the end of the first year. By the end of the second year, much of this had gone in those classes, but in other classes positive effects had emerged for the first time. Girls and boys gained in somewhat different ways (verbal vs. reasoning subscales). The advantage was true of pre/post test of an IQ test. It was also true of teacher assessments e.g. reading grades, which showed big effect in third grade as well. They also did blind retesting of a sample by an examiner who was not the teacher, and who didn't know which were supposed to do well, and got results showing a greater difference.

Another effect was that pupils in the control group who improved against expectation were disliked by teachers, or at least showed signs of being in conflict.

This is the biggest and most careful study. But besides primary school pupils, it has also been shown for algebra at the (US) Air force academy, and for university students as well.

Teacher effects in general

Although not of central importance here, of huge importance in educational research in general is the issue of teacher effects. Tim O'Shea once told me that in all studies where one of the variables was the teacher, the effect of different teachers was always bigger than the effect of different treatments (usually what was meant to be being studied). Basically, teachers have a huge effect but one we don't understand at all.

If we did, we could train teachers to use best practice in the sense of getting the best effects: but we have no idea how to do that. Assuming this is true, this is the most important effect in the whole field of education. (Consider: if this was true in medicine, then it wouldn't matter much what treatment you gave a patient, the most important thing would be to get the best doctor regardless of drugs, surgery or other treatments.) It also implies that the professionalisation of teaching does not entail improvement in learning or in any rational basis for treating learners, though it may from a social viewpoint or of course from the viewpoint of the benefits to practitioners of restrictive practices and regulation to exclude the worst practitioners. However we shouldn't be suprised. Medicine was organised into its current professional form before there was a single scientifically justified treatment available: in the UK, the governing professional body, the General Medical Council, was established by law in essentially its present form by the 1858 Medical Act. However on an optimistic view, Pasteur's rabies vaccination, established around 1870, was the first medical treatment based on scientific evidence; and it has been estimated that 1911 is the first year when a patient was objectively likely to benefit from being treated by a doctor. (L.J Henderson: "somewhere between 1910 and 1912 in this country, a random patient with a random disease, consulting a doctor at random had, for the first time in the history of mankind, a better than a fifty-fifty chance of profiting from the encounter." as quoted in John Bunker (2001) "Medicine Matters After All: Measuring the benefits of medical care, a healthy lifestyle, and a just social environment" (Nuffield Trust))

Note too that all this casts doubt on the value of training teachers, apart from giving them practice to learn for themselves: if we don't know what it is about teachers' behaviour that has such large effects on learning, how can we usefully train them? In the absence of this knowledge, the only measure of a teacher's worth is the comparative learning outcomes of their students. However neither teachers nor teacher training is usually assessed by this. So while it is quite possible that teachers learn either by unaided practice, or by unconscious imitation of other teachers (apprenticeship learning), there is almost no evidence on whether that training makes a difference.

The empirical observation of the importance of teachers has major implications for theory. Because they are of such large importance, I prefer Laurillard's theory of the learning and teaching process to others since it gives equal weight to learners and to teachers, and I regard slogans such as "learner-centered" and theories such as neo-constructivism to be flawed because they do not acknowledge or give a place to teachers of the prominence that they in fact have in the causation of learning.

So given the importance of teacher effects, what is the evidence? I need to do a proper review of this. But the Pygmalion effect is one big demonstration of the effect of teachers, showing they can double the amount of pupil progress in a year. Rosenthal & Jacobson (1992) also mention briefly research that showed that 10 secs of video without sound of a teacher allows students to predict the ratings that person will get as a teacher. Similarly hearing the sound without vision AND without content (rhythm and tone of voice only) were enough too. This is powerful evidence that teachers differ in ways they cannot easily or normally control, but which are very quickly perceptible, and which at least in students' minds, determine their value as a teacher. (And Marsh's (1987) work shows that student ratings of teachers do relate to learning outcomes.)

This also brings out an essential difference between medicine and education. In education, the teacher is supposed (except by radicals) to be a major cause of learning; while in medicine it is supposed to be the "treatment" regardless of who administers it.

The placebo effect: does it really exist?

Placebos are things like sugar pills, that look like real treatments but in fact have no physical effect. They are used to create "blind" trials in which the participants do not know whether they are getting the active treatment or not, so that physical effects can be measured independently of the participants' expectations. There are various effects of expectations, and blind trials control all of these together by making whatever expectations there are equal for all cases. Placebos aren't the only possible technique for creating blindness (unawareness of the intervention): to test the effectiveness of prayer by others, you just don't tell the participants who has and has not had prayers said for them. To test the effect of changing the frequency of fluorescent lights on headaches, you just change the light fittings at night in the absence of the office workers (this is a real case).

Related to this is the widespread opinion that placebo effects exist, where belief in the presence of a promising treatment (even though it is in fact an inert placebo) creates a real result e.g. recovery from disease. Placebos as a technique for blinding will remain important even if there is no placebo effect, but obviously it is in itself interesting to discover whether placebo effects exist, how common they are, and how large they are. After all, if they cure people then we probably want to employ them for that.

Claims that placebo effects are large and widespread go back to at least Beecher (1955). However Kienle and Kiene (1997) did a reanalysis of his reported work, and concluded his claims had no basis in his evidence; and then Hrobjartsson & Gotzsche (2001) did a meta-analysis or review of the evidence, and concluded that most of these claims have no basis in the clinical trials published to date. The chief points of their sceptical argument are:

Nevertheless, even they conclude that there is a real placebo effect for pain (not surprising since this is partly understood theoretically: (Wall, 1999)); and for some other continuously-valued subjectively-assessed effects e.g. fatigue, nausea. A recent experimental demonstration was reported: Zubieta et al. (2005) "Endogenous Opiates and the Placebo Effect" The journal of neuroscience vol.25 no.34 p.7754-7762. This seems to show that the psychological cause (belief that the placebo treatment might be effective in reducing pain) causes opioid release in the brain, which then presumably operates in an analogous way to externally administered morphine.

A more extensive review of the overall dispute is Nimmo (2005) and another is Woolfson (2009). (See also Hrobjartsson, A. & Gotzsche, P.C. (2006) "Placebo interventions for all clinical conditions" Cochrane Database of Systematic Reviews issue 3.)

N.B. the opposite of placebo is nocebo: something that although in fact materially neutral, causes harm in the patient because they believe it will harm them: see a review by Barsky et al. (2002).

Ted Kaptchuk of Harvard Medical School seems to have following position. The placebo effect seems to exist in at least a few situations. If it is real, then it operates in both scientific treatments (where it will increase the apparent efficacy), and in non-scientific treatments e.g. in alternative medicine. If we care for patients' recovery we should attempt to optimise it, and possibly alternative medicine is actually better at eliciting it. If we care for knowledge, we should research how it works, what best promotes it, etc. See for example: Ted J. Kaptchuk (2002) "The Placebo Effect in Alternative Medicine: Can the Performance of a Healing Ritual Have Clinical Significance?" Annals of Internal Medicine vol.136 no.11 pp.817-825

Kaptchuk offers some ideas about what the key factors might be in the placebo effect (i.e. in positively enhancing healing, but not from the material effects of the treatment or drug). He suggests it is the effect of hope, attention and care; and to do with the social, not just their personal, beliefs i.e. with "the healing drama". That is, like the Pygmalion effect, if those round you believe in the effect that may itself have an important effect.

In turn that suggests to me these points:

Currently it is apparent that the placebo effect is real and important with pain, but may not exist elsewhere. This is not grasped by some important authors. So meta-analyses e.g. by Hrobjartsson may show no effect, but this is probably because he averages together placebo effects across many fields; and conversely Benedetti argues that all drug trials should be done differently even though he has shown important effects only for pain. It also brings out that for good pure science, you should compare no-treatment, placebo, and treatment (and additionally seriously consider learning effects from repeated administrations); but for applied science, the standard treatment vs. placebo trial is mainly good enough. It will only miss out on how important placebo effects are in potentiating both drugs. But still such standard trials measure drugs in approximately valid clinical conditions, where patients believe the treatment is probably effective.

Note too the strong distorting tendencies of different fields: in drug trials, researchers tend to attribute the whole efficacy of the control (placebo) condition to a placebo effect whereas much of it will be due to spontaneous recovery; while physicians see it as like bedside manner: something to please the patient independently of objective healing mechanisms, whereas the neurophysiological effects of the placebo effects have now been well established.

Benedetti has shown that some pain drugs rely in part on placebo effect: they are real but there is a statistical interaction with patient knowledge of getting a drug i.e. the placebo effect is necessary to potentiate the drug, which then adds value on top of placebo alone. Same is true of diazepam's (valium) effect on anxiety.

Can do (they have) trials where patient doesn't know when they are getting the drug; but can be informed consent because they know they might, or they will at some time.

Finally: here's where you can buy one! "Obecalp, the First Standardized, Branded and Pharmaceutical Grade Placebo is Now Available for Sale at InventedByAMommy.com": "Obecalp"

Placebo vs. Hawthorne effects

The placebo and Hawthorne effects compare and contrast in these ways:

Positive thinking

Related to placebos is evidence about how positive thinking can improve heath outcomes. This area shows that healing and recovery is affected by various kinds of positive thinking. This is not mostly about a placebo effect confounding an experiment, but about better medical outcomes for physically treated patients depending on whether they additionally have positive thinking. The mediating causes are probably: effects on pain; on how well we look after ourselves; and on reduction in persistent stress.

Marchant, Jo (2011) "Heal thyself: the power of mind over matter" "Heal thyself: think positive" NewScientist no.2827 29 Aug. 2011 http://www.newscientist.com/special/heal-thyself

Optimists heal faster: "The Value of Positive Psychology for Health Psychology: Progress and Pitfalls in Examining the Relation of Positive Phenomena to Health" Annals of Behavioral Medicine Volume 39, Issue 1 , pp 4-15 doi:10.1007/s12160-009-9153-0

doi:10.1097/PSY.0b013e31818105ba

doi:10.1037/0022-3514.85.4.605

doi:10.1037/a0014663

Summary of the placebo effect

Sham controls against placebo effects

Graeme Campbell describes recent ways of tackling the problem of placebo effects undermining experimental tests as follows.

Designing appropriate controls in surgical research is difficult because surgical outcomes are "a cumulative effect of three main elements: critical surgical element, placebo effects, and non-specific effects." (Wartolowska et al., 2014). A solution to this problem has existed within medical research since at least 1959, when 20 years of accepted wisdom about the effectiveness of Internal-Mammary artery ligation (tying off the Internal-Mammary artery) for the relief of angina was overturned by randomising anaesthetised patients to receive either the surgery, or merely the incision (Cobb, Thomas, Dillard, Merendino, & Bruce, 1959).

Such controls, referred to as sham surgery are understandably the subject of ethical debate — unlike placebo medications they are not inert, and confer significant risks due to anaesthesia and surgical wounds (Angelos, 2007). However, there is growing consensus that the costs of errant surgical practices persisting outweigh these risks (Prasad & Cifu, 2019; Wartolowska et al., 2014). After being out of fashion for around 40 years (Stolberg, 1999), sham surgery is now used in areas as diverse as brain surgery (where control patients actually have holes drilled into their skulls; (Kim et al., 2005), to knee surgery (Moseley et al., 2002).

The sham approach has proven powerful for the evaluation of procedural interventions beyond surgery. Sham acupuncture is identical to traditional Chinese acupuncture, save that the needles are deliberately placed in locations not claimed to be vital to the flow of chi. A 2016 review of sham acupuncture trials for menopausal hot flushes failed to demonstrate traditional acupuncture's efficacy — in direct contradiction of the non-sham literature (Carlos et al., 2016). Their results suggest that the improvement observed in previous research had nothing to do with needle placement — but rather placebo and non-specific effects, such as those inherent in the therapeutic relationship.

Thus controlled experiments require not a no-treatment condition, but a sham treatment that participants cannot tell apart from the new-treatment condition. This applies not only to surgery, but to experiments, for instance, on mindfulnes meditation where participants are very likely to have heard enough about it that simply resting or doing nothing will not stop them expecting the effects claimed for meditation.

Ways of classifying and comparing such effects

Can we organise these (and other) various reported effects in some useful way?

What are the effects that might be related?

This section is a list of names in the literature purporting to identify "effects".

Why perhaps rational

No-one knows the mechanisms behind these effects. However it is not hard to generate speculations on how they might be advantageous and so quasi-rational. Note that not all conceivable effects are in fact observed. The cases where they are and are not, may often be explained by where they are rational, advantageous.

Summary

There are many terms that have been used in one way or another refer to how the behaviour of human participants is modified by the experiment itself. These terms suggest different attitudes to this problem area. McCambridge et al. (2014) are largely right to conclude that: "there is no single Hawthorne effect. ... Consequences of research participation for [the] behaviors being investigated do exist, although little can be securely known about the conditions under which they operate, their mechanisms of effects, or their magnitudes. New concepts are needed to guide empirical studies."

Actually, I think we can already do a bit better than that. It is quite wrong to think that all these effects apply all the time, and that no useful experiments with human participants can be done.

The placebo effect is real, but only applies to very limited cases such as the treatment of pain, nausea and fatigue. There is no good evidence of it operating on other medical problems. And there seem to be good reasons for this (see previous section).

Unless the task in an experiment is unusually well specified, then expectancies are likely to be an important factor for the reasons outlined above. This in turn interacts strongly with how participants' understand "the task" and what is being asked of them: so their interpretation of the situation is often another large factor. To repeat, simply telling participants to speak, to walk, to pull, to sew on a button is massively under-specified: they MUST somehow fill in all the missing details especially of the desirable speed and accuracy; and are likely to try to fill them in by imagining what you want, or what interests them, or ...

A more general lesson is about conduct of any study with human participants. Coombs & Smith (2003) argue that these issues justify action research rather than lab. experiments as a method, but often it may rather be that their criticism of naive positivism is more widely applicable. Many naive researchers say very little to participants "in order not to bias them"; but this just means the participants all have to make an interpretation of what the under-specified instructions mean, and will make it differently from each other. If you want to study individual differences in interpretation (as in Rorschach blots), then witholding a specified meaning is the way to proceed. But if you want to study other variables, then more fully specifying the task is what is wanted, and comparing two groups with the same interpretation of the task but differences in some other variable (e.g. illumination level). In other words, the solution is a better controlled, not an uncontrolled, study; which may be either in the lab or in the field.

The fields where such effects apply

Blind trials

In the medical field, a strong adherence to the method of double and triple blinding in trials, at least of drugs, has developed. We could also use this as one practical, applied, behaviouristic way of classifying effects in this area.

Thus from a practical point of view, there are three classes of humans to be managed in an experimental trial, and whose expectations have each been shown sometimes to affect its outcome.

Three different reasons for a control group to show recovery from illness (including regression to the mean)

(I learned what is expressed in this section from Stephen Senn (2009).)

It is possible for an experiment to test for a placebo effect clearly, by comparing those who get the placebo and those who get nothing at all. However in standard medical trials using a placebo, the new treatment is compared to a placebo. It is common for the placebo group to show an improvement (reduced illness) compared to the start of the trial. There are 3 different kinds of reason for this, and such standard experiments cannot tell which applies:

Thus many experiments look as if they show a placebo effect because there is a group that receives a placebo and which shows significant improvement during the trial. However in many cases this effect is not due to a placebo, but to either spontaneous recovery or to regression to the mean of fluctuating symptoms. (Note that this is a case where the within-subjects comparison is LESS informative than the between-subjects one.) The standard double blind, placebo controlled trial can not discriminate between these three cases. Many published papers have asserted quite erroneous conclusions through not understanding this.

Research methods implications

Shayer: pure and applied research

These are some notes stimulated by a valuable chapter by Shayer (1992).

There are two different aims for research:

Science studies

If you want just to find causes and laws, not to achieve any useful practical effect, then the focus is on isolating causes by controlling experiments and avoiding things such as the Hawthorne effect. Hence, in medical research, double blind trials etc.

Note that double blind trials (where neither experimenter nor patient know which intervention/treatment they are getting during the trial) are quite practicable for testing pills (where a dummy sugar pill can easily be made that the patient cannot tell apart from other pills); but not for major surgery, nor usually for educational interventions that require actions by the learner: in these cases participants necessarily know which treatment they have been given.

Double (or triple) blind trials "control for" most of the various effects above in the sense of making them equal for all groups by removing the ability of both experimenter and participants to even know which treatment is being given, much less to believe they know which is the more effective. They may tend to reduce the placebo effect since the patient knows they have only a 50% chance that they are getting the active treatment. However they do NOT remove the Hawthorne effect (only make it equal for all groups in the trial), since on the contrary the experiment almost certainly makes participants very aware of receiving special attention. This could mean that the effect sizes measured in some groups are misleading, and would not be seen later in normal practice. The trial would be a fair comparison between groups, but the magnitude of effect measured would not be predictive of the effect seen in non-experimental conditions, due to a similar "error" (i.e. effect due to the Hawthorne effect) applying to both groups.

This could, at least in theory, matter. A case in point could be comparing homeopathic and conventional medicine. Generally a patient will get about 50 minutes of the practitioner's attention in the former case, and 5 minutes in the latter. It is not hard to imagine that this might have a significant effect on patient recovery. A standard double blind experiment (comparing just two treatments) would be most seriously misleading in a case where both a drug and a "Hawthorne" effect of attention were of similar magnitude, but not additive (i.e. either one was effective, but getting both gave no extra benefit): then a conventional trial would see similar and useful effect magnitudes in both groups, but would not be able to tell that in fact either giving the drug or giving an hour's attention to the patient were alternative effective therapies, unless there were also a third "no treatment, no attention" control group. A thorough experiment would have to have at least five groups: with either 5 or 50 minutes of practitioner attention, with either conventional or homeopathic substances taken, plus a group that got no substance and no practitioner attention.

Finally, neither medicine nor education habitually employ counter-balanced experimental designs, where all participants get both treatments: one group gets A then B, and the other gets B then A. This is partly because of the possibility of assymmetric transfer effects i.e. the effect of B (say) is different depending on whether or not the participant had A first. For instance, learning French vocabulary first then reading French literature is not likely to have the same effect as receiving them the other way round.

Applied or engineering studies (Shayer)

Shayer thinks there are distinct questions and stages to address in applied as opposed to "scientific" research — i.e. in research on being able to generalise the creation of a desired effect:
  1. Study primary effect: Is there an effect (whatever the cause), what effect, what size of effect?
  2. Replication: can it be done by other enthusiasts (not only by the original researcher)?
  3. Generalisability: can it be done by non-enthusiasts? i.e. can it be transferred via training to the general population of teachers? i.e. without special enthusiasm or skills. This is actually a test of the training procedure, not of the effect — but that is a vital part of whether the effect can be of practical use.

One danger is the Hawthorne effect: you get an effect, but not due to the theory. The opposite is to get a null effect even though the theory is correct because transfer/training didn't work. So you need to do projects in several stages, showing effects at each.

In stage (1) you do an experiment and show there really is an effect, defensible against all worries. But you still haven't shown what it is caused by: whether the factors described in your theory, or by the experimenter: i.e. no defence against Hawthorne. Use one or two teachers, and control like crazy. In (2) you show it can be done by others: so at least it is not just a Papert charisma effect, but it still might be a learner enthusiasm effect (of novelty or halo). Use say 12 teachers. In (3) you are testing whether training can be done.

Note that if what you care about is improving learning and the learners' experience, then you may want to maximise not avoid novelty, halo, and Hawthorne effects. If you can improve learning by changing things every year, telling students this is the latest thing, then that is the ethical and practical and practically effective thing to do.

Rosenthal's suggestions on method

Rosenthal & Jacobson (1992) have a brief chapter proposing methods to address these effects, at least for "science" studies of primary effects.

They say firstly we should have Hawthorne controls i.e. 3 groups: control (no treatment); experimental (the one we are interested in); a Hawthorne control, which has a change or treatment manifest to participants but not one that could be effective in the same way as the experimental intervention. [This is the reply to wanting to do triple blind trials, but not being able to avoid participants knowing something is being done; AND is a response to measuring the size of the placebo effect as well as of the experimental effect.]

Secondly, have "Expectancy control designs": 2X2 of control/experimental X with / without secondary participants expecting a result. [Hawthorne effect and control groups are about subject expectancies; expectancy controls are about Pygmalion effect i.e. teachers' expectancies.]

So, combining these, they then suggest a 2 X 3 design of {teacher expects effect or not} X {control, experimental, Hawthorne ctrl i.e. placebo treatment}. The point of these is not merely to avoid confounding factors but to measure their existence and size in the case being studied.

N.B. A medical trial with drug and placebo groups is most like having experimental and Hawthorne-control groups but no pure control group. Adding the latter would additionally require a matched group that was monitored but given no treatment. However participants are normally told it is a blind trial, rather than fully expecting both treatment and placebo to be effective, so this is not an exact parallel.

Adair (1984) suggests that the important (though not the only) aspect of these effects is how the participants interpret the situation. Interviewing them (after the "experiment" part) would be the way to investigate this. This is also essental in "blind" trials to check whether the blinding is in fact effective. Some trials which are conducted and probably published as blind are in fact not. If the active treatment has a readily perceptible side effect on most patients (e.g. hair falls out, urine changes colour, pronounced dry mouth) both doctors and patients will quickly know who does and does not have the active drug. Blinding depends on human perception, and so these perceptions should be measured.

Summary recommended method

First party (cf. "single blind"): the pupil or patient
Second party (cf. "double blind"): the teacher or doctor or researcher
(Third party (cf. "triple blind"): a rater or lab technician who makes observations or tests is also blind to the condition s/he is judging)
2nd party expectancy 1st party expectancy
Teacher (mis)led to expect positive result Experimental group Control group: no treatment Hawthorne control: irrelevant treatment / placebo
Teacher (mis)led to expect no effect Experimental group Control group: no treatment Hawthorne control: irrelevant treatment / placebo
Plus interview both first and second parties on how they see (interpret) the situation.

My comment

We know that all the above effects can have important and unexpected effects. So we cannot trust results that don't at least try to control for them. A double or triple blind procedure allows a 2-group experiment to control for them. Rosenthal's recommended 6-group approach is three times more costly. However it doesn't merely control but measures the size of all three effects (placebo, Hawthorne, and the material effect) separately AND their interactions. If the effects aren't there, that might be grounds for doing it more simply and cheaply in future. But if they are, then without the larger design, we cannot know what size of effect to expect in real life, only that there is an effect that is independent of expectations. Thus we could see a blind trial as somewhat like Shayer's stage 1 (establishing the existence of an effect), while the larger designs also address aspects of later practical stages.

Because placebo effects are so large and so prevalent in medicine, blind trials have become the standard there. Nevertheless they do not give information about the size of benefit to be expected in real life use. In fact it may initially be greater than in the trials, because the placebo effect will be unfettered (everyone will expect it to work after the trials), but may decline to lower levels later. Another way of looking at it is that blind trials test the effect of the (say) drug, but resolutely refuse to investigate the placebo and Hawthorne benefits even though these may possibly be of similar size and benefit to the patient. Drug companies may reasonably stick to research that informs their concerns only, but those who either claim to investigate all causes or those that benefit patients or pupils have much less excuse.

Currently we don't understand how any of these effects work. This could probably be done, but would require some concentrated research e.g. on uncovering how expectancies are communicated (cf. "clever Hans") unconsciously or anyway implicitly, and what expectancies are in fact generated.

Ann Brown's discussion of the Hawthorne effect in educational research

Ann Brown, a notable researcher in education and psychology, has a section on the Hawthorne effect as a criticism of studies in her field (Brown, 1992; p.163ff.). As in this web page, she went back to the original literature to find a considerable difference between the original work and what is often said about it now.

Her comments relate to several points:

Acknowledgements

This began as my own notes; but over time I have taken stuff from others. Particularly important contributions from Morag Nimmo, Stephen Senn, and the various workers on the wikiPedia entry on the Hawthorne effect.

References

Hawthorne effect references

(See Gillespie (1991) for an extensive bibliography of primary sources on Hawthorne.)

Adair,G. (1984) "The Hawthorne effect: A reconsideration of the methodological artifact" J. Appl. Psych. vol.69 (2), 334-345 [Reviews references to Hawthorne in the psychology methodology literature.]

Angelos, P. (2007) "Sham Surgery in Clinical Trials" JAMA 297(14), 1545-1546. doi:10.1001/jama.297.14.1545-c

Bauernfeind, Robert H., and Carl J. Olson (1973) "Is the Hawthorne Effect in Educational Experiments a Chimera?" The Phi Delta Kappan Vol.55, No.4 (Dec., 1973), pp.271-273 Stable URL: http://www.jstor.org/stable/20297533

Brown, A.L. (1992) "Design experiments: Theoretical and methodological challenges in creating complex interventions in classroom settings" The Journal of the Learning Sciences, 2(2), pp.141-178

Carey, A. (1967) "The Hawthorne Studies: A radical criticism" American Sociological Review vol.32 pp.403Ð416

Carlos, L., Cruz, L. A. P. da, Leopoldo, V. C., Campos, F. R. de, Almeida, A. M. de, & Silveira, R. C. de C. P. (2016). Effectiveness of Traditional Chinese Acupuncture versus Sham Acupuncture: A Systematic Review. Revista Latino-Americana De Enfermagem, 24, e2762. doi:10.1590/1518-8345.0647.2762

Clark,R.E. & Sugrue,B.M. (1991) "Research on instructional media, 1978-1988" in G.J.Anglin (ed.) Instructional technology: past, present, and future ch.30 pp.327-343 (Libraries unlimited: Englewood, Colorado).

Cobb, L. A., Thomas, G. I., Dillard, D. H., Merendino, K. A., & Bruce, R. A. (1959) "An Evaluation of Internal-Mammary-Artery Ligation by a Double-Blind Technic" New England Journal of Medicine 260(22), 1115-1118. doi:10.1056/NEJM195905282602204

Coombs,S.J. & Smith,I.D. (2003) "The Hawthorne effect: Is it a help or hindrance in social science research?" Change: Transformations in Education vol.6 no.1 pp.97-111 http://ses.library.usyd.edu.au//bitstream/2123/4494/1/Vol6No1Article7.pdf

Flynn, J.R. (2012) How to Improve your Mind: Twenty keys to unlock the modern world (Wiley-Blackwell)

Franke,R.H. & Kaul,J.D. (1978) "The Hawthorne experiments: First statistical interpretation" American sociological review vol.43 pp.623-643

Franke,R.H. (1980) "Worker productivity at Hawthorne" Amer. Sociol. Rev. vol.45 no.6 pp.1006-1027

French,J.R.P. (1953) "Experiments in field settings" ch.3 pp.98-135 in Festinger,L., & Katz,D. Research methods in the behavioral sciences (New York: Holt, Rinehart & Winston) Cited, with a long quotation, by Jones (1992, p.452).

Gillespie, Richard, (1991) Manufacturing knowledge : a history of the Hawthorne experiments (Cambridge : Cambridge University Press) [Has an extensive bibliography of primary sources on Hawthorne.]

Jastrow (1900) Fact and fable in psychology (Boston: Houghton Mifflin) [I haven't seen this book myself.]

Jones, Stephen R. G. (1992) "Was There a Hawthorne Effect?" The American Journal of Sociology vol.98 no.3 (Nov., 1992), pp. 451-468, from the abstract "the main conclusion is that these data show slender to no evidence of the Hawthorne Effect"

Kim, S. Y. H., Frank, S., Holloway, R., Zimmerman, C., Wilson, R., & Kieburtz, K. (2005). "Science and Ethics of Sham Surgery: A Survey of Parkinson Disease Clinical Researchers" Archives of Neurology 62(9), 1357-1360. doi:10.1001/archneur.62.9.1357

Landsberger, Henry A. (1958) Hawthorne Revisited (Ithaca, NY: Cornell University )

Lovett,R. "Running on empty" New Scientist 20 March 2004 vol.181 no.2439 pp.42-45

Marsh, H.W. (1987) "Student's evaluations of university teaching: research findings, methodological issues, and directions for future research" Int. journal of educational research vol.11 no.3 pp.253-388.

Mayo, E. (1933) The human problems of an industrial civilization (New York: MacMillan)

McCambridge,J., Witton,J. & Elbourne,D.R. (2014) "Systematic review of the Hawthorne effect: New concepts are needed to study research participation effects" Journal of clinical epidemiology vol.67 no.3 pp.267-277 doi:10.1016/j.jclinepi.2013.08.015

Moseley, J. B., O'Malley, K., Petersen, N. J., Menke, T. J., Brody, B. A., Kuykendall, D. H., Wray, N. P. (2002) "A Controlled Trial of Arthroscopic Surgery for Osteoarthritis of the Knee" New England Journal of Medicine 347(2), 81-88. doi:10.1056/NEJMoa013259

Olson,R., Verley,J., Santos,L. & Salas,C. (1994) "What we teach students about the Hawthorne studies: A review of content within a sample of introductory I-O and OB textbooks"

Orne,M.T. (1973) "Communication by the total experimental situation: Why is it important, how it is evaluated, and its significance for the ecological validity of findings" in P.Pliner, L.Krames & T.Alloway (eds.) Communication and affect pp.157-191 (New York: Academic Press).

Parsons,H.M. (1974) "What happened at Hawthorne?" Science vol.183, pp.922-932 [A very detailed description, in a more accessible source, of some of the experiments; used to argue that the effect was due to feedback-promoted learning.]

Prasad, V., & Cifu, A. S. (2019). "The Necessity of Sham Controls" The American Journal of Medicine, 132(2), e29-e30. doi:10.1016/j.amjmed.2018.07.030

Rice, Berkeley (1982) "The Hawthorne Defect: Persistence of a Flawed Theory." Psychology Today vol.16 no.2 pp.71-74   Also at: http://wolfweb.unr.edu/homepage/markusk/Hawthorne.htm

Roethlisberger,F.J. & Dickson,W.J. (1939) Management and the Worker (Cambridge, Mass.: Harvard University Press).
[This is a large book (more than 600 pages) of details of the studies.]

Roethlisberger, F.J. (1941) Management and morale (Cambridge, MA: Harvard University Press)

Rosenthal,R. (1966) Experimenter effects in behavioral research (New York: Appleton).

Rosenthal,R. & Jacobson,L. (1968, 1992) Pygmalion in the classroom: Teacher expectation and pupils' intellectual development (Irvington publishers: New York)

Rhem,J. (1999) "Pygmalion in the classroom" in The national teaching and learning forum vol.8 no.2 pp.1-4

Saretsky, G. (1972) "The OEO PC experiment and the John Henry effect" The Phi Delta Kappan Vol.53 No.9 (May, 1972), pp.579-581 Stable URL: http://www.jstor.org/stable/20373317

Schön, D.A. (1983) The reflective practitioner: How professionals think in action (Temple Smith: London) (Basic books?)

Shayer,M. (1992) "Problems and issues in intervention studies" in Demetriou,A., Shayer,M. & Efklides,A. (eds.) Neo-Piagetian theories of cognitive development: implications and applications for education ch. 6 pp.107-121 (London : Routledge) GoogleBook

Stolberg, S. (1999). "Sham Surgery Returns as a Research Tool" Retrieved August 13, 2019, from https://archive.nytimes.com/www.nytimes.com/library/review/042599surgery-ethics-review.html

Wall,P.D. (1999) Pain: the science of suffering (Weidenfeld & Nicolson)

Wartolowska, K., Judge, A., Hopewell, S., Collins, G. S., Dean, B. J. F., Rombach, I., ... Carr, A. J. (2014) "Use of placebo controls in the evaluation of surgery: Systematic review" BMJ, 348. doi:10.1136/bmj.g3253

Zdep,S.M. & Irvine,S.H. (1970) "A reverse Hawthorne effect in educational evaluation" Journal of School Psychology vol.8 pp.89-95

Definitely important references on the placebo effect

Beecher,H.K. (1955) "The powerful placebo" Journal of the American Medical Association vol.159 pp.1602-1606 [Original article, most cited one, claiming a widespread placebo effect]

Carroll, Robert Todd (2001?) The Placebo Effect Accessed on 2004-05-19. [Part of the Skeptics Dictionary. Useful categorisation of possible types of mechanism for the placebo effect if it exists.]

Hrobjartsson, Asbjorn; Gotzsche, Peter C. (2001) "Is the Placebo Powerless? An Analysis of Clinical Trials Comparing Placebo with No Treatment" New England Journal of Medicine vol.344 no.21 May 2001 pp.1594-1602 [Meta-analysis, destroying most but not all of the belief that there is evidence for a placebo effect.]

Kienle G.S. & Kiene H. (1997) "The powerful placebo effect: fact or fiction?" Journal of Clinical Epidemiology vol.50 no.12 pp.1311-8. [Destroys Beecher's original article]

Petrie & Rief (2019) "Psychobiological Mechanisms of Placebo and Nocebo Effects: Pathways to Improve Treatments and Reduce Side Effects" Annual Review of Psychology vol.70 no.12 pp.1-27 doi:10.1146/annurev-psych-010418-102907 https://www.annualreviews.org/doi/pdf/10.1146/annurev-psych-010418-102907

Some more references on the placebo effect

More references can also be found in the Nimmo review: PDF copy.

Barsky, Arthur J., Saintfort, Ralph Rogers, Malcolm P. Borus, & Jonathan F. (2002) JAMA (Journal of the American Medical Association) "Nonspecific Medication Side Effects and the Nocebo Phenomenon" vol.287 pp.622-627.

Brooks, M. (2008) "Running on empty" New Scientist vol.?? issue 2670 of New Scientist magazine, 20 August 2008, page 36-39

Dodes, John E. (2001?) The Mysterious Placebo Accessed on 2001-01-19. Originally published in the January/February 1997 issue of Skeptical Inquirer. A nice overview of the placebo effect and how it influences the study of alternative medicines.

Evans, Dylan (2003) Placebo: the Belief Effect (Harper Collins)

Evans, M. Justified deception? The single blind placebo in drug research. Journal of Medical Ethics 2000;26(3):188-193.

Feynman, R.P. (1985) "Surely You're Joking, Mr. Feynman!" Adventures Of A Curious Character (London: Norton)

Kaptchuk, Ted J. (2002) "The Placebo Effect in Alternative Medicine: Can the Performance of a Healing Ritual Have Clinical Significance?" Annals of Internal Medicine vol.136 no.11 pp.817-825

Kriegeskorte, N., Simmons, W.K., Bellgowan, P.S.F. & Baker, C.I. (2009) "Circular analysis in systems neuroscience: the dangers of double dipping" Nature Neuroscience vol.12 pp.535-540 Published online: 26 April 2009 | doi:10.1038/nn.2303

McDonald CJ, Mazzuca SA, McCabe GP, Jr. How much of the placebo 'effect' is really statistical regression? Statistics in Medicine 1983;2(4):417-27.

Nimmo, Morag (2005) Placebo: Real, Imagined or Expected? A Critical Experimental Exploration Final year undergraduate Critical Review, Dept. of Psychology, University of Glasgow. PDF copy.

Nordenberg, Tamar (2000) "The Healing Power of Placebos" FDA Consumer magazine

Price,D.D., Finniss,D.G. & Benedetti,F. (2008) "A Comprehensive Review of the Placebo Effect: Recent Advances and Current Thought" Annu. Rev. Psychol. 2008. vol.59 pp.565Ð90

Senn SJ. How much of the placebo 'effect' is really statistical regression? [letter]. Statistics in Medicine 1988;7(11):1203.

Senn SJ. The ignoble lie [letter; comment]. Journal of Clinical Epidemiology 1992;45(11):1338-40.

Senn SJ. (1995) "A personal view of some controversies in allocating treatment to patients in clinical trials" [see comments] Statistics in Medicine 1995;14(24):2661-74.

Senn SJ. (1997) Are placebo run ins justified? British Medical Journal 1997;314(7088):1191-3.

Senn SJ. (2001) "The Misunderstood Placebo" Applied Clinical Trials 2001;10(5):40-46.

Senn SJ. (2002) "Ethical considerations concerning treatment allocation in drug development trials" Statistical Methods in Medical Research 2002 vol.11 pp.403-411.

Senn SJ. (2003) Dicing with death (CUP: Cambridge)

Senn, S. (2009) "Three things that every medical writer should know about statistics" The Write Stuff vol.18 no.3 pp.159-162

Simon, Steve (2003) "Ethics of a placebo group"

Woolfson, Jenny (2009) Questioning the Power of the Placebo Given the Substantial Psychological and Physiological effects Generated by Placebos, should Pharmacologically Inactive Medicines be considered Ineffective or Indispensable? PDF copy.

Zubieta et al. (2005) "Endogenous Opiates and the Placebo Effect" The journal of neuroscience vol.25 no.34 p.7754-7762

Web site logical path: [www.psy.gla.ac.uk] [~steve] [this page]
[Top of this page]