The third ELTHE evaluation workshop was held in Glasgow University on 19 Sept 1996. Organised by Steve Draper, but supported both financially and administratively by TLTSN (the Glasgow University part of the regional TLTP support network). There were 34 participants altogether. Apart from the organisers there was little overlap in participants with the February workshop, implying that the UK could use at least two workshops a year but with variety of times and places to suit more people.
Feedback indicates that it was a definite success: no-one fell asleep after lunch, many seemed reluctant to leave, and remarks such as "I now feel refreshed about the whole business of evaluation" were heard. One reason for attending is for people who are the only evaluator in their group and need some peer interaction to overcome the isolation, or even training. However I am equally struck by how those who do have substantial experience and local colleagues in evaluation still find these workshops stimulating and useful. Too often workshops and conferences are conceived of as either for "dissemination" (meaning talk-only, don't listen) or for training (meaning listen-only, don't contribute), but these workshops remind me how important it is to share problems and solutions: to hear about other people's problems and wonder how your own methods would stand up to them, and hear about other people's solutions and wonder which bit of your own work might adopt them.
The workshop followed the plan tried out at the previous one, and in fact if anything went further along that road. While many "workshops" actually consist of a string of invited talks, these workshops have become mainly discussion while yielding just as much learning for participants (what I myself learned appears below). We had one proper invited talk by Sue Hewer, while Steve Draper's was cut short and all the indications were that even less monologue and more discussion would have been better, as participants organise their own topics. An evaluation scenario had been provided as a topic for the afternoon group discussions, but in the event only one of the five groups addressed it. Two of them addressed an issue not foreseen by the organisers, but important to many participants: how to do evaluation of WWW based materials. This was associated with a reshuffling of the groups in the light of this emergent interest. Feedback suggests that this spontaneous reorganisation had mixed effects: about equal numbers of people felt it was a good thing or a bad thing from their point of view. Besides the group discussions and main speakers, there were 6 short (5 min) talks at the start by participants who wanted to air their positions and/or problems, and which in part set the agenda. Below I have a section on the issues which came out of the day for me, and which were initiated by these short talks. After that are reports on each of the group discussion sessions. These were written by the coordinator or other participant, and have been given to group members for comments.
Warning to students who show initiative: "Self-directed learning equals self-destructive learning, because the exam hasn't changed."
"He's terribly blinkered, but he is facing towards the light." (Better read the education literature if you don't want to be patronised.)
We could begin to do this by, for instance, first asking learners what they think they learned, and only then administering a post-test to measure how much they learned of what we had thought they should learn. In later iterations of the evaluation, we could do systematic tests of any new benefits brought up by students. Similarly, as John Cowan pointed out, we can ask which outcome or learning objective is valued or valued most by students, and how they learned them (not just whether they did).
If I ask people, at least those of my age (I'm in my 40s), what was most important about their university education they tend to say things like "living away from home for the first time" or "meeting people with seriously different opinions than those I had met" or "having my views respected and taken seriously". They do not say "learning Maxwell's equations" nor "learning to write essays and give talks". If you look at those statements, you can see how relevant they are, not just for personal significance, but for many jobs; yet current learning objectives and notions of personal skills in HE fail to address them. Perhaps students do know more than teachers about the value of what they are learning, and we should ask them and attempt to measure it.
Similarly we could not measure the value of workshops such as this by fixed
post-tests and an instructional design perspective: no-one knew before the
workshop what I was going to learn; and the learning outcomes, though
considerable, were probably different for different participants.
Thus small, medium, and large scale CAL may have quite different
characteristics for evaluation, for funding, and for issues of institutional
change.
Another advantage of evaluating becoming standard practice is that as data for
successive years accumulates, it becomes more valuable: exam results can be
added to the evaluation measures, the chance that any good results were a fluke
of special students, novelty, particular care by the teacher and so on is
reduced and the likelihood that a stable result is being seen is increased.
Furthermore in a number of cases the best results are only seen in later years,
as the experience and evaluation of the first year are used by the teacher to
make further improvements. All of these are reasons for making evaluation part
of permanent practice, and for viewing reports of one-off evaluations as rather
less informative. From this perspective, too, educational evaluation should be
seen as more like meterology and less like landmark scientific experiments;
less to do with once-in-history discoveries and more to do with ongoing sets of
measurements that are both of immediate use locally and build up large datasets
for wider conclusions.
Conclusion:
We know about available evaluation methods BUT how do we apply them?
especially: remote evaluations (WWW etc.) and where there are; constraints
(people, time and money).
For those who felt they had achieved the IT Baseline already there is an
assessment only option. Two versions of the courses are available:
1) Scheduled in individual departments or faculties
2) Open access in the central computing labs
GROUP DISCUSSION
We focussed on the statement from the course designer:
"....the idea is supposed to be EDUCATION not TRAINING. .......... It is
intended to make people AWARE of how IT can enable them to carry out their
STUDY TASKS more EFFECTIVELY." We then discussed the necessary resources, the
areas to be addressed and the instruments to be used.
1) RESOURCES FOR EVALUATION. We hoped that these would be adequate for a comprehensive evaluation of the course, as the course was to be an important resource for all university students.
2) AREAS TO BE ADDRESSED
IT Baseline
Skill Level
Attainment of appropriate skills
Application of skills
Subsequent use of skills
IT contribution to future study skills (students' perception of utility and
reality)
Effective course design
Actual learning time
Departments' perception of student attainment
Skills omitted
"Fear" of Computers (overcome or not?)
DATA required on from all courses and different versions of the courses including the Assessment only option.
3) POSSIBLE INSTRUMENTS FOR EVALUATION
All Documentation on Course and Students
Pre and Post Questionnaires
Delayed Questionnaires
Interviews of students and Departments
Obsevation of students
Sample and examination of examples of assessments
Computer logs
CONCLUSIONS:
The evaluation must attempt to determine if what the course designer wanted is
achieved.
It must determine if students attain appropriate skills which they can and do use effectively in their studies.
Also the skills obtained by the students need to be shown to be
appropriate/acceptable to the department's requirements from which those
students come (otherwise there may not be a case for a centrally run course),
i.e. the improvement should be observable and able to be capitalised on by the
individual departments as opposed to just useful to the student alone. N.B.
how this discussion, but focussing on testing "skills" seems to have gone back
to interpreting the course as aimed at training not educationl Perhaps this
shows how vital it is for real evaluations to have access to the teacher in
order to elicit both their aims and objectives, and how to test them.
* Confidence logs are described in Draper et al, Observing and Measuring the
Performance of Educational Technology. They are based on the learning outcomes
of the situation and are administered as a pre-test post-test.
* Could use interviews to find out why they are confident or not confident.
* Use diaries to highlight changes in confidence or areas that could be worked
on
* There is a bubble dialogue technique (any references to this?). This would
allow students to enter their confidence levels using pictures with few words.
This may be less threatening than other techniques.
We identified the following areas that are necessary for an evaluation. Some discussion is added to each of these areas.
Identify who the stakeholders are in the Evaluation. In our scenario it was JISC, Project Team, Lecturers, Students, Content Contributors, Others. Should prioritise the stakeholders when there is such a large group and have a core group that will receive priority.
What are the likely influences on student learning. What should you be trying to measure? It is better to know at the beginning of an evaluation what might affect students, so the evaluator can be aware of it and also students can have the opportunity to comment on it.
There maybe some instruments that are easier to use on the Web. Questionnaires from remote sites are easier to administer on the web. The web does record the pages that people view. This could be useful information. But access logs on their own are not really "an evaluation" - you have to do something with them and be able to interpret the results - frequent access to one part of the courseware might actually indicate a problem with navigation and offer the possibility of a shortcut or streamlining.
I think that questionnaires are best administered by paper, especially if there are open ended questions. This allows all students to comment, those not used to computers may not be comfortable responding to questionnaires on computers.
The speed of the network may be an issue for Web based applications. This could be measured by timing how long files take to be delivered and then asking students if this is adequate.
Those least familiar with the work of TILT's Evaluation Group asked what were the evaluation tools used, where could they be found, and how could they be used. All had genuine evaluation jobs to be done, and wanted help by the end of the afternoon. All but one of the group had some degree of interest in evaluating use of the Web.
The need to evaluate is: Summative - to establish how long it takes to earn a CPD point by this method of study; Formative - to help each subsequent CD-ROM to be more effective, and perhaps to help them make them into commercial products for export; Illuminative - to help Jane study her personal interest, probably for an MSc, in how good this is as an educational tool.
There are some ideas on changes to the integrated learning experience that evaluation may inform, e.g. the need for incentives and motivation to use the CDs, and for assistance such as a telephone helpline.
The learning system is unusual compared with most Higher Education in that there is no teacher and no assessment. The learners are professionals who probably do not like to admit ignorance. They tend to subscribe to the scientific paradigm of knowledge generation, so may respect the use of quantitative questionnaires and statistical analysis, but not an interpretative method of drawing conclusions. There will be an opportunity at a big conference/workshop in December to launch the CD-ROM series and carry out some evaluation. After that it may not be very easy to evaluate the users.
We felt that the following evaluation instruments would be appropriate: Computer and task experience questionnaires; Observation (e.g. of a group at the workshop - we discussed incentives to make them spend at least 30 minutes with the CD); Focus group at the workshop, followed by individual semi-structured interviews; Post-task questionnaires; Survey of other resources from which they update their knowledge and skills; Questions to establish what types of learners they tend to be; Comparison with learning theories of CPD (Laurillard's model not appropriate?). In the plenary discussion John Cowan suggested that the learners would be cooperative if asked to draft a notional letter to the Royal College on how this aspect of CPD could be improved.
We felt that rather than identifying particular instruments or methods, the issue was one of practice. Reflective evaluation should become a habit within teaching and learning practice: Part of the culture sort of thing... This can seem a bit of a vague and lofty aim, and not very useful as a directive though experience suggests that it doesn't take long "in the business" before each of us realises that that, for better or worse, is how it is. However, this doesn't mean that there are no good guidelines to hand and we discussed these - the "TILT instruments" are tried and true and can be adapted to many classroom uses of CAL. CMC resources may (or may not!) need a different approach - this was picked up during afternoon sessions. There was felt to be a need for guidelines to practice that would cover the development as well as the implementation of learning programmes. Erica had found Judith Calder's book for open and distance learning systems pretty sensible, Judith (George!) seconded this and knew of others - it was agreed that she would mail a list round.
It is difficult to encapsulate the interactions over the rest of the session - especially as I didn't pick up that acetate at the end of the day! There was general agreement that gathering information from the various perspectives of interests (Who Are The Stakeholders?) is obviously important; that feedback between immediately interested parties (developer, teacher, evaluator, students if appropriate) should be quick and even casual, though not at the level of change and change back with every comment!; that some of the responsibility and effort of the exercise should be taken on by the users; that reflection seems to imply ownership in some sense.
There was a fair amount of off the cuff exchange of problems and solutions - in parallel with the coordinator too involved to take multiple notes, and we agreed that such gatherings were really useful to new and old alike, as we questioned our own wisdoms in the face of others' questions.
The breadth of that title indicates how much our individual remits varied. Just looking round at our own examples it is clear that CMC learning resources can differ with respect to: the nature of the communication itself; the situation within which the communicating task is undertaken; the purpose of that task; the nature of the task; the output of the task.
Devising any "standard methodology" is a bit of a challenge - if only for the relevance of whatever is devised! But some standard axioms of evaluation apply - e.g., considering efficacy in terms of purpose. Thus requiring careful consideration of purpose, and suitability of programme for purpose (bit like looking hard at learning objectives then wondering why you thought a particular teaching practice might achieve them!) And given that in a fairly new field such as this, not much experience on which to predicate - it seems reasonable to take some trouble to closely monitor situations of use - given that these are likely to vary widely for any specific CMC resource.
This should be done "from the qualitative to the quantitative" - interviewing (telephone even!) and facilitating focus group sessions of teaching staff, student users, and other protagonists - separately might be better, but not necessarily - to achieve a "story" of the resource from its different perspectives of use. The keeping of logs or use diaries may be sensible - or not. The thing is to establish what is critical, by asking the users. Where these users were spread around various sites, then locate the evaluation at site.
Then if more global data is required, questionnaires can be devised which address critical issues from the different perspectives, perhaps providing options for choice - N.B. to include some open response questions.
Methods for this were discussed - for some of us there was no problem as the resource had fairly closely defined use and users, who could be located and addressed easily. Others, for whom the target users were all inclusive ("the world") had problems which required both technological and social research knowhow to address.
There are issues for on-line questionnaire techniques which seriously mushroom when it is a question of generally accessible web resources.
Two issues raised by the group: "Anonimity culture" and "Techno proficiency barrier". I'd really like some examples here, to flesh out my "intuitive" understanding of what they mean for evaluation strategists.
We agreed it would be useful to keep in touch, and to pass experiences along the line. Whilst no truly global tools are likely to emerge, the notion of "Cluster Evaluation" (Kozma and Quellmalz 1996) might work - a portfolio of diverse yet related projects whose features can be evaluated from common perspectives. Points for clustering could include primary goals, educational approach (e.g. online seminars, project work), intended participants, context of use, technology type...)
The discussion then moved to a related set of points about what evaluation should be: early in development, involving many stakeholders and beginning with their concerns whether explicit or hidden. That is, it was argued (by Philip Crompton) that evaluation should begin by identifying the motivation behind the development of the courseware and behind doing the evaluation: unless an undertaking (e.g. writing a new piece of software or getting students to learn IT skills) has clearly defined aims and objectives how can we carry out a clear and realistic evaluation? Identifying the aims for the intervention, the stakeholder(s) in the intervention and the participants in the intervention at the beginning is crucial. The same applies to an evaluation: unless you know what you want to find out and/or what others expect you to examine, and what might be changed as a result, then how can you proceed? (Actually I don't fully agree with this. Often the most important findings are those that were not expected, and were not wanted or looked for deliberately. This is related to the issue of looking for unspecified educational gains: see above.)
Finally we discussed the notion that learners should become evaluators in their own right, and how that could be seen as part of the idea of evaluation becoming, not a specially commissioned enquiry, but part of permanent ongoing practice (see above).
During the discussion it was observed that there are often opposing objectives in evaluation of courseware - to demonstrate improved quality of learning and to demonstrate reduced costs. The primary aim of the evaluation should be clear.
In order to demonstrate improved pedagogy, the effectiveness of achievement of the learning outcomes needs to be assessed. Pre- and post-tests were suggested as a means of assessing this, with three graded bands of questions starting off fairly simple and getting progressively harder, so that even at the pre-test, the student would be able to answer some questions. The desirability of anonymity was recognised to save any students' embarrassment, and some sort of coding, perhaps using matriculation number was suggested.
In order to evaluate the best way of implementing the courseware with a view to reducing costs, it was suggested that the courseware should be implemented in a staggered fashion with different classes, i.e. 3 classes + 1 unsupervised lab for one class, 2 classes + 1 supervised lab + 1 unsupervised lab for another class, and 1 class + 2 supervised classes + 1 unsupervised lab for another class. Exam marks and the views of the students could then be examined for the different approaches. This might point the way to identifying the optimum way to integrate the courseware with complementary teaching delivery including the probably essential human teacher component. It was pointed out that there are many other factors involved in the delivery of a course and a more holistic approach to such evaluation is often neccessary.