This is an entry page into pages on A&F (assessment and feedback) in HE.
This began with my involvement with the REAP project (April 2005 - July 2007);
and with followup work.
Students
See also a 2-part report on practical advice on giving learners feedback by
Thalheimer:
Part 1
Part 2
In this draft, I'm writing this section egocentrically, referring to practices
in this psychology dept., which is an essay-based discipline. I believe the
points are general, but here I'm not writing to bring this out.
The idea here is NOT to offer techniques for assessment BUT to provide
a clear statement about the conflicting criteria which any assessment must
satisfy or compromise over.
This is necessary to have any rational thought, let alone discussion, about
choices in decideing on assessment design. Most of the literature is lacking
this.
a) Criteria / requirements / dimensions of merit / aims / constraints:
all of which independently apply to any assessment design.
List EXPLICITLY the key criteria that have to be considered, both the
naively aspirational educational slogans, and also the unspeakable but real
constraints. What is hard about redesigning assessment is that there isn't one
thing you want to improve, but how to optimise, or at least
satisfice (reach an acceptibility threshold), multiple
requirements that often conflict. This is made much harder by some not
being written down in public, and so not discussed rationally by staff. (There
is a provisional list below.)
b) Metrics:
For each of these criteria give a measurement scale that shows teachers the
degree to which it is satisfied. E.g. if you want to raise the NSS score,
then the NSS subscale is the measure (and could be administered every semester
by a course team). If you want to improve learning, then you must show (for
instance) grade rises year on year to demonstrate whether or not you
succeeded.
c) Marks:
I will also occasionally mention the marks or grades given to students as the
result of an assessment activity, to point out what they would (logically)
mean if they were to represent that educational aim (criterion) for the
assessment design.
Draft list of assessment constraints / dimensions
- Learning from doing.
The single biggest use of assessment at the moment, which however is never
mentioned in most literature on assessment, (is not to measure student
knowledge at all, but) is to mount an activity which is powerfully
"mathemagenic" (productive of learning). We learn mostly by doing; it often
doesn't matter whether it succeeds or fails, it often requires NO feedback
by staff (contrary to what Laurillard says), just the internal changes that
happen when we plan and attempt something new.
Reword?
At a simple level, the whole of the Maths presentation at the workshop was
about the large demonstrated learning benefits from persuading students to
actually do some maths work every week. Conversely, students generally report
learning a lot from doing their final year project, although we don't measure
this. Their whole redesign addresses this criterion.
Metric:
The metric for satisfying this design criterion/aim is how much the student
learns from the activity, pre-to-post.
Mark: essentially this measures attendance (engaging in the
learning activity with reasonable sincerity), if it aligns with this aim.
- Produces information that is useful to the student.
- The information might be used either formatively or summatively.
- It may be based on:
- A human judgement
- A fact (right answer), independently confirmable elsewhere
- Or most powerfully, the degree of success of a construction
(a bridge you built, a cake you baked).
- It may be in the form of:
- a mark/grade,
- written comments,
- It may be only an internal effect of changing the learner's
degree of certainty / confidence in knowing something.
Here are three kinds of assessment to do with this:
- "Catalytic assessment" (Draper, 2009b) and peer discussion
in general is one way of problematising confidence: the
learner wonders if they have got it right, and is likely
to work later to resolve it.
- Formative tests: typically these have many items, and which items a
learner fails shows which topics they need to direct further effort to.
- Reassurance quizzes: essentially these are like formative tests
in that any missed items show something that needs further work, but
students may mostly use the overall score to tell then whether they
are on the right track or have mistaken a lot and need to do a major
redirection of effort.
[2.2] But the often neglected further issue is: to which use is it put?
As argued in Draper (2009a), egocentric academics hold whole conferences on A&F
while presupposing that the only use is to improve the technical knowledge of
the learner. Each type of learner use of assessment and feedback is in fact an
independent criterion for designing an assessment, so that it produces that
information. Thus this one sub-criterion of providing informaton useful to the
learner in fact produces six alternative independent criteria, all desirable.
Draper,S.W. (2009a)
"What are learners actually regulating when given feedback?"
British Journal of Educational Technology vol.40 no.2 pp.306-315
doi:10.1111/j.1467-8535.2008.00930.x
Draper,S.W. (2009b) "Catalytic assessment: understanding how MCQs and EVS can
foster deep learning"
British Journal of Educational Technology vol.40 no.2 pp.285-293
doi:10.1111/j.1467-8535.2008.00920.x
One list of learner uses follows.
- Self-regulate and allocate the learner's limited time and effort: if I
got a B grade, then I needn't think about this topic any more. Spend less time
on what I'm good at, more on what I am struggling with. As used in "mastery
learning", and its use of formative testing to focus remedial learning each
week, this brings large gains.
Another form of this is "catalytic assessment" (Draper, 2009b):
designed, like a brain-teaser, to signal to the learner that this is something
they don't understand yet but want to.
- Decide future courses, based on what I did well on in the past.
Spend more time on what I'm good at, drop what I struggle with.
Our educational system requires students to make choices, but we fail to
design assessments to support that choice optimally.
- Decide on the quality of the marker. Seek out other opinions.
- Improve the learner's technical knowledge.
- Decide whether and how to adjust my learning method.
- The mark may be interpreted by a learner as feedback on their learning,
revision, and exam technique as a whole process.
Metric: measures of pre/post change in information picked up by the
learner.
- Cost to staff (in time, mostly).
Metric: Staff-hours on the assessment.
- Defensiveness against student complaints, which cost both school
and senate office staff a lot of time and trouble. This criterion has always
been the main problem obstructing useful feedback from exams.
Metric: Staff-hours / money spent on complaints and appeals.
- A measure for employers
to use to discriminate amongst job applicants.
Metric: (Validity, reliability, and ..) One metric is variance. E.g. Coursework not only has a higher
mean mark, it has a lower standard deviation which makes it of considerably
less use in discriminating capability.
- A measure of competence:
(if you want this, use senate schedule C for pass/fail course marks; if you
don't then don't moan about competence assessment as an aim).
Metric: (Validity, reliability)
Mark: Pass / fail.
- A measure of how much specific knowledge a student has.
Our level 3 stats exam does this well; our other level 3 exams (1-hour essays)
do not, because they offer a choice of questions each of which requires only a
small proportion of the course's content knowledge.
Metric: (Validity, reliability)
- A measure of generic discipline skill.
Exam essays are our instrument for this, and quite good at it. The main
criterion is: to what extent is the essay written like a psychologist?
There are low level skills we teach but don't use much assessment on
measuring. Then (mid-level) we assess specific content knowledge (facts and
concepts rather than skills). We, like most departments, focus most on the
ultimate, deep, high level skill of thinking and writing like a professional
in the discipline. It is why essays are fundamentally confusing to level 1
students because essays mean different things in each discipline: for a very
deep reason. The metric for this criterion is whether a given assessment
measures, usually tacitly, how well the candidate exhibits disciplinary
thinking (rather than reproduction of specific facts, names, etc.).
Metric: (Validity, reliability)
- Student enjoyment of the activity: Do students like doing it?
Giving students a choice of topic in an essay or project is motivated by this.
On all other criteria, a fixed topic would be better.
(Students may learn more if they enjoy it: that would be a positive secondary
effect. Equally, they may choose a topic that is least work to them: a
negative secondary effect on how much they learn.)
In choosing a topic for an assessment, students are in fact choosing part of
their curriculum: another deep educational issue disguised as an assessment
design choice by teachers.
Metric: student self-reports on enjoyment.
More sophisticated versions of this might ask for self-reports on how much
they feel they learned, and separately how much it corresponded to their
intrinsic learning goals (as opposed to required curriculum learning goals).
- Raise NSS scores for the A&F subscale.
There is generally little correlation between scores on the A&F subscale, and on
the overall course (programme) satisfaction, so there is no reason to think
that A&F contributes to learning nor to student satisfaction.
Metric: The NSS subscale: how much does it increase?
NSS: A&F scores don't affect the overall student rating of a course
Perhaps feedback doesn't make a difference to the amount of learning.
Teachers should have communicated it in advance, so feedback not necessary;
learners should know how to check and remediate their own learning, and not
rely on being told this.
F-Prompting seems to be SO important, transformative of whether students learn
from feedback.
The main problem seems to be that our students mostly do not have any concept
of learing from our written feedback: it doesn't occur to them to actively use
it.
Reflecting back on the success of REAP gave us some ideas on what does (and
does not) go into making a project effective at actually changing learning and
teaching in practice, and making it measurably better.
These papers are about this, and so effectively on ideas about how to design
and run large projects that bring about significant, large scale changes (in
areas such as A&F).
-
Transformation in e-learning
Draper,S.W. and Nicol,D. (2006)
The content of a talk given at ALT-C, Sept 2006
Local copy (PDF)
- Understanding the prospects for transformation
Nicol,D. and Draper,S.W. (2006?)
Local copy (PDF)
REAP website copy (PDF)
- A blueprint for transformational organisational change in higher
education: REAP as a case study
Nicol,D. and Draper,S.W. (2009)
Local copy (PDF)
A shorter version of this is in:
Transforming Higher Education through Technology-Enhanced Learning
ed. Terry Mayes, Derek Morrison, Harvey Mellar, Peter Bullen and Martin Oliver
(2009) (York: Higher Education Academy) ch.14 pp.191-207
Local copy (PDF)
REAP website copy (PDF)
- Achieving transformational or sustainable educational Change
Draper,S.W. & Nicol,D.J. (2013)
"Achieving transformational or sustainable educational change"
ch.16 pp.190-203 in
Reconceptualising feedback in higher education:
Developing dialogue with students
S.Merry, M.Price, D.Carless & M.Taras (eds.) (London: Routledge)
Local copy (PDF)