How machine delays change user strategies

How machine delays change user strategies

Paddy O'Donnell & Stephen W. Draper
GIST (Glasgow Interactive Systems cenTre)
Department of Psychology
University of Glasgow
Glasgow G12 8QQ U.K.
email: steve@psy.gla.ac.uk

Preface

This was accepted as a short paper for HCI'95.

Abstract

As machine response delays vary, the most important effect on users may be not their annoyance but that they change the way they use an interface. Even the very simple task of copytyping three digit numbers gives rise to at least three different user strategies (i.e. procedures). However the effect seems not be a simple function of delay length, contrary to earlier reported work. Instead users are probably shifting between strategies more fluidly.

Keywords: time, system delay, user strategies

1 Introduction

Casual experience suggests that machine response delays, where users are kept waiting, cause annoyance. Initial attempts to provide a theory suggested that the best interface was the one with the shortest response times (Dannenbring 1983, Barber & Lucas 1983, Martin & Corl 1986, Lambert 1984). Later work focussed on predictability, the ability to assess the length of the delay as a function of the type of user action, and feedback during the delay. For example a mouse click might change the cursor to indicate a task in progress, or an execute command in a statistics package might display a progress indicator showing how much of the task still awaits completion (Rushinek & Rushinek, 1986).

More recently Teal & Rudnicky (1992) offered another view: instead of focussing on annoyance and how it depends on duration or unpredicability, they suggested that delays can change user strategies i.e. the procedures users select for a task. Users do not necessarily sit passively, if impatiently, waiting for the interface to accept the next command, but organise their behaviour around delays, trying to anticipate the machine's behaviour. Teal & Rudnicky reported two experiments whose results they interpreted as showing that users had a choice of three strategies for the task, and that the strategy selected depended simply and directly on the magnitude of machine delay relative to an individual's typical reaction time. We set out to replicate and extend Teal & Rudnicky's findings, but encountered more mixed results suggesting that other issues also affect a user's choice of strategy.

2 The task

Teal & Rudnicky's task is one of the simplest in which such flexibility of user "strategy" (i.e. the detailed method a user adopts for the task) can arise. The task is the routine data entry by copytyping of 3 digit numbers from a printed list. After entering a number (3 number key presses followed by <return>), there may be a delay as if the machine were processing it before the prompt appears and the next number may be keyed in. Premature keying results in an error. The delay is held constant throughout each block of many trials (i.e. numbers being entered), so that the user can get very used to it. Different blocks, with different fixed delays, allow any differences in user strategy to be studied. Because the task is so short, users are in effect trained over the first 15 trials and the remaining 35 trials in the block are used to observe their stable strategy as experienced users of that machine setting.

Three user strategies (i.e. methods) for this task appear. The first is the "automatic" strategy suitable for negligible delays, where the user enters numbers as fast as they can copy them, without checking the screen for the prompt. The third is the "monitoring" strategy suitable for very long delays, where the user enters a number, reads the next one, looks at the screen until the prompt appears, then enters the next number. Between these is the "pacing" strategy, where the user does not use the perceptual cue of the prompt but instead estimates the delay needed before typing the next number. This can be faster for some delay lengths because, while avoiding (most) errors from premature typing, it saves moving the head and eyes to look at the screen for the prompt, and saves the time taken to react to seeing the prompt i.e. the perceptual and other processing causing a response latency of at least that of simple reaction time. If users are good enough at estimating they can in principle cut this wasted time gap to zero.

The variables used to show changes in user strategy are the initial keystroke latencies and anticipation errors. Subsequent keystroke latencies and other keying errors are also measured.

Initial Keystroke Latency (IKL): The time that elapses between when the computer signals it is ready for the next input and when the user actually strikes the first number key of the number being entered (i.e. the user delay).
Anticipation Errors (AE): When the user attempts to enter a keystroke prior to the system becoming available for the next user input.
Subsequent Keystroke Latency: The inter-keystroke times for all user inputs other than the initial keystrokes.
Keying Errors: All performance errors other than anticipation errors (e.g. transcription errors).

Teal & Rudnicky argued that in their two experiments, users moved between the strategies as a function of machine delay, selecting the strategy which is most efficient. Their results supported the appearance of the three different strategies and showed the approximate machine delay ranges in which they appeared:--
Automatic strategy: 0.00 - 0.625 secs
Pacing strategy: 0.625 - 2.25 secs
Monitoring strategy: > 2.25 secs.

In fact, as they argue, the optimum switchover point should (and apparently does) depend on the individual subject's speed of reaction. It is only worth pacing when the machine delay is longer than the user's delay: in particular, than the user's natural time between pressing <return> on the previous number group and pressing the first digit of the next number group. (This time is presumably consumed by reading the next number, deciding on the motor action, and actually pressing the key; less any time saved by overlap e.g. reading the next number while keying the previous group.) This user characteristic k can be measured as the subject's modal IKL on a block with zero machine delay i.e. their reaction time in this task when machine delays are removed.

3 Abbreviated method

The basic design is within-subjects, varying the machine delay i.e. every subject experiences every delay length. A pre-test block with zero delay is run to measure k; then 12 blocks are run with delays in the range k-0.25 to k+2.5 seconds at 0.25 second intervals. Subjects are randomly assigned to different orders, which are taken from a N*N Latin square design to counterbalance potential ordering effects. The computer recorded times and errors as well as displaying prompts and error signals, while the data to be copied was presented on a sheet of paper on an upright typing stand in a fixed position.

As mentioned, each block consisted of 40 trials (one 3 digit number per trial) with a constant machine delay. The first 15 trials were discarded to allow users to settle on a strategy (they had no other way of discovering the delay within a block), and the last 25 trials were used as data for analysis. This was based on Teal & Rudnicky's experience that 15 trials was entirely adequate for stabilisation. When subjects made an anticipation error, the associated IKL was removed from the totals used for calculating means and standard deviations for IKLs. (These were rare enough for this not to threaten sample size.)

4 Our Experiment

We attempted replication of the Teal & Rudnicky results with 12 subjects, aiming to produce the three different strategy regions i.e. an automatic strategy region, a pacing region and a monitoring region.

Figure 1 shows the modal IKLs and the AE rates obtained in each delay condition, comparing our results with Teal & Rudnicky's. The graph is marked for the interpretation of which strategy dominated in each band of delays. The peak was thought to correspond to a transition region with no uniform strategy between automatic and pacing. Our data is clearly not very similar to theirs, although the supposed pacing region does correspond quite well.

Figure 1: Our results plotted with those from Teal & Rudnicky (1992).

Statistically, evidence was sought for distinct regions, beginning with a one-way ANOVA comparing all 12 delay conditions with each other. Unlike Teal & Rudnicky, we failed to find any significant differences for the IKL measure, although we did for AEs (F[11, 121] = 2.68, p < 0.05). A Tukey HSD was therefore carried out for the AE rate to find any significant differences between delay regions. This produced two regions -- region 1 [k-0.25, k+0.25] and region 2 [k+0.25, k+2.25] at p < 0.05.

For IKLs a Tukey's HSD could not be executed as the data sample sizes were unequal (due to exclusion of entries on which an anticipation error was made), therefore a planned comparisons procedure was carried out. By comparing all delay conditions separately and then identifying groups significant regions were identified. For IKL two regions were produced -- region 1 [k-0.25, k+1.00] and region 2 [k+1.25, k+2.50] at p < 0.05. The next region identified for the anticipation error rate is k+0.25 to k+2.25. Within this region initial keystroke latencies and initial keystroke latency standard deviations have overlapping regions, [k+1.25, k+2.50] and [k+1.00, k+1.75] respectively. In this region of overlap all the variables remain relatively low and stable, showing the use of a pacing strategy.

5 Discussion

These results do give support for some of the strategies in the cost-based strategy selection model proposed by Teal & Rudnicky(1992). There is use of the pacing strategy and evidence indicating use of the monitoring strategy. Even though the IKLs are not significant using the ANOVA (p < 0.05), certain regions are identified using the planned comparisons procedure, and a distinct pattern of results is obtained. The IKLs do not show evidence of an automatic strategy, which is shown by the very low initial keystroke latencies in Teal & Rudnicky's results.

Why the differences, and what does this say about the main issue of user selection of alternative strategies as machine delays vary? The first point is that when we recreated the experiment it became introspectively obvious to all who tried it out that there are indeed three strategies. This is much clearer from introspection than from reading accounts of the experiment. It is much less clear, however, how and when users select or switch a strategy. One issue then is the attempt to use the traditional measures of time and error to describe and examine these strategies. It does not seem possible to ask users to report their strategies systematically on each trial, as this would take a long time compared to the task, and since the task is meant to be highly practiced, continuous verbal reporting is very likely to interfere with it.

The next point is that, even on the Teal & Rudnicky account, there is not in fact always a stable fixed strategy. Their interpretation of the large peak of errors and time is that this is the transition region where users oscillate between automatic and pacing strategies, and subjects' verbal reports back this up. In fact strategies may seldom stay constant throughout a block, let alone across all subjects for a given block. For instance, whenever an error is committed the subject is more or less bound to look at the screen while they correct the error, and will probably continue to look at the screen frequently for the next trial or two: but this in effect means they use the monitoring strategy during the recovery period, and will probably show markedly slower IKL times for a few trials. A promising direction for future work, then, is to develop an analysis that can identify and isolate such sequences within blocks.

Errors are particularly important in the pacing strategy, where subjects are estimating how long to pause before typing without checking that the prompt has appeared. If subjects pause too long they waste time and make the strategy less worthwhile, but if they pause too little they commit an error. Errors incur time penalties which both motivate the subject to be more cautious in future, but also distort the summary measures. Caution may simply make subjects increase their estimate of the pause duration needed, but they may also tend to make them switch strategy (to monitoring) either for a few trials or permanently.

This issue was probably more important to us than to the original studies, because our setup imposed a greater penalty on users for each error. In Teal and Rudnicky's experiment subjects, although instructed to avoid errors, did not have to correct an incorrect number once entered. In our replication the computer would not move on to the next number until the previous number had been entered correctly (which corresponds better to normal use). Therefore, besides getting an error signal, a subject has the additional trouble of stopping and correcting the wrong entry before moving on to the next number. Presumably, then, subjects will try harder not to make errors, leading to the generally lower AE rate we obtained. This can be done by maintaining higher IKLs, and our results also show that the replication generally has higher IKLs than in Teal & Rudnicky's experiment.

Similarly we believe that our failure to observe clear evidence of the automatic strategy was largely due to significant subject fatigue, leading in turn to a kind of strategy switch within the block. A pure execution of the automatic strategy would mean neither pausing nor glancing at the screen throughout the block. In fact more natural and typical behaviour is to pause at times for various reasons: to rest, adjust posture, look twice at the copy sheet, check the screen for any evidence of errors. If this anomalous pause time is simply added into the IKL means, this will be misleading to simple analyses. The high variances we observed are consistent with this view.

6 Conclusion

Although we believe that machine response delays often do change user strategies, these are also affected by other factors such as the (subjective) cost of errors, and furthermore probably also change on a finer time scale as users recover from errors, get distracted for a moment, or perhaps choose to lower their error rates by pausing for extra checks periodically. Furthermore there are important individual differences within our data (one subject had 100% error rate on one block, as if not understanding something about the situation). In all these ways, this miniature problem is in fact representative of an under-analysed aspect of HCI: that typically users have multiple alternative methods at their disposal, with different performance consequences (e.g. execution times), and that they switch between them frequently for reasons that are not easy to identify. This suggests that work in this area is a long way from producing recommendations for designers, apart from raising awareness that what looks like a single, simple design in fact has turned out to support three different user strategies with different requirements: in the automatic strategy the user does not use the prompt, in the pacing strategy the user employs their skill at estimating time durations and an audible prompt would be helpful for correcting that estimate, while in the monitoring strategy the visual prompt is central.

Acknowledgements

This work was carried out as part of the JCI (Joint Councils Initiative) project "Temporal Aspects of Usability" no.9201233. We are grateful to Lindsey Millar who ran the experiments reported here, and to Steve McGowan for the software.

References

R.E.Barber & H.C.Lucas (1983) "System response time, operator productivity, and job satisfaction" CACM vol.26 no.11 pp.972-986.

G.L.Dannenbring (1983) "The effect of computer response time on user performance and satisfaction: a preliminary investigation" Behavior research methods & instrumentation vol.15 no.2 pp.213-216.

J.Gibbon (1977) "Scalar expectancy theory and Weber's law in animal timing" Psychological Review vol.84 pp.279-335

G.N.Lambert (1984) "A comparative study of system response time on program developer productivity" IBM Systems journal vol.23 no.1 pp.36-43

G.L.Martin & K.G.Corl (1986) "System response time effects on user productivity" Behaviour and Information Technology vol.5 no.1 pp.3-13.

A.Rushinek & S.F.Rushinek (1986) "What makes users happy?" CACM vol.29 no.7 pp.594-598

S.L.Teal & A.I. Rudnicky (1992) "A performance model of system delay and user strategy selection" Proc. CHI '92 pp.295-305