Friday 4 April 2014

Survey Testing

While developing the collection survey for the placement program, two revisions were pilot tested. The surveys were tested with a large group of participants (~30 for the first one and ~70 for the second one). This was partially done to involve students in the development process for the program. The first revision (here) included a number of features that proved to be problematic.

The personality and skills sections formed the main body of the survey. Each contained six questions. For each question participants were asked to rate themselves on a scale from 1 (least like me) to 10 (most like me), and then rank the relative importance of the questions from 1 to 6. The numbering from 1 to 10 did not work as most participants equated 10 with good and 1 with bad, leading to very skewed results. Additionally, the importance ranking was rarely applied correctly. In retrospect, the differing scales 1 to 10 and 1 to 6 should have been enough for me to realize that participants would struggle to understand what was expected.

There was a basic demographics section consisting of six questions and the same importance ranking as the personality and skills sections. The section turned out to be too basic; participants wanted to supply more demographic information about themselves. The ranking was more frequently applied correctly in this section, but it was still a source of trouble in the survey. The significance of the ranking was frequently misunderstood, and every participant ranked location as the most important factor.

The open-ended questions worked fairly well. They highlighted some of the problem areas and suggested features that needed extension. Unfortunately, they were also heavily dominated by comments along the line of '...but what about my particular case that is unique and special...'.

The second revision of the survey (here), attempted to resolve some of the issues from the initial survey. Additional information was added in the form of more explicit instructions on the first page and a section detailing special case restrictions on the third page.

For this survey the demographics section was expanded and the importance ranking was dropped. Questions in the personality and skills sections were reworded slightly to give a better sense that responses lie on a spectrum, as opposed to being either right or wrong. More importantly, the number scale 1 to 10 was eliminated and replaced with the language 'least like me' and 'most like me'.

Unfortunately, the importance ranking continued to be a frequent source of confusion. In general, the process of collecting two pieces of information in one section (ranking and self-rating) does not work well on a paper survey. When the survey was shifted to a digital format, those components were separated out onto two consecutive pages. In the future I would separate the components for the paper test as well.

The open-ended questions were unchanged, but the other changes to the survey led to more helpful responses in that section. Compared to the first survey there were more responses indicating useful extensions to, rather than highlighting the obvious failures of, the survey.

The final version of the survey that appears on the website is different again based on the results of the second pilot test. The demographics section has been expanded further, now collecting 18 items in 10 questions. Question text has again been reworded in some instances. And, finally, all numbers have been dropped from the survey. Questions are not numbered and all responses are on a scale presented without numbers.

1 comment:

  1. Thanks for sharing your process Jeff. Great look at the changes you made and why you made them. This is a wonderful example of something that will actually be put into practice which makes things even more worthwhile.

    ReplyDelete