While developing the collection survey for
the placement program, two revisions were pilot tested. The surveys were tested
with a large group of participants (~30 for the first one and ~70 for the
second one). This was partially done to involve students in the development
process for the program. The first revision (here) included a number of
features that proved to be problematic.
The personality and skills sections formed the main body of the survey. Each contained six questions. For each question participants were asked to rate themselves on a scale from 1 (least like me) to 10 (most like me), and then rank the relative importance of the questions from 1 to 6. The numbering from 1 to 10 did not work as most participants equated 10 with good and 1 with bad, leading to very skewed results. Additionally, the importance ranking was rarely applied correctly. In retrospect, the differing scales 1 to 10 and 1 to 6 should have been enough for me to realize that participants would struggle to understand what was expected.
There was a basic demographics section consisting of six questions and the same importance ranking as the personality and skills sections. The section turned out to be too basic; participants wanted to supply more demographic information about themselves. The ranking was more frequently applied correctly in this section, but it was still a source of trouble in the survey. The significance of the ranking was frequently misunderstood, and every participant
ranked location as the most important factor.
The open-ended questions worked fairly
well. They highlighted some of the problem areas and suggested features that
needed extension. Unfortunately, they were also heavily dominated by comments
along the line of '...but what about my particular case that is unique and
special...'.
The second revision of the survey (here),
attempted to resolve some of the issues from the initial survey. Additional
information was added in the form of more explicit instructions on the first
page and a section detailing special case restrictions on the third page.
For this survey the demographics section
was expanded and the importance ranking was dropped. Questions in the personality and skills sections were reworded slightly to give a better sense
that responses lie on a spectrum, as opposed to being either right or wrong. More
importantly, the number scale 1 to 10 was eliminated and replaced with the
language 'least like me' and 'most like me'.
Unfortunately, the importance ranking continued
to be a frequent source of confusion. In general, the process of collecting two
pieces of information in one section (ranking and self-rating) does not work
well on a paper survey. When the survey was shifted to a digital format, those
components were separated out onto two consecutive pages. In the future I
would separate the components for the paper test as well.
The open-ended questions were unchanged,
but the other changes to the survey led to more helpful responses in that
section. Compared to the first survey there were more responses indicating
useful extensions to, rather than highlighting the obvious failures of, the
survey.
The final version of the survey that
appears on the website is different again based on the results of the second
pilot test. The demographics section has been expanded further, now collecting
18 items in 10 questions. Question text has again been reworded in some
instances. And, finally, all numbers have been dropped from the survey.
Questions are not numbered and all responses are on a scale presented without
numbers.
Thanks for sharing your process Jeff. Great look at the changes you made and why you made them. This is a wonderful example of something that will actually be put into practice which makes things even more worthwhile.
ReplyDelete