Saturday, January 17, 2009

Alternatives to Kirkpatrick

While the Kirkpatrick taxonomy is something of a sacred cow in training circles—and much credit goes to Donald Kirkpatrick for being the first to attempt to apply intentional evaluation to workplace training efforts—it is not the only approach. Apart from being largely atheoretical and ascientific (hence, 'taxonomy', not 'model' or 'theory'), several critics find the Kirkpatrick taxonomy seriously flawed. For one thing, the taxonomy invites evaluating everything after the fact, focusing too heavily on end results while gathering little data that will help inform training program improvement efforts. (Discovering after training that customer service complaints have not decreased only tells us that the customer service training program didn’t “work”; it tells us little about how to improve it.)

Too, the linear causality implied within the taxonomy (for instance, the assumption that passing a test at level 2 will result in improved performance on the job at level 3) masks the reality of transfer of training efforts into measurable results. There are many factors that enable or hinder the transfer of training to on-the-job behavior change, including support from supervisors, rewards for improved performance, culture of the work unit, issues with procedures and paperwork, and political concerns. Learners work within a system, and the Kirkpatrick taxonomy essentially attempts to isolate training efforts from the systems, context, and culture in which the learner operates.

In the interest of fairness I would like to add that that Kirkpatrick himself has pointed out some of the problems with the taxonomy, and suggested that in seeking to apply it the training field has perhaps put the cart before the horse. He advises working backwards through his four levels more as a design, rather than an evaluation, strategy; that is: What business results are you after? What on-the-job behavior/performance change will this require? How can we be confident that learners, sent back to the work site, are equipped to perform as desired? And finally: how can we deliver the instruction in a way that is appealing and engaging?

An alternative approach to evaluation was developed Daniel Stufflebeam. His CIPP model, originally covering Context-Input-Process- Product/Impact, and later extended to include Sustainability, Effectiveness, and Transportability, provides a different take on the evaluation of training. Western Michigan University has an extensive overview of the application of the model, complete with tools, and a good online bibliography of
literature on the Stufflebeam model. Short story: this one is more about improving what you're doing than proving what you did.

More life beyond Kirkpatrick: Will Thalhimer endorses Brinkerhoff's Success Case evaluation method and commends him for advocating that learning professionals play a more “courageous” role in their organizations.

Enough already, Jane! More later on alternatives to the Kirkpatrick taxonomy. Yes, there are more.

(Some comments adapted from the 'evaluation' chapter in my book, From Analysis to Evaluation: Tools, Tips, and Techniques for Trainers. Pfeiffer, 2008.)


Anonymous said...

I've always thought Kirkpatrick's model ("model" in the sense of "sketch," not as you say a scientific theory) was a handy heuristic, but not much more than than. Unfortunately, some people in the training field (maybe pressed for time, maybe unsure what else to do) cling to it like a nervous swimmer grasping the side of the pool.

I recently painted myself in a corner, during a conference call for a project I'm working on, when I said that level one evaluations don't tell anything about learning. While I don't think there's much of a case to the contrary, longtime trainers on the call -- who felt they gained valuable information from smile sheets -- saw this as an effort to cut off one stream of data about the classes they conduct.

I've used what I call "level 1.5" evaluation -- like an email questionnaire, when people are back on the job. One took the form of three or four dozen discrete skills related to both custom and off-the-shelf software. The basic form was, "How easily can you add a contract to an existing account?" (" an attachment received in email?" "...complete your expense report?")

Why level 1.5? On the one hand, it's still self-reported data; on the other, for skills close to the core of the job, the sales reps routinely reported 3.5 and above (on a scale with 4 at the top).

This was useful data for us as we continued the program, and for comparing with the client's more indirect indicators.

I hadn't heard of Stufflebeam, so I'm glad you wrote about this.

Don Bolen said...

Back in the last century (in the 80s), we used a similar approach that Dave is describing. Within our hospital-based training group, we asked participants on the course eval forms to list three things they learned. In some cases we provided the form approach: name the the x steps to performing y ...

This info was particularly helpful for the facilitators and in course design/development. We found that a significant percentage of the learners completed this form.

Don Bolen

Anonymous said...

I have never liked Kirkpatrick's Model and I've been in the ISD field for over 30 years.

The problem that I face is how to make the most of it when you are forced do to so by your client.

The solution is to use whatever you think will do a better job, and then shoehorn it into one of the four levels.

For starters, the whole notion of "Satisfaction" as a desirable outcome is one big bag of baloney. Measuring "Satisfaction" is probably the most meaningless exercise that educators and trainers can do because it does not assess ability or desire to learn.

It's an amorphous, "feel-good" measure that has no direct application to subsequent behavior.

Like, if the food at a restaurant sucks, or the silverware is dirty, who is going to care if the service was great?

I have had great instructors teaching sucky courses, and I've had sucky instructors teaching great courses.

The bottom line is still, "What the heck did I actually learn, and can I use it in my career?"

Sreya Dutta said...

Thanks Jane this is very useful and much of the kind of information I was looking for.

I set up a post to gather more about what people think on my blog.


Jason Willensky said...

Great piece, Jane. My clients rarely question Kirkpatrick. Of course, they don't question learning styles, either.

Jason Willensky said...

Here's a link to Brinkerhoff's material on Success Case, evaluation, and training from the 2003 ASTD conference(PDF):

Leslie Allan said...

Hello Jane. Thanks for a thought provoking piece. I do think you are a bit harsh on Kirkpatrick in a couple of places. I see his taxonomy as not just inviting evaluation at the end of the course. If it is a long program, evaluations at Levels 1, 2 and 3 can identify weak points in the program that can be fixed whilst the program is being rolled out.

Secondly, the linear causality implied by the taxonomy is a form of "necessary" causality and not "sufficient" causality. What it is saying is that if you want trainees to apply the learning at (Level 3), then they will need to actually learn the material first (Level 2). It makes the simple point that you cannot apply what you have not learned. I think it is a misreading of Kirkpatrick to say that it says that learning at Level 2 necessarily leads to application at Level 3. If you read his writings and that of his son Jim, you will see that they urge the opposite. Kirkpatrick is entirely consistent with a systems model of organizational behaviour, and I think helps us understand the linkages. In that sense, it helps us understand the organizational context of training efforts and not mask them.

Leslie Allan
Author: Training Evaluation Toolkit

Donald Clark said...

Jane, you asked for our $.02 worth on Twitter...

I quite agree with Leslie. I also agree with you on using it as a backwards planning model. If you use it that way, then it is a breeze to actually evaluate with once you move forward and implement the program.

The only part of the model that I find fault with is the Reaction part. When a learning platform is delivered, rather it be elearning, classroom training, social, or blended, the learner has to make a decision as to whether he or she will pay attention to it. If the goal or task is judged as important and doable, then the learner is normally motivated to engage in it (Markus, Ruvulo, 1990). However, if the task is presented as low-relevance or there is a low probability of success, then a negative effect is generated and motivation for task engagement is low.

Thus, reaction should go deeper than smiley sheets, that is it should be used to evaluate if the program is actually going to benefit the learners and if it meets Markus and Ruvulo's two tests.

Someone noted that the customers don't want to hear about the different levels. I agree, but the first three levels should be used as formative evaluations to improve the learning platform while it is in progress. The last one or two levels may be used as summative evaluations to judge its worth.

At times you don't even need level four. For example, if I train employees to perform a new process, I only have to ensure they can perform at level three (perform the new process) as it is up to the creators of the new process to prove its worth.