Malcolm Gladwell's new collection of essays, What the Dog Saw, includes a piece on the Challenger explosion. Essentially, he asserts, most failures of this magnitude can't be traced to a single mistake or one bad decisionmaker. Sure, hindsight being what it is, things could be done differently -- but there are several things, sometimes in important chronological order or patterns -- that all need to happen. In other words, the problem is the result of a system failure.
And therein lies the central problem with the traditional (think 4-level) means of "evaluating" training. There are 1001 things (let's call them 'variables') standing between a freshly-trained worker and successful performance, from bad tools to a bad hard drive to, yes, a bad supervisor. Attempting to isolate the worker from the rest of the system in which he or she works invalidates the evaluation by removing context and circumstance -- and if the desired performance still isn't there, this approach to evaluation doesn't tell us how to fix it.
If you've been led to believe there's only one approach to evaluating training, try Googling around for Stufflebeam, Brinkerhoff, Stake, and Scriven. And there are others, so keep Googlin'. Perhaps something else would better meet your needs at informing both your formative and summative evaluation processes.
Or maybe you're already using something else? If not the 4 (or 5)-level taxonomy, what are you using to figure out whether training is really "working"?