Fidelity to evaluation theory: how important is it in your practice?


Evaluation, I find, is all about mixing and matching your tools. But I’ve seen some evaluations that claim to adopt [insert evaluation theory], when really they’re skipping over several pieces of that theory.

So I’m wondering: When applying an evaluation theory to your work, how important to you is fidelity (as in, the degree to which implementation matches the theory)?

Taking Realist Evaluation as an example, complete fidelity might involve identifying mechanisms, contextual factors, and outcomes; analyzing C-M-O configurations; writing C-M-O pattern statements; and everything else that Realist Evaluation theory proposes. On the other hand, some people might use Realist Evaluation more as a mindset, just generally focusing on the interaction of mechanisms, context, and outcomes throughout a project (maybe that is considered “partial fidelity”?).

What do you think, when is it okay to just ‘borrow’ specific pieces from an evaluation theory? When is this problematic? Are some eval theories meant to be more descriptive and others more prescriptive?


Great question!! From my experience, in reading others’ evaluation reports, and based on Tina Christie’s dissertation, most evaluators seem to only use pieces of theories to make a mosaic for their evaluation.

Personally, I’m all for borrowing specific components of theories. However, I think we should be clear in that when communicating to other evaluators (not necessarily to stakeholders who probably know/care nill about eval theory). I believe there is research on empowerment evaluation and utilization focused evaluation (I think Devin Wisner did the UFE work) that shows that few evaluators adhere to it with high fidelity.

However, I’m of the mind that fidelity (even to program design) is not 100% necessary. When experienced individuals are able to use contextual information to adjust programmatic/evaluation activities to improve the program/evaluation, I think that would only improve the program/evaluation. Not all are of that mind in the FOI literature though, and there is a study that shows that this is best done with experienced individuals (compared to novices); I can search for this study if you’re interested.


I tend not to think of evaluation “theories” as theories (i.e., explanatory models with predictive capacity) so much as practice approaches, ways of thinking about how to do evaluation grounded in experience and supportive research. And “fidelity” feels like a weird requirement, to be honest, since the approaches I’m drawn to offer quite a lot of leeway and are based on principles with interpretive flexibility, not hard-and-fast absolute requirements. So I prefer “alignment”, as in I do align my practice with certain approaches but I also adapt as needed to the requirements of the situation and its context. Context-sensitivity is so critical to doing good evaluation work that it trumps fidelity, in my opinion. What is important, though, is having good clarity and transparency about what decisions you are making and why, knowing what approaches are informing your work, and having a meaningful understanding of the implications of various approaches.


Some great points, @danawanzer & @c_camman. Thanks! And I appreciate the focus on transparency. Our area of work involves so, so many decisions throughout each project. Without transparency, stakeholders are left just wondering how these decisions get made. Just another example: I’ve seen ROI values reported with no indication of how they were calculated. WHYYYYYY.

Having recently come from my grad studies, I sometimes still think about fidelity as a rigid quantitative measurement. And I think that’s part of why I’m hung up on “what happens if this 1 piece of the theory isn’t applied?? Is there a quantifiable negative impact??” I suppose it’s just a bit of a thought exercise though. Like if Developmental Eval is applied, but the Cocreation principle is missed, repercussions are probably around lack of ownership, misalignment between the evaluation and the evaluand, etc. It’d be interesting to see some research into exactly what happens if certain criteria or principles of an approach are omitted, but a little imagination and thought can be telling to.


I immediately thought of Dr. Christie’s dissertation, glad you shared that, @danawanzer. I find @c_camman’s distinction about eval theory apt and like the idea of ‘alignment’ as I’m not even sure eval theorists or exemplars would consider 100% fidelity 100% of the time possible or even ideal, how else would we innovate and improve? It assumes infallibility of prescriptions and precise fit with context. Even the notion of contingency theories of evaluation, picking the right evaluation model for the evaluation context, suggests there needs to have degrees of freedom for adaptability as we don’t have an equal amount of eval theories that perfectly align with all the contingencies evaluators face in the real world.

That being said, there are some evaluation researchers and descriptive theories of evaluation that can inform how we apply and align or depart from prescriptive theories of eval, even those prescriptions we’re theorizing in this thread.

First, Nick Smith distinguishes between eval theory, models, and approaches. Theories are ideas about issues in eval (and not theories inn the scientific sense), like the role of casual explanation in eval. Models are collections of resolutions to those issues comprising prescriptions of what good eval practice looks like, like the collections of prescriptions known as Realist Evaluation (which resolves the issue about the role of explanation in eval). Approaches are collections of models that share elements in their broad application, like theory-based/driven eval (of which Realist Eval is a form of).

With this distinction, recommendations that lean closer to approaches like developmental eval, provide more room for ‘realignment’ or flexibility from initial prescriptions. Models provide less flexibility, although some more than others. I see models and approaches as a spectrum of specificity for prescriptions for good eval practice.

I find another eval theory/research framework adds light to the question of fidelity here. Robyn Miller offers five criteria for empirical examinations of eval theory: operational specificity, range of application, feasibility in practice, discernable impact, and reproducibility. Models with high degrees of operational specifity, claims of discernable impact, and reproducibility would seem to require higher alignment or fidelity. Models with high degrees of range of application and feasibility would seem to lend themselves to more flexibility and realignment.

Realist Evaluation would score higher for demanding a degree of alignment than other models. In fact, Pawson spent much of his 2013 manifesto book talking about fidelity and alignment, while ceding the ability for flexibility with certain constraints.

As with much of eval, it would appear to depend, but would agree with others that there’s less fidelity in practice than we might think.


Great, thanks for your thoughts. The need for flexibility that people are highlighting makes sense. To me, it mirrors evaluation debates around RCTs vs. more natural experimental designs. RCTs aim to remove external variables from measurement - good for basic research, but potentially harmful for eval by ignoring the ‘messiness’ of contexts we work within. Strict adherence to fidelity can also ignore this messiness; adapting evaluation approaches and theories helps to meet the demands of external/contextual variables.

I hadn’t heard of those distinctions by Smith. The thought of eval theory simply as ‘issues in eval’ is hard to accept because of that difference with scientific theories, but I can see how the categorization is helpful for thinking about fidelity. I’ll have to read into that some more!


Thank you for raising this interesting topic, Evan! The discussion around the meaning of “theory” even is reminding me how important it is to be conscious of the language we use and how we can mean quite different things from each other without realizing it, because we’re working across such diversity of backgrounds, perspectives, and experiences. That diversity is a strength of the evaluation field, partly because it requires us to be thoughtful and clear in our discussions and also partly because the clarification discussions can give rise to interesting new ideas!