For a number of reasons, the agencies responsible for evaluating social programs in both industrialized and developing countries have been much slower in adopting data science approaches than their colleagues who work in research and planning.
York, P. and Bamberger, M. (2020), p.1.
Big Data and Evaluation: a tale of two cities
Beyond the humanitarian sector, across the public policy and development communities, the conversation has been taking place for years. At evaluation conferences from Europe to Asia, from the Americas to Australasia, panels have been discussing the question of how to apply big data to the evaluation of public policy programmes for the best part of a decade.
There are plenty of reasons for evaluation offices to integrate big data tools in their work. Many of these tools promise to speed up the data collection process and increase its reach, thus freeing up the time of the evaluator to focus on the truly evaluative aspects of their work. Likewise, advanced data analytics can help evaluators to increase the depth of their analysis, evaluate complex programmes, and look for patterns that would otherwise not be possible to spot, let alone verify.
Yet, when attending an evaluation meeting on this topic, the potential gains to be made are invariably followed up with questions and concerns. Big data tends to arouse suspicion. It has a reputation as a noisy, messy, dynamic beast; unfit for the slow and careful analysis to which evaluation aspires. How can we ensure the validity of findings drawn from big data sources? What about the population biasing effects of working with digital data collection? How can we ever hope to filter out the “noise” from the “signal” in the big data multiverse?
As York and Bamberger (2020) note, this suspicion is not so widespread among the programme and policy research communities. So why is it any different for evaluators? And how can we learn from programme teams, information management specialists and even in-house MEAL staff who are looking increasingly to adopt the tools provided by the data revolution to improve their work?
What’s holding us back?
York and Bamberger (2020) argue that the problem boils down to something like a cultural divide. Evaluators and data scientists simply aren’t talking enough. Like all cultural divides, this is one enforced by a number of pre-conceived ideas about what life’s like on the other side. These boil down to disagreements about theory (roughly, how does one reliably measure sociological phenomena and their causes?) data quality and validity (what constitutes good data? what constitutes valid reasoning?), and selection bias (how do we sample a population with minimum biasing effect?).
Piciotto (2020) adds that the problems arise equally from both sides of the divide. The evaluators are still so absorbed in “needless methodological conflict” between qualitative and quantitative approaches, that they are unable to look up and see the new kid on the block. Likewise, the big data enthusiasts have been blinded by the sheer power of the numbers available to them; and remain unable to think about causal proof beyond simply correlating observations over vast datasets.
What can we do about it?
How can the divide be overcome? How can evaluators start making use of the potential that big data provides them, whilst mitigating the risks? York and Bamberger (2020) propose a fourfold approach:
- Establish new ways to build bridges between the two communities
- Integrate both evaluation and big data tools in capacity development programmes and learning
- Promote “landscaping research” to map the two ecosystems and look for synergies between them
- Provide seed funding for areas of collaborative research between evaluation and big data specialists
- Improving the ability of humanitarian agencies to measure their results in hard-to-reach communities
- Getting a better understanding of their impact when control trials are either unfeasible or inappropriate
- Focusing budgets on data analysis rather than manual data gathering
Each of these goals already absorbs considerable attention within the evaluation community. So any chance to tackle all three deserves time and investment to make it happen. But it will require engagement from all the major players in the humanitarian evaluation system: from donors and the UN system, to NGOs and the Red Cross / Red Crescent movement, researchers, evaluators and data scientists alike.
References:
- Piciotto, R. (2020). Evaluation and the Big Data Challenge. American Journal of Evaluation, Vol. 41, Issue 2, 2020.
- York, P. and Bamberger, M. (2020). Measuring Results and Impact in the Age of Big Data: the nexus of evaluation, analytics and digital technology. Rockefeller Foundation.
I think there is possibly a fourth contributed to ‘the cultural divide ‘. Many evaluators, myself included, have got used to thinking about theory-led evaluations, where you start with a theory of change, identity what propositions need to be tested, then seek out information for and against that proposition. In contrast, data scientists using machine learning algorithms are using inductive methods, searching through piles of data for regular associations. This approach may not sit comfortably with people who have been told time and again that correlation does not equal causation. Companies using machine learning to develop useful predictive models can get a lot of value out of a prediction oriented approach, without worrying too much about underlying causes. But evaluators, because they are concerned with interventions that seek to change behaviours one way or another, are understandably more concerned with causative models.
So how to bridge such a cultural divide, if it does exist to some extent? I have a mixed bag of suggestions. One is the concept of ‘loose ‘theories of change. A loose theory of change is a list of attributes of the context and intervention which we think, with some justification, are contributing to an outcome of interest. But in what combinations and sequences we may not be yet very clear on. Data on these attributes can then be analysed using one of a number of machine learning algorithms. When a good predictive model is found evaluators then need to take a deep breath and not be put off by the mantra of ‘correlation does not mean causation’. Then recognise that the absence of a correlation/association between two attributes X and Y does mean that they cannot be a direct causal link. So straightaway machine learning exercise tells us about all the possible candidate associations which can’t have a causal role. This is one of the things that evaluators are supposed to do, look at alternative explanations and exclude those that don’t fit the facts. They then need to consider the proposition that that ‘an association is necessary but not sufficient for a causal claim ‘. This suggests that the next step is to look for other sources of evidence as to whether there is an underlying causal mechanism at work, or not. One such dental source is through detailed within – case inquiries of true-positive cases, false-positive cases, and false-negative cases, as suggested here: https://evalc3.net/how-it-works/within-case-analysis/ . Other related tools can be found on the EvalC3 website, which describes the workings of an Excel which combines simple machine learning algorithms with an approach to the analysis of causal configurations taken from Qualitative Comparative Analysis (QCA) – https://evalc3.net
Hi Rick,
Thanks so much for your comment! I think you’ve hit the nail on the head regarding the theory-based approach versus the inductive reasoning of ML approaches. Thanks for linking to the EvalC3 website too – I’ll add that to the Data Conscious directory so everyone can access.
I like your suggestion of developing “loose” theories of change that can help us look for a wider range of associations than we otherwise would. I think this is useful even without the ML tools we’re discussing here, as it speaks to the discussion on systems-thinking and evaluation that have been raging/progressing over recent years. In humanitarian evaluation in particular I think there’s good grounds for this type of approach because of the significant role of external actors and factors in humanitarian access, protection, and outcomes: these all beg for a wider appreciation of associations and change than strict TOC models have traditionally focused on. I wonder if there’s space for revisiting our understanding of TOCs in humanitarian action in the near future?
That said, I also wonder if the “straightforward” inductive ML models miss a point on causation. Judea Pearl’s work on an interventionist model of causation seems relevant here. And I wonder how this could be brought to bear on a complexity-sensitive analysis of change in humanitarian crisis? Would love to hear your thoughts!
Cheers,
N
Could you expand on ” Judea Pearl’s work on an interventionist model of causation ” and its possible relevance I have “The Book of Why” on my shelves but did not make much progress with its reading, probably being distracted by easier to read other books.