Only about one in five new public programs shares any real outcome data in its first years. Yet leaders still claim success. In this episode, we drop into the launch room of a shiny new jobs program—and follow what happens when nobody agrees how to measure “working.”
The awkward truth is this: most programs are built like impressive stages with no plan for where to aim the spotlight. Goals are announced, budgets approved, partners lined up—but the basic questions stay fuzzy. How many people should be better off, by when, and according to whose standard? Without those choices, “success” becomes whatever is most convenient to claim at the next press conference. In this episode we move from slogans to systems: specifying what we are trying to change, how we think that change will actually happen, and which signals will tell us—early—if we’re off track. We’ll look at how strong programs lock in clear objectives, a testable theory of change, and baseline numbers before the first dollar moves, so that learning isn’t an optional add-on but the core of the design.
The gap between ambition and impact usually opens in the first few weeks: hiring a team, signing partners, rushing to “get money out the door.” This is where many initiatives hard‑code vagueness into contracts, reporting templates, and IT systems. Later, when leaders finally ask, “What difference are we making?” the data they need literally doesn’t exist. In this episode we’ll slow down that early scramble and zoom in on three neglected moves: choosing the few results that truly matter, wiring simple data flows from day one, and agreeing upfront how decisions will change when the numbers do.
The pattern in the data is blunt: programs that wire in measurement from day one routinely beat those that don’t. McKinsey finds that projects with clear KPIs are three times more likely to hit budgets and timelines. J‑PAL’s database shows vocational training initiatives that used iterative evaluation delivering a 16% median return on investment, compared with 3–5% where leaders flew mostly blind. This isn’t about academic neatness; it’s about who actually gets better jobs, skills, or incomes for each unit of effort.
So what does “wiring in measurement” really mean in practice?
First, turning broad ambitions into a small, sharp set of KPIs that different actors can actually control. A national ministry might track employment rates and earnings; a local provider might track course completion, job placement within six months, and retention at 12 months. The trick is to cascade indicators so each level sees its own contribution while staying linked to the bigger goal.
Second, committing to publish and review results on a timetable, not when it’s politically convenient. Right now, only a minority of programs anywhere in the world release outcome data within two years. That delay isn’t neutral—it locks in weak designs, because no one can see what’s failing fast enough to fix it. Regular dashboards, even if imperfect at first, create a shared reality between finance officials, delivery teams, and citizens.
Third, building feedback loops that genuinely change decisions. Adaptive management sounds fashionable, but it’s concrete: a standing meeting where data trigger action. Applications from one group are stalling? Simplify forms there first. One training provider consistently underperforms? Shift participants or support. Learning happens when budgets, contracts, and political attention are allowed to move toward what works.
The UK’s What Works Centres show what this looks like at scale: they review evidence across about £200 billion in spending each year and steer funds toward interventions with demonstrable impact, while de‑funding those that disappoint. That’s the opposite of “spend equals success.”
A common worry is that measurement will freeze innovation. In reality, timely metrics are closer to a sports coach’s video replay: a way to experiment more boldly, precisely because you can see, quickly, which plays are working and which need to be dropped.
A city runs a vocational training initiative for laid‑off factory workers. Instead of counting only how many people attend classes, they track three things from day one: how many finish, how many get offers within three months, and how those wages compare to previous jobs. Early data show high completion but low offers in one district. Digging in, they discover local employers need digital repair skills, not just mechanical ones. Within a quarter, the curriculum shifts and placement rates jump.
In another case, a regional government launches a small loan scheme for micro‑retailers. They log not just disbursements, but monthly sales, stock‑outs, and on‑time repayment for a random sample of shops. A simple dashboard flags that women‑owned stalls have strong repayment but weaker sales growth. Follow‑up interviews reveal safety concerns limiting evening hours. The next funding round bundles small grants for secure lighting and co‑located childcare; subsequent cohorts show both higher profits and lower default.
As tools mature, “measurement” stops being a report and becomes a living system. AI can surface weak spots the way spell‑check highlights typos, suggesting mid‑course fixes leaders might miss. Blockchain‑backed logs could make quiet data edits as visible as scratched‑out lines in a shared notebook. Citizens who can read these signals will press for funding rules that say: no credible learning loop, no large budget—pushing institutions to treat adaptation as core work, not decoration.
When you treat implementation like tuning an instrument instead of playing a fixed score, small adjustments compound. A tweak to targeting here, a redesigned form there, and suddenly uptake, trust, and durability all shift. Over time, programs stop being frozen monuments and start behaving more like living ecosystems that learn, shed dead branches, and grow toward real needs.
Before next week, ask yourself: 1) “If I had to define success for this initiative in one sentence that doesn’t mention revenue or vanity metrics, what would it be—and how would a customer actually experience that success in their day-to-day?” 2) “Looking at the metrics I’m tracking right now, which ones are true leading indicators (e.g., activation rate, time-to-first-value, weekly product usage) and which are just ‘nice to know’—and what will I stop tracking so my dashboard only shows what drives decisions?” 3) “What is one experiment I can launch this week (with a clear hypothesis, owner, start/end date, and success metric) that would help me learn faster about whether our implementation steps are truly moving those leading indicators?”

