The Machine That Decides Who Loses Their Benefits

The DWP profiles close to a million Universal Credit claimants a year using an algorithm whose predecessor was wrong 65 per cent of the time by their own assessment. There were "no immediate concerns." Then it went to court to keep the technical details secret. Social credit scores arrived quietly.

The Machine That Decides Who Loses Their Benefits

The Department for Work and Pensions profiles close to a million benefit claimants a year using a machine-learning model whose internal logic it refuses to fully disclose, whose own assessments have found statistically significant disparities across age, disability, nationality, and marital status, and which the department went to court to prevent the public from examining.

Officials who follow the machine's recommendation need offer no justification. Officials who override it must explain themselves in writing. The machine, in other words, is presumed right. The human must account for disagreeing.

An algorithm is simply a set of instructions, like a cooking recipe. Go through each stage, and produce a score at the end. Every time you bake a cake, you're running an algo: add this to that, stir, then add this, go around again, add that, end up with cake. If you can make a starter dish, you can understand what an algorithm does

Consolidated Platform = Impersonal Mess

Before the algorithm, there was the platform. And the platform is itself a case study in the administrative state's favourite delusion: centralise, unify, digitise, and everything will work better at scale.

Universal Credit was announced in 2010 as the flagship welfare reform of the Coalition government. The idea was somewhat elegant: six separate means-tested benefits, each with its own rules, its own administration, its own delivery agency, would be merged into a single monthly payment. The complexity of the old system, its overlapping entitlements and perverse incentives, would be swept away. One benefit. One system. One platform.

And here we go again. Sir Humphrey vacuums up local government powers to central power.

The original completion date was 2017. It has been postponed seven times. The full migration of legacy claimants will not now finish until at least 2028, eighteen years after the programme was announced. The DWP has spent £2.3 billion on implementation. An early IT system costing over £300 million was found to be so inflexible the department had to review how much of it could be salvaged; £34 million was written off outright. Fraud and error in Universal Credit runs at 9.4 per cent of payments, against a forecast of 6.4 per cent. In 2019, a million people were receiving less than their entitlement, often because they were repaying the emergency loans they had taken out during the five-week wait for their first payment. 44 per cent of claimants face monthly deductions averaging £78. The system designed to simplify welfare created a new category of structural debt among the people it was supposed to help.

The pattern will be familiar to readers of this series.

  • The SEND reforms promised every child's needs would be met, then created an unfunded liability now exceeding £6 billion.
  • The CQC's Single Assessment Framework promised data-driven regulation, then swallowed 500 inspection reports and couldn't retrieve them.

Universal Credit promised to replace six complex benefits with one simple one. It replaced six imperfect systems, each with its own institutional memory and local relationships, with a single digital platform whose rigidity, delays, and error rates have been documented by the National Audit Office in report after report. And it is on this platform, now processing over six million claimants, the DWP has built its automated profiling machinery.

The algorithm did not arrive in a vacuum. It was layered onto a system already struggling with its own complexity.

The British Social Credit Score

The system at the centre of this story is the Universal Credit Advances model, one of a growing suite of automated tools the DWP uses to flag claims for investigation. It scans applications for hardship payment advances and assigns each a risk score. That score determines what happens next: whether your claim proceeds smoothly, or whether you are pulled aside for further scrutiny, asked to produce documents, subjected to the slow grind of an investigation you may never fully understand.

The tool is a supervised binary classification machine learning model. The technical description is a work of art.

3.3 - Frequency and scale of usage
Withheld under FOI Act - S31

4.1.1 - System architecture
Withheld under FOI Act - S31

4.1.2 - System-level input
Real-time UC claim data.

4.1.3 - System-level output
Probability of UC Advance being fraudulent.

4.2.1. - Model name
Withheld under FOI Act - S31

4.2.2 - Model version
Withheld under FOI Act - S31

4.2.3 - Model task
The model is designed to classify UC Advances requests as high risk of fraud or not.

4.2.4 - Model input
Features using claim characteristics and target variable based on historic Advances outcome data.

4.3.1 - Development data description
Withheld under FOI Act - S31

4.3.2 - Data modality
Withheld under FOI Act - S31

4.3.3 - Data quantities
Withheld under FOI Act - S31

4.3.5 - Data completeness and representativeness
Withheld under FOI Act - S31

4.3.6 - Data cleaning
Withheld under FOI Act - S31

4.3.7 - Data collection
Withheld under FOI Act - S31

4.3.8 - Data access and storage
Withheld under FOI Act - S31

4.3.9 - Data sharing agreements
Withheld under FOI Act - S31

4.4.3 - Data processing methods
Withheld under FOI Act - S31
4.4.4 - Data access and storage
Withheld under FOI Act - S31

4.4.5 - Data sharing agreements
Withheld under FOI Act - S31

The "Freedom of Information" Act is being used here to ensure information is suppressed, and it appears legitimate.

💡
For non-AI or non-technical readers: this jargon is simpler to understand than it sounds. This "recipe" classifies applications as fraud or not (i.e. binary options of false or true, 0 or 1). "Supervised" means a human labels what it should use as examples of fraud, rather than guess by itself. Scoring generally means a confidence or probability between 0 and 1 (0.1 or 10% is not certain, 0.9 or 90% is high certainty.) There is no reason whatsoever to keep the model hidden from the public. The individual cases used in the training data, perhaps. The idea understanding how the model works will allow people to game or fool it illustrates how laughably luddite our Boomer civil servants are.

The DWP says the algorithm is only used to prioritise cases. Final decisions, it insists, are made by human officials. This is technically true and functionally misleading. When the machine flags you, your case is referred to an officer. The officer sees the flag. The officer knows the machine put you there. If the officer agrees with the machine, nothing further is required. If the officer disagrees, he must record his reasons. The architecture is not neutral. It is designed to make agreement frictionless and disagreement effortful. The machine does not decide. It merely makes one decision vastly easier than the other.

An earlier version of this kind of tool, used by the DWP from 2020 to 2024 to help determine disability benefit eligibility, achieved a correct match rate of just 35 per cent. Sixty-five per cent of cases had to be corrected by a human. The machine was wrong nearly two out of three times. It was used for four years.

What The DWP Tried To Hide

In February 2024, the DWP conducted a fairness analysis of the Advances model. The results were not reassuring. The assessment found statistically significant disparities in both referral rates and outcomes across every characteristic it examined: age, disability, marital status, and nationality. Older applicants and non-UK nationals were more likely to be flagged by the model but less likely to have their claims ultimately refused. The machine, in other words, was pulling people aside on characteristics correlated with who they were rather than what they had done.

Rather embarrassing under the Equality Act. The model flagged fraud as highest in groups with protected characteristics.

The DWP's response was revealing. It concluded there were "no immediate concerns of unfair treatment." The disparities were real but, in the department's view, did not rise to the level of discrimination. It assessed only one of nine protected characteristics under equality law for which it had sufficient data: age. For the rest, it said the data was insufficient. It did not pause the model. It did not restrict its use. It continued profiling close to a million people a year with a tool its own analysis had found to produce crappy results, and declared itself satisfied.

It then fought to keep the details secret.

Big Brother Watch took the DWP to court to obtain the technical documentation behind the model. The department resisted disclosure, arguing it would enable fraudsters to game the system. During the tribunal hearing, with its position under pressure, the DWP made a partial concession and released some additional material. But the core architecture of the model remains opaque. The public cannot see how the risk scores are calculated. Claimants flagged by the system are not told why. The department profiling a million people a year with a tool its own assessment found riddled with statistical disparities will not explain how the tool works.

Now they want to look inside people's bank accounts.

This is not an accountability gap. It is accountability by design withheld.

Government Profiling Is Just Getting Started

The Advances model is not an outlier. It is one node in a rapidly expanding network of automated decision-making across government.

The DWP's Targeted Case Review programme uses automation to select Universal Credit recipients for detailed investigation. More than two million claimants are expected to be reviewed by the end of the decade. Recipients report receiving demands for documents with minimal explanation, no clarity about why they were selected, and no indication an algorithm was involved. One man told Big Brother Watch the process placed him under serious psychological strain, made worse by the department's refusal to explain what had triggered the review.

The Housing Benefit Accuracy Initiative uses a separate algorithm to risk-score housing benefit claimants and flag the 400,000 "highest risk" cases to local authorities for full review. Since 2022, councils have been mandated to participate. After three years of real-world use, data obtained by Big Brother Watch showed only one in three people flagged by the algorithm was actually receiving the wrong amount. Two-thirds were innocent. The algorithm's false positive rate was double its hit rate.

At HMRC, the Connect system interrogates 22 billion data points across tax returns, bank records, property transactions, employer filings, and social media to assign every taxpayer a risk score. Launched in 2010, built by BAE Systems Applied Intelligence at a cost of at least £45 million, Connect cross-references your declared income against your lifestyle. If the system detects a gap between what you reported and how you appear to live, you receive what HMRC calls a "nudge letter," or a formal investigation. In the 2024–25 tax year, leads generated by Connect contributed to an additional £4.6 billion in revenue. The system profiles every self-assessment filer in the country. Its criteria, weightings, and risk thresholds are not published. No taxpayer has ever been shown their own risk score.

The Home Office operated a visa-streaming algorithm using a traffic-light system to sort every application into fast, slow, or intensive processing lanes. Nationality was a core input. The system maintained a secret list of "suspect nationalities" automatically assigned higher risk scores. In 2020, facing a legal challenge from the Joint Council for the Welfare of Immigrants, the Home Office scrapped the tool rather than defend it in court. Separately, the Home Office's immigration database has been found to contain over 76,000 records with incorrect information, with identities merged, photographs misassigned, and people unable to prove their lawful right to live and work in the country because the system attached someone else's details to their name.

The pattern repeats across departments. The tool changes. The opacity does not.

Australia's Robodebt Warning

This is not a theoretical risk. There is a precise precedent for what happens when a government replaces human caseworkers with an automated debt-recovery system, refuses to disclose how it works, shifts the burden of proof onto citizens, and presses ahead despite internal warnings it may be unlawful.

Australia's Robodebt scheme, launched in 2016, used automated data-matching to identify alleged welfare overpayments and issue debt notices to recipients. The algorithm compared fortnightly welfare records against annualised tax data and used income averaging to calculate discrepancies. The method was crude, the error rate enormous, and the burden of disproving the machine's calculations fell entirely on the citizen. At its peak, 20,000 debt notices were issued per week. Recipients were given no meaningful explanation of how their debt had been calculated.

The scheme was declared unlawful. The government was forced to refund 751 million Australian dollars wrongly recovered from 381,000 people. A royal commission found ministers had failed to ensure the programme was legal. Senior officials had suppressed internal legal advice warning the scheme had no statutory basis. The commission linked the programme to welfare recipients' suicides. The net cost to the Australian taxpayer, after refunds, compensation, and administrative expenses, was 565 million Australian dollars. The scheme designed to save money cost half a billion.

The DWP's systems have not yet produced a scandal on Robodebt's scale. But the structural parallels are unmistakable. Automated profiling of millions. Opacity about how decisions are made. A department fighting to keep its own analysis secret. A system where following the machine is effortless and overriding it requires justification. A previous tool with a 65 per cent error rate used for four years.

And a government now seeking new powers, through the Public Authorities (Fraud, Error and Recovery) Bill, to scan benefit claimants' bank accounts, seize assets, and remove driving licences, with fraud and honest error treated identically under the same enforcement powers.

The machine is not slowing down. It is accelerating. And nobody outside the system can tell you how it works.

Computer Says No

Before the machine, there were caseworkers. They were not perfect. They were sometimes slow, sometimes wrong, sometimes indifferent. But a caseworker who sat across from a claimant could do something the algorithm cannot. He could ask a question the model had not anticipated. He could recognise the man whose claim looked anomalous was anomalous because his wife had just died and his income had collapsed and his housing costs had changed in ways the system's averaging function could not accommodate. The caseworker could exercise judgment. Not perfectly. Not always fairly. But with a capacity for context no risk score can replicate.

The DWP replaced caseworkers with algorithms for the same reason every institution in this series replaced experienced people with centralised process: it is cheaper. A caseworker costs a salary, an office, training, time. An algorithm costs a development contract and a server. The algorithm processes a million cases a year. The caseworker processed dozens. The economics are unanswerable. The consequences are everywhere.

When the machine flags you, you do not know why. When you are asked to produce documents, you are not told what triggered the request. When your case is investigated, you cannot examine the model's reasoning because the department will not release it. When the model's own fairness assessment finds statistical disparities across age, disability, nationality, and marital status, the department concludes there are "no immediate concerns" and continues profiling. When campaigners go to court to obtain the technical documentation, the department fights disclosure.

This is what happens when the administrative state's drive to replace judgment with process reaches its logical terminus. The process becomes the judgment. The algorithm is not a tool used by humans. It is a system within which humans operate, whose architecture makes agreement the path of least resistance and whose logic is shielded from the people it governs.


Tomorrow, the Ministry of Defence obtained a super-injunction suppressing not just a catastrophic security breach, but the existence of the court order itself.


What was satisfactory here?

  • Reality: A machine-learning model with documented statistical disparities across multiple characteristics profiled close to a million claimants a year, while a previous tool ran for four years with a 65 per cent error rate.
  • Administrative intervention: The DWP conducted a fairness analysis, found disparities across every characteristic examined, concluded there were "no immediate concerns," and fought in court to keep the technical details secret.
  • Reported statistic: The department saved £4.4 million over three years and declared the system a success in efficiency and fraud prevention.