Behind the Guardian’s analysis of 100 years of MPs’ language on immigration


The Guardian has revealed a significant rightward shift toward sentiment relating to immigration among MPs speaking in the House of Commons in the past five years.

To do this analysis the Guardian’s Data Science and Data Projects teams, in collaboration with University College London, developed an in-house machine learning model to measure linguistic sentiment in debates in the Commons over the course of a century.

Unlike off-the-shelf sentiment models, the Guardian’s version distinguishes sentiment directed specifically at immigration from general emotionally charged language about any topic.

The development of the model involved the following process:

The researchers first used a list of trigger terms manually designed and verified by experts on immigration history to identify speeches most likely to be about immigration. This process narrowed the data down to a manageable sample.

To ensure the results were not biased by the choice of keywords, the team stress-tested their findings, running the analysis many times with different combinations of words and proving similar findings regardless of the specific combination of terms.

To build the dataset on which the sentiment model was trained, a team of 12 people manually labelled more than 1,250 fragments of parliamentary speeches and contributions over a century that were up to five sentences each.

Where the fragment was about immigration it was identified as such, and then classified as either positive, negative or neutral. Fragments that did not relate to immigration were classified as not about immigration.

The team also assessed the performance of several Large Language Models – which are a form of generative AI – for the purpose of labelling more fragments; statistical testing deemed their accuracy level to be robust.

The use of AI in this project was limited to the annotation process, which increased the training dataset used to develop the Guardian machine learning model up to more than 22,600 fragments of annotated parliamentary contributions in the last century.

The bespoke model was then applied to a century’s worth of debates and speeches made in the Commons, capturing almost 238,000 fragments relating to immigration between 1925 and the end of 2025, with each of them given a “sentiment label”.

The overall sentiment score for each year was calculated using only the fragments relating to immigration (a full speech can combine fragments that relate to immigration with ones that don’t). The annual score was then calculated by subtracting the number of negative snippets from the positive ones and dividing the result by the number of all snippets about immigration. This was also done individually for the main parties highlighted within the analysis.

As the model was built to measure sentiment in parliamentary speeches in aggregate, it has not been used to report the sentiment of individual contributions. The analysis has also excluded period of times for specific parties when there has not been enough contributions about immigration over a consistent period of time.