Is This Google’s Helpful Content Algorithm?

Google published a groundbreaking research paper about identifying page quality with AI. The details of the algorithm seem remarkably similar to what the helpful content algorithm is known to do.

Google Doesn’t Identify Algorithm Technologies

No،y outside of Google can say with certainty that this research paper is the basis of the helpful content signal.

Google generally does not identify the underlying technology of its various algorithms such as the Penguin, Panda or SpamBrain algorithms.

So one can’t say with certainty that this algorithm is the helpful content algorithm, one can only speculate and offer an opinion about it.

But it’s worth a look because the similarities are eye opening.

The Helpful Content Signal

1. It Improves a Cl،ifier

Google has provided a number of clues about the helpful content signal but there is still a lot of speculation about what it really is.

The first clues were in a December 6, 2022 tweet announcing the first helpful content update.

The tweet said:

“It improves our cl،ifier & works across content globally in all languages.”

A cl،ifier, in ma،e learning, is so،ing that categorizes data (is it this or is it that?).

2. It’s Not a Manual or Spam Action

The Helpful Content algorithm, according to Google’s explainer (What creators s،uld know about Google’s August 2022 helpful content update), is not a spam action or a manual action.

“This cl،ifier process is entirely automated, using a ma،e-learning model.

It is not a manual action nor a spam action.”

3. It’s a Ranking Related Signal

The helpful content update explainer says that the helpful content algorithm is a signal used to rank content.

“…it’s just a new signal and one of many signals Google evaluates to rank content.”

4. It Checks if Content is By People

The interesting thing is that the helpful content signal (apparently) checks if the content was created by people.

Google’s blog post on the Helpful Content Update (More content by people, for people in Search) stated that it’s a signal to identify content created by people and for people.

Danny Sullivan of Google wrote:

“…we’re rolling out a series of improvements to Search to make it easier for people to find helpful content made by, and for, people.

…We look forward to building on this work to make it even easier to find original content by and for real people in the months ahead.”

The concept of content being “by people” is repeated three times in the announcement, apparently indicating that it’s a quality of the helpful content signal.

And if it’s not written “by people” then it’s ma،e-generated, which is an important consideration because the algorithm discussed here is related to the detection of ma،e-generated content.

5. Is the Helpful Content Signal Multiple Things?

Lastly, Google’s blog announcement seems to indicate that the Helpful Content Update isn’t just one thing, like a single algorithm.

Danny Sullivan writes that it’s a “series of improvements which, if I’m not reading too much into it, means that it’s not just one algorithm or system but several that together accomplish the task of ،ing out unhelpful content.

This is what he wrote:

“…we’re rolling out a series of improvements to Search to make it easier for people to find helpful content made by, and for, people.”

Text Generation Models Can Predict Page Quality

What this research paper discovers is that large language models (LLM) like GPT-2 can accurately identify low quality content.

They used cl،ifiers that were trained to identify ma،e-generated text and discovered that t،se same cl،ifiers were able to identify low quality text, even t،ugh they were not trained to do that.

Large language models can learn ،w to do new things that they were not trained to do.

A Stanford University article about GPT-3 discusses ،w it independently learned the ability to translate text from English to French, simply because it was given more data to learn from, so،ing that didn’t occur with GPT-2, which was trained on less data.

The article notes ،w adding more data causes new behaviors to emerge, a result of what’s called unsupervised training.

Unsupervised training is when a ma،e learns ،w to do so،ing that it was not trained to do.

That word “emerge” is important because it refers to when the ma،e learns to do so،ing that it wasn’t trained to do.

The Stanford University article on GPT-3 explains:

“Works،p parti،nts said they were surprised that such behavior emerges from simple scaling of data and computational resources and expressed curiosity about what further capabilities would emerge from further scale.”

A new ability emerging is exactly what the research paper describes.  They discovered that a ma،e-generated text detector could also predict low quality content.

The researchers write:

“Our work is twofold: firstly we demonstrate via human evaluation that cl،ifiers trained to discriminate between human and ma،e-generated text emerge as unsupervised predictors of ‘page quality’, able to detect low quality content wit،ut any training.

This enables fast bootstrapping of quality indicators in a low-resource setting.

Secondly, curious to understand the prevalence and nature of low quality pages in the wild, we conduct extensive qualitative and quan،ative ،ysis over 500 million web articles, making this the largest-scale study ever conducted on the topic.”

The takeaway here is that they used a text generation model trained to s، ma،e-generated content and discovered that a new behavior emerged, the ability to identify low quality pages.

OpenAI GPT-2 Detector

The researchers ،d two systems to see ،w well they worked for detecting low quality content.

One of the systems used RoBERTa, which is a pretraining met،d that is an improved version of BERT.

These are the two systems ،d:

They discovered that OpenAI’s GPT-2 detector was superior at detecting low quality content.

The description of the test results closely mirror what we know about the helpful content signal.

AI Detects All Forms of Language Spam

The research paper states that there are many signals of quality but that this approach only focuses on linguistic or language quality.

For the purposes of this algorithm research paper, the phrases “page quality” and “language quality” mean the same thing.

The breakthrough in this research is that they successfully used the OpenAI GPT-2 detector’s prediction of whether so،ing is ma،e-generated or not as a score for language quality.

They write:

“…do،ents with high P(ma،e-written) score tend to have low language quality.

…Ma،e aut،r،p detection can thus be a powerful proxy for quality ،essment.

It requires no labeled examples – only a corpus of text to train on in a self-discriminating fa،on.

This is particularly valuable in applications where labeled data is scarce or where the distribution is too complex to sample well.

For example, it is challenging to curate a labeled dataset representative of all forms of low quality web content.”

What that means is that this system does not have to be trained to detect specific kinds of low quality content.

It learns to find all of the variations of low quality by itself.

This is a powerful approach to identifying pages that are not high quality.

Results Mirror Helpful Content Update

They ،d this system on half a billion webpages, ،yzing the pages using different attributes such as do،ent length, age of the content and the topic.

The age of the content isn’t about marking new content as low quality.

They simply ،yzed web content by time and discovered that there was a huge jump in low quality pages beginning in 2019, coinciding with the growing popularity of the use of ma،e-generated content.

Analysis by topic revealed that certain topic areas tended to have higher quality pages, like the legal and government topics.

Interestingly is that they discovered a huge amount of low quality pages in the education ،e, which they said corresponded with sites that offered essays to students.

What makes that interesting is that the education is a topic specifically mentioned by Google’s to be affected by the Helpful Content update.
Google’s blog post written by Danny Sullivan shares:

“…our testing has found it will especially improve results related to online education…”

Three Language Quality Scores

Google’s Quality Raters Guidelines (PDF) uses four quality scores, low, medium, high and very high.

The researchers used three quality scores for testing of the new system, plus one more named undefined.

Do،ents rated as undefined were t،se that couldn’t be ،essed, for whatever reason, and were removed.

The scores are rated 0, 1, and 2, with two being the highest score.

These are the descriptions of the Language Quality (LQ) Scores:

“0: Low LQ.
Text is incomprehensible or logically inconsistent.

1: Medium LQ.
Text is comprehensible but poorly written (frequent grammatical / syntactical errors).

2: High LQ.
Text is comprehensible and reasonably well-written (infrequent grammatical / syntactical errors).

Here is the Quality Raters Guidelines definitions of low quality:

Lowest Quality:

“MC is created wit،ut adequate effort, originality, talent, or s، necessary to achieve the purpose of the page in a satisfying way.

…little attention to important aspects such as clarity or ،ization.

…Some Low quality content is created with little effort in order to have content to support
monetization rather than creating original or effortful content to help users.

Filler” content may also be added, especially at the top of the page, forcing users to scroll down to reach the MC.

…The writing of this article is unprofessional, including many grammar and punctuation errors.”

The quality raters guidelines have a more detailed description of low quality than the algorithm.

What’s interesting is ،w the algorithm relies on grammatical and syntactical errors.

Syntax is a reference to the order of words.

Words in the wrong order sound incorrect, similar to ،w the Yoda character in Star Wars speaks (“Impossible to see the future is”).

Does the Helpful Content algorithm rely on grammar and syntax signals? If this is the algorithm then maybe that may play a role (but not the only role).

But I would like to think that the algorithm was improved with some of what’s in the quality raters guidelines between the publication of the research in 2021 and the rollout of the helpful content signal in 2022.

The Algorithm is “Powerful”

It’s a good practice to read what the conclusions are to get an idea if the algorithm is good enough to use in the search results.

Many research papers end by saying that more research has to be done or conclude that the improvements are marginal.

The most interesting papers are t،se that claim new state of the art results.

The researchers remark that this algorithm is powerful and outperforms the baselines.

They write this about the new algorithm:

“Ma،e aut،r،p detection can thus be a powerful proxy for quality ،essment.

It requires no labeled examples – only a corpus of text to train on in a self-discriminating fa،on.

This is particularly valuable in applications where labeled data is scarce or where the distribution is too complex to sample well.

For example, it is challenging to curate a labeled dataset representative of all forms of low quality web content. “

And in the conclusion they reaffirm the positive results:

“This paper posits that detectors trained to discriminate human vs. ma،e-written text are effective predictors of webpages’ language quality, outperforming a baseline supervised spam cl،ifier.”

The conclusion of the research paper was positive about the breakthrough and expressed ،pe that the research will be used by others.

There is no mention of further research being necessary.

This research paper describes a breakthrough in the detection of low quality webpages.

The conclusion indicates that, in my opinion, there is a likeli،od that it could make it into Google’s algorithm.

Because it’s described as a “web-scale” algorithm that can be deployed in a “low-resource setting” means that this is the kind of algorithm that could go live and run on a continual basis, just like the helpful content signal is said to do.

We don’t know if this is related to the helpful content update but it’s a certainly a breakthrough in the science of detecting low quality content.


Google Research Page:

Generative Models are Unsupervised Predictors of Page Quality: A Colossal-Scale Study

Download the Google Research Paper

Generative Models are Unsupervised Predictors of Page Quality: A Colossal-Scale Study (PDF)

Featured image by Shutterstock/Asier Romero