In a Google Search Central video, Gary Illyes from Google described a portion of the process of indexing webpages that includes choosing canonicals, defining canonicals for Google, and providing a synopsis of webpage signals. He also mentioned the focal point of a page and described what Google does with duplicates, suggesting a different perspective on them.
A Canonical Webpage: What Is It?
There are various ways to think about what canonical means from Google’s perspective as well as from the publisher’s and SEO’s point of view when it comes to our side of the search box.
Publishers designate what they consider to be the “original” webpage, while search engine optimisation (SEO) views canonicals as webpages that have been optimised to rank highest among their competitors.
That information from a Google employee like Gary Illyes is helpful since canonicalization for Google is quite different from what publishers and SEOs believe it to be.
In Google’s official literature about canonicalization, the process of selecting a canonical is referred to as “deduplication,” and the material enumerates five common reasons why a site may have duplicate pages.
Five Reasons For Duplicate Pages
- βRegion variants: for example, a piece of content for the USA and the UK, accessible from different URLs, but essentially the same content in the same language
- Device variants: for example, a page with both a mobile and a desktop version
- Protocol variants: for example, theΒ HTTP and HTTPSversions of a site
- Site functions: for example, the results of sorting and filtering functions of a category page
- Accidental variants: for example, the demo version of the site is accidentally left accessible to crawlersβ
- There are at least five reasons for duplicate pages, and canonicals can be viewed in three different ways.Gary offers a another perspective on canonicals.Canonicals Are Selected Using Signals
Ilyes discusses the signals that are utilised to choose canonicals and offers one more definition of a canonical, this time from the perspective of indexing.Gary clarifies:
- βGoogle determines if the page is a duplicate of another already known page and which version should be kept in the index, the canonical version.
- But in this context, the canonical version is the page from a group of duplicate pages that best represents the group according to the signals weβve collected about each version.β
Gary pauses to explain duplicate clustering, then picks up the conversation on signals again after a little interval.
He continued:
βFor the most part, only canonical pages appear in Search results. But how do we know which page is canonical?
So once Google has the content of your page, or more specifically the main content or centerpiece of a page, it will group it with one or more pages featuring similar content, if any. This is duplicate clustering.β
I just wanted to pause here to point out that Gary calls the main content the “centrepiece of a page,” which is intriguing considering Martin Splitt at Google created the idea of the Centrepiece Annotation. While Gary’s explanation helps, he didn’t truly describe what the Centrepiece Annotation is.
The section of the film where Gary discusses what signals are truly can be found below.
Illyes explains what βsignalsβ are:
βThen it compares a handful of signals it has already calculated for each page to select a canonical version.
Signals are pieces of information that the search engine collects about pages and websites, which are used for further processing.
Some signals are very straightforward, such as site owner annotations in HTML like rel=βcanonicalβ, while others, like the importance of an individual page on the internet, are less straightforward.β
Comparable Clusters Possess Only One Canonical
Gary goes on to say that for every cluster of duplicate pages in the search results, one page is selected to serve as the canonical. There is one canonical for each cluster of duplicates.
He goes on:
βEach of the duplicate clusters will have a single version of the content selected as canonical.
This version will represent the content in Search results for all the other versions.
The other versions in the cluster become alternate versions that may be served in different contexts, like if the user is searching for a very specific page from the cluster.β
Different Iterations of the Pages
The last section is really intriguing and should be taken into account as it can aid with ranking for many keyword variations, especially for e-commerce websites.
The content management system (CMS) occasionally generates multiple webpages to accommodate product changes, such as differences in size or colour, which may have an effect on the description. When a variant website more closely matches a search query, Google may decide to feature it higher in the search results.
This is a crucial consideration since, in an attempt to avoid the (fictitious) keyword cannibalization issue, one may find it easy to reroute no index variation webpages so they remain outside of the search index.
It can backfire to add a no index to pages that are variations of a single page since there are situations in which the variant pages score highest for a more complex search query that differs from the canonical page in terms of colors, sizes, or version numbers.
The Most Important Things to Remember About Canonicals (And More)
Gary’s exposition of canonicals covers a lot of ground, including several tangential subjects related to the core idea.
Here are seven things to remember:
The Centre piece is the name given to the primary material.
Each page that Google finds is given a “handful of signals” calculation.
Data utilized for “further processing” following the discovery of webpages is known as a signal.
Certain signals, such as hints (and presumably directives), are under the publisher’s control. Illyes pointed out that the rel=canonical link element is a clue.
The publisher has no control over other signals, such as the page’s significance inside the Internet environment.
Certain copies of a page can function as different versions.
Different iterations of websites
Leave a Reply