Quick Links
Google’s internal ranking systems have been under tight lock and key since the beginning, but that all changed earlier this month.
On May 5th, Moz founder Rand Fishkin received an email from an anonymous source (now confirmed) containing thousands of documents leaked from Google Search’s Content Warehouse API.
These documents spill the beans on more than 14,000 internal ranking features, which is a HUGE deal for the SEO world.
For decades, we could only speculate how Google’s algorithms work, mainly going off the results achieved through experimentation and our work with clients.
As any SEO can tell you, there’s always been a notable discrepancy between the results achieved through SEO strategy and Google’s public claims about what works.
For example, last year, we wrote about Gary Illyes’ claim that ‘links weren’t a top 3 ranking factor’ and how the statement directly conflicted with the results we generated for our clients.
What’s truly interesting about these leaked documents is how they vindicate what dedicated SEOs like us knew all along about things like backlinks, clicks, and domain authority.
There’s a lot to go over here, so stay tuned to learn the stuff Google doesn’t want you to know.
Crucial note: One thing missing from this leak is how each ranking system is weighted, so we don’t know how much importance Google places on each factor – just that they exist. |
How Did the Leak Happen?
The leak is surprising considering how much emphasis Google placed on keeping its search ranking systems a secret.
So what happened?
The documents first appeared on March 13th on the developer platform GitHub. They were accidentally placed there by an automated bot named Yoshi-code-bot (it’s still not entirely clear how this happened, but apparently, the bot mistakenly uploaded the internal version of the deprecated Document AI Warehouse).
The documents were then emailed to Rand Fishkin on May 5th from an anonymous source, now revealed to be SEO practitioner Erfan Azimi.
Azimi reached out to Fishkin in the hopes that he would publish an article sharing the leak with the public to refute notable ‘lies’ spread by Google.
Naturally, Fishkin had his suspicions and went to great lengths to confirm the validity of the leaked documents. He shared them with Mike King, founder of iPullRank, who claimed the documents appeared to be legitimate and that they were from Google’s internal search division (a number of other Google insiders backed up this claim).
So, what do the documents have to say about Google Search?
While we can’t dive into all 14,000 ranking systems, let’s take a look at the most notable findings.
Yes, Backlinks DO Matter
Google has been downplaying the importance of backlinks for a while now, likely in hopes of deterring would-be spammers (which is probably the reason behind all their untruthfulness).
Yet, as we at The HOTH knew from our work, backlinks move the SEO needle.
Documentation from the leak proves this, as Google’s PageRank algorithm is very much alive and well. In fact, every document has its homepage PageRank associated with it.
This likely serves as a proxy for web pages that don’t have a PageRank yet, which reinforces the importance of backlinks. In other words, backlinks are considered for the ranking of every single web page!
Also, link quality and relevance matters, as the algorithm checks for relevance on both sides.
Once again, we don’t have any data on how backlinks are weighted in comparison to other ranking factors, but this is quite vindicating for us link-builders.
Here are some other interesting findings related to backlinks.
Indexing Tier is Related to Link Value
A metric called sourceType correlates the quality of a link to its indexing tier.
Google’s index is broken into high, medium, and low-quality tiers.
Web pages in the highest tier are high-quality, frequently visited, and regularly updated websites that get stored in flash memory.
Quality sites that are deemed less important and aren’t updated as often get stored on solid-state drives (medium tier), and low-quality sites that are rarely updated are stored on standard hard drives.
This means that you want to target backlinks from websites in the highest index tier – which are trusted websites that publish accurate, up-to-date information (which are exactly the types of websites we’ve been telling marketers to target in our articles).
Link Spam Velocity Signals (Spammy Anchor Text)
Another thing we frequently preach is to have a balanced anchor text ratio, and this confirms that we were correct.
According to the documents, Google has a whole host of metrics they use to measure spikes in anchor text spam.
Note the first metric, phraseAnchorSpamCount, which says, “How many spam phrases are found in the anchors among unique domains.” This refers to exact-match anchors, which are typically a company’s ‘money’ keywords.
These metrics also demonstrate how Google identifies and devalues negative SEO attacks.
Google Measures ‘Domain Authority’
Moz’s Domain Authority is one of the most popular metrics for determining the ranking strength of a domain, and SEOs have used it religiously for years.
There’s also Ahrefs Domain Rating, which provides a similar metric.
However, they were always viewed as ‘third-party metrics’ that had nothing to do with Google’s actual algorithm. Also, Google employees always claimed they had no ‘domain authority’ metric (we’re looking at you, John Mueller).
The truth is Google has a metric called siteAuthority that it computes for every website, so we can finally put that issue to rest.
Font Size Matters for Keywords and Backlinks
This one was actually really surprising because it’s an ancient SEO technique that apparently has merit.
In the early days of SEO, optimizers would bold, underline, and increase the font size of keywords to make them stand out.
It definitely seems like something Google would have devalued by now, but evidently, this isn’t the case.
The avgTermWeight metric tracks the average weighted font size for keywords, and the same is true for backlink anchor text.
Content Demotions
The documentation revealed numerous ways that Google’s algorithm can demote content in its search rankings, which include:
- When a link doesn’t match the site it links to (Google checks for relevance on both sides)
- If SERP signals indicate user dissatisfaction (it’s integral to provide a stellar user experience)
- Exact match domains (once again, it’s not wise to use too many exact match anchors)
- If you receive a large number of negative product reviews
- If there’s porn on your website
These demotions prove Google’s dedication to providing a great user experience, and they reflect the importance of creating content for humans first, and search engines second.
SEO Moving Forward
Now that the cat’s out of the bag, so to speak, it’ll be interesting to see how SEO strategies change.
The leak is one of the biggest stories in SEO history, so the buzz is only bound to keep intensifying. Stay tuned for more updates on the topic as new developments come to light.
If you need expert help crafting a winning SEO strategy for your business, don’t wait to check out HOTH X, our fully managed service that’ll simplify your SEO success.