A Breakdown of HTML Usage Across ~8 Million Pages

moz.com moz.com3 years ago in #Dev Love142

Not long ago, my colleagues and I at Advanced Web Ranking came up with an HTML study based on about 8 million index pages gathered from the top twenty Google results for more than 30 million keywords. We wrote about the markup results and how the top twenty Google results pages implement them, then went even further and obtained HTML usage insights on them. What does this have to do with SEO? The way HTML is written dictates what users see and how search engines interpret web pages. A valid, well-formatted HTML page also reduces possible misinterpretation — of structured data, metadata, language, or encoding — by search engines. This is intended to be a technical SEO audit, something we wanted to do from the beginning: a breakdown of HTML usage and how the results relate to modern SEO techniques and best practices. In this article, we’re going to address things like meta tags that Google understands, JSON-LD structured data, language detection, headings usage, social links & meta distribution, AMP, and more. Meta tags that Google understands When talking about the main search engines as traffic sources, sadly it’s just Google and the rest, with Duckduckgo gaining traction lately and Bing almost nonexistent. Thus, in this section we’ll be focusing solely on the meta tags that Google listed in the Search Console Help Center. Pie chart showing the total numbers for the meta tags that Google understands, described in detail in the sections below. <meta name=”description” content=”…”> The meta description is a ~150 character snippet that summarizes a page’s content. Search engines show the meta description in the search results when the searched phrase is contained in the description. SELECTOR COUNT <meta name=”description” content=”*”> 4,391,448 <meta name=”description” content=””> 374,649 <meta name=”description”> 13,831 On the extremes, we found 685,341 meta elements with content shorter than 30 characters and 1,293,842 elements with the content text longer than 160 characters. <title> The title is technically not a meta tag, but it’s used in conjunction with meta name=”description”. This is one of the two most important HTML tags when it comes to SEO. It’s also a must according to W3C, meaning no page is valid with a missing title tag. Research suggests that if you keep your titles under a reasonable 60 characters then you can expect your titles to be rendered properly in the SERPs. In the past, there were signs that Google’s search results title length was extended, but it wasn’t a permanent change. Considering all the above, from the full 6,263,396 titles we found, 1,846,642 title tags appear to be too long (more than 60 characters) and 1,985,020 titles had lengths considered too short (under 30 characters). Pie chart showing the title tag length distribution, with a length less than 30 chars being 31.7% and a length greater than 60 chars being about 29.5%. A title being too short shouldn’t be a problem —after all, it’s a subjective thing depending on the website business. Meaning can be expressed with fewer words, but it’s definitely a sign of wasted optimization opportunity. SELECTOR COUNT <title>*</title> 6,263,396 missing <title> tag 1,285,738 Another interesting thing is that, among the sites ranking on page 1–2 of Google, 351,516 (~5% of the total 7.5M) are using the same text for the title and h1 on their index pages. Also, did you know that with HTML5 you only need to specify the HTML5 doctype and a title in order to have a perfectly valid page? <!DOCTYPE html> <title>red</title> <meta name=”robots|googlebot”> “These meta tags can control the behavior of search engine crawling and indexing. The robots meta tag applies to all search engines, while…

Like to keep reading?

This article first appeared on moz.com. If you'd like to keep reading, follow the white rabbit.

View Full Article

Leave a Reply