Photo by Conny Schneider on Unsplash
How big is the Worldwide Web (and where do the numbers come from)?
A look under the hood
6 min read
How many websites are there in the world? This could be considered a good quiz show question, with contestants choosing from wildly different options to animated gasps from the audience and a growing (or diminishing) cash prize at the end. Or it could be a good bit of trivia to carry in your head and drop at the apposite moment in conversation, finishing with "not many people know that."
The reason I am interested however, is that having a sense of how big the Web is provides an important benchmark for estimating the environmental footprint of the Internet, and understand what priority to give to reducing its emissions.
Behind the numbers
There are a number of figures floating around, often without citations, and where backed by references, without any discussion or examination of the methodologies involved. Here I review the main sources of estimates, and how each arrives at a number.
Perhaps the most widely cited number comes from a company called Netcraft. Their exact methodology and tooling is not, as far as I can tell, fully public, but we can deduce that it relies on scanning of the IPv4 space, almost certainly using the standard approach of sending a TCP SYN packet to, for instance, port 80. If the port is closed, the host responds with an RST packet. If the port is open, the host responds with a TCP SYN/ACK packet indicating that a connection can be established. Put another way, Netcraft sends a signal to knock on the door of every Internet address, and logs every time someone shouts "coming!".
So according to Netcraft, out of all the IP addresses polled this way, 1.1 billion established a connection, indicating a website host in operation.
There are two problems with this.
1) Netcraft indicates that "The number of hosts found running internet web sites by the Web Server Survey is large (over 1.1 billion hostnames), but not exhaustive."
This suggests that it does not poll the entire IPv4 space, so the actual number of total open hosts is likely to be higher.
Internet Live Stats
This may explain why Internet Live Stats, an equally opaque (as regards specific methods and data sources), but equally authoritative Internet-wide counter (cited by W3C and by Tim Berners-Lee among others) has instead found as of August 2022, 1.9 billion websites, or nearly double the current Netcraft figure.
This higher figure is apparently arrived at by integrating a range of data sources (including Netcraft Server Surveys) and projecting second by second estimates using a proprietary [worldometers algorithm (worldometers.info/about/)]. So again, not an exact or comprehensive number, but rather a projection building using undisclosed assumptions and methods atop Netcraft's original numbers.
Hosts vs websites
The numbers above have to be caveated further by the fact that open named hosts do not translate into actually active websites. So the 1-2 billion hosts detected should certainly be counted when estimating the size of the Internet, but not when estimating the size of the Web, which is the corner of the Internet populated by websites.
Netcraft estimates that out of the 1.1 billion sites detected, only 200 million sites can be considered active.
Internet Live Stats likewise estimated the number of active websites at less that 200 million. The figure is likely derived from Netcraft itself, and suggests that 70-85% of the web is populated by dead or placeholder websites.
Builtwith.com offers a third estimation of the number of websites, again without full transparency as to actual methodology.
BuiltWith affirm that they have 100% coverage of the web, meaning "domains and subdomains" using "technology under patent", and arrive at a mysterious figure of 16.4 billion potential websites. This would be 6 times more than Netcraft's estimates.
Specifically, they assert their coverage included "100% .com/.net/.org & 100% active website coverage". This is not particularly clear or helpful. Instead of their headline claim to cover 100% of the web, in reality they appear to mean 100% of those 3 Top Level Domains plus all websites defined as active.
I can't tell how that could translate into a 16.4 billion figure, given there are 4.3 billion IPv4 addresses, 614 million registered domains, and 37 (characters allowed) to the power of 63 (maximum characters allowed) potential domain names per TLD, which comes to a 6 followed by 98 zeros, so a lot more than 16.4 billion. Alternatively, there are 189 million domains registered in the .com, .net and .org tlds in August 2022. Each domain can have up to 500 subdomains, but that makes the potential pool 90 billion, not 16.4. It could be that the figure estimates the potential number of virtual hosts for registered domains, or the maximum capacity of detected computers but it's impossible to know.
Similarly, they arrive at a hard to validate figure of 673 million active websites, defined as domains which are not "parked". This is more than 3 times the Netcraft and Internet Live Stats estimates. Given that it is estimated that around 70% of domains are parked, it seems questionable for BuiltWith to arrive at more active websites than registered domains.
A more reliable and transparent BuiltWith metric is their total number of websites detected, which stands at 557 million as of August 2022. This is closer in line with the 614 million registered domains. Of these, 34.5% (nearly 200 million websites), cannot resolve the domain, taking down the number to 350 million active websites.
Again, it's not immediately clear how these much more realistic figures, clearly based on actual IP scanning, correlate to the apparently untethered figures given earlier.
Probably the most reliable, and certainly the most transparent estimate of the size of the active web can be arrived at via Censys, which uses zmap to carry out Internet wide scans of the entire IPv4 space. Its methodology and code are fully open source, and backed by a large body of academic research and tooling extensions. Censys detects 1.9 billion services, 222.5 million hosts and 484.1 million virtual hosts as of August 2022.
This aligns much more closely to the 1.9 billion figure from Internet Live Stats quoted above, and the 225 million active hosts is likewise in range for both Internet Live Stats and Netcraft for what they call "active websites". The Netcraft/Internet Live Stats figure is likely an undercount ignoring virtual hosts: once you add these, the total rises to 706.6 million hosts which is in range of the 614 million registered domains. The extra 90 million likely reflects that Censys scans ports beyond those assigned to http and https, so may detect services beyond the surface web.
So how big is the Web?
Triangulating the datapoints from Netcraft, Internet Live Stats, BuiltWith, Domainnamesstat and Cesys, this is my own best estimate:
- 1.9 billion internet services running on IPv4 addresses
- 706 million hosts (200 million physical, 406 million virtual)
- 614 million registered domains
- 350 million resolvable domains (active websites)
So the active Worldwide Web can be said to contain 350 million websites. This is probably the best benchmark to use when estimating the size and impact of the web.
Watch this space for future posts building on this!