What’s the most popular site? Is the most popular site the highest-ranked in Alexa? (Not likely.) The site that lists first in the search engine? (Not necessarily.)
Unless you have the traffic logs, and you know how to read them, you’ll never know.
This post attempts to explain the difference between the different types of web statistics out there. It’s important to understand which statistics, and which combination of them, are most relevant, and which statistics are least relevant. More importantly, it’s useful to know when a site is overestimating its audience (which is easy to do if the site’s owner doesn’t know how to configure or convey those stats correctly).
Simultaneously the most popular statistic and the most misleading, a hit counter tracks every URL load, including accesses from spiders, bots, and reloads in a session. Depending on how your logs and/or statistics software is configured, it might even count every load of every css file, script, and image referenced by, and included in, the page. If it does, that could (falsely) give you a 10 for 1 reading on every site access. Although hits are a great gauge of bandwidth utilization, they are a very poor indicator of site popularity (especially if the site is the target of an overactive bot, a DOS attack, or a small group of loyal followers who like to reload it dozens of times a day to take part in chatter or gossip).
- Page Views
Probably the second most popular statistic. If used properly, this will represent the total number of times a page was (re)loaded from the site. It’s a better statistic than hits because, when used properly, references and includes are not counted and spider traffic is partially excluded as well. However, like hits, it can significantly overestimate the unique traffic experienced by a site.
- Entry Views
Used mainly with blogs, this counts the number of specific post accesses, as opposed to the number of times the main page was accessed.
One of the less popular statistics, and used mainly with portal and commerce sites that require login, it refers to the number of unique accesses of a site by a unique user identifier. It’s equal to the average number of unique visits times the number of unique visitors, and it’s a better indicator of site popularity than page views for a site whose visitors, on average, don’t visit more than a few times during the spanned time period.
- Unique Visits
Similar to sessions, except it refers to the number of unique accesses by IP. The difference is that if multiple visitors from the same IP access the site in the same time window (through a proxy server), the number of unique visitors could be under-represented.
- Unique IPs
Counts the number of unique IP addresses that accessed the site, and acts as a lower bound on the site’s popularity (since multiple individuals could access the site through the same IP address).
- Combination of Page Views and Unique IPs
Combined, one of the best, measures of a site’s popularity. You know the site has at least as many unique visitors as IPs and you know, based on the page views, about how many pages a unique visitor accesses in a given time period.
- Combination of Unique Sessions/Visits and Unique IPs
Combined, the other best measure of a site’s popularity. You know the site has at least as many unique visitors as IPs and you know, based on the sessions, about how many times a unique visitor visits the site.
So what are SI’s statistics? Over the past month:
What does this mean? It means that at least 10,825 people visited SI last month, an average of 2.3 times each, visiting 4.6 pages each. Since about 34% of traffic is search engine traffic, which is mostly accesses of a page or two, we can exclude this traffic. Revising our statistics, we can then estimate that 7,145 people visited SI an average of 3.5 times each, visiting an average of 7 pages each. Furthermore, given that about 39% of traffic comes from external referrals (SI has over 10,000 incoming links from numerous sites all over the internet that link directly to it and redistribute its feeds), and that this traffic displays irregular patterns (and accesses SI approximately 50% as much as regular readers), we can estimate that, last month, there were:
- 2925 regular readers who visited about 9.2 pages each in 4.2 visits
- 4220 irregular readers who visited about 4.6 pages each in 2.1 visits
- 3680 new readers who visited a page as a result of a search engine query
Finally, it is very important to justify the numbers. They must all be consistent. If the numbers don’t make sense, or if they are internally inconsistent, you are dealing with a site that really has no clue at all as to its traffic. The above numbers make very good sense since, while some readers will visit almost every day, my representative reader (who has no time to leave comments) is too busy to visit every day, but makes a point of visiting two (to three) times a week (often on Monday and Friday, which are the peaks of SI activity).
* Lower bound. This is one statistic I’m not able to retrieve by time period from the native blog stats tool, so it was extracted from one of the three third-party stats tools I also use, which rely on (java)scripts that can be cached or blocked, and therefore cause some hits/IPs to be missed.