Statistics Frequently Asked Questions
Stats / Webalizer Information
- What is the difference between 'HITS' and 'FILES'?
- Hits
- Files
- Pages
- Sites
- Visits
- KBytes
- Top Entry and Exit Pages
- Why don't the daily visit totals add up to the monthly totals?
Server Errors and What They Mean
- HTTP Error 304
- HTTP Error 403
- HTTP Error 404
- Analyzing 404 Errors
- 404 Errors - Summary
Stats / Webalizer Information
What is the difference between 'HITS' and 'FILES'?
HITS is the total number of HTTP requests that the server received during the reporting period. Any request made to the server is considered a hit. FILES is the number of hits that actually resulted in something being sent back to the user, such as an HTML page or image. 'Total Files' and '200 - OK' totals should be the same. If you add up the totals in the 'Hits by Response Code' section, it should be the same as the 'Total Hits' figure.
For a complete description of what all the numbers mean in the output:
Back to Top
Hits
Any request made to the server which is logged, is considered a 'hit'.
The requests can be for anything... html pages, graphic images, audio
files, CGI scripts, etc... Each valid line in the server log is
counted as a hit. This number represents the total number of requests
that were made to the server during the specified report period.
Back to Top
Files
Some requests made to the server, require that the server then send
something back to the requesting client, such as a html page or graphic
image. When this happens, it is considered a 'file' and the files
total is incremented. The relationship between 'hits' and 'files' can
be thought of as 'incoming requests' and 'outgoing responses'.
Back to Top
Pages
Pages are, well, pages! Generally, any HTML document, or anything
that generates an HTML document, would be considered a page. This
does not include the other stuff that goes into a document, such as
graphic images, audio clips, etc... This number represents the number
of 'pages' requested only, and does not include the other 'stuff' that
is in the page. What actually constitutes a 'page' can vary from
server to server. The default action is to treat anything with the
extension '.htm', '.html' or '.cgi' as a page. A lot of sites will
probably define other extensions, such as '.phtml', '.php3' and '.pl'
as pages as well. Some people consider this number as the number of
'pure' hits... I'm not sure if I totally agree with that viewpoint.
Some other programs (and people :) refer to this as 'Pageviews'.
Back to Top
Sites
Each request made to the server comes from a unique 'site', which can
be referenced by a name or ultimately, an IP address. The 'sites'
number shows how many unique IP addresses made requests to the server
during the reporting time period. This DOES NOT mean the number of
unique individual users (real people) that visited, which is impossible
to determine using just logs and the HTTP protocol (however, this
number might be about as close as you will get).
Back to Top
Visits
Whenever a request is made to the server from a given IP address
(site), the amount of time since a previous request by the address
is calculated (if any). If the time difference is greater than a
pre-configured 'visit timeout' value (or has never made a request before),
it is considered a 'new visit', and this total is incremented (both
for the site, and the IP address). The default timeout value is 30
minutes (can be changed), so if a user visits your site at 1:00 in
the afternoon, and then returns at 3:00, two visits would be registered.
Note: in the 'Top Sites' table, the visits total should be discounted
on 'Grouped' records, and thought of as the "Minimum number of visits"
that came from that grouping instead. Note: Visits only occur on
PageType requests, that is, for any request whose URL is one of the
'page' types defined with the PageType option. Due to the limitation
of the HTTP protocol, log rotations and other factors, this number
should not be taken as absolutely accurate, rather, it should be
considered a pretty close "guess".
Back to Top
KBytes
The KBytes (kilobytes) value shows the amount of data, in KB, that
was sent out by the server during the specified reporting period. This
value is generated directly from the log file, so it is up to the
web server to produce accurate numbers in the logs (some web servers
do stupid things when it comes to reporting the number of bytes). In
general, this should be a fairly accurate representation of the amount
of outgoing traffic the server had, regardless of the web servers
reporting quirks.
Note: A kilobyte is 1024 bytes, not 1000 :)
Back to Top
Top Entry and Exit Pages
The Top Entry and Exit tables give a rough estimate of what URL's
are used to enter your site, and what the last pages viewed are.
Because of limitations in the HTTP protocol, log rotations, etc...
this number should be considered a good "rough guess" of the actual
numbers, however will give a good indication of the overall trend in
where users come into, and exit, your site.
Back to Top
Why don't the daily visit totals add up to the monthly total?
You cannot add up the daily visit totals and compare them to the monthly total, they are different reporting periods. For example, if someone visits your site at 11:45pm and stays until 12:15am, the monthly total would show one visit, while the daily totals will show two (one for each day).
Back to Top
Server Errors and What They Mean
304 - Not modified
HTTP error 304
This error is specifically defined in the HTTP protocol. It does not really indicate an error as such, but rather indicates that the resource for the requested URL has not changed since last accessed or cached. The 304 status code should only be returned if allowed by the client (e.g. your Web browser). The client specifies this in the HTTP data stream sent to your Web server e.g. via If_Modified_Since headers in the request.
Systems that cache or index Web resources (such as search engines) often use the 304 response to determine if the information you previously gathered for a particular URL is now out-of-date.
Back to Top
HTTP error 403
This error is specifically defined in the HTTP protocol. Your Web server thinks that the HTTP data stream sent by the client (e.g. your Web browser) was correct, but access to the resource identified by the URL is forbidden for some reason.
These indicate a fundamental access problem, which may be difficult to resolve because the HTTP protocol allows the Web server to give this response without providing any reason at all. So the 403 error is equivalent to a blanket 'NO' by your Web server - with no further discussion allowed.
By far the most common reason for this error is that directory browsing is forbidden for the web site. Most Web sites want you to navigate using the URLs in the Web pages for that site. You do not often allow you to browse the file directory structure of the site.
This URL should will fail with a 403 error saying "Directory browsing failed - access forbidden". This is true for most Web sites on the Internet - your Web server has "Allow directory browsing" set OFF.
Back to Top
HTTP error 404
This error is specifically defined in the HTTP protocol. Your Web server thinks that the HTTP data stream sent by the client (e.g. your Web browser) was correct, but simply can not provide the access to the resource specified by your URL. This is equivalent to the 'return to sender - address unknown' response for conventional postal mail services.
This error is easily shown in a Web browser if you try a URL with valid domain name but invalid page e.g. http://www.ibm.com/ggggggg.html.
Back to Top
Analyzing 404 errors
For top level URLs (such as www.isp.com), the first possibility is that the request for your site URL has been directed to a Web server that thinks it never had any pages for your Web site. This is possible if DNS entries are fundamentally corrupt, or if your Web server has corrupt internal records. The second possibility is that the Web server once hosted the Web site, but now no longer does so and can not or will not provide a redirection to another computer which now hosts the site. If your site is completely dead - now effectively nowhere to be found on the Internet - then the 404 message makes sense. However if your site has recently moved, then an 404 message may also be triggered. This is also a DNS issue, because the old Web server should no longer be accessed at all - as soon as global DNS entries are updated, only your new Web server should be accessed.
For low-level URLs (such as www.isp.com/products/list.html), this error can indicate a broken link. You can see this easily by trying the URL in a Web browser. Most browsers give a very clear '404 - Not Found' message.
Back to Top
404 errors - summary
Provided that your Web site is still to be found somewhere on the Internet, 404 errors should be rare. For top level URLs, they typically occur only when there is some change to how your site is hosted and accessed, and even these typically disappear within a week or two once the Internet catches up with the changes you have made. For low-level URLs, the solution is almost always to fix your Web pages so that the broken hypertext link is corrected.
Back to Top
|