|
Introduction
Any client system (e.g. your Web browser or our CheckUpDown
robot) goes through the following cycle when it communicates with your Web
server:
- Obtain an IP address from the IP name of your site (your site
URL without the leading 'http://'). This lookup (conversion of IP name to IP
address) is provided by domain name servers (DNSs).
- Open an IP socket connection to that IP address.
- Write an HTTP data stream through that socket.
- Receive an HTTP data stream back from your Web server in
response. This data stream contains status codes whose values are determined by
the HTTP protocol. Parse this data stream for status codes and other useful
information.
This error occurs in the last two steps above i.e. our
CheckUpDown robot has obtained an IP address and opened a valid socket
connection to that IP address, but the subsequent exchange of HTTP data streams
between us and your Web server fails altogether or times out. We report this as
a 007 error.
Reasons for 007 errors
007 errors can be viewed as slow overall response time. The way
to approach this problem is to start with your Web server and work from there
backwards out to the Internet. On all the computers involved along this path,
you need to look for high workloads, particularly if these are transient spikes
e.g. a system has sudden bursts of 99% processor utilization.
Your Web server should be capable of dealing with multiple
reads/writes (of HTTP data) via the socket connection we have established. IP
connections over the Internet take time - there are inevitable delays simply
because of the size and complexity of the Internet itself. So we can not wait
indefinitely for data to be exchanged with your Web server. On our internal
checks, we set a 20-second limit for any exchange of data. This 20 seconds is
fairly generous - a user accessing your Web site through a browser typically
gives up long before 20 seconds have elapsed.
Your Web server may not respond in time (within 20 seconds) to a
read request on our part. This slow response time from your Web server may be
very transient i.e. due to sharp 'spikes' in activity on the computer running
your Web server. The spikes may be Web related (the Web server may simply be
too busy dealing with other HTTP requests) or not (there may be an unrelated
CPU-intensive program running temporarily on your Web server).
There is also the possibility that the slow response time has
nothing to do with your Web server. There might be other equipment which is
temporarily slow e.g. an intervening firewall or router. Temporary delays might
also occur on any number of intervening systems on the Internet itself, most of
which you have no control over.
Resolving 007 errors
Finding the components which are not coping with their workloads
can be difficult, particularly if spikes are unpredictable. You need detailed
workload logs and accurate timing. If you do not have a decent audit trail to
analyse previous errors, you might be forced to create artificial heavy
workloads yourself to try to detect which parts of your configuration become
overloaded.
If the effort to analyse workload proves too difficult, you
might find that simply upgrading suspect pieces of equipment (e.g. the computer
which is most likely to be overloaded) is the cheapest alternative.
We are confident that 007 errors do indeed represent 'down' time
i.e. if we get a 007 error then it is highly likely that other Internet users
would have got some kind of error reported to their browser at that time, or
simply given up and surfed off somewhere else.
You should look at the pattern of 007 errors over time. If they
are becoming more frequent and you know that traffic generally to your Web
server is increasing, then some kind of upgrade - or better workload balancing
- is called for.
We suggest you contact us for further discussion (email
preferred) if you see persistent 007 errors. |
 We monitor your site for errors like 007.
Please click any of the buttons below for more information. |
|
|