|
Introduction
Any client system (e.g. your Web browser or our
CheckUpDown robot) goes through the following cycle when
it communicates with your Web server:
- Obtain an IP address from the IP name of your site
(your site URL without the leading 'http://'). This
lookup (conversion of IP name to IP address) is
provided by domain name servers (DNSs).
- Open an IP socket connection to that IP address.
- Write an HTTP data stream through that socket.
- Receive an HTTP data stream back from your Web
server in response. This data stream contains status
codes whose values are determined by the HTTP
protocol. Parse this data stream for status codes and
other useful information.
This error occurs in the last two steps above i.e. our
CheckUpDown robot has obtained an IP address and opened
a valid socket connection to that IP address, but the
subsequent exchange of HTTP data streams between us and
your Web server fails altogether or times out. We report
this as a 007 error.
Reasons for 007 errors
007 errors can be viewed as slow overall response time.
The way to approach this problem is to start with your
Web server and work from there backwards out to the
Internet. On all the computers involved along this path,
you need to look for high workloads, particularly if
these are transient spikes e.g. a system has sudden
bursts of 99% processor utilization.
Your Web server should be capable of dealing with
multiple reads/writes (of HTTP data) via the socket
connection we have established. IP connections over the
Internet take time - there are inevitable delays simply
because of the size and complexity of the Internet
itself. So we can not wait indefinitely for data to be
exchanged with your Web server. On our internal checks,
we set a 20-second limit for any exchange of data. This
20 seconds is fairly generous - a user accessing your
Web site through a browser typically gives up long
before 20 seconds have elapsed.
Your Web server may not respond in time (within 20
seconds) to a read request on our part. This slow
response time from your Web server may be very transient
i.e. due to sharp 'spikes' in activity on the computer
running your Web server. The spikes may be Web related
(the Web server may simply be too busy dealing with
other HTTP requests) or not (there may be an unrelated
CPU-intensive program running temporarily on your Web
server).
There is also the possibility that the slow response
time has nothing to do with your Web server. There might
be other equipment which is temporarily slow e.g. an
intervening firewall or router. Temporary delays might
also occur on any number of intervening systems on the
Internet itself, most of which you have no control over.
Resolving 007 errors
Finding the components which are not coping with their
workloads can be difficult, particularly if spikes are
unpredictable. You need detailed workload logs and
accurate timing. If you do not have a decent audit trail
to analyse previous errors, you might be forced to
create artificial heavy workloads yourself to try to
detect which parts of your configuration become
overloaded.
If the effort to analyse workload proves too difficult,
you might find that simply upgrading suspect pieces of
equipment (e.g. the computer which is most likely to be
overloaded) is the cheapest alternative.
We are confident that 007 errors do indeed represent
'down' time i.e. if we get a 007 error then it is highly
likely that other Internet users would have got some
kind of error reported to their browser at that time, or
simply given up and surfed off somewhere else.
You should look at the pattern of 007 errors over time.
If they are becoming more frequent and you know that
traffic generally to your Web server is increasing, then
some kind of upgrade - or better workload balancing - is
called for.
We suggest you contact us for further discussion (email
preferred) if you see persistent 007 errors.
|