CheckUpDown

 

007 - I/O Interrupt

Introduction

Any client (e.g. your Web browser or our CheckUpDown robot) goes through the following cycle when it communicates with the Web server:

  1. Obtain an IP address from the IP name of the site (the site URL without the leading 'http://'). This lookup (conversion of IP name to IP address) is provided by domain name servers (DNSs).
  2. Open an IP socket connection to that IP address.
  3. Write an HTTP data stream through that socket.
  4. Receive an HTTP data stream back from the Web server in response. This data stream contains status codes whose values are determined by the HTTP protocol. Parse this data stream for status codes and other useful information.

This error occurs in the last two steps above i.e. our CheckUpDown robot has obtained an IP address and opened a valid socket connection to that IP address, but the subsequent exchange of HTTP data streams between us and your Web server fails altogether or times out. We report this as a 007 error.

Reasons for 007 errors

007 errors can be viewed as slow overall response time. The way to approach this problem is to start with your Web server and work from there backwards out to the Internet. On all the computers involved along this path, you need to look for high workloads, particularly if these are transient spikes e.g. a system has sudden bursts of 99% processor utilization.

Your Web server should be capable of dealing with multiple reads/writes (of HTTP data) via the socket connection we have established. IP connections over the Internet take time - there are inevitable delays simply because of the size and complexity of the Internet itself. So we can not wait indefinitely for data to be exchanged with your Web server. On our internal checks, we set a 20-second limit for any exchange of data. This 20 seconds is fairly generous - a user accessing your Web site through a browser typically gives up long before 20 seconds have elapsed.

Your Web server may not respond in time (within 20 seconds) to a read request on our part. This slow response time from your Web server may be very transient i.e. due to sharp 'spikes' in activity on the computer running your Web server. The spikes may be Web related (the Web server may simply be too busy dealing with other HTTP requests) or not (there may be an unrelated CPU-intensive program running temporarily on your Web server).

There is also the possibility that the slow response time has nothing to do with your Web server. There might be other equipment which is temporarily slow e.g. an intervening firewall or router. Temporary delays might also occur on any number of intervening systems on the Internet itself, most of which you have no control over.

Resolving 007 errors

Finding the components which are not coping with their workloads can be difficult, particularly if spikes are unpredictable. You need detailed workload logs and accurate timing. If you do not have a decent audit trail to analyse previous errors, you might be forced to create artificial heavy workloads yourself to try to detect which parts of your configuration become overloaded.

If the effort to analyse workload proves too difficult, you might find that simply upgrading suspect pieces of equipment (e.g. the computer which is most likely to be overloaded) is the cheapest alternative.

We are confident that 007 errors do indeed represent 'down' time i.e. if we get a 007 error then it is highly likely that other Internet users would have got some kind of error reported to their browser at that time, or simply given up and surfed off somewhere else.

You should look at the pattern of 007 errors over time. If they are becoming more frequent and you know that traffic generally to your Web server is increasing, then some kind of upgrade - or better workload balancing - is called for.

We suggest you contact us for further discussion (email preferred) if you see persistent 007 errors.