Generally, limits should be imposed on mod_perl processes to prevent mayhem if something goes wrong. There is no need to limit processes if the code does not have any bugs, or at least if there is sufficient confidence that the program will never overconsume resources. When there is a risk that a process might hang or start consuming a lot of memory, CPU, or other resources, it is wise to use the Apache::Resource module.

But what happens if a process is stuck waiting for some event to occur? Consider a process trying to acquire a lock on a file that can never be satisfied because there is a deadlock. The process just hangs waiting, which means that neither extra CPU nor extra memory is used. We cannot detect and terminate this process using the resource-limiting techniques we just discussed. If there is such a process, it is likely that very soon there will be many more processes stuck waiting for the same or a different event to occur. Within a short time, all processes will be stuck and no new processes will be spawned because the maximum number, as specified by the MaxClients directive, has been reached. The service enters a state where it is up but not serving clients.

If a watchdog is run that does not just check that the process is up, but actually issues requests to make sure that the service responds, then there is some protection against a complete service outage. This is because the watchdog will restart the server if the testing request it issues times out. This is a last-resort solution; the ideal is to be able to detect and terminate hanging processes that do not consume many resources (and therefore cannot be detected by the Apache::Resource module) as soon as possible, not when the service stops responding to requests, since by that point the quality of service to the users will have been severely degraded.

This is where the Apache::Watchdog::RunAway module comes in handy. This module samples all live child processes every $Apache::Watchdog::RunAway::POLLTIMEseconds. If a process has been serving the same request for more than $Apache::Watchdog::RunAway::TIMEOUTseconds, it is killed.

To perform accounting, the Apache::Watchdog::RunAway module uses the Apache::Scoreboard module, which in turn delivers various items of information about live child processes. Therefore, the following configuration must be added to httpd.conf:

<Location /scoreboard>
    SetHandler perl-script
    PerlHandler Apache::Scoreboard::send
    order deny,allow
    deny from all
    allow from localhost
</Location>

Make sure to adapt the access permission to the local environment. The above configuration allows access to this handler only from the localhostserver. This setting can be tested by issuing a request for http://localhost/scoreboard. However, the returned data cannot be read directly, since it uses a binary format.

We are now ready to configure Apache::Watchdog::RunAway. The module should be able to retrieve the information provided by Apache::Scoreboard, so we will tell it the URL to use:

$Apache::Watchdog::RunAway::SCOREBOARD_URL = "http://localhost/scoreboard";

We must decide how many seconds the process is allowed to be busy serving the same request before it is considered a runaway. Consider the slowest clients. Scripts that do file uploading and downloading might take a significantly longer time than normal mod_perl code.

$Apache::Watchdog::RunAway::TIMEOUT = 180; # 3 minutes

Setting the timeout to 0 will disable the Apache::Watchdog::RunAway module entirely.

The rate at which the module polls the server should be chosen carefully. Because of the overhead of fetching the scoreboard data, this is not a module that should be executed too frequently. If the timeout is set to a few minutes, sampling every one or two minutes is a good choice. The following directive specifies the polling interval:

$Apache::Watchdog::RunAway::POLLTIME = 60; # 1 minute

Just like the timeout value, polling time is measured in seconds.

To see what the module does, enable debug mode:

$Apache::Watchdog::RunAway::DEBUG = 1;

and watch its log file using the tail command.

The following statement allows us to specify the log file's location:

$Apache::Watchdog::RunAway::LOG_FILE = "/tmp/safehang.log";

This log file is also used for logging information about killed processes, regardless of the value of the $DEBUG variable.

The module uses a lock file in order to prevent starting more than one instance of itself. The default location of this file may be changed using the $LOCK_FILE variable.

$Apache::Watchdog::RunAway::LOCK_FILE = "/tmp/safehang.lock";

There are two ways to invoke this process: using the Perl functions, or using the bundled utility called amprapmon (mnemonic: ApacheModPerlRunAwayProcessMonitor).

The following functions are available:

stop_monitor( )
Stops the monitor based on the PID contained in the lock file. Removes the lock file.

start_monitor( )
Starts the monitor in the current process. Creates the lock file.

start_detached_monitor( )
Starts the monitor as a forked process (used by amprapmon). Creates the lock file.

In order for mod_perl to invoke this process, all that is needed is the start_detached_monitor( ) function. Add the following code to startup.pl:

use Apache::Watchdog::RunAway( );
Apache::Watchdog::RunAway::start_detached_monitor( );

Another approach is to use the amprapmon utility. This can be started from the startup.pl file:

system "amprapmon start";

This will fork a new process. If the process is already running, it will just continue to run.

The amprapmon utility could instead be started from cron or from the command line.

No matter which approach is used, the process will fork itself and run as a daemon process. To stop the daemon, use the following command:

panic% amprapmon stop

If we want to test this module but have no code that makes processes hang (or we do, but the behavior is not reproducible on demand), the following code can be used to make the process hang in an infinite loop when executed as a script or handler. The code writes "\0" characters to the browser every second, so the request will never time out. The code is shown in Example 5-12.

Example 5-12. hangnow.pl

my $r = shift;
$r->send_http_header('text/plain');
print "PID = $$\n";
$r->rflush;
while(1) {
    $r->print("\0");
    $r->rflush;
    sleep 1;
}

The code prints the PID of the process running it before it goes into an infinite loop, so that we know which process hangs and whether it gets killed by the Apache::Watchdog::RunAway daemon as it should.

Of course, the watchdog is used only for prevention. If you have a serious problem with hanging processes, you have to debug your code, find the reason for the problem, and resolve it, as discussed in Chapter 21.