Unlike other Apache handlers, filter handlers may get invoked more than once during the same request. Filters get invoked as many times as the number of bucket brigades sent from the upstream filter or content provider.

For example, if a content-generation handler sends a string, and then forces a flush, following with more data:

# assuming buffered STDOUT ($|=  =0)
$r->print("foo");
$r->rflush;
$r->print("bar");

Apache will generate one bucket brigade with two buckets (there are several types of buckets that contain data—one of them is transient):

bucket type       data
----------------------
1st    transient   foo
2nd    flush

and send it to the filter chain. Then, assuming that no more data was sent after print("bar"), it will create a last bucket brigade containing data:

bucket type       data
----------------------
1st    transient   bar

and send it to the filter chain. Finally it'll send yet another bucket brigade with the EOS bucket indicating that no more will be data sent:

bucket type       data
----------------------
1st    eos

In our example the filter will be invoked three times. Notice that sometimes the EOS bucket comes attached to the last bucket brigade with data and sometimes in its own bucket brigade. This should be transparent to the filter logic, as we will see shortly.

A user may install an upstream filter, and that filter may decide to insert extra bucket brigades or collect all the data in all bucket brigades passing through it and send it all down in one brigade. What's important to remember when coding a filter is to never assume that the filter is always going to be invoked once, or a fixed number of times. You can't make assumptions about the way the data is going to come in. Therefore, a typical filter handler may need to split its logic into three parts, as depicted in Figure 25-4.

Figure 25-4

Figure 25-4. mod_perl 2.0 filter logic

Jumping ahead, we will show some pseudocode that represents all three parts. This is what a typical filter looks like:

sub handler {
    my $filter = shift;

    # runs on first invocation
    unless ($filter->ctx) {
        init($filter);
        $filter->ctx(1);
    }

    # runs on all invocations
    process($filter);

    # runs on the last invocation
    if ($filter->seen_eos) {
        finalize($filter);
    }

    return Apache::OK;
}
sub init     { ... }
sub process  { ... }
sub finalize { ... }

Let's examine the parts of this pseudofilter:

  1. Initialization

    During the initialization, the filter runs all the code that should be performed only once across multiple invocations of the filter (during a single request). The filter context is used to accomplish this task. For each new request, the filter context is created before the filter is called for the first time, and it's destroyed at the end of the request. When the filter is invoked for the first time, $filter->ctx returns undef and the custom function init( ) is called:

    unless ($filter->ctx) {
        init($filter);
        $filter->ctx(1);
    }

    This function can, for example, retrieve some configuration data set in httpd.conf or initialize some data structure to its default value. To make sure that init( ) won't be called on the following invocations, we must set the filter context before the first invocation is completed:

    $filter->ctx(1);

    In practice, the context is not just served as a flag, but used to store real data. For example, the following filter handler counts the number of times it was invoked during a single request:

    sub handler {
        my $filter = shift;
    
        my $ctx = $filter->ctx;
        $ctx->{invoked}++;
        $filter->ctx($ctx);
        warn "filter was invoked $ctx->{invoked} times\n";
    
        return Apache::DECLINED;
    }

    Since this filter handler doesn't consume the data from the upstream filter, it's important that this handler returns Apache::DECLINED, so that mod_perl will pass the bucket brigades to the next filter. If this handler returns Apache::OK, the data will simply be lost.

  2. Processing

    The next part:

    process($filter);

    is unconditionally invoked on every filter invocation. This is where the incoming data is read, modified, and sent out to the next filter in the filter chain. Here is an example that lowers the case of the characters passing through:

    use constant READ_SIZE  => 1024;
    sub process {
        my $filter = shift;
        while ($filter->read(my $data, READ_SIZE)) {
            $filter->print(lc $data);
        }
    }

    Here the filter operates on only a single bucket brigade. Since it manipulates every character separately, the logic is really simple.

    In more complicated filters, the filters may need to buffer data first before the transformation can be applied. For example, if the filter operates on HTML tokens (e.g., <img src="me.jpg">), it's possible that one brigade will include the beginning of the token (<img ) and the remainder of the token (src="me.jpg" >) will come in the next bucket brigade (on the next filter invocation). In certain cases it may involve more than two bucket brigades to get the whole token, and the filter will have to store the remainder of the unprocessed data in the filter context and then reuse it in the next invocation. Another good example is a filter that performs data compression (compression usually is effective only when applied to relatively big chunks of data)—if a single bucket brigade doesn't contain enough data, the filter may need to buffer the data in the filter context until it collects enough of it.

  3. Finalization

    Finally, some filters need to know when they are invoked for the last time, in order to perform various cleanups and/or flush any remaining data. As mentioned earlier, Apache indicates this event by a special end-of-stream token, represented by a bucket of type EOS. If the filter is using the streaming interface, rather than manipulating the bucket brigades directly, it can check whether this is the last time it's invoked using the $filter->seen_eos method:

    if ($filter->seen_eos) {
        finalize($filter);
    }

    This check should be done at the end of the filter handler, because sometimes the EOS token comes attached to the tail of data (the last invocation gets both the data and the EOS token) and sometimes it comes all alone (the last invocation gets only the EOS token). So if this test is performed at the beginning of the handler and the EOS bucket was sent in together with the data, the EOS event may be missed and the filter won't function properly.

    Filters that directly manipulate bucket brigades have to look for a bucket whose type is EOS for the same reason.

Some filters may need to deploy all three parts of the described logic. Others will need to do only initialization and processing, or processing and finalization, while the simplest filters might perform only the normal processing (as we saw in the example of the filter handler that lowers the case of the characters going through it).