Speeding up a Site: Quicker 404 Errors for files in Drupal

On the Archdiocese of Saint Louis website, we moved thousands of files around as part of our site migration from 49 separate Joomla sites to Drupal. Internally, all our file links were updated. However, there are thousands of hotlinks from different websites to the Archdiocesan website (for instance, the blog American Papist hits a missing file of a Church interior about 80 times a day).

This was creating a lot of overhead for the server, as Drupal would do a full bootstrap, sending out a fully-rendered 404 page on each missing file request.

Looking to Drupal forums for help, I found some help from kbahey, founder/owner of 2bits, a Drupal shop that specializes in speeding up large Drupal sites. The advice in the issue linked above has the proper code for speeding up requests on a Drupal 7 site, but the code is slightly different for Drupal 6.x. Here's the code:

<?php
/**
* 404 Handling, to conserve server resources upon missing image/text/non-html file.
*/
if (preg_match("/\.(txt|png|gif|jpe?g|shtml?|css|js|ico|swf|flv|cgi|bat|pl|dll|exe|asp)$/", $_SERVER['QUERY_STRING'])) {
header('HTTP/1.0 404 Not Found');
print
'<html><head><title>404 Not Found</title></head><body><h1>404 Not Found</h1><p>The requested URL was not found on this server. If you think you reached this page in error, please visit the <a href="http://archstl.org/">Archdiocese of Saint Louis home page</a> and search for the page or file for which you are looking.</p></body></html>';
exit();
}
?>

You can simply paste this at the bottom of the settings.php file.

In a nutshell, it does the following: If the request is for a file in it's list of regular expressions, and the file is not found, it will output 404 headers, along with a simple HTML string.

This is a heck of a lot faster than allowing a full Drupal bootstrap on each missing file request—an operation which, on our server, takes up a nice 30 MB of RAM, plus a lot extra CPU usage per request, when compared with using the code above.

Your rating: None Average: 5 (1 vote)

Comments

oscatholic's picture

Just FYI, this caused some problems with ImageCache. See: http://drupal.org/node/76824#comment-2925834

Advancing the faith.

KevinHerrington's picture

As a quick fix for the imagecache issue, you could just remove the image types from the regular expression, or add another check to allow Drupal to run if 'imagecache' is in the URL.

<?php

// Allow all images to run, in case of Imagecache
if (preg_match("/\.(txt|shtml?|css|js|ico|swf|flv|cgi|bat|pl|dll|exe|asp)$/", $_SERVER['QUERY_STRING'])) {

// Allow URLs containing 'imagecache' to run
if (preg_match("/\.(txt|png|gif|jpe?g|shtml?|css|js|ico|swf|flv|cgi|bat|pl|dll|exe|asp)$/", $_SERVER['QUERY_STRING']) && !strpos($_SERVER['QUERY_STRING'], 'imagecache')) {

oscatholic's picture

Perfect! Didn't even think about adding the imagecache check for that; however, this could still cause problems if the URLs for your imagecache'd files don't always have 'imagecache' in them...

Advancing the faith.

KevinHerrington's picture

True, but imagecached files with have 'imagecache' in the URL out of the box. They all go to /imagecache//. You'd have to go to a lot of work to keep imagecache out of the URL.

oscatholic's picture

In my use case, I have a view that displays an imagecache preset, without 'imagecache' in the path, so it's not quite working like it should :(

In most cases, though, this would be great!

Advancing the faith.

ajayg's picture

Does this work with imagecache? It is supposed to create a cached image if not found and wondering this setting conflicts with imagecache working.

oscatholic's picture

See the above comment - it can work with imagecache using the code above.

Advancing the faith.

Khalid's picture
oscatholic's picture

Nice addition to the code; I hope it gets in for Drupal 7.x!

Advancing the faith.

Post new comment

The content of this field is kept private and will not be shown publicly.