optimization

Speeding up a Site: Quicker 404 Errors for files in Drupal

On the Archdiocese of Saint Louis website, we moved thousands of files around as part of our site migration from 49 separate Joomla sites to Drupal. Internally, all our file links were updated. However, there are thousands of hotlinks from different websites to the Archdiocesan website (for instance, the blog American Papist hits a missing file of a Church interior about 80 times a day).

This was creating a lot of overhead for the server, as Drupal would do a full bootstrap, sending out a fully-rendered 404 page on each missing file request.

Looking to Drupal forums for help, I found some help from kbahey, founder/owner of 2bits, a Drupal shop that specializes in speeding up large Drupal sites. The advice in the issue linked above has the proper code for speeding up requests on a Drupal 7 site, but the code is slightly different for Drupal 6.x. Here's the code:

<?php
/**
* 404 Handling, to conserve server resources upon missing image/text/non-html file.
*/
if (preg_match("/\.(txt|png|gif|jpe?g|shtml?|css|js|ico|swf|flv|cgi|bat|pl|dll|exe|asp)$/", $_SERVER['QUERY_STRING'])) {
header('HTTP/1.0 404 Not Found');
print
'<html><head><title>404 Not Found</title></head><body><h1>404 Not Found</h1><p>The requested URL was not found on this server. If you think you reached this page in error, please visit the <a href="http://archstl.org/">Archdiocese of Saint Louis home page</a> and search for the page or file for which you are looking.</p></body></html>';
exit();
}
?>

You can simply paste this at the bottom of the settings.php file.

In a nutshell, it does the following: If the request is for a file in it's list of regular expressions, and the file is not found, it will output 404 headers, along with a simple HTML string.

This is a heck of a lot faster than allowing a full Drupal bootstrap on each missing file request—an operation which, on our server, takes up a nice 30 MB of RAM, plus a lot extra CPU usage per request, when compared with using the code above.

Caching a Page; Saving a Server

A couple months ago, the Archdiocese of Saint Louis announced that a new Archbishop had been chosen (then-Archbishop-elect Robert J. Carlson). For the announcement, the Archdiocese streamed the press conference online, then posted pictures on the St. Louis Review website of the day's events (updated every hour or two).

Pageviews on April 21, 2009 - Archstl.org
Pageviews for April 21, 2009 on archstl.org – note that from 8-10 a.m., the server was practically down from the thousands of hits/requests it was getting. Just before 10 a.m., I enabled the caching described below. We announced everything via Twitter, SMS, Press Releases, and the web, just after 5 a.m.

During this period of time, the Archdiocesan website had over 2,000 visitors per hour, and almost all the visitors were hitting the home page. The website (run on Joomla 1.0.x) didn't have many caching mechanisms in place, and for almost a complete hour, the website was returning server errors as the processor was pegged at 100% utilization. Something had to be done! Continue Reading »

Syndicate content