A couple months ago, the Archdiocese of Saint Louis announced that a new Archbishop had been chosen (then-Archbishop-elect Robert J. Carlson). For the announcement, the Archdiocese streamed the press conference online, then posted pictures on the St. Louis Review website of the day's events (updated every hour or two).

Pageviews on April 21, 2009 - Archstl.org
Pageviews for April 21, 2009 on archstl.org – note that from 8-10 a.m., the server was practically down from the thousands of hits/requests it was getting. Just before 10 a.m., I enabled the caching described below. We announced everything via Twitter, SMS, Press Releases, and the web, just after 5 a.m.

During this period of time, the Archdiocesan website had over 2,000 visitors per hour, and almost all the visitors were hitting the home page. The website (run on Joomla 1.0.x) didn't have many caching mechanisms in place, and for almost a complete hour, the website was returning server errors as the processor was pegged at 100% utilization. Something had to be done!

Quick-Fix Mode

Quickly, I opened up the .htaccess file and denied all traffic except from the Archdiocesan offices (so I could access the site to make a fix!). Then I saved the home page as an .html file using FireFox (just the source), and uploaded it to the server. I finally put the rule DirectoryIndex index.html index.php into the .htaccess file so visitors going to www.archstl.org would be served the html file (which the server can send out much quicker and easier than any kind of PHP page, which requires time and memory for the server to build).

I then took out the deny rule, and the server was pretty happy for the rest of the day (although still hovering around 50% processor usage, due to the number of page requests). The St. Louis Review site, running on its own dedicated server through SoftLayer, and running with abundant page caching and file aggregation (through Drupal's built-in means), received about half as much traffic but ran a lot faster due to (a) a brand-new, very efficient server, and (b) lots of caching.

A Better Fix (for Now) - Home Page .html Caching

With the pressure of the Archbishop announcement past, I knew there would be another day where traffic would reach some pretty lofty heights: the day of the Installation Mass. We heavily promoted the Mass through all means possible, and advertised that it would be streamed live on our site (with up-to-the-minute pictures on the St. Louis Review site).

Since I didn't have time to fix all the bugs causing page caching to be more efficient on the Joomla site (www.archstl.org), I instead went the route of having a PHP script automatically update the site's home page as an html file every 5 minutes. The home page gets somewhere around 5-10,000 views every day, so this fix would help in the long run as well.

I found a great script online (can't find the link anymore) which I adapted (with the help of a few Twitter friends) to our site's peculiarities, and then set up a cron job on the server to run the script every five minutes. The script basically grabs the front page using PHP's file_get_contents() function, then saves it to the location you specify.

In addition to this, I made sure to redirect visitors to the cached .html file. For the home page, I just set the DirectoryIndex directive properly; for other pages (or for a script that will cache any page, anywhere), you'll need to write some fancy redirect rules for Apache to handle.

Here's the script:

";
   return false;
}

// $string is the path to the file you want to cache.

$string = file_get_contents("http://www.archstl.org/");

// Add a line showing when the page was last cached

$string .= "\r";

// Write the file to the cached .html file

$handle = fopen("/home/archstl/public_html/index.html", "w");

if (fwrite($handle, $string) === FALSE) {
       echo "Cannot write to file.";
       exit;
   }

echo "Success, wrote content to file!";

fclose($handle);

// Allow anyone (web visitors) to read the file

chmod("/home/archstl/public_html/index.html", 0755);

?>

The site handled the Installation Mass day pretty well, never peaking above 50% CPU, even with both archstl.org and stlouisreview.com both hosted on the same server (unlike the day of the announcement):

Pageviews Graph for June 10 - Day of the Archbishop's Installation Mass

Long-Term Solution

The above script is helping the site run pretty efficiently right now (average CPU utilization for archstl.org went down from 14-15% a day to 7-9% a day), but it would take a lot of work to get it integrated into our current Joomla system (especially since every section of the website is its own Joomla install, and would require some unique additions).

I'm a huge fan of the Drupal Boost module, as it has performed flawlessly on the St. Louis Review drupal-based site, serving up thousands of pages a day and keeping the processor utilization and memory footprint quite tidy (case study here), and I know there's a similar solution for WordPress (WP Super Cache), which allows html caching of everything for anonymous users. Plus, Drupal's built-in caching engines and CSS/JS aggregation options are quite amazing. The question in my mind is, where is this secret sauce for Joomla? I haven't found much in the way of high-performance options for Joomla sites, but I'm keeping my eyes open.