apache

Setting up an Apache Solr Search Server (for many sites/hosts)

Magnifying GlassIn the Archdiocese of St. Louis, I manage more than 15 separate Drupal websites (plus a few others), and I have often wanted to use Apache Solr for search across all these sites. I finally had some time to tackle this issue, and I have a pretty good (and very fast) Solr server set up, and this server is shared across all these sites on two (so far) different webservers through two different hosting companies.

The main Archdiocesan sites (archstl.org, archstldev.com, and stlouisreview.com) are all hosted via SoftLayer in Dallas, while Catholic Youth Apostolate sites (like stlyouth.org and cycstl.net) are hosted via Hot Drupal in North Carolina.

I was able to set up a linode (linode.com) for less than $20 to run Apache Solr via Jetty, and that server is then accessible to all our other servers to send and receive search index data. This solution allows our main web servers to keep resources free from expensive MySQL search queries and the large databases that result from storing 20k+ nodes' search data in the main site DB.

You can find the process by which I set up the search server in this issue on the Development website. The best thing about this system is that I can really make the search server fly; ping takes about 30-40ms between the search server and our other servers, and queries only take about 150-250ms to reach the websites.

Any large organization looking to vastly improve search performance (and usability), especially on a Drupal site (it's so easy, with the Apache Solr Search Integration module pluggable right out of the box), should look into setting up a dedicated search VPS or server (depending on your search traffic).

Our linode Solr server typically sits close to idle, even at peak hours (right now it's showing 0.00, 0.00, 0.00), and I'll probably set it up to do some other tasks off-site as well, since it has the spare CPU, memory and disk space available (and a really fat pipe to the Internet!).

Caching a Page; Saving a Server

A couple months ago, the Archdiocese of Saint Louis announced that a new Archbishop had been chosen (then-Archbishop-elect Robert J. Carlson). For the announcement, the Archdiocese streamed the press conference online, then posted pictures on the St. Louis Review website of the day's events (updated every hour or two).

Pageviews on April 21, 2009 - Archstl.org
Pageviews for April 21, 2009 on archstl.org – note that from 8-10 a.m., the server was practically down from the thousands of hits/requests it was getting. Just before 10 a.m., I enabled the caching described below. We announced everything via Twitter, SMS, Press Releases, and the web, just after 5 a.m.

During this period of time, the Archdiocesan website had over 2,000 visitors per hour, and almost all the visitors were hitting the home page. The website (run on Joomla 1.0.x) didn't have many caching mechanisms in place, and for almost a complete hour, the website was returning server errors as the processor was pegged at 100% utilization. Something had to be done! Continue Reading »

Syndicate content