search

Setting up an Apache Solr Search Server (for many sites/hosts)

Magnifying GlassIn the Archdiocese of St. Louis, I manage more than 15 separate Drupal websites (plus a few others), and I have often wanted to use Apache Solr for search across all these sites. I finally had some time to tackle this issue, and I have a pretty good (and very fast) Solr server set up, and this server is shared across all these sites on two (so far) different webservers through two different hosting companies.

The main Archdiocesan sites (archstl.org, archstldev.com, and stlouisreview.com) are all hosted via SoftLayer in Dallas, while Catholic Youth Apostolate sites (like stlyouth.org and cycstl.net) are hosted via Hot Drupal in North Carolina.

I was able to set up a linode (linode.com) for less than $20 to run Apache Solr via Jetty, and that server is then accessible to all our other servers to send and receive search index data. This solution allows our main web servers to keep resources free from expensive MySQL search queries and the large databases that result from storing 20k+ nodes' search data in the main site DB.

You can find the process by which I set up the search server in this issue on the Development website. The best thing about this system is that I can really make the search server fly; ping takes about 30-40ms between the search server and our other servers, and queries only take about 150-250ms to reach the websites.

Any large organization looking to vastly improve search performance (and usability), especially on a Drupal site (it's so easy, with the Apache Solr Search Integration module pluggable right out of the box), should look into setting up a dedicated search VPS or server (depending on your search traffic).

Our linode Solr server typically sits close to idle, even at peak hours (right now it's showing 0.00, 0.00, 0.00), and I'll probably set it up to do some other tasks off-site as well, since it has the spare CPU, memory and disk space available (and a really fat pipe to the Internet!).

One-Page Quick SEO Optimization

Today I had to make some updates to the Archdiocese of Saint Louis' Leadership page. While I was making the updates, I noticed a pattern on the page that was very ineffective in terms of giving proper keyword metadata to Google for page links.

For each leader in the Archdiocese, there was a link to "Read more..." at the end of the leader's description. Google and other spiders take that 'Read more' text and expect it to mean something, so they give a little weight (but not much) to the words 'read' and 'more' when searched in tandem with content on the page the words link to.

However, to give Google more context, and to let our pages get a tiny bit of extra link juice, I linked the names of the leaders directly to their pages (instead of 'Read more' referring to Archbishop Robert J. Carlson, now 'Archbishop' 'Robert' 'J' and 'Carlson' refer to him!):

<a href="/archstl/page/archbishop-robert-j-carlson">Most Reverend Robert J. Carlson</a>

Then I set all the 'Read more...' links to rel="nofollow":

<a href="/archstl/page/archbishop-robert-j-carlson" class="readon" rel="nofollow">Read more...</a>

This tells Google that it can disregard the 'Read more...' link, and lets Google instead use the more contextually-sound link (with SEO terms built in).

Drupal Views Filters: Making Exposed Searches User-Friendly

One of the main new features of the Archdiocese of Saint Louis' website (to launch on February 22!) is the much-improved parish and school searching capabilities. There are many facets to these sections of the site; everything is built using the combination of nodes built with CCK, Views, and Mapstraction (for Google Map interfaces).

Parish Search by Name

One of the main annoyances with most implementations of parish and school searching that I've found (and I've tested almost every U.S. Archdiocese's website for this functionality) is the fact that searches are extremely rigid - if you don't type in the exact terms for the title of the parish in the parish database, you won't get any results.

For instance, type in "St. Luke," and you might get a result for St. Luke parish. However, type in "Saint Luke," and you get nothing. Or, what if you type in "Saints Joachim and Anne," but the parish is in the database as "Sts. Joachim & Anne"? Continue Reading »

Mining the Catechism with Perl

There are a few copies of the Catechism of the Catholic Church online, and they all have a very simple search interface.  While this might be helpful when looking up words like "Incarnation" or "Purgatory", these search interfaces are not very robust.  What's more, they don't enable readers to identify paragraphs of the Catechism which make reference to a particular passage of Scripture.

Enter the Catechism Search Tool made available at The Cross Reference.  This utility, approved for use by the USCCB Subcommittee on the Catechism, enables Catholics (and others) to search and view the entire text of the Catechism -- paragraphs and footnotes -- in a variety of ways:

  • By reference to Scripture verses
  • By text (exact phrase, all words, any words, partial words, even support for regular expressions)
  • By paragraph number
  • By Catechism section

The full content of the Catechism is made available, including the cross-references between paragraphs. Continue Reading »

Syndicate content