Creating a Search Service - a Tutorial

Gregory Kozlovsky

A search service consists of an index database, a search daemon, and a front end. First, we must choose a name for the index, for example mysearch.

Note: locust search daemon cannot be run under a priviledged (root) account for security reasons. In order to avoid problems with file access permissions, we advice to run all locust executables under a single account.

Create a MySQL user account

Before we start with configuration, we must create a MySQL user account for locust. This account should not be confused with the UNIX user account under which we run locust. Several index databases can be created under one MySQL user account. You can choose to use more than one MySQL user account. To create an account for MySQL user locust with the password "asdf", login into the MySQL interactive client mysql as the root and execute the command:

grant usage on *.* to locust@localhost identified by 'asdf';

Create a configuration directory

The first step is to create an index configuration subdirectory, named after the index, in the locust configuration directory /etc/locust. To do this, go to the /etc/locust directory and copy the sample configuration subdirectory sample-config including all the default configuration files in it into the subdirectory mysearch using the following command:

cp –r sample-config mysearch

Then, go into the subdirectory mysearch and edit the storage configuration file storage.cnf. Specify the MySQL user account that was created in the previous step and its password as values for the options "User" and "Passwd" respectively.

Next, specify site(s) to spider in the file sets.cnf. Modify a sample "Server" statement provided in this file.

Now, select a port number to be used for communication between the search daemon and frontends connecting to it. Every instance of the search daemon must use a distinct port. Select a free port in a high range, for example starting from 12332 up, and record it in the file port_assignments in the /etc/locust/frontend directory (the file port_assignments is for human use only).

Edit the search daemon configuration file searchd.cnf. Specify the same MySQL user account used in the storage.cnf and its password as values for the options "DBUser" and "DBPass" respectively. Use the port number selected for the search daemon as the value for the option "Port".

After your basic search is running you can study the User Guide and edit the configuration files to specify desired spidering, storage, search, and result presentation parameters.

Create an empty index database

Next, create an empty index database with the following command:

lcreatedb mysearch

The command will ask you for the MySQL root password.

Frontend configuration

If template-based HTML frontend is desired, create a hard link named, for example, mysearch.cgi to the file shtml.cgi in the directory /var/www/cgi-bin. For an XML frontend, create a hard link named, for example, xmysearch.cgi to sxml.cgi in the above directory.

For an HTML frontend CGI, create the correspondingly named configuration file mysearch.cnf by making copy of the file hsample.cnf in the /etc/locust/frontend directory. Similarly, for an XML frontend copy the file xsample.cnf into xmysearch.cnf. In the HTML frontend config file, edit the value of the parameter templateFile to point to a template file, normally located in the index configuration directory. The path can be specified relative to /etc/locust directory or given as an absolute path. In our example, set the value to mysearch/fe.tmpl .

Then, for both HTML and XML frontends set the value of the option "port" in the frontend configuration file to the same port number used in the search daemon configuration file.

Finally, for the convenience of having all the configuration files for a particular index accessible from a single directory, we recommend making a soft link from the index configuration directory to the frontend configuration file located in the /etc/locust/frontend directory.

Spidering

To spider your site(s), first clean the existing index database using the command

stortool –C mysearch

Then start spidering by

spider –N 32 mysearch

where number following –N is the number of worker threads and is selected depending on the number of sites to spider and the performance of your server.

After spidering is finished, check the index database statistics by

stortool –S mysearch

Now, convert the spider journal into a searchable reverse index by

deltamerge mysearch

Starting search service

And, finally, start the search daemon by

/usr/local/sbin/asearchd –RD mysearch

To open the search form, type the search frontend CGI URL, something like

myserver.mydomain/cgi-bin/mysearch.cgi

in your browser.

Setting automatic resumption of search services on reboot

To insure that search services are automatically restored upon rebooting, for each instance of search daemon add an entry to the initialization script /etc/init.d/locust. To do this, just copy the sample entry, uncomment it and replace the index name by the one you desire.

Automatic periodic respidering

If you wish your index to be periodically respidered, create an entry in the crontab table associated with the account used to run locust. To edit the table use the command crontab -e. When inside the crontab editor, copy the sample entry, uncomment it and replace the index name by the one you desire. Set the desired respidering times using the crontab syntax.