BioMart 0.7 Quick Install Guide

1. Installing BioMart

The BioMart components are distributed in two separate packages.

biomart-perl contains the Perl API and all the BioMart applications that depend on it, such as MartView, MartService, MartURLAccess and DAS Annotation Server (if configured).

martj contains the Java API and all the BioMart applications that are written in Java, such as MartEditor, MartShell, MartExplorer and MartBuilder.

1.1 Downloading martj

1.1.1 Binary distribution

The best way to obtain martj is to download one of the pre-compiled binary distributions from http://www.biomart.org/.

The martj-bin.zip distribution is for Windows users, and contains the Java API and all the Java-based BioMart applications. The bin folder contains a number of .bat scripts for launching the applications. You can unpack it using WinZip or a similar application.

The martj-bin.tgz distribution is identical to the martj-bin.zip distribution, but is intended for Unix and Linux users. Instead of .bat scripts, it contains a number of .sh scripts which perform equivalent tasks. You can unpack it using:

tar -zxvf martj-bin.tgz

The martj-bin.dmg distribution is for MacOSX users, and contains bundles in the bin folder for each of the various Java-based BioMart applications. The bin folder also contains the MartShell application .command script, and the lib and data folders contain the files that MartShell depends on. The .dmg image file will automatically unpack itself when double-clicked.

1.1.2 Source distribution

The source code for martj is available for download via CVS, but you will need the ant tool installed if you subsequently wish to compile it. ant is available from http://ant.apache.org/. Download and install it as per the instructions on that website.

To check martj out from CVS, you need to type the following commands on the command line prompt. The password you need to enter when prompted is CVSUSER.

cvs -d :pserver:cvsuser@cvs.sanger.ac.uk:/cvsroot/biomart login

cvs -d :pserver:cvsuser@cvs.sanger.ac.uk:/cvsroot/biomart \
co -r release-0_7 martj



To compile martj using ant, change into the martj directory created by the commands above and type:

ant jar

This will create the martj.jar file inside the build folder. All other JAR files which the Java-based BioMart applications depend on can be found in the lib folder.

1.2 Downloading biomart-perl

There is no binary distribution for biomart-perl. It is only available as a source distribution from CVS.

To check biomart-perl out from CVS, you need to type the following commands on the command line prompt. The password you need to enter when asked is CVSUSER.

cvs -d :pserver:cvsuser@cvs.sanger.ac.uk:/cvsroot/biomart login

cvs -d :pserver:cvsuser@cvs.sanger.ac.uk:/cvsroot/biomart \
co -r release-0_7 biomart-perl

No compilation is required, but it will need to be configured before you can use it. For details on how to configure it, please refer to the section on installing biomart-perl.

1.3 Installing martj

You need to have Java 1.3 or later installed. You can get Java from http://java.sun.com/.

martj has been tested with Java 1.3, 1.4 and 1.5.

If you attempt to run any martj component on Java 1.6 or higher, and experience problems, please try the same series of actions with Java 1.5 before reporting a bug.

martj requires no further installation, except if you wish to use marts other than the ones defined by default. To do so requires you to modify the Registry file, which is discussed elsewhere in this document.

It is important that you do not modify the directory structure inside the martj folder, or move or copy any of the scripts from the bin folders to other locations. Running these scripts from other locations will not work.

1.4 Installing biomart-perl

1.4.1 Prerequisites

You need to have Perl version 5.6.0 or later installed first. You can get the latest version of Perl from http://www.perl.org/.

It is important that you do not modify the directory structure inside the biomart-perl folder, or move or copy any of the files within it to other locations.

biomart-perl depends on a number of Perl modules. When you run the configuration steps detailed elsewhere in this document it will tell you if any of the ones it needs are missing so the best plan is to run the configure.pl straight away from your biomart-perl directory and install any missing modules:

perl bin/configure.pl -r conf/registryURLPointer.xml

The easiest way to install these missing modules is to use CPAN shell. Unless you are knowledgeable about CPAN and know how to do otherwise, modules should be installed by the root user on Linux/Unix systems, or by the Administrator on MacOSX/Windows systems.

For each module reported as missing by the configuration step, type:

cpan -i Module::Name

You should replace Module::Name with the name of the module you are attempting to install, ideally by cut and pasting from the output from configure.pl. Read the questions CPAN asks during installation thoroughly, and answer yes when it asks you if you want to install missing dependencies. It is usually fine to accept the default responses for almost all questions it asks.

For reference, the list of required CPAN modules required at the time of writing is shown below. Version numbers are those that have been tested against, but other more recent versions may also work.

Dependency

Module name

Module version

API

XML-DOM

1.44

API

OLE-Storage_Lite

0.14

API

Exception-Class

1.23

API

libwww-perl

5.8

API

Log-Log4perl

1.05

API

Test-Exception

0.24

API

DBI and relevant DBD drivers

1.53

API

Digest::SHA

5.44

Website

IO::Compress::Gzip

2

Website

Number-Format

1.51

Website

Template-Toolkit

2.14

Website

Template-Plugin-Number-Format

1.01

Website

CGI-Session

4.14

Website

Readonly

1.13

Website

List-MoreUtils

0.22

Website

SpreadSheet-WriteExcel

2.17

Website

IO-Compress-Zlib

2.003

Website

SOAP-Lite

0.710.08

1.4.1.1 Apache installation for MartView

All references to MartView in this document that relate to configuration and maintenance equally apply to and affect MartService, as the two are part of the same application.

If you are going to be running a MartView/MartService/DAS server, you will also need to have an Apache web-server installed. This can be downloaded from http://httpd.apache.org/, and should be installed as per the guidelines on that website. You do not need to configure Apache to be used with BioMart, as the BioMart configuration scripts will handle that for you.

MartView works fine with all versions of Apache 1.3 or higher, including Apache 2.0 or higher.

MartView requires a few Apache extension modules to be installed. It does not matter if they are compiled into Apache or provided as dynamic modules. If you are missing any of them, the website where you can download them is listed beside each one.

Apache version

Module name

Module website

1.3, 1.4

mod_gzip

(optional, improves performance)

http://sourceforge.net/projects/mod-gzip/


mod_perl

http://perl.apache.org/

2.0 or higher

mod_deflate

Part of the Apache distribution.


mod_perl

http://perl.apache.org/

MartView has been designed to work best with Apache 2.0 or higher but Apache 2.0 is not a prerequisite.

It is highly recommended that you install the appropriate compression module for Apache before running MartView. If you do not, then MartView is likely to be very slow.

For Apache 2.0 or higher, use mod_deflate. For Apache versions before 2.0, use mod_gzip. Details are in the table above.

The Perl API and MartView configuration scripts check which Apache modules are available by using apxs (Apache 1.3/1.4) or apxs2 (Apache 2.0+). It requires this tool to live in the same location as the Apache binary (usually called apache, apache2, or httpd). If you have installed a binary distribution of Apache, you may also need to install the Apache development tools to make apxs/apxs2 available.

Other Apache modules are also used by MartView but these are all available with the default Apache installation and so you should not need to worry about having to install them.

1.4.1.2 Apache and ModPerl quick setup

If you do not already have Apache and ModPerl installed on your system then you can follow these steps to set them up.

These steps assume a Unix/Linux-based system. For other operating systems, please refer to the Apache and ModPerl websites for instructions.

For the purposes of these instructions it is assumed that you will be installing the latest versions of Apache and ModPerl that were available at the time of writing (Apache 2.2.14 and ModPerl 2.0.4).

BioMart software does not depend on specific versions of Apache or ModPerl. It can use other versions if required but these instructions are only valid for the versions specified.

First you will need to create a directory where you can work. In our example we will install Apache in /home/biomart/apache.

mkdir /home/biomart/apache

You will need to substitute your directory for this example location in all the commands and explanations in this section.

Next you need to download, unpack, and build Apache inside this directory.

cd /home/biomart/apache
mkdir source
cd source
wget http://www.apache.org/dist/httpd/httpd-2.2.14.tar.gz
tar zxvf httpd-2.2.14.tar.gz
cd httpd-2.2.14
./configure \
--enable-deflate \
--prefix=/home/biomart/apache \
make install

Apache has now been built and configured. Your Apache installation path is /home/biomart/apache/bin. You will need this when configuring biomart-perl to use this copy of Apache.

The last step is to download and install ModPerl.

Note for ModPerl you may need to upgrade your Perl CGI module to the latest version (the module name is CGI). You can use the same technique to upgrade this as you used to install the other Perl module dependencies for biomart-perl.

This only applies when using ModPerl 2.0 or higher, as per this example. You will know if you need to upgrade if errors show in the Apache error log that refer to Apache/Response.

cd /home/biomart/apache/source
wget http://perl.apache.org/dist/mod_perl-2.0-current.tar.gz
tar zxvf mod_perl-2.0-current.tar.gz
cd mod_perl-2.0.4
perl Makefile.PL \
PREFIX=/home/biomart/apache \ MP_APXS=/home/biomart/apache/bin/apxs
make install

Now ModPerl has been installed, the setup of Apache and ModPerl is complete.

1.4.2 Setting the Registry

The BioMart Perl API and MartView allow users to make queries against a predetermined list of marts, defined in a Registry file. When using the Perl API, this registry file can be located anywhere the user requires.

However, when using MartView, the registry file to be used must be located in the conf folder of the biomart-perl installation.

Registry files are in XML format. You will find a number of example registry files already in the conf folder after you download and unpack biomart-perl. They can be extended to widen the selection of marts to include others available publicly, or they can be adapted to serve your own local marts.

The structure of the registry file is discussed elsewhere in this document.

1.4.3 Configuring

1.4.3.1 Configuring the BioMart Perl API

Configuration of the Perl API requires a single step. Change into the biomart-perl directory, then type:

perl bin/configure.pl -r conf/registryURLPointer.xml

where registryURLPointer.xml is the registry file you wish to use from the conf folder.

The first question the configure script will ask is:

Do you want to install in API only mode [y/n] [n]:

Type y to install the API only.

During configuration it may point out that required Perl modules are missing. If this happens, follow the steps detailed in the prerequisites section above to install these missing Perl modules.

When it has completed successfully, you will see this final message:

Looks good.... you are done.

1.4.3.2 Configuring MartView

This section describes how to set up Apache for use with MartView.

You will need a Registry file defined – the default one is in the conf folder and is called registryURLPointer.xml. See section 3.3 for details on how to create your own registry file.

MartView can use only one single Registry file at a time.

Before running the configuration script for MartView some settings need to be defined in the settings.conf file in the conf directory:

Other optional settings can be configured from within settings.conf, allowing you to specify amongst others the colour schemes and wording to use on the site, and to specify an alternative web server to use for relative URLs from query results. It is possible to enable background result jobs where the results are stored in a server-side directory and the user emailed when they are ready. The directory settings and mail options are all configured here as well. Other configurable options include how long session related data is stored on the server and how webservices logging is managed.

You may also like to edit the site_header.tt document in the conf/templates/default directory in order to embed the MartView interface into a custom setting, for instance by adding the logo and navigation bars from your website. Instructions for both settings.conf and site_header.tt are embedded within those files.

When you are done customising the settings, the same script is used to configure MartView as for the Perl API.

From the biomart-perl directory type

perl bin/configure.pl -r conf/registryURLPointer.xml

changing registryURLPointer.xml to the registry file in the conf directory you wish to use.

It will ask:

Do you want to install in API only mode [y/n] [n]:

Type n to install the Perl API and MartView together.

During configuration it may point out that required Perl modules are missing. If this happens, follow the steps detailed in the prerequisites section above to install these missing Perl modules.

MartView will now proceed to process the Registry file and download the various dataset configurations defined therein. It will build templates for then compile all the pages of the MartView website. This may take quite some time. The final message you see before completion should be:

Compiling templates for visible datasets

1.4.4 Starting and stopping MartView

Change to the biomart-perl directory and type (substituting /my/chosen/Apache/binary for the correct Apache location chosen during configuration):

/my/chosen/Apache/binary -d $PWD -f $PWD/conf/httpd.conf

Test MartView by pointing your web browser to the following URL, substituting <host>, <port> and <location> for the values you configured earlier:

http://<host>:<port>/<location>/martview
e. g.
http://localhost:5555/biomart/martview

To stop it again, change to the biomart-perl directory and type:

kill `cat logs/httpd.pid`

If the httpd.pid file is damaged or missing, you will have to identify and kill the Apache processes manually. This is potentially dangerous if you have more than one Apache instance running on your machine and so should be done with care. Therefore you should always be careful not to damage httpd.pid.

1.4.5 Troubleshooting MartView startup

The following information may help if you find that MartView will not start up correctly. Thanks to Eric Ross for providing most of it.

1.4.5.1 Useless use of AllowOverride (in console)

This is an Apache version issue and can be safely resolved by deleting the offending lines from httpd.conf.

1.4.5.2 Couldn't determine Username (in console)

Your operating system may not have a definition set up for the Apache web browser process owner. It can be resolved by adding User www to the top of the httpd.conf file. Also make sure the ownership of the biomart directory is www.

If Apache runs under a different user than www on your system, you should use that instead in the fix above.

1.4.5.3 No such file or directory (in the log)

The martview, martresults and martservice scripts can't find Perl.

Modify the first line of each script in the bin directory to point to the correct location of your Perl installation (which is often /usr/bin/perl).

1.4.5.4 Can't call method "settingsParams" on an undefined value (in browser)

Your mod_perl installation is broken or incompatible. Reinstall mod_perl from source and reconfigure MartView using the --clean option.

Note that on Mac OS X, installing mod_perl with fink may not be sufficient.

1.4.5.5 Exception::Class::Base::new (in browser)

Your mod_perl installation is broken or incompatible. Reinstall mod_perl from source and reconfigure MartView using the --clean option.

Note that on Mac OS X, installing mod_perl with fink may not be sufficient.

1.4.6 MartView maintenance tasks

1.4.6.1 Updating the existing Registry

If the datasets that your Registry points to have been updated or upgraded, but the list of datasets itself has not changed, then follow the instructions in this section.

If you modify your Registry in order to add or remove datasets, or need to change to a Registry file with a different name, you should refer to the section elsewhere in this document on switching to a different Registry.

Note if you rerun configure.pl without either the update or clean option it will use the default (cached) option to configure using the cached copy of the existing registry if it exists. This is only useful if you want to modify the server settings.

If the datasets that your Registry file points to have been updated to newer versions, you will need to reconfigure MartView. To do this you must first stop MartView, then change to the biomart-perl directory and type the following (replacing myRegistry.xml with your actual Registry file, just as you did when first configuring):

perl bin/configure.pl --update -r conf/myRegistry.xml

Answer n to the first prompt about configuring for the API, and answer y to the second prompt about keeping your existing server configuration. The dataset configurations that were downloaded previously will be checked and any that have changed will be downloaded anew. Finally, the various templates that define the MartView pages will be rebuilt and recompiled to match any changes found.

You can now safely start MartView up again.

1.4.6.2 Switching to a different Registry

If you alter the Registry file so that it points to different datasets, or decide to use a completely different Registry file, then MartView needs to be reconfigured from scratch using the new Registry.

If you have renamed the registry file then you can just rerun configure.pl using the new name:

perl bin/configure.pl -r conf/myNewRegistry.xml

If the altered registry still has the same name you must use the clean option to overwrite the cached copy:

perl bin/configure.pl --clean -r conf/myRegistry.xml

1.4.6.3 Changing memory usage

The default behaviour suitable for a server with a large amount of memory is to keep all the configuration data in memory (--memory option). If memory is an issue then you can run configure.pl with the --lazyload option which will store all configuration data locally on disk and just load what is currently required into memory. These options can be used with any of the above configure.pl options. For example to reconfigure the server in lazyload mode if none of the underlying configuration has changed you would run:

perl bin/configure.pl --lazyload -r conf/myRegistry.xml

1.4.6.4 Clearing download files

An optional setting in the settings.conf file in the conf folder allows users of MartView to request that their queries be run in the background and the results saved to file for them to download later. The location of the files is also defined in settings.conf.

The MartView system administrator needs to decide on a policy as to how long these files are kept on disk before being cleared out. Files can be removed safely just by deleting them.

1.4.6.5 Clearing log files

Log files are kept in the log directory. The logs are written by Apache, and can be maintained in the same way as Apache logs are maintained. In other words, you can pretty much do what you like to them.

Make sure you do not accidentally delete the httpd.pid file whilst clearing logs. If you do so, it becomes harder to stop MartView safely.