The BioMart components are distributed in two separate packages.
biomart-perl contains the Perl API and all the BioMart applications that depend on it, such as MartView, MartService, MartURLAccess and DAS Annotation Server (if configured).
martj contains the Java API and all the BioMart applications that are written in Java, such as MartEditor, MartShell, MartExplorer and MartBuilder.
The best way to obtain martj is to download one of the pre-compiled binary distributions from http://www.biomart.org/.
The martj-bin.zip distribution is for Windows users, and contains the Java API and all the Java-based BioMart applications. The bin folder contains a number of .bat scripts for launching the applications. You can unpack it using WinZip or a similar application.
The martj-bin.tgz distribution is identical to the martj-bin.zip distribution, but is intended for Unix and Linux users. Instead of .bat scripts, it contains a number of .sh scripts which perform equivalent tasks. You can unpack it using:
tar -zxvf martj-bin.tgz
The martj-bin.dmg distribution is for MacOSX users, and contains bundles in the bin folder for each of the various Java-based BioMart applications. The bin folder also contains the MartShell application .command script, and the lib and data folders contain the files that MartShell depends on. The .dmg image file will automatically unpack itself when double-clicked.
The source code for martj is available for download via GIT, but you will need the ant tool installed if you subsequently wish to compile it. ant is available from http://ant.apache.org/. Download and install it as per the instructions on that website.
To check martj out from GIT, you need to type the following commands on the command line prompt.
git clone https://github.com/biomart/biomart-perl.git --branch cvs/release-0_7
To compile martj using ant, change into the martj directory created by the commands above and type:
ant jar
This will create the martj.jar file inside the build folder. All other JAR files which the Java-based BioMart applications depend on can be found in the lib folder.
There is no binary distribution for biomart-perl. It is only available as a source distribution from CVS.
To check biomart-perl out from CVS, you need to type the following commands on the command line prompt. The password you need to enter when asked is CVSUSER.
cvs -d :pserver:cvsuser@cvs.sanger.ac.uk:/cvsroot/biomart login
cvs -d :pserver:cvsuser@cvs.sanger.ac.uk:/cvsroot/biomart \
co -r release-0_7 biomart-perl
No compilation is required, but it will need to be configured before you can use it. For details on how to configure it, please refer to the section on installing biomart-perl.
You need to have Java 1.3 or later installed. You can get Java from http://java.sun.com/.
martj has been tested with Java 1.3, 1.4 and 1.5.
If you attempt to run any martj component on Java 1.6 or higher, and experience problems, please try the same series of actions with Java 1.5 before reporting a bug.
martj requires no further installation, except if you wish to use marts other than the ones defined by default. To do so requires you to modify the Registry file, which is discussed elsewhere in this document.
It is important that you do not modify the directory structure inside the martj folder, or move or copy any of the scripts from the bin folders to other locations. Running these scripts from other locations will not work.
You need to have Perl version 5.6.0 or later installed first. You can get the latest version of Perl from http://www.perl.org/.
It is important that you do not modify the directory structure inside the biomart-perl folder, or move or copy any of the files within it to other locations.
biomart-perl depends on a number of Perl modules. When you run the configuration steps detailed elsewhere in this document it will tell you if any of the ones it needs are missing so the best plan is to run the configure.pl straight away from your biomart-perl directory and install any missing modules:
perl bin/configure.pl -r conf/registryURLPointer.xml
The easiest way to install these missing modules is to use CPAN shell. Unless you are knowledgeable about CPAN and know how to do otherwise, modules should be installed by the root user on Linux/Unix systems, or by the Administrator on MacOSX/Windows systems.
For each module reported as missing by the configuration step, type:
cpan -i Module::Name
You should replace Module::Name with the name of the module you are attempting to install, ideally by cut and pasting from the output from configure.pl. Read the questions CPAN asks during installation thoroughly, and answer yes when it asks you if you want to install missing dependencies. It is usually fine to accept the default responses for almost all questions it asks.
For reference, the list of required CPAN modules required at the time of writing is shown below. Version numbers are those that have been tested against, but other more recent versions may also work.
Dependency |
Module name |
Module version |
API |
XML-DOM |
1.44 |
API |
OLE-Storage_Lite |
0.14 |
API |
Exception-Class |
1.23 |
API |
libwww-perl |
5.8 |
API |
Log-Log4perl |
1.05 |
API |
Test-Exception |
0.24 |
API |
DBI and relevant DBD drivers |
1.53 |
API |
Digest::SHA |
5.44 |
Website |
IO::Compress::Gzip |
2 |
Website |
Number-Format |
1.51 |
Website |
Template-Toolkit |
2.14 |
Website |
Template-Plugin-Number-Format |
1.01 |
Website |
CGI-Session |
4.14 |
Website |
Readonly |
1.13 |
Website |
List-MoreUtils |
0.22 |
Website |
SpreadSheet-WriteExcel |
2.17 |
Website |
IO-Compress-Zlib |
2.003 |
Website |
SOAP-Lite |
0.710.08 |
All references to MartView in this document that relate to configuration and maintenance equally apply to and affect MartService, as the two are part of the same application.
If you are going to be running a MartView/MartService/DAS server, you will also need to have an Apache web-server installed. This can be downloaded from http://httpd.apache.org/, and should be installed as per the guidelines on that website. You do not need to configure Apache to be used with BioMart, as the BioMart configuration scripts will handle that for you.
MartView works fine with all versions of Apache 1.3 or higher, including Apache 2.0 or higher.
MartView requires a few Apache extension modules to be installed. It does not matter if they are compiled into Apache or provided as dynamic modules. If you are missing any of them, the website where you can download them is listed beside each one.
Apache version |
Module name |
Module website |
1.3, 1.4 |
mod_gzip (optional, improves performance) |
|
|
mod_perl |
|
2.0 or higher |
mod_deflate |
Part of the Apache distribution. |
|
mod_perl |
MartView has been designed to work best with Apache 2.0 or higher but Apache 2.0 is not a prerequisite.
It is highly recommended that you install the appropriate compression module for Apache before running MartView. If you do not, then MartView is likely to be very slow.
For Apache 2.0 or higher, use mod_deflate. For Apache versions before 2.0, use mod_gzip. Details are in the table above.
The Perl API and MartView configuration scripts check which Apache modules are available by using apxs (Apache 1.3/1.4) or apxs2 (Apache 2.0+). It requires this tool to live in the same location as the Apache binary (usually called apache, apache2, or httpd). If you have installed a binary distribution of Apache, you may also need to install the Apache development tools to make apxs/apxs2 available.
Other Apache modules are also used by MartView but these are all available with the default Apache installation and so you should not need to worry about having to install them.
If you do not already have Apache and ModPerl installed on your system then you can follow these steps to set them up.
These steps assume a Unix/Linux-based system. For other operating systems, please refer to the Apache and ModPerl websites for instructions.
For the purposes of these instructions it is assumed that you will be installing the latest versions of Apache and ModPerl that were available at the time of writing (Apache 2.2.14 and ModPerl 2.0.4).
BioMart software does not depend on specific versions of Apache or ModPerl. It can use other versions if required but these instructions are only valid for the versions specified.
First you will need to create a directory where you can work. In our example we will install Apache in /home/biomart/apache.
mkdir /home/biomart/apache
You will need to substitute your directory for this example location in all the commands and explanations in this section.
Next you need to download, unpack, and build Apache inside this directory.
cd /home/biomart/apache
mkdir source
cd source
wget http://www.apache.org/dist/httpd/httpd-2.2.14.tar.gz
tar zxvf httpd-2.2.14.tar.gz
cd httpd-2.2.14
./configure \
--enable-deflate \
--prefix=/home/biomart/apache \
make install
Apache has now been built and configured. Your Apache installation path is /home/biomart/apache/bin. You will need this when configuring biomart-perl to use this copy of Apache.
The last step is to download and install ModPerl.
Note for ModPerl you may need to upgrade your Perl CGI module to the latest version (the module name is CGI). You can use the same technique to upgrade this as you used to install the other Perl module dependencies for biomart-perl.
This only applies when using ModPerl 2.0 or higher, as per this example. You will know if you need to upgrade if errors show in the Apache error log that refer to Apache/Response.
cd /home/biomart/apache/source
wget http://perl.apache.org/dist/mod_perl-2.0-current.tar.gz
tar zxvf mod_perl-2.0-current.tar.gz
cd mod_perl-2.0.4
perl Makefile.PL \
PREFIX=/home/biomart/apache \ MP_APXS=/home/biomart/apache/bin/apxs
make install
Now ModPerl has been installed, the setup of Apache and ModPerl is complete.
The BioMart Perl API and MartView allow users to make queries against a predetermined list of marts, defined in a Registry file. When using the Perl API, this registry file can be located anywhere the user requires.
However, when using MartView, the registry file to be used must be located in the conf folder of the biomart-perl installation.
Registry files are in XML format. You will find a number of example registry files already in the conf folder after you download and unpack biomart-perl. They can be extended to widen the selection of marts to include others available publicly, or they can be adapted to serve your own local marts.
The structure of the registry file is discussed elsewhere in this document.
Configuration of the Perl API requires a single step. Change into the biomart-perl directory, then type:
perl bin/configure.pl -r conf/registryURLPointer.xml
where registryURLPointer.xml is the registry file you wish to use from the conf folder.
The first question the configure script will ask is:
Do you want to install in API only mode [y/n] [n]:
Type y to install the API only.
During configuration it may point out that required Perl modules are missing. If this happens, follow the steps detailed in the prerequisites section above to install these missing Perl modules.
When it has completed successfully, you will see this final message:
Looks good.... you are done.
This section describes how to set up Apache for use with MartView.
You will need a Registry file defined – the default one is in the conf folder and is called registryURLPointer.xml. See section 3.3 for details on how to create your own registry file.
MartView can use only one single Registry file at a time.
Before running the configuration script for MartView some settings need to be defined in the settings.conf file in the conf directory:
apacheBinary – you should set the path to your apache httpd binary.
serverHost – the hostname of the server to include in the apache configuration. Leaving it as localhost is fine for most cases.
port – the part MartView should listen on for requests. The default one should work fine. If MartView will be the only application served on this machine you can change it to 80 so users do not have to enter a port number when communicating with MartView via a web browser.
proxy – If your server is receiving port forwarded requests from a server other than the one it is running on then you should enter this server hostname here. As MartView needs to encode the hostname in some responses in order to redirect future requests, it needs to be told the hostname of the machine that the port-forwarded connections are coming from. For the normal scenario of no forwarding just leave blank.
location – this setting affects the URL that will be used to access MartView. The URL will be formed of the server name, followed by the response to this question, followed by the script name required. The default setting on biomart will be used in all URL examples in this documentation.
Other optional settings can be configured from within settings.conf, allowing you to specify amongst others the colour schemes and wording to use on the site, and to specify an alternative web server to use for relative URLs from query results. It is possible to enable background result jobs where the results are stored in a server-side directory and the user emailed when they are ready. The directory settings and mail options are all configured here as well. Other configurable options include how long session related data is stored on the server and how webservices logging is managed.
You may also like to edit the site_header.tt document in the conf/templates/default directory in order to embed the MartView interface into a custom setting, for instance by adding the logo and navigation bars from your website. Instructions for both settings.conf and site_header.tt are embedded within those files.
When you are done customising the settings, the same script is used to configure MartView as for the Perl API.
From the biomart-perl directory type
perl bin/configure.pl -r conf/registryURLPointer.xml
changing registryURLPointer.xml to the registry file in the conf directory you wish to use.
It will ask:
Do you want to install in API only mode [y/n] [n]:
Type n to install the Perl API and MartView together.
During configuration it may point out that required Perl modules are missing. If this happens, follow the steps detailed in the prerequisites section above to install these missing Perl modules.
MartView will now proceed to process the Registry file and download the various dataset configurations defined therein. It will build templates for then compile all the pages of the MartView website. This may take quite some time. The final message you see before completion should be:
Compiling templates for visible datasets
Change to the biomart-perl directory and type (substituting /my/chosen/Apache/binary for the correct Apache location chosen during configuration):
/my/chosen/Apache/binary -d $PWD -f $PWD/conf/httpd.conf
Test MartView by pointing your web browser to the following URL, substituting <host>, <port> and <location> for the values you configured earlier:
http://<host>:<port>/<location>/martview
e. g.
http://localhost:5555/biomart/martview
To stop it again, change to the biomart-perl directory and type:
kill `cat logs/httpd.pid`
If the httpd.pid file is damaged or missing, you will have to identify and kill the Apache processes manually. This is potentially dangerous if you have more than one Apache instance running on your machine and so should be done with care. Therefore you should always be careful not to damage httpd.pid.
The following information may help if you find that MartView will not start up correctly. Thanks to Eric Ross for providing most of it.
This is an Apache version issue and can be safely resolved by deleting the offending lines from httpd.conf.
Your operating system may not have a definition set up for the Apache web browser process owner. It can be resolved by adding User www to the top of the httpd.conf file. Also make sure the ownership of the biomart directory is www.
If Apache runs under a different user than www on your system, you should use that instead in the fix above.
The martview, martresults and martservice scripts can't find Perl.
Modify the first line of each script in the bin directory to point to the correct location of your Perl installation (which is often /usr/bin/perl).
Your mod_perl installation is broken or incompatible. Reinstall mod_perl from source and reconfigure MartView using the --clean option.
Note that on Mac OS X, installing mod_perl with fink may not be sufficient.
Your mod_perl installation is broken or incompatible. Reinstall mod_perl from source and reconfigure MartView using the --clean option.
Note that on Mac OS X, installing mod_perl with fink may not be sufficient.
If the datasets that your Registry points to have been updated or upgraded, but the list of datasets itself has not changed, then follow the instructions in this section.
If you modify your Registry in order to add or remove datasets, or need to change to a Registry file with a different name, you should refer to the section elsewhere in this document on switching to a different Registry.
Note if you rerun configure.pl without either the update or clean option it will use the default (cached) option to configure using the cached copy of the existing registry if it exists. This is only useful if you want to modify the server settings.
If the datasets that your Registry file points to have been updated to newer versions, you will need to reconfigure MartView. To do this you must first stop MartView, then change to the biomart-perl directory and type the following (replacing myRegistry.xml with your actual Registry file, just as you did when first configuring):
perl bin/configure.pl --update -r conf/myRegistry.xml
Answer n to the first prompt about configuring for the API, and answer y to the second prompt about keeping your existing server configuration. The dataset configurations that were downloaded previously will be checked and any that have changed will be downloaded anew. Finally, the various templates that define the MartView pages will be rebuilt and recompiled to match any changes found.
You can now safely start MartView up again.
If you alter the Registry file so that it points to different datasets, or decide to use a completely different Registry file, then MartView needs to be reconfigured from scratch using the new Registry.
If you have renamed the registry file then you can just rerun configure.pl using the new name:
perl bin/configure.pl -r conf/myNewRegistry.xml
If the altered registry still has the same name you must use the clean option to overwrite the cached copy:
perl bin/configure.pl --clean -r conf/myRegistry.xml
The default behaviour suitable for a server with a large amount of memory is to keep all the configuration data in memory (--memory option). If memory is an issue then you can run configure.pl with the --lazyload option which will store all configuration data locally on disk and just load what is currently required into memory. These options can be used with any of the above configure.pl options. For example to reconfigure the server in lazyload mode if none of the underlying configuration has changed you would run:
perl bin/configure.pl --lazyload -r conf/myRegistry.xml
An optional setting in the settings.conf file in the conf folder allows users of MartView to request that their queries be run in the background and the results saved to file for them to download later. The location of the files is also defined in settings.conf.
The MartView system administrator needs to decide on a policy as to how long these files are kept on disk before being cleared out. Files can be removed safely just by deleting them.
Log files are kept in the log directory. The logs are written by Apache, and can be maintained in the same way as Apache logs are maintained. In other words, you can pretty much do what you like to them.
Make sure you do not accidentally delete the httpd.pid file whilst clearing logs. If you do so, it becomes harder to stop MartView safely.