Installation of SIMBAD

Note

SIMBAD is distributed with CCP4 since v7.0.46.


Installation of SIMBAD databases

This section of the documentation relates to the installation or updating of SIMBAD-specific databases. We advise you update your databases regularly because your chances of success increase with every newly deposited structure.

SIMBAD currently requires four databases, although only three can be updated/installed manually. The contaminant database is “as is” and for updated versions, please contact us directly at ccp4[at]stfc.ac.uk. However, do note that the database is automatically updated with SIMBAD updates in CCP4, i.e. manual updates should not be necessary.

The remaining databases required by SIMBAD are

In the following sections, we will explain how to install/update each of these databases. Note, all databases require an active internet connection!

Lattice parameter database

The lattice parameter database ships by default with SIMBAD and should contain a relatively recent snapshot of the PDB. However, in order to screen the very latest PDB entries you will need to update this database or CCP4 regularly.

To do so, use the simbad-database command with the lattice subcommand. If you are in a Unix terminal, use the following code to update:

$ simbad-database lattice

Hit the <Enter> key and your default database will be updated automatically.

If your CCP4 installation is located in a write protected directory, you can use the following command:

$ sudo -E simbad-database lattice

Alternatively if this doesn’t work, you can write the database to any user-specific directory and then move it to your CCP4 installation using the following commands:

$ simbad-database lattice -latt_db niggli_database.npz
$ sudo -E mv niggli_database.npz $CCP4/share/simbad/static

Warning

Please consider that other users might require access to this database. Thus, permission settings of the moved database might need to be adapted accordingly.

If you do not have write permissions to the CCP4 installation directory or would prefer to keep a separate copy of the updated lattice database, you can use the -latt_db flag with a path to your preferred location. For example, your command could instead look like this:

$ simbad-database lattice -latt_db $HOME/Documents/simbad_lattice_db.npz

Note

If you create a custom copy of the lattice parameter database, make sure to point SIMBAD to that. Provide the -latt_db flag when invoking relevant scripts.

Contaminant database

The contaminant database is shipped with SIMBAD by default. A script to update the contaminant database will be made available in a future update.

MoRDa-like database with domain coordinates

The MoRDa-like database in SIMBAD is a requirement for running the MoRDa database search. As the database is quite large (~3Gb) it does not ship with SIMBAD by default. The MoRDa database is derived from the PDB and contains a compact description of non-redundant protein chains, domains, homo- and hetero-oligomers. Instructions to install the MoRDa through CCP4 are available here. In order to use the MoRDa database in SIMBAD, the database must be reformatted, thus creating our MoRDa-like database.

Note

MoRDa is not currently available on Windows, therefore this database and associated features in SIMBAD are currently limited to Unix systems.

The most basic command to generate the MoRDa-like database is:

$ simbad-database morda $HOME/Documents/simbad_db

The previous command will install the MoRDa-like database to the directory simbad_db. It will create a PDB-like substructure of folders to group relevant entries. Each file is encoded to save disk space, for instructions on how to decode it, please contact us at ccp4[at]stfc.ac.uk``.

If MoRDa is installed, SIMBAD will use the associated MoRDa database to generate the MoRDa-like database. Otherwise, the MoRDa package will be temporarily downloaded.

The creation of the MoRDa-like SIMBAD database scales with the number of processors available. If you have more available, provide the -nproc command line argument and we will make use of as many processors as you provide.

Note

If you are installing SIMBAD on a computing cluster, make use of the -submit_qtype option.

$ simbad-database morda -nproc 10 $HOME/Documents/simbad_db

After the first installation of this database, we do not need to process every domain again in consecutive runs. If you want to update your database in the future, you can simply run the same commands as before, and point the script to the root of the simbad database. I.e., if we created the database with the command simbad-database morda $HOME/Documents/simbad_db, we can update it now with the following:

$ simbad-database morda $HOME/Documents/simbad_db

The installation procedure will determine any new files in the MoRDa database, and only process them.

Custom database

Alternatively you may wish to run SIMBAD using a custom database. In order to do this, first the database must be converted into a SIMBAD compatible format.

SIMBAD provides a script to create a SIMBAD compatible database from a database of PDB files. The command to run this is:

$ simbad-database custom $HOME/Documents/custom_db $HOME/Documents/input_db

Note

If you create a custom database, make sure to point SIMBAD to that. Provide the -cont_db or -morda_db flags when invoking relevant scripts.