GOMoDo Manual


This is a practical manual to get started with GOMoDo. It's not meant to be a deep description of the methods and algorithms used in the pipeline. For that, refer to the paper: [manuscript in preparation] and references therein.

Flowchart showing the GOMoDo pipeline.

Index

0. Retrieving jobs

I. Starting homology modeling.

STEPS: Start modeling of a human GPCR in our database
STEPS: Start modeling of a custom GPCR
STEPS: Alignment strategies
STEPS: Input modeling options

II. Understanding modeling results.

STEPS: Choosing the best models
Custom models

III. Docking with AutoDock VINA

STEPS: Docking an olfactory ligand with VINA
STEPS: Docking a custom ligand with VINA

IV. Docking with HADDOCK

STEPS: Obtaining the necessary files to dock a ligand with HADDOCK
STEPS: Predicting a binding site with FPOCKET
STEPS: Docking a ligand with HADDOCK after having obtained files
STEPS: Choosing the best HADDOCK docking(s)

0. Retrieving jobs.

Before starting, remember you can retrieve your job results anytime using the box at the top-right corner and clicking "Submit":
- inserting your job code
- inserting your email address (will list the jobs associated with your email address)
Every job is also linkable and accessible with the URL: e.g. http://molsim.sci.univr.it/cgi-bin/cona/checkjob.php?jobid=my_job_code
Jobs are usually accessible for 60 days after their run has started.

Worried about privacy?

If you need absolute privacy, we advise you not to use GOMoDo. However we understand that maybe you don't want everybody who knows your email address to poke your jobs. But we don't have a login or password (so far). How do you avoid people poking your jobs?
Well, GOMoDo actually treats every string with a "@" as an email address. So if you want to retrieve your jobs with a little privacy, just use a "password" with a @ in the middle, instead of a real email address, to retrieve them.

I. Starting homology modeling.

The first aim of GOMoDo is to obtain a series of homology models of your protein from its aminoacid sequence. To do this, you need the protein sequence in FASTA format, align it to a family of similar proteins to obtain a relevant hidden Markov model (HMM) for template alignment, and then obtain a list of suitable templates on which your protein can be modeled.

STEPS: Give a name to your job

  • 1. Give a name to your modeling job and put it in the Job Label text box.
  • 2. Insert your email. This way you will be able to retrieve your jobs inserting your email in GOMoDo and you will receive an email when the job is completed. Optional.

On GOMoDo, you can either choose a human GPCR from our database (allowing you to use also our pre-built alignments) or your GPCR of choice.

STEPS: Start modeling of a human GPCR in our database

  • 1. Begin typing the name of the GPCR in the search box. While typing, suggestions will automatically be fetched.
  • 2. Click on the correct one from the suggestions.
  • 3. Press "Submit"

STEPS: Start modeling of a custom GPCR

  • 1. Copy the protein sequence in the main Sequence box.
    Remember to keep the first line, also called the defline: you recognize it because it starts with a > character. Do not put comments or other extraneous text in the box.
  • 2. Press "Submit"

This was easy. The next page will ask you to choose or provide alignments for your protein. This is critical: the quality of the alignment you provide will influence the quality of the template/target alignment, and in turn the quality of your models.

STEPS: Alignment strategies

  • 1. Choose between these alternatives:
  •   1(A). Upload a FASTA sequence alignment If you have a good alignment of the GPCR family of your target, you can upload it. It must be in FASTA format, and your target sequence must be the first sequence.
    A good alignment allows a better hidden Markov model to be built for your target and will likely result in a better target-template alignment, which means a better model.
  •   1(B). Choose an automated alignment algorithm If you don't have an alignment to provide, GOMoDo will build an alignment from an automated search-and-align algorithm. You can choose BLAST or HHblits. Both algorithms are sort of heuristic for performance reasons, and therefore do not provide optimal alignments, however they can be good enough for many applications.
      You can also insert the search rounds BLAST or HHblits does. More search rounds usually means more sequences the algorithm will harvest: notice that this could bring in a lot of poorly related sequences and degrade alignment quality. If you do not know what it means, leave it alone and go to step 4.
  •   1(C) Use GOMoDo own database of alignments. This is available only if you chose a human GPCRs from the previous screen. We provide reasonable alignments for most human GPCR families.
  • 2. Press "Submit"

The next step will ask you options related to the modeling (if you asked an automatic alignment, this may take a couple minutes). It will also show you the uploaded alignment (for a last check) with JALVIEW.

STEPS: Modeling options

  • 1. Choose HHPRED options. If you don't know what these options are, leave them as is.
  • 2. Choose a suitable number of models. This is the number of models that are built for each template, so the total number of models is Nmodels × Ntemplates (loop modeling notwithstanding). The default (2) is generally too small for production use -in theory, the more the merrier, but this could also clog our server. A decent compromise is 20-30 models.
  • 3. Choose loop optimization models. Optional. MODELLER can attempt automatic loop optimization of the parts of your target that cannot be aligned to the template, for each model. This can improve model quality a bit. Notice that it is not guaranteed at all that the optimized loops are realistic predictions of the real loop geometry, and it slows down the modeling a lot. The total number of models will be Nmodels × Ntemplates × Nloop optimized + Nmodels × Ntemplates. Use at your own risk.
  • 4. Provide your MODELLER key. For licensing reasons we need you to have a (free for academics) MODELLER license key, that can be requested here
  • 5. Press "Submit" and wait for modeling to finish.

Now is a good moment to copy somewhere the job code generated by GOMoDo. If you put your email address you should be able to retrieve your job anyway, but better be safe. At this point GOMoDo goes all the way up to modeling: it generates the HMM from the (uploaded, database or automatic) alignment with HHPRED, generates structural alignments, looks for the suitable templates and launches MODELLER jobs for each template. It can take some time (from twenty minutes to several hours, depending on the number of models and server load) to get the result.

You can close the browser during the modelling.

II. Understanding modeling results.

For an example of modeling results, enter modeling_example in the "Check result" box

To retrieve your modeling results, you can either wait or submit the job code, in the "Check result" form at the top of every page (including this one). Wait a few seconds (sometimes one or two minutes, depending on the job size) for the output to appear. The output shows:

  • - A link to the log file of the modeling. This contains the output of all programs used, up to MODELLER. It's a good thing to check it in case something goes wrong.
  • - An alignment file, containing all alignments between the target sequence and the templates.
  • - A link to a score map of your modeling job (explained below)
  • - A table containing all the informations about the modeling job.

The table is the main modeling output. Each row is a model. Each column is a parameter or link. All column headers can be clicked to sort the table according to the chosen parameter (just try!). From left to right each column shows:

  • - Link to the PDB entry of the GPCR template structure
  • - The resolution of the template original structure (which is not to be confused with the resolution or accuracy of the model!)
  • - The organism from which the template comes from
  • - The state (active/inactive/unknown) of the GPCR template
  • - A link to download the model PDB

After the link, four different model quality scores are shown. These are essential for choosing the appropriate model(s) for the next step. They are:

  • - DOPE score (the lower, the better)
  • - MOLPDF score (the lower, the better)
  • - GA341 score (between 0 and 1 ; the higher, the better; ideally should be >0.6)
  • - Normalized DOPE score (the lower, the better)

Scores are based on statistical potentials and are rough indicators of the overall quality of the model structure. The only two scores that can be safely compared between models aligned on different templates are the GA341 and the normalized DOPE (The other scores can safely compare structures obtained from the same template). To easily see the agreement in model evaluation by both scores, at the bottom of the table a "map" can be found, showing each model as a dot in the GA341/normalized DOPE space. In general if the modeling job has been successful, the two scores are roughly correlated, and the best model(s) will be consistently towards the lower right corner. The correlation is never excellent, but a completely sparse map of scores or a reverse correlation is usually an indicator that the modeling job is not of sufficient quality.

You can click on any dot and be brough back at the corresponding table row (highlighted). In the table you can also click on the normalized DOPE and find a sequence DOPE profile that indicates which parts of the model are better (minima) or worse (maxima).

The next column indicates links that can be used to further analyse the model: a view link to visualize the model PDB with JMOL and a link to the VADAR external server. The VADAR server (hosted by the David Wishart research group at the University of Alberta, Canada) allows to check several quality parameters in detail, like the Ramachandran plot (evidencing residues in forbidden regions), stereochemical and packing quality index plot (residues with score <7 are problematic) and a threading/3D profile score plot (residues with score <5 are problematic). You should expect a low amount of potentially problematic residues even in the best models.

STEPS: Choosing the best models:

This is not a list of mandatory steps, but of extremly basic "good practice" steps, suggested more by common sense than anything else. Detailed steps will depend on your own experience with homology modeling, your biological problem etc.!

  • 1. Sort the table by lower normalized DOPE score, clicking on the "Normalized DOPE" header. Take note of the top structure.
  • 2. Sort the table by higher GA341 score, clicking on the "GA341" header. If the top structure is the same, it means you have a structure which is considered good by both statistical potentials, which is probably your best model.
  • 3. If not, or if you want to be sure, check the normalized DOPE/GA341 map at the bottom of the page. Best models are at the bottom right of the graph. If the map is totally sparse, or not correlated with a downward slope, check your modeling job -it is possible that your sequence or alignment or both aren't right.
  • 4. Once you have selected the top, say, 5 candidates, click on their normalized DOPE score and check how the score changes in the part of the molecule you're interested in (e.g. binding site).
  • 5. Do the same by clicking on the VADAR link and check the Ramachandran plot, the stereochemical/packing index, and the threading/3D profile index. Take note of the residues which are badly modeled in each structure according to each index.
  • 6. Once you have good candidate models, it is a good idea to download them and open them together, to check if they look similar to each other and if the side chains of the binding site are modeled similarly or not. From this, you can judge which model(s) you want to dock.

Tip: It is a good idea to check your models to see if there are long unstructured "tails" at the N/C terminal end. This happens because sometimes your sequence can contain ends that are not present in any of the template structures. These tails are hopelessly badly modeled and their poor quality masks the overall score of the structured part of the protein, thus tainting the model quality assessment. In this case you can:
- cut the unmodeled tail part from the sequence
- re-align, in case you want to use custom alignments - re-run modeling job

Custom models.

If you have some way to optimize your models, or you prefer a modeling protocol different from that of GOMoDo, you can can upload a custom model PDB. To do that, just click on the link and upload the PDB file. You also need the MODELLER key (we use MODELLER to evaluate the model with scoring functions, so that you can compare it to the GOMODO ones). The results page will refresh and your model will be present, with the template name "custom".

Warning: Since in a custom model we have no way to know the template(s), nor the geometry modifications etc. you did, we cannot use our precomputed VINA binding site parameters, therefore for now VINA docking will not work for custom models. HADDOCK docking however will work as before.

Warning: Notice that custom models will only have DOPE scores available -other scoring functions are not available for models not directly created by MODELLER. On the GA341/normalized DOPE plot, custom models are shown with an arbitrary GA341 score of 0.

III. Docking with AutoDock VINA

For an example of VINA docking results, enter dock_vina_example in the "Check result" box

AutoDock VINA is a reasonably accurate state-of-the-art docking software. AutoDock VINA is also extremly fast (a job rarely takes more than 20 seconds!), allowing to quickly have reasonable dockings for each structure and ligand you're interested in.

GOMoDo already "knows", in its database, what is the region of the GPCR binding site, and this location is used to guide the binding. GOMoDo was also originally conceived for the modeling/docking of olfactory receptors, thus we offer also a collection of notable olfactory molecules to use as ligands.

STEPS: Docking an olfactory ligand with VINA:

  • 1. Click on the VINA link in your preferred model row in the modeling job results' table.
  • 2. Take note of the job code for further reference
  • 3. Click on the "Odor ligands" tab
  • 4. Select a ligand from the drop-down menu
  • 5. Click "Submit".

  • STEPS: Docking a custom ligand with VINA:

    In this example, we download structures from the public repository PubChem.

  • 1. Download a ligand structure from PubChem - be careful of downloading the 3D structure file, not the 2D one, in PDB or SDF format.
  • 2. Click on the VINA link in your preferred model row in the modeling job results' table.
  • 3. Take note of the docking job code for further reference
  • 4. Upload the structure file
  • 5. Click "Submit".

  • GOMoDo automatically converts SDF or PDB files in the PDBQT format required by Autodock VINA. After a few tens of seconds, output should appear.

    AutoDock VINA functioning and detailed info about the results can be found in the online VINA manual. Here we will just notice the essential, that is, that dockings are listed by their binding affinity in kcal/mol. Lowest (more negative) is better.

    IV. Docking with HADDOCK

    For an example of HADDOCK docking results, enter dock_haddock_example in the "Check result" box

    If you have experimental informations about your ligand binding site -for example, mutagenesis or NMR information on which residues are involved in binding, then you can use HADDOCK for the docking. HADDOCK has been conceived for experimentally-driven protein-protein docking, but can also work for protein-small ligand docking. It is much slower than VINA but it is unique in its combination of using experimental data to restrain the docking and refining the docking structures iteratively, with the final step being a molecular dynamics in explicit water.

    On the other hand, the old "garbage in, garbage out" saying still holds, and therefore if the restraints given to HADDOCK are wrong or misleading, HADDOCK results can be suboptimal too.

    The docking process with HADDOCK is a bit more convolute, and requires obtaining a number of files from external servers before uploading them on GOMoDo. However it is much easier than using HADDOCK on the command line yourself.

    STEPS: Obtaining the necessary files to dock a ligand with HADDOCK

    In this example, we download structures from the public repository PubChem.

  • 1. Download a ligand structure from PubChem - be careful of downloading the 3D structure file, not the 2D one, in PDB format.
  • 2. Go to the PRODRG server, and follow the instructions there to obtain the files.
  • 3. When the PRODRG job has finished, keep the following files: DRGFIN.PDB (all hydrogen PDB file) ; DRGCNS.PAR (CNS param file) ; DRGCNS.TOP (CNS top file). You can rename them, provided you keep the .pdb , .par and .top extensions.
  • 4. Now go to the HADDOCK Generate Ambiguous Interaction Restraints page to obtain a restraint file. If you do not have experimental restrains, see box below.
  • 5. Indicate the aminoacid residue numbers (separated by spaces or commas) that you know are interacting with your ligand as active restraints for molecule 1. At least two restrains must be indicated (if you only put one, HADDOCK will fail). Note that the residue numbers must be relative to your chosen GPCR model, and as such they can be shifted from the original input sequence (sometimes MODELLER truncates a few residues because it can't align them to the template). Be sure to check this in the PDB of your model.
  • 6. Indicate any neighbouring residues that are plausibly in the binding pocket as passive restraints for molecule 1. Same recommendations above apply. You can also leave it blank.
  • 7. For second molecule (which is your ligand), just indicate 1 as the active residue, and leave the passive residues blank.
  • 8. Scroll down and click on Generate AIR restraints. Save the resulting text file with the extension .tbl (e.g. restraints.tbl).

  • If you do not have experimental restrains for your protein, you can obtain plausible binding pockets with FPOCKET, fully automatically. You can then use the prediction as both active and passive restrains for HADDOCK. In our tests, this protocol gives surprisingly good results in reproducing GPCR/small molecule complex crystal structures.

    STEPS: Predicting a binding site with FPOCKET

  • 1. Click on the FPOCKET link on the HADDOCK docking GOMoDo page. A prediction will appear in a few seconds.
  • 2. Check your prediction. You can download the full FPOCKET output and visualize ligand binding pockets. Usually the first one in the table is the GPCR "classical" ligand binding pocket, but this is not always true.
  • 3. Copy the aminoacids of the pocket listed in the table, and use them as both active and passive restrains for HADDOCK.
  • Once you have followed these steps, you have collected the files you need for the docking.

    STEPS: Docking a ligand with HADDOCK after having obtained files

  • 1. Click on the HADDOCK link in your preferred model row in the modeling job results' table
  • 2. Take note immediately of your job code, so that you don't forget.
  • 3. Upload the files we have obtained in the previous steps, the three PRODRG file and the restraints file. You can leave the other options as they are.
  • 4. Insert your email address to be notified of when the job completes, if you wish.
  • 5. Click "Submit"

  • The job will take a long while -from a few hours to more than 24 hours. If you inserted your email, GOMoDo will automatically alert you when the job has finished; otherwise you can check the job status by inserting your job code, as usual. When it will have finished, you can insert your job code to retrieve the results as usual. Notice that only the first time you look a finished job it has to generate the result presentation, and this can take a while (several tens of minutes) -you'll see the job progressing.

    STEPS: Choosing the best HADDOCK docking(s)

  • 1. Insert your HADDOCK job code and wait for the results to finish loading.
  • 2. If you had >1 structure in the water refinement step (as you should have), skip all the it1 step and scroll directly to the "Water" section of the page.
  • 3. You should find a plot of cluster population versus cluster energy. Each cluster contains a set of structurally similar docking solutions. If everything is well, you should see one or two distinctly most populated clusters, with average energy being among the lowest, or at least average, with respect to all clusters.
  • 4. Click on the dot of the plot corresponding to the best cluster -it should bring you to a table where you can download the cluster as a gzipped file.
  • 5. Now you can either download the whole cluster or go below the plot and download individual structures from a cluster.
  • 6. Open your docking PDB with a molecular visualizer like VMD or Chimera, and check the docking makes sense! It's recommended to open structures from more than a single cluster, and to open more structures in the same cluster too, to have a better idea of what's going on.