What's New in MEROPS?
Release 9.8 17-December-2012
Move to EMBL-EBI
The MEROPS database will be moving from the Wellcome Trust Sanger Institute to the EMBL-European Bioinformatics Institute very soon. The MEROPS team is part of the group run by Alex Bateman, who took up the position of Head of Protein Sequence Resources at EMBL-EBI in November and took his team with him. Please note that in the near future the URL and E-mail addresses for the MEROPS database will change.
The first peptidase family with mixed catalytic types
The recent crystal structure of the precursor of the pantetheinyl hydrolase ThnT from Streptomyces cattleya (Buller et al., 2012) has shown that auto-activation exposes a threonine at the new N-terminus, occupying the same position as a serine in the homologous aminopeptidase DmpA from Ochrobactrum anthropi. This means that the nucleophile in peptidases in this family can be either threonine or serine. In all other known families of peptidases, the nucleophile is absolutely conserved. This means that the family cannot be named according to the convention used so far in MEROPS in which the first letter of the family name represents the nature of the nucleophile. This family has been named P1, which is the first in a new category of families with mixed nucleophiles.
More peptidases from model organisms
The number of model organisms has been increased to eleven with the addition of a Gram-positive bacterium (Bacillus subtilis), an archaean (Pyrococcus furiosus), a protozoan (Dictyostelium discoideum) and another yeast (Schizosaccharomyces cerevisiae). A special MEROPS identifier has been created for each putative peptidase from each of these organisms.
Links to Europe PubMed Central
The literature pages for peptidases, inhibitors, families and clans now include links to Europe PubMed Central.
Release 9.7 1-August-2012
We invite users with specialist knowledge to help improve the peptidase summaries in MEROPS. We hope to be able to present a succinct but detailed summary for every peptidase. The writer will receive full acknowledgement on the peptidase summary page. Please contact us at (email@example.com) if you would like to help, and you will be sent a username and password, and details of how to use the online editor.
Selection of strain on species page
It has become common practice to sequence the genomes of several different strains of bacteria. The list of strains with completely sequenced genomes can now be displayed on the species page. By selecting one of these strains, the species page filters the results for the selected strain only, and presents only those peptidases or inhibitors detected in that strain. Please be aware that the genome analysis at the foot of the page will display results for the selected strain, NOT the species.
Release 9.6 29-February-2012
Completely sequenced genomes
We consider a genome completely sequenced if an estimate of the number of protein coding genes can be obtained. For genomes from microbes, NCBI's genome database had provided this information. However, restructuring at NCBI has meant that these data are no longer being provided. Instead, for microbial proteomes that can be downloaded from NCBI's FTP site, we now calculating the number of protein coding genes. In this release of MEROPS, an additional 500 completely sequenced microbial genomes have been identified. It is much more difficult to count the number of protein coding genes from completely sequenced eukaryote genomes. Identifying all of the exons and introns is computationally challenging, and misassemblies of genes are frequent. Some genes are missed and others are concatenated. Proteomes include alternatively spliced forms of coding sequences and thus inflate the number of protein-coding genes. For these reasons, the number of eukaryote proteomes annotated as complete in MEROPS lags behind the number of completely sequenced eukaryote genomes.
New reference search
A new item has been added to the search menu. This allows a user to retrieve references by submitting a simple text search. A user can enter an author name, a term from a title, or a journal name. The retrieved list will display the full reference with, where available, links to PubMed, PubMed central, the full text of the article and clan, family, peptidase or inhibitor summaries in MEROPS.
Release 9.5 4-July-2011
Drosophila melanogaster, Caenorhabditis elegans and Escherichia coli peptidases
We have added Drosophila melanogaster, Caenorhabditis elegans and Escherichia coli to our list of model organisms, and have created a MEROPS identifier for each peptidase.
New substrate cleavage status
In addition to "physiological", "non-physiological" and "synthetic", the new status "pathological" has been added to the substrate pages of peptidase summaries.
Second family of glutamate peptidases
A second family of glutamate peptidases has been discovered, that of the pre-neck appendage protein from Bacillus phage phi29, which is a self-cleaving protein. The family is now included in MEROPS as family G2.
Asparagine catalytic type
Self-cleaving proteins that utilize asparagine as a nucleophile perform a novel form of proteolysis that is not hydrolysis. The asparagine attacks its own carbonyl carbon atom and forms a succinimide ring and simultaneous cleavage of the peptide bond. This activity best fits the enzyme description of a lyase - "an enzyme cleaving a C-C, C-O or C-N bond by other means than hydrolysis or oxidation" (Enzyme Nomenclature, 1992). These self-cleaving proteins therefore belong to subclass EC 4.3, whereas peptidases belong to subclass EC 3.4. In MEROPS, the term "asparagine peptide lyase" will be used to describe these enzymes.
Release 9.4 31-January-2011
The inclusion of asparagine peptidases in MEROPS has forced us to reconsider inteins, which had not been considered peptidases. These are self-splicing proteins which are structurally related to the hedgehog protein precursor C-terminal domain (C46). Proteins which contain an intein undergo two cleavage events to release the intein and a splicing event which joins the two portions of the extein. Neither cleavage involves hydrolysis. One of the two cleavages occurs at an asparagine residue and the mechanism is thought to be similar to that of asparaginyl peptidases. Although these proteins are at the limits of what can be considered peptidases, it is now sensible to include them in MEROPS. Three families of inteins have been assembled, N9, N10 and N11, with the majority of the sequences in N10. The structural relationship between these families and C46 mean that the clan to which these families belong contains peptidases of mixed catalytic type, and so the existing clan has been renamed to PD, divided into the subclans PD(C) and PD(N) for families of cysteine and asparagine peptidase, respectively.
Marcin Poreba and Marcin Drag have been collecting peptidase specificities from combinatorial peptide screening studies in the published literature (Poreba & Drag (2010)) and have made this collection available to MEROPS. Where applicable, a peptidase summary now includes a table showing subsite preferences derived from such studies.
This release includes the first batch of cleavage sites in protein substrates that are being collected for MEROPS by Molecular Connections, Bangalore, India.
Searches of the MEROPS reference collection
The MEROPS literature collection includes over 42,000 references to articles, book chapters and books on peptidases and their inhibitors, to which MEROPS identifiers are assigned. We will be implementing methods to search this collection, and the first is now included in the Searches page; this is to search by PubMed identifier. Nearly 40,000 of the articles in our collection are included in PubMed. By entering a PubMed identifier, a table showing the full reference, with access to an on-line version (if available) and a list of MEROPS identifiers is returned. The identifiers are linked to the peptidase or family summary.
Release 9.3 07-September-2010
A new catalytic type
Over the recent few months it has become apparent that there are several families of proteins that cleave themselves, that cleavage occurs at an asparaginyl bond, and that the side-chain amino group of the asparagine is acting as a nucleophile. All of these cleavages happen in cis. At present seven families have been recognized, including two that had previous been included in MEROPS as aspartic-type endopeptidases. The peptidases included in the families are viral coat proteins and bacterial autotransporters. All of these asparagine-type peptidase family names start with the letter 'N'.
Rotating Richardson diagrams
A structure page at the peptidase level now shows a rotating Richardson image in addition to the static image. There is also an option to display a surface on top of the Richardson cartoon. By holding down the left-hand mouse button, the image can be manually rotated. By clicking the right-hand mouse button, the user gains access to the full range of the Astex viewer commands, allowing further manipulation of the image.
Changes to displays of cleavages in a protein
The display of cleavages in a protein now shows the known cleavages in a table as well as showing the susceptible bonds in the sequence. The rows in the table are arranged by cleavage position in the substrate. Each row shows residue number, the name of the peptidase performing the cleavage (which links to the substrates page for that peptidase), the residue range of the protein portion used in the experiment (for example, the mature protein), whether the cleavage is believed to be physiological or not, how the cleavage position was identified, and a reference.
Release 9.2 30-April-2010
Changes to Small Molecule Inhibitor displays
A unique identifier for each small molecule inhibitor (SMI) has been introduced. The identifier is the letter J followed by by a five-digit number. These are displayed on the SMI summaries, the SMI index and the inhibitors pages in the peptidase summaries. Cross-references to the University of Alberta's DrugBank database are now included on the SMI summary pages. Interactions between peptidases and SMIs are now included on the inhibitors pages. This includes not only SMIs for which we have summaries and identifiers, but also other SMIs.
A new page has been added to each peptidase and inhibitor summary to display our predictions of active site residues (peptidases only), metal ligands (metallopeptidases only), extent of the peptidase or inhibitor unit, sequence length, and the source of the sequence used in MEROPS with a link to the relevant primary sequence database.
Changes to substrate displays
It is now possible to filter the list of substrates cleaved by a peptidase to limit the display to either protein and peptide substrates of physiological relevance, proteins and peptide substrates that are not physiologically relevance, or synthetic substrates.
Links to Wikipedia entries
There are now links from peptidase and inhibitor summaries to Wikipedia entries. There is considerable extra data on a peptidase or an inhibitor in Wikipedia because of the effort Wikipedia users have put in to the Molecular and Cellular Biology WikiProject. We at MEROPS encourage our users to register at Wikipedia and to contribute to existing pages and create new pages for peptidases and their inhibitors.
Release 9.1 26-January-2010
New data release
This is second half of the release that began in December 2009 when all the software was updated. In this release the data have been updated. In future, software and data will be updated simultaneously.
Links to ChEMBL
Links to the ChEMBL database have been added. These are links to drug targets in ChEMBL and can be found on the pharma pages at the peptidase level.
Problems with the new codebase
We apologize that the software update in December did not go smoothly, and several aspects of the MEROPS website were either not working, not working properly or missing. Most of these problems have been fixed and we would like to thank the users who pointed out problems of which we were unaware. There are still issues to be resolved with the MEROPS batch Blast and the dynamic alignments and trees.
Release 9.0 15-December-2009
Release 9 of MEROPS will be in two parts. The reason for this is that the software that produces the web pages has been re-written by Matthew Waller and Jody Clements from the Wellcome Trust Sanger Institute web team. In January 2010 the new data will be published. The release has been done in this way so that any programming bugs can be detected and reliably separated from data errors. The FTP site will also be updated in January.
The new software has been designed to resemble the old website, and the changes have mainly been for our benefit, to simplify the maintainence and make it easier to add new features. Users will notice that frames have disappeared, and that the left-hand green menu now scrolls with the rest of the page. Links to external databases now open in the same window so you will have to use the back button to return to MEROPS. References and sequences are now displayed in small windows in the middle of the screen.
Feedback and reporting errors
The MEROPS website now has a ticketing system for reporting errors and making comments. Each page now has a feedback link in the footer. The user will be asked to enter his or her name and an E-mail address when the comment is posted. The user will receive an automated E-mail which should not be replied to. A member of the MEROPS team will then contact the user and when the problem has been fixed the ticket will be closed. The user will receive a second automated E-mail. This should only be replied to if the issue has not been resolved to the user's satisfaction: the ticket will then be automatically re-opened. Please use this system to report programming errors, broken links and any errors or omissions in the data.
Links to PubMed Central from Literature pages
There are now links from the clan, family, peptidase and inhibitor literature pages to the full text of papers stored in PubMed Central.
Release 8.5 21-August-2009
Domain images and architecture pages
We have redesigned the domain images which appear on the peptidase summaries. Each image is scaled according to the sequence length, shown as a blue line. The peptidase unit is shown as a green box, with the active site residues and metal legands (if any) shown as red and blue "lollipops",respectively, along the bottom edge of the box. The top edge shows disulfide bridges, and known carbohydrate binding sites (as orange lollipops). An inhibitor uint is shown as a large, grey box, with reactive site residues shown on the bottom edge as red lollipops. Other domains that have been annotated by SwissProt or Pfam are shown as smaller boxes. Domains derived from Pfam are shown as red boxes and links to the Pfam database can be accessed by clicking on the domain. Signal peptides and transmembrane domains are shown as small, black boxes. Propeptides are shown as small, grey boxes. Mouse-over text gives details for each feature displayed.
Because these simpler domain images are quicker to generate, we now include at the family level a page showing the different protein architectures known in the family or subfamily, ordered by MEROPS identifier.
Comparisons of peptidase specificity
The MEROPS collection of substrate cleavages now exceeds 38,500. There are over three hundred peptidases for which ten or more substrates are known. In addition to the displays on a peptidase summary, MEROPS now includes displays to compare preferences in binding pockets S4 to S4'. These are items on the substrate index and show preference in terms of all amino acids, amino acid properties and individual amino acids. The first of these shows, for each peptidase, an amino acid if it occurs in the same binding pocket in 40% or more of the substrates. So no more than two amino acids are shown for any one binding pocket. The amino acid is shown with a green background, and the brighter the green the greater the percentage of substrates with the amino acid in that binding pocket. The second display is similar but instead of showing individual amino acids, these are collected into "aliphatic", "aromatic", "acidic", "basic" or "small" groups. In the third option the user is prompted to select an amino acid from a pull-down menu and the displays shows the number of substrates with the selected amino acid in each binding pocket for each peptidase. Where an amino acid has not been observed in a binding pocket, this is hightlighted in black. In all three displays where no amino acid is possible (for example P4, P3 and P2 for an aminopeptidase, of P2', P3' or P4' for a carboxypeptidase) the binding pocket is highlighted in grey.
If known, the substrate alignments how show protein secondary structure at the foot of the alignment. A helix is shown as a string of "a's" and is highlighted in red, a beta strand is shown as a string of "b's" and is highlighted in green.
MEROPS identifiers for another model organism
We recently expanded MEROPS identifiers to Arabidopsis thaliana, as well as human, mouse and rat, so that every gene product that is likely to be a peptidase has a unique identifier. We have now added identifiers for all probable peptidase in Saccharomyces cerevisiae. Identifiers for peptidases for this organism have the first character after the dot replaced by the letter A. When a homologue is characterized biochemically, we will replace the identifier with one in the standard format (three digits after the dot).
The number of Richardson diagrams showing cartoons of structures has substantially increased, thanks to the hard work of Matthew Jenner, who has been working with us this summer. There is now a Richardson diagram for every peptidase or inhibitor for which a tertiary structure has been solved.
Predicted sequences from the chimpanzee genome
Summer student Matthew Jenner has also been predicting protein sequences from the chimpanzee genome. Protein sequences from eukaryote genomes are collected from the Ensembl database. Although Ensembl has a sophisticated, automated pipeline for predicting protein sequences, some predictions require a further manual stage. These are predictions where exons are missed, introns are mistranslated as exons, or genes are run together. Predicted protein sequences derived from orthologue genes which show the greatest difference between human and chimpanzee have been recalculated using the GeneWise software, the human sequence as a template and nucleotide sequence found in the chimpanzee genome by using the Ensembl Blast search service.
Release 8.4 3-April-2009
Two indexes for peptidase substrates have been added. These are accessible from the left-hand green menu. The first index shows the count of known substrate cleavages per peptidase, ordered by and linked to the MEROPS identifier. Three counts are shown: the total number in the MEROPS collection, the total number of physiological substrates (peptides and proteins) and the total number of non-physiological substrates (peptides and synthetic substrates). The second index is ordered by substrate name and shows the MEROPS identifier of the peptidases known to cleave each substrate and the total number of cleavages performed by each peptidase. For substrates than can be mapped to the UniProt protein sequence database the UniProt identifier is shown with a link to the MEROPS utility which shows all cleavages of this protein in the MEROPS collection.
We have introduced "flags" on the substrate pages to indicate the method used to identify the cleavage position. The flags are as follows: NT shows that the cleavage position was determined by N-Terminal sequencing, MS shows that the peptide composition was determined by mass-spectroscopy (MS) and the cleavage position computed, MU shows that the cleavage position was determined by site-directed MUtagenesis, CS indicates that the cleavage position was postulated from a concensus motif (CS) within the protein sequence.
Release of signal and transit peptides and initiating methionine
Cleavages that result in the release of signal and transit peptides and the initiating methionine have been automatically collected from the annotations in the SwissProt protein sequence database. However, cleavages were not previously assigned to a specific MEROPS identifier. This has now changed, and assignments made where possible. Such assignments can only be made for organisms where the genome has been completely sequenced and only one homologue is known from the family in question. Data for the cleavage position is usually determined by N-terminal sequencing, but readers should be aware that at least for chloroplast transit peptides, aminopeptidases are also transported into the chloroplast which may remove amino acids from proteins subsequent to the removal of the transit peptides.
For peptidases that are drug targets we are now collecting links to databases of interest to the pharmaceutical industry. These are collected together on the new Pharma pages accessible for the peptidase summary. We are currently making links to the PubChem BioAssay database and the Binding Database.
Several new flags have been added to the Literature pages. The full list of flags is:
Structure pages now include a link to Proteopedia.
In keeping with many other publicly available databases MEROPS now has a blog. This can be found at http://meropsdb.wordpress.com and is where information about how the database is assembled and its developments will be posted. Users are encouraged to visit this page where they can comment on items posted.
Release 8.3 21-December-2008
The organism pages now include an analysis of homologues if the genome has been completely sequenced. This analysis is done family by family indicating an unusual presence (where the family is absent in 90% of closest relatives), unusual absence (where the family isd present in 90% of closest relatives), or where the number of family members is most or fewest compared to the organism's closest relatives. Closest relatives are identified by walking up the taxonomic tree until the number of organisms with completely sequenced genomes is five or more.
Expansion of MEROPS identifiers for model organisms
Every human or mouse peptidase homologue has a unique MEROPS identifier, and we have recently begun expanding identifiers for other model organisms with completely sequenced genomes. The first is Arabidopsis thaliana and identifiers for peptidases for this organism have the first character after the dot replaced by the letter A. When a homologue is characterized biochemically, we will replace the identifier with one in the standard format (three digits after the dot).
A new item has been added to the Searches menu. The MEROPS database includes many cross-references to other databases and bioinformatics resources. To make it easier for others to map their database entries to MEROPS there is a new CGI that presents the cross-references between MEROPS and any database selected from a pull-down menu. There are a considerable number of cross-references between MEROPS and primary sequence databases, so these are returned in batches of 50,000.
Modifications to distribution trees
The displays of peptidase or inhibitor distribution among organisms have been enhanced. There is now mouse-over text at every node which gives the name of the taxon.
Modifications to protein substrate cleavage display
The display showing known cleavages in a selected protein substrate depended on the user knowing from which species the substrate was derived. Now if no cleavages are known for the selected protein but are known for the same protein from a different species, there is an option to display the sequence alignment with cleavages highlighted.
New peptidases and inhibitors
The full list of identifiers that appear for the first time in the present release of MEROPS can be found here.
Release 8.2 4-August-2008
Alignments of protein variants
The sequence page of the peptidase (or inhibitor) summary now includes an ALIGN VARIANTS button. Many peptidases and inhibitors are sequenced many times and variants exist, either strain-specific or the result of alternative initiation, alternative splicing of exons, allelic variation or single nucleotide polymorphisms (SNPs). Clicking on the ALIGN VARIANTS button will generate a dynamic alignment of all the variants we have collected from the primary sequence databases. Residues that differ from the sequence we have selected for inclusion in our protein sequence collection are highlighted as white text on a black background.
Gene name index
A new index of gene names has been added to the main index page (the left hand menu). You can now search for any peptidase or protein inhibitor homologue knowing the name of its gene or its gene locus.
Protein substrate annotation
All protein substrates of peptidases are mapped to sequences in the UniProt protein sequence database. This database contains translations of full coding sequences, including the initiating methionine, signal and other targeting peptides. However, the substrates as used by researchers are usually mature proteins and peptides. The substrates page for each peptidase summary now includes an extra column in the table to show the residue range of the protein or peptide used in each respective study. The display of protein substrate alignments also shows the residue ranges graphically in the header lines that include the scissile bond symbols in the form <--+-+--->. A question mark instead of an angled bracket indicates that the terminus has not been determined.
MEROPS identifiers have been added to the tables of peptidase-inhibitor interactions, and it is now possible to order the tables according to the identifier or the protein name.
Chromosome locations added to Organism pages
For eukaryotes with completely sequenced genomes, the chromosomal location (in megabases) of the peptidase or protein inhibitor homologue gene is now shown on the organism page. These locations are derived from the EnSEMBL database by searching for entries with a cross-reference to the UniProt protein sequence database, therefore a location will not be shown for gene from any genome where the copy number is low. However, the locations for all homologues from human and mouse should be shown. For human and mouse these locations are also shown in the Genetics table of the peptidase or protein inhibitor summary. Here the locations are linked to the contig view in EnSEMBL, which shows the exon and intron structure of the gene. The name of the chromosome (or genomic scaffold) precedes the location and the strand is indicated by a plus or minus sign in parentheses after the location. Users should be aware that EnSEMBL is automatically generated and is not a curated database.
New peptidases and inhibitors
The full list of identifiers that appear for the first time in the present release of MEROPS can be found here.
Release 8.01 05-May-2008
We have been aware that as more data are collected some of our alignments are becoming very large. Not only will there be hundreds (even thousands) of sequences, but the consequences of aligning so many diverse sequences means that more gap characters are inserted and the alignments get wider. These are difficult to view on a computer screen, and on scrolling the screen the residue numbers or sequence identifiers disappear off screen. To help to alleviate these problems, we have made our dendrograms ("trees") more interactive. The nodes of the tree are now active links and on clicking on the node an alignment of all the sequences derived from that node will be displayed. This alignment also includes the family type example and the sequence numbering derived from the type example sequence. The alignment displayed is not dynamic, but is derived from the full alignment by removing any insert characters common to all the sequences. In order to make this happen, we are now including the aligned peptidase or inhibitor unit sequences and the dendrograms (in New Hampshire format) in the MySQL database. Users who download the MySQL database from our FTP site should be aware that two new tables, aligned_sequence and tree, have been added.
New peptidases and inhibitors
The full list of identifiers that appear for the first time in the present release of MEROPS can be found here.
Release 8.00A 14-Feb-2008
More regular updates
This is the first of what are intended to be monthly data updates to MEROPS. An update is not a full release, so there are no new features and only a handful of alignments and trees will have changed from the last release. All the sequence data have been updated, however, and this update includes the analysis of several new prokaryote genomes as well as several new families and family summaries.
New peptidases and inhibitors
Release 8.00 8-Jan-2008
A major change to MERNUMs
Every sequence in the MEROPS database has been assigned a unique accession which we call the MERNUM. The MERNUM consists of the letters 'MER' followed by a number. When we set-up the system we though that a five-digit number would be sufficient, but with sequencing becoming easier to do during our most recent collection of new peptidase and protein inhibitor homologues we collected our 100,000th sequence. So a MERNUM is now a six-digit number. Users who have individual sequences bookmarked as favourites will need to refresh the links in their Web browser.
MEROPS DAS server
A distributed annotation system (DAS) server has been set-up for MEROPS. This allows others to extract data directly from the MEROPS MySQL database for inclusion in their own Internet service. The user enters an accession as a parameter on the URL (usually this will be a UniProt accession, but an EMBL/GenBank ProtID will work for MEROPS) and data relating to the sequence stored in our collection will be returned. For a peptidase or protein inhibitor, this will include the MEROPS identifier, family and clan, the extent of the peptidase or inhibitor unit, active site residues (and metal ligands for metallopeptidases), the amino acid sequence and a link to a page in MEROPS for each feature. For a protein substrate, positions of known cleavages and the MEROPS identifiers of the peptidases responsible are returned. Example URL's are:
http://das.sanger.ac.uk/das/merops/features?segment=P07858 (features for human cathepsin B)
http://das.sanger.ac.uk/das/merops/sequence?segment=P07858 (sequence for human cathepsin B)
http://das.sanger.ac.uk/das/merops/features?segment=P05067 (known cleavages for human amyloid beta A4 protein precursor)
New specificity display
The summary page for any peptidase with well-characterised specificity now contains an additional display to the logos, with the advantage that amino acids not known to occur within a cleavage site are now shown. This simpler tabular display shows how frequently an amino acid occurs in each substrate binding pocket. There is a column for each substrate residue P4 to P4', and a row for each amino acid (in alphabetical order of the single letter code). If only one amino acid occurs in any position, then the background of the table cell is shown in red. If an amino acids occurs in 75% of all substrates then the cell background is orange. If an amino acids occurs in less than 25% of all substrates, then the cell background is white. If an amino acid is not known to occur, then the cell is shown with a black background.
Facilities are being set-up for our users to contribute to annotation in MEROPS. We are very grateful to all our users who have provided information, pointed out errors, or made helpful suggestions, and intend a series of forms to make user contribution easier. The left-hand green menu now includes a "Submissions" button. At present there are only two submission items, both for advising us of any known protein cleavage sites that we are unaware of. The first is a form for the submission of a single cleavage, the second allows the user to upload a file of known cleavage sites. The latter has been designed with proteomics experiments in mind. The information requested will allow us to map the cleavage to an entry in the UniProt database. We look forward to receiving your submissions.
New peptidases and inhibitors
Release 7.90 17-Sep-2007
A dynamic alignment of protein substrate sequences to show conservation around a user-selected cleavage site
An additional option has been added to the "What are the known cleavage sites in this protein?" page. On this page the user is invited to enter the UniProt sequence database accession of any protein and the display shows known cleavages and the peptidases that perform them. The new option is to make a dynamic alignment of close homologues of the chosen protein substrate sequence. The sequences used for this alignment are taken from the relevant UniRef50 database entry, which lists all sequences from the UniProt and UniParc databases that share 50% sequence identity, and the alignment is generated by MUSCLE. The substrate in which the cleavage is known to occur is highlighted with a green background. Peptidases know to cleave the selected protein substrate are listed above the aligned sequences, and known cleavage positions are marked by a scissile bond symbol above the P1 positions in the substrate. Clicking on one of these symbols will highlight residues P4-P4' in the aligned sequences. Residues identical to those in the known substrate are shown with a pink background; replacements that are known to occur in the same position in other substrates for the selected peptidase are shown with an orange background. Replacements that are unknown in any substrate in the same position for the selected peptidase are shown as white text on a black background. This facility enables the user to assess the evolutionary conservation, and therefore probably the physiological relevance, of the known cleavage.
Label/key files for sequence alignments and trees
Label/key files for sequence alignments and trees have been enriched by:
a) Inclusion of protein architecture subheadings that make use of the architecture "strings" taken from the Pfam database
b) Inclusion of a link from each organism name to the organism card. This allows the user to find out what sort of organism it is, and what other peptidases or inhibitors it is known to express.
Release 7.80 23-Apr-2007
Release 7.70 22-Jan-2007
The summary page for any peptidase with well-characterised specificity now contains a 'logo' that is a diagrammatic representation of the specificity preference in each of the subsites P4 - P4'. To generate the logos, sequences around cleavage sites (ten or more) are aligned, and a hidden Markov model is generated. This is converted to a logo by use of the WebLogo package (Crooks et al., 2004). This feature owes much to the skills of our two rotation students, Jun Kong (firstname.lastname@example.org) and Matias Piipari (email@example.com).
Comparative genomics of prokaryote strains
For many species of bacteria and archaea, genome sequences are available for multiple strains. It can be of great interest to know how the peptidases and inhibitors in the strains compare, for example when some strains are pathogenic and others not. The Comparative Genomics section at the foot of the Searches page now contains the option 'What are the common peptidases in different strains of bacteria or archaea?'. This leads to a page on which a species of bacterium or archaean can be selected to show a side-by-side comparison of the peptidases in the various strains. For each MEROPS identifier, an alignment of the sequences from the different strains is available. An example can be seen here.
Alignments show holotype sequences
Now that numbers of known sequences are becoming so large, full alignments of the homologous sequences in a family, which may amount to hundreds, can become confusing. As a response to this, MEROPS now provides an alignment of just the holotypes in each family and subfamily. It may be remembered that a holotype is the single 'type' representative of its MEROPS identifier.
The new Inhibitors button above some peptide summary pages gives access to data on peptidase/inhibitor interactions. For a peptidase, the user is presented with a list of at least some of the known protein inhibitors, with Ki, conditions and a reference where known. The table can be sorted according to any of the column headings. For an inhibitor, the user is presented with an alphabetical list of peptidases inhibited, again with supplementary data.
Better keys to sequences
The 'Key to sequences' file associated with each alignment and tree now has a format in which each line contains a link to the sequence in MEROPS where there might previously have been a Uniprot accession number.
Release 7.60 23-Oct-2006
Changes to clan CA, and a new clan CN
Prior to Release 7.6, MEROPS included many families of viral cysteine peptidases in clan CA with little supporting evidence. These were families for which crystal structures were not available, but catalytic residues were known to be cysteine and histidine, occurring in this order in the sequence as they do in papain. The correctness of this policy has now been brought into question by the description of the structure of the nsP2 peptidase of Venezuelan equine encephalitis alphavirus (peptidase C09.002). The nsP2 peptidase has a structure significantly different from those of other known cysteine peptidases, and has caused MEROPS to remove family C9 from clan CA and place it in a new clan, CN. At the same time, several other families of viral cysteine peptidases were removed from clan CA pending further evidence.
Genome statistics show strains
The pages "Peptidases in Whole Genome Sequences" and "Inhibitors in Whole Genome Sequences" reached from the Genomes entry on the menu bar, now show data by strain as well as species.
Display of all known cleavage sites in a given protein
The Substrates page that can be reached from each peptidase summary have now been enhanced. Clicking on the Uniprot accession of a protein substrate opens a window in which the complete sequence of the protein is shown together with all the cleavage sites known to MEROPS. Mouse-over text for each cleavage site gives further informtion about it.
Work continues on small-molecule inhibitors
Summary pages for important small-molecule inhibitors (SMI) of peptidases were introduced into MEROPS in Release 7.4, and with the current release the number of SMI included is increased to 158.
Comparison of genomes
We at MEROPS find that we often want to ask a question like "What peptidases are in the mouse genome but not in the human?". When the sequencing and analysis of the two genomes have been completed, it should be a simple matter to make this sort of comparison, and we now include a new "Searches" page to allow the user to do this. Please open the Searches menu from the left-hand green menu and click the option "Comparative Genomics" at the bottom. Select two species from the drop-down menus at the top, and choose to compare them at the level of Family or MEROPS Identifier. Full details of exactly how the comparisons are made are to be found in the help text.
Merging of subfamilies in family S1
We have re-organized the subfamilies within family S1. As more data became available, and alignment methods improved, intermediate sequences were found, and it became clear that several of the former subfamilies should merge. Family S1 is now divided into just two subfamilies, S1A and S1B. The former subfamilies S1D (which included lysyl endopeptidase from Achromobacter lyticus and arginyl endopeptidase from Lysobacter enzymogenes) and S1E (which included streptogrisins) are now included in subfamily S1A (the trypsin subfamily). The former subfamilies S1C (protease Do) and S1F (astrovirus serine peptidase) are now included in subfamily S1B (the glutamyl endopeptidase I subfamily).
Better BLAST searches
The sequence library that is used for the BLAST searches in MEROPS now contains more sequences, i.e. all the holotype sequences and also the "linker" sequences that are required to make the transitive links within families. This makes the e-values that are returned more meaningful, and decreases the number of false-positive hits.
We are progressively enhancing the usability of the tables in MEROPS by making the columns sortable. An example is the table of Images that can be reached from the green side-menu.
Release 7.40 15-Mar-2006
Appearance of small-molecule inhibitors (SMI) in MEROPS
Perhaps the major single reason for the intense research activity on peptidases is their involvement in many disease processes. This makes many of them drug targets, and scientists in hundreds of companies and academic laboratories are working to develop new, small-molecule inhibitors of peptidases for possible use as drugs. MEROPS already contains thousands of literature references detailing the results of the work on SMI (flagged I in the reference lists), but in the present release we add a whole new section to MEROPS that deals with the small-molecule inhibitors in greater depth. The size of the field is vast, and it will be some time before we can do it full justice, but we already include many of the 'big name' inhibitors that are either useful as reagents in the laboratory or are significant drugs.
The sidebar menu on the Inhibitors side of MEROPS contains a link to the index of SMI by name, and thence to summary pages about the individual inhibitors. The summary pages for individual peptidases may also show a new section "Relevant inhibitors" that contains links to some of the SMI that have been described for this particular enzyme.
Completion of clan summaries
In 1993, we introduced the term 'clan' to refer to an evolutionarily-related set of families of peptidases, and this level of classification has been important in MEROPS since the beginning. The detection of distant relationships between proteins is often difficult, and the results can be controversial. We therefore need to be clear about the criteria we have used in assembling each clan so that others can assess the validity of the grouping. With this in mind we have expanded all of the text summaries that describe the clans.
More consistent numbering of residues in Structure images
The numbers of amino acid residues (typically active site and metal-ligand residues) cited in the legends for the Richardson-style molecular images have now been made consistent with the numbering schemes used elsehwere in MEROPS for the given peptidases, and may differ from the numbering that was used in the original Protein Databank record.
Display of known cleavage sites in proteins
There is an additional search option from the Searches button on the green menu bar. This allows a user to enter the UniProt accession of any protein to see the sequence displayed with any known cleavage sites. When the peptidase responsible is known there is a link to the peptidase summary page in MEROPS. A mouse-over displays the name and MEROPS identifier of the peptidase, or, if more than one peptidase cleaves at the same site, shows a pull-down menu of the peptidases.
More flexible display of EST data
Greater use of CGI techniques in MEROPS has made it possible for the human and mouse EST data table for each peptidase to be sorted by the table headings according to the user's preference, for example by disease state.
Richer markup of BLAST results
The markup of the results of BLAST searches against the MEROPS data has now been further enhanced to include the potential and known glycosylation sites and disulfide bonds that are shown in UniProt.
Release 7.30 22-Dec-2005
Batchwise scanning of MEROPS data
We now provide a facility to BLAST a set of up to 5000 amino acid sequences against the MEROPS sequence collection, to obtain a report that shows the peptidase family and conservation of active site residues for each of the hits. (Find full details under "Batch BLAST" in the About pages.)
Protein domain images
The summary page for each peptidase and inhibitor now includes a linear diagram of the whole protein in which the locations of the peptidase unit and other recognised domains are shown. The diagrams include markers for the active site residues, disulfide bonds and other features. These diagrams have been made possible by the availability of data from Pfam, our sister database at the Wellcome Trust Sanger Institute. (Find full details under "Domain images" in the About pages.)
Clan summaries begin to appear
Clans form the top level of the hierarchical classification of peptidases and their inhibitors that MEROPS has pioneered. Having completed the summary pages for the families, we have now started work on those for the clans, and the first of these are included in the present release.
Literature pages: new format, new data
We have changed the format of the references to a three-line one that we believe is clearer. We have also added new links that use the Digital Object Identifier (DOI) system. The DOI system is designed to provide stable identifiers for journal articles and much else besides. By use of DOIs we make links to the resources on the respective publishers' web sites. For journals that provide free content, or to which the user of MEROPS has a subscription, this link is likely to lead directly to the full text of the article.
New format for 'About' pages
The 'About' pages that try to answer a variety of questions about MEROPS, and are our equivalent of FAQ, have been re-organised with their own menu. We trust that users will find the information more accessible.
Release 7.20 14-Oct-2005
Completion of summaries for inhibitor families
Each of the 56 families of proteins that inhibit peptidases now has a text summary in a format similar to that we used for the peptidase families.
Links from BLAST results
The markup of results on the BLAST page now includes a direct link to each matching peptidase or inhibitor and its family.
Second clan of N-terminal nucleophile hydrolases recognised
DmpA peptidase (S58.001) is a self-processing, serine-type, N-terminal nucleophile (Ntn) hydrolase, but its structure, in the DOM-fold, shows that it is not homologous to the other Ntn hydrolases, which are in clan PB of MEROPS. Accordingly, family S58 has been placed in a new clan, SQ.
Family U61 moves to S66, in clan SS
It has been shown that LD-carboxypeptidase formerly of unknown catalytic type in family U61 is in fact a serine peptidase with a distinctive protein fold. Full details can be found under family S66.
Links to the HUGO Gene Nomenclature Committee database
On the peptidase and inhibitor summary pages, each human gene symbol is now linked to the HUGO Gene Nomenclature Committee database, which will provide additional information about it.
Links to IPfam for peptidase-inhibitor interactions
Links to the IPfam protein interactions database are made when a structure is available for a peptidase-inhibitor complex.
Release 7.10 22-Jul-2005
New peptidases and inhibitors
Additional MEROPS identifiers have been assigned for all of the known human and mouse peptidase inhibitors. The list of identifiers that appear for the first time in the present release of MEROPS can be found here.
Release 7.00 4-Apr-2005
Markup of BLAST results
The results of BLAST searches are now highlighted to show the presence or absence of catalytic residues. This gives an indication of whether a novel sequence is that of an active peptidase or a non-peptidase homologue.
Lists of peptidases and inhibitors at higher taxonomic levels
Having consulted an Organism page to see the list of peptidases known from (for example) Plasmodium falciparum, a user may be interested to see the set for all Plasmodium species, or even all Protozoa. That is now possible: just click the higher level at the top of the Organism page. Please be patient if the pages with these larger sets of peptidases take a little time to appear.
Yet more family summaries
The series of expanded Summary pages for peptidase families is complete in Release 7.00. Work on fuller summaries for the inhibitor families is now in progress, and already new data are visible for many of them.
Alignments and trees for individual peptidases and inhibitors
With the new Alignment button at the top of each peptidase page MEROPS now provides an alignment of sequences (either full-length or peptidase units only) for each individual peptidase and inhibitor. The Tree button leads to a Neighbor-Joining tree derived from the alignment of peptidase units.
More informative sequence displays
We have added more markup to the displays of individual "MER" sequences that are reached from the sequence pages. When a peptidase unit is interrupted by unrelated sequence, that is shown. An example is a plant peptidase in family A1 that contains an inserted saposin-like sequence. Non-catalytic residues that replace catalytic ones are now marked in black (view).
FAQ become About pages
In the process of revising the FAQ pages we have re-arranged much of the information they contained, and also have changed their name to "About".
Release 6.90 16-Dec-2004
Display of active site residues in family summaries
Our work on the expansion of the data on the peptidase family summary pages continues, and about three-quarters of them have been re-written during 2004. As an adjunct to this, we have added a display of the active site residues in the family (numbered as in the type peptidase). Following our usual convention, the catalytic residues are shown on a red background, and the metal ligands on blue.
Alignments of sequences for individual peptidases and inhibitors
We now include alignments of the full-length amino acid sequences of individual peptidases and inhibitors, reached by the pale blue button. Being full-length alignments, these show the peptidase or inhibitor units in context. Colour-coding marks the extent of the peptidase and inhibitor units: in the sequence of the holotype, the unit is coloured green, and the remainder of the sequence brown. There is highlighting of the catalytic residues (red background) and the metal ligands (blue background). The link to the MEROPS reference sequence shows the species of origin when the cursor is held over it. The alignments are generated dynamically on demand, so please allow a second or two for them to appear.
Activity status of human and mouse peptidases
MEROPS Release 6.9 lists just under 700 peptidases and homologues encoded in the human genome, and slightly more from the mouse genome. But in each species only a minority of these potential peptidases are yet known to be catalytically active peptidases on the basis of direct experimental evidence. Others are presumed to be active because of their structural similarity to active peptidases from other species such as rat. Or they can be described as 'putative' because, although there is no closely relevant experimental evidence, they do contain all the residues that are known to be of functional importance in the family. Other homologues are probably not active peptidases, either because the genes are pseudogenes or the expressed proteins lack residues that are believed to be essential to peptidase activity in the family. A new feature of MEROPS introduced in Release 6.9 is the field 'Activity Status' on the PepCard for each peptidase homologue from the human or mouse genomes. This shows whether we currently regard this as an 'active', 'putative' or 'inactive' peptidase homologue. If we are aware that a peptidase has been shown experimentally to be active, we try to give a reference, and if on the other hand we believe it to be inactive because one or more expected active site residues are replaced, we show that too. A few examples:
|A01.007||Renin||Human: active (Suzuki et al., 2004)
Mouse: active (Hansen et al., 2004)
|S09.018||Dipeptidyl-peptidase 8||Human: active (Chen et al., 2004)
Mouse: active (by similarity to human)
|S09.973||Dipeptidylpeptidase homologue DPP6||Human: inactive; S D H
Mouse: inactive; S D H
Flagging of topics in Literature pages
The literature on peptidases is large, and the Literature pages in MEROPS contain well over 20,000 references. So that it may be easier to spot a paper on a particular topic in a Literature page, we have added "flags" for six important topics. Thus E indicates that the paper contains information on the recombinant Expression of a peptidase, I shows that we found the article to be relevant to the design of Inhibitors for the enzyme, K means that the paper deals with a gene Knockout or other artificial genetic manipulation, M shows that the paper deals with a natural Mutation, allelic variant or polymorphism, R indicates that the article includes information about an RNA splicing variant, S means that the article deals with three-dimensional Structure, and V shows that the article is a Review.
No doubt some articles deserving of flags do not yet have them, and we are working to make the assignments more complete.
Release 6.80 27-Aug-2004
Progress with expanded family summaries
We have been busy producing more of the expanded summaries of peptidase families that were introduced in Release 6.7, and 90 families (just over half of the total peptidase families) are now included.
Clan cards for all clans
In the past, MEROPS has not provided summary cards for clans that are divided into subclans; instead there was a separate summary for each subclan. This has now been changed so that we treat clans and subclans very much as we have done families and subfamilies for some time. That is to say, there is a summary page for every clan, and the cards for the clans that have subclans contain the data for each subclans in a subsection. We feel that this is logical and hope that it will make the clan-level data more accessible.
Release 6.70 30-Jun-2004
A start on expanded family summaries
A new format has been adopted for the family summaries that allows the inclusion of much new information. The additional information has been added to about one quarter of the summaries in the present release, and we plan to complete the work during the coming year.
Better access to information on peptidases and inhibitors by Organism
Three improvements have been made.
- The indexes of Organisms now include English common names as well as the scientific binomial names of the organisms.
- The individual Organism pages that list all peptidases or inhibitors known from a given species have been made interactive in regard to sort order. The default order in the lists is by family, but re-sorting by clan, peptidase or inhibitor, or gene name can be achieved simply by clicking the top of the appropriate column.
- A link at the top of each Organism card gives access to a full list of the sequences of peptidase or inhibitor units known from the species.
Marking of "holotypes"
Many users of MEROPS will be aware that each family or subfamily is built around a "type peptidase" to which all other members of the family must be shown to be related. Similarly, we nominate a type form for each peptidase and inhibitor, and by analogy with the taxonomy of organisms, this is called the "holotype" (formerly "type example"). The identity of the holotype for each peptidase and inhibitor is shown on the relevant Summary page, and as of Release 6.7, the names of the holotypes are also highlighted in the label file below each multiple sequence alignment and tree.
New book relevant to MEROPS
The long-awaited new edition of the Handbook of Proteolytic Enzymes edited by Alan J. Barrett, Neil D. Rawlings and J. Fred Woessner is now available. Please see http://books.elsevier.com/proteo. The CD-ROM that accompanies the two-volume book is closely linked to MEROPS.
Release 6.60 29-Mar-2004
A new catalytic type of peptidases
As a result of the exciting paper of Fujinaga, Cherney, Oyama, Oda & James (2004) The molecular structure and catalytic mechanism of a novel carboxyl peptidase from Scytalidium lignicolum. PubMed, we now recognise a sixth catalytic type of peptidases: the glutamic peptidases. The known glutamic peptidases are all contained in the the family that was formerly A4, and now becomes G1.
In the light of new crystal structures, three new clans have been established: clan MO (containing family M23), clan MP (containing family M67) and clan SJ (containing families S16 and S50).
New peptidases, inhibitors and families
New families in this release are G1 (renamed from A4, see above) and M73 (containing only camelysin from Bacillus species). There is a full list of new identifiers here.
Enhanced Download page
If you are interested in working with MEROPS data on your own local system, please click Downloads on the menu at left, and see what we are now offering.
New papers about MEROPS
- Rawlings,N.D., Tolle,D.P. & Barrett,A.J. (2004) Evolutionary families of peptidase inhibitors. Biochem J 378, 705-716. PubMed
- Rawlings,N.D., Tolle,D.P. & Barrett,A.J. (2004) MEROPS: the peptidase database. Nucleic Acids Res 32 Database issue, D160-D164. PubMed
And much else besides...
As usual, MEROPS has undergone a comprehensive update for the new release. The total of sequences listed is now 19,689 (up by 820) for 2,243 peptidases (up by 63) from 2,056 species of organism (up by 49). The alignments of human and mouse ESTs have been completely regenerated, and now contain over 166,000 individual ESTs. There are 2,267 literature pages, containing 20,174 references.
The amino acid sequences from our own collection (the "MER-sequences") are now returned by use of a CGI script with enhancements such as residue numbering, range of peptidase unit and numbers of catalytic residues. We hope you will find all this useful.
Release 6.50 22-Dec-2003
An additional series of identifiers for clans of inhibitors
We have named the clans of peptidase inhibitors with identifiers from the series IA to IZ, but this has not proved sufficient for the very numerous clans of inhibitors. We have therefore moved on to using the additional series JA - JZ, of which only JA has so far been assigned (for the family of the thrombin inhibitor, triabin).
Withdrawal of a few families of inhibitors
The status of all the families of peptidases and inhibitors is kept constantly under review. We have recently taken the view that the inhibitors in three of the families that we have previously recognised do not meet our criteria for retention in MEROPS. Two of the families affected are I22 (BsuPI protease inhibitor) and I23 (BbrPI protease inhibitor), about which we feel too little is yet known, although we shall be watching for new developments that will justify their re-instatement. Family I30 containing the cathepsin B propeptide and its homologues has also been withdrawn, on the grounds that we do not know of any member of the family that is expressed other than as part of a cysteine peptidase. Again, the situation will remain under review.
New peptidases, inhibitors and families
The process of collecting data for MEROPS continues and a number of peptidases, inhibitors and families are appearing for the first time in this Release. Amongst the new families are C69 (Lactobacillus dipeptidase A), S62 (PA endopeptidase of influenza A virus), I57 (staphostatin B), I58 (staphostatin A) and I59 (triabin).
Release 6.40 16-Sep-2003
Still more peptidases, inhibitors and families
The process of collecting data for MEROPS continues and a number of peptidases, inhibitors and families are appearing for the first time in this Release.
Release 6.30 16-Jun-2003
A facility for better links to MEROPS
The counts of known peptidases and inhibitors that are shown on each Organism card are now broken down to show active or putative peptidases separately from their catalytically inactive homologues, and active inhibitors separately from homologues without known inhibitory activity. For example, for Homo sapiens we show "Count of known and putative peptidases: 464, inactive homologues: 88" and "Count of inhibitors: 98, homologues: 156".
Links to Ensembl
The Summaries for human and mouse peptidases and inhibitors now contain links to gene reports in the Ensembl database. Ensembl is a joint project between EMBL - EBI and the Sanger Institute to develop a software system which produces and maintains automatic annotation on human, mouse and other eukaryotic genomes.
Still more peptidases, inhibitors and families
The process of collecting data for MEROPS continues and a number of peptidases, inhibitors and families are appearing for the first time in this Release.
Release 6.20 24-Mar-2003
BLAST server for MEROPS
A user of MEROPS may have a protein or nucleic acid sequence that is possibly that of a peptidase. It is useful to be able to search the MEROPS data with such a sequence to find homologues and see how they are classified in MEROPS. With the help of the Institute's Web team we have now implemented a server to BLAST the MEROPS sequence data in this way. A library has been compiled containing amino acid sequences of peptidase units from our entire collection for peptidases and peptidase homologues. The library also contains inhibitor units from our collection of protein inhibitor sequences. The searches available are BLASTP (protein sequence query against a protein sequence database) and TBLASTX (nucleic acid sequence query against a protein sequence database). Please look for the "BLAST MEROPS" item on the sidebar menu and give it a try.
Details of distributions of families
Recently a colleague was reviewing one of the families of peptidases, and asked us for all we knew about its distribution throughout various kinds of organisms. We had a wealth of data, but found that we were not presenting it in a readily-accessible form in MEROPS. Now you will find that the Distribution table at the foot of each family summary card contains "details" links, and these will open up new windows containing the names of all the species from which the family has been reported in the kingdom of organisms. For example:
Many of the Pepcards now show a new "Substrates" button that will reveal a page of specificity data mainly for the action of the peptidase on other proteins. Of course the three Searches that provide access to specificity data are still there too, but the new buttons look and work like this:
New and merged families
The MEROPS classification of peptidases is constantly evolving. Major changes that have occurred since the last release of MEROPS include the merging of family M46 (pappalysin) into M43 (cytophagalysin). New families recognised in the present release include C61 (small protease of Sulfolobus solfataricus), C62 (gill-associated nidovirus 3C-like proteinase) and C63 (African swine fever virus processing peptidase).
The Peptidase List
There are a number of databases that provide various kinds of information about enzymes in general. We encourage them to give their own distinctive treatments to as many peptidases as possible, and to this end we are providing a list of the well-characterised peptidases that they might wish to include. We call this the Peptidase List (or PepList for short), and it can be reached from the PepList item on the sidebar menu. If the curator of any other database would like to have the List in any other format we shall try to help.
Molecular images of inhibitors
Twenty five molecular images of inhibitors have been added to MEROPS in the present release - please see the Images index on the Inhibiors side of MEROPS for the details.
Release 6.10 10-Jan-2003
New Location for MEROPS
The MEROPS team moved to the Wellcome Trust Sanger Institute on the Genome Campus at Hinxton near Cambridge on October 1, 2002. There we are working alongside the Pfam Protein Families database, and are in an ideal environment for database work. Access to the database is now unrestricted under the terms of a GNU library licence. We are grateful for continued financial support from the Medical Research Council and the warm welcome that we have received from Alex Bateman and his Pfam team.
Addition of inhibitors to MEROPS
The proteins that inhibit peptidases are arguably as important as the peptidases themselves to any understanding of the balances of proteolysis in any biological system. The compilers of the Database have a long-standing interest in these proteins, and with Release 6.1 have made a start on including them in the database. It is a major challenge to provide proper coverage of this new aspect, and we do not claim to have completed it in one release, but we feel that we have made a useful start. As always, we shall welcome suggestions for further improvements.
New families of peptidases appearing in this release of MEROPS are C60 (type example: sortase A) and M67 (type example: Poh1 peptidase).
Merging of families
MEROPS recognises transitive relationships when forming families. This means that when a new sequence appears that shows significant relationships to proteins in two existing families the two families are merged. Since Release 6.0 we have merged family C29 into C16, families M25 and M40 into M20, and M37 into M23. More details can be found here.
More informative 'MER' sequences
Colleagues familiar with the database will know that each Sequences card contains (in the left-hand 'MERNUM' column) links to our copies of the full-length, 'MER' sequences in FASTA format. In the present release we have added additional information to each header line, and have highlighted the part of the sequence that is the peptidase or inhibitor unit in red for greater clarity.
Release 6.0 30-Aug-2002
Clan-level sequence alignments
In the absence of three-dimensional molecular structures, the evidence that supports the assignment of a peptidase family to a clan can come from the order of catalytic residues in the polypeptide chain and the similarities in amino acid sequences around them. We now show this kind of evidence in MEROPS in the form of a limited sequence alignment for any clan that contains multiple families, e.g. for the metzincins, clan MA(M). (The numbering of residues is according to that of the type example of the clan.)
Extended family trees
In the past, MEROPS has contained alignments and trees only for the subfamilies in those families that are divided into subfamilies, not for the complete families. We feel that this has been a deficiency, because we nowhere showed the deep divergences that demand the separation of the subfamilies. With the present release we have started to put this right, by providing trees for the complete families also.
New families and subfamilies
Family A22 of presenilin has been divided into two families, following the discovery of many more putative peptidase homologues. Two new families of bacterial peptidases have been created: C58 and M64.
Type examples for peptidases
The concept of a "type example" for a taxon is well recognised - this is the individual example to which all other members of the group can be shown to be related, and the definition of type examples enhances the stability of a taxonomic system. Type examples for clans and families have been shown in MEROPS previously, but we have now added the type examples for individual peptidases as well.
Sequence libraries for human and mouse peptidase units added to the FamCards
There are now buttons "H-seq" and "M-seq" at the top the card for each family of peptidases that is known from mammals. These link to FASTA libraries of the sequences of the peptidase units for the family, which can be copied, and are then useful for making one's own alignments and hidden Markov models. The header line for each sequence now shows the residue numbers of the range of amino acids included in the peptidase unit, even when the peptidase unit is one from which an intervening sequence has been removed.
Release 5.9 21-Jun-2002
Learning more about peptidases from the complete genomes
It is fairly obvious that once the sequencing of the genome of an organism is truly complete the data can show which families of proteins are absent from the genome as well as which are present. There are data for about 75 completed genomes in the present release of MEROPS, and we are now making fuller use of these to obtain information about the evolution of peptidase families. The set of buttons linking to information on each family now include a "Genomes" button (except for the families that are confined to viruses). The distribution data take the form of a tree in which each twig is an organism with a sequenced genome, and the colours are blue if the family is present or black if it is not. In an example we can see that family A1 containing pepsin and its homologues is present in all the eukaryotes, but not in the archaea or bacteria. Conversely, subfamily M24A is present in all the genomes available to date. The genome tree for family M22 is the only other one that is entirely blue at this stage. Many of the other families have produced trees that are much more complex, and the interpretation of these will be food for thought for our users as well as ourselves at MEROPS.
Please note that the blue (and capitalized) organism names have links to their organism cards, so that the user can click on any one to look up exactly which members of the family occur there. We have added similar links to the Distribution trees for individual peptidases, too.
New plug-in to view molecules
Since the early days MEROPS has provided for the use of the excellent RASMOL viewer, but we now also provide links on the Structure pages that make use of CHIME. The user has first to install CHIME from here.
Release 5.8 19-Mar-2002
Removal of intervening sequences that interrupt peptidase units
Most users of MEROPS are probably familiar with the concept of a peptidase unit, and they may have noticed that in some peptidases the peptidase unit is interrupted by the insertion of an unrelated domain. A very clear example of this is gelatinase A (M10.003) in which three copies of a fibronectin-like sequence are found immediately N-terminal to the HEXXH zinc-binding consensus sequence. Most other members of the family (e.g. matrilysin, M10.008) show no such insertion. We think it best to remove these intervening domains from peptidase units for display in MEROPS, because this improves the alignment of the sequences of the peptidase units proper, and makes it easier to compare one peptidase unit with another throughout the family. Removal of such intervening sequences has been done where necessary in subfamilies M10A, M50B and S8A. The sequence segments to be removed were recognisable by the lack of any matching segment in a pairwise alignment with the type example of the family. The residue numbers of the two remaining segments forming the strict peptidase unit are indicated in the label file for the subfamily alignment and tree. In the alignment itself, the point at which the intervening sequence has been removed has been indicated by use of colouring as shown here (for human gelatinase A):
This would indicate that an intervening segment has been removed between the residues T and C. The details of this are now included in the key to the sequence alignment, e.g. "(residues: 108-214, 390-461)" for this particular sequence.
A start on larger alignments and trees
Problems of computation and display have until now prevented us from including alignments and trees containing more than 200 sequences in MEROPS, but with the help of our friends at Pfam we have now been able to handle 740 sequences in subfamily S1A. We find this enormous tree helpful in checking the coding assignments in the subfamily, and anticipate that our users will find it useful too. We plan to add more large alignments in the next release.
New genomes, and another human chromosome
MEROPS now contains analyses of the peptidase content of the complete genomes of Brucella melitensis, Nostoc sp. PCC 7120, Clostridium perfringens and Pyrobaculum aerophilum. In addition, the completed human chromosome 20 has been found to contain 9 peptidases and homologues amongst 727 total predicted protein-encoding genes. Incidentally, about 30 new images of the three-dimensional molecular structures of peptidases have been added to MEROPS since the last release.
Changing families in MEROPS
Family M51 of intramembrane metallopeptidases has been merged with family M50 following the appearance of linking sequences related to members of both families. We note that family M50 is an exceptionally interesting one, represented in almost all of the genomes so far completely sequenced. Families U61 and U62 are new to this release of MEROPS.
MEROPS names revisited
As we noted in Release 5.5, it seems obvious that any one peptidase is almost certain to occur in more than one organism. This means that any species-specific name is not going to work for long, because we shall not know what to call the same enzyme when it turns up in a different organism. Although this seems obvious, many peptidases are still named in just this way, most notably by use of gene symbols as names. A peptidase that is the product of the xyz gene in Escherichia coli might well be called "XYZ protease". But gene names are species-specific, and the gene that encodes this peptidase in another organism will almost inevitably have a different name. So "XYZ protease" will make little sense outside the original species. We know of many peptidases that have no satisfactory name for reasons such as this, and we have found it necessary to devise a system of temporary names for use until the scientific community, or perhaps the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology, produces a satisfactory one. Such a temporary name originally took the form "MEROPS-AA001 peptidase", but we have now replaced these by "Mername-AA001" because they make their meaning clearer. As before, we simply increment the letters and numbers to generate new names. We shall look forward to retiring the Mernames as soon as an acceptable new names appear.
Data for EST libraries
The alignments in MEROPS now contain 82,000 ESTs for peptidases and their homologues (for the three species, human, mouse and rat), and we believe that useful information can be obtained by considering the EST libraries in which they were found. So there is a new item in the sidebar menu "EST cell lines". This provides access to a table for each of the 2557 EST libraries (identifed by EMBL Library number) that shows what peptidases were detected in the library. Four separate indexes to the libraries in each species are sorted alphabetically by (1) Library number, (2) Tissue, (3) Developmental stage and (4) Disease. So it is easy, for example, to identify the three libraries from mouse adipose tissue, and to see the lists of peptidase ESTs that they contained by clicking the links to the library numbers on the left. This can be seen here.
New MEROPS publications
In recent months, two new publications have described aspects of MEROPS. A free download of the PDF file for the Nucleic Acids Research paper is available at the PubMed link.
Barrett, A.J., Rawlings, N.D. & O'Brien, E. A. (2001) The MEROPS database as a protease information system. J. Structural Biol. 134, 95-102. PubMed
Rawlings, N.D., O'Brien, E.A. & Barrett, A.J. (2002) MEROPS: the protease database. Nucleic Acids Res. 30, 343-346. PubMed
Release 5.7 17-Dec-2001
New families in MEROPS
The families M60 of enhancin, S46 of dipeptidyl-peptidase 7 from Porphyromonas gingivalis and S54 for the Rhomboid protein have been added to the database. S18 (for omptin) has been moved to A26 (see below).
First structures for three families
It is always a landmark when the first three-dimensional structure is published from a family of peptidases. One reason is that it commonly allows the assignment of the family to a clan, or places the existing assignment on a firmer footing. In this release we are happy to be able to show the first structures of peptidases from three families: omptin (A26, clan AF), mitochondrial processing peptidase (M16, clan ME) and anthrax lethal factor (M34, clan MA(E)). The structure of omptin was particularly influential, since it indicated that this is a family of aspartic peptidases, not serine peptidases as had previously been thought, and founded a new clan.
A "Community" page
MEROPS is accessed from nearly 10,000 computers every month, so it has the potential to act as a medium for the exchange of the kinds of information that can help to bring the community of scientists who are interested in proteolytic enzymes closer together. We are happy for it to do this, and the new Community page lists some of the societies and conferences that we think our users may like to be aware of.
Release 5.61 29-Oct-2001
Statistics of peptidases in completed genomes
The Statistics page now starts with a table that indicates roughly how much of their coding potential different organisms use to encode proteolytic enzymes. What is shown is the total number of members of peptidase families that we have found in each of the complete genomes we have analysed, as a percentage of the reported total number of genes. The numbers range from 0.68 - 3.57, but many are close to 1.8%. We see no striking correlation with type of organism, e.g. archaeon, bacterium or eukaryote.
We have added to MEROPS a feature previously seen only in MEROPS-PRO: a "Distribution" diagram that shows the organisms from which each peptidase is known in comparison to the distribution of the whole family. The diagram is reached by use of a new button on the PepCard.
More and better EST analyses
We have added results from analysis of the rat EST collection alongside those for human and mouse ESTs in PepCards. The format is the same: alignments and data tables. The "EST" columns in the Peptidase Identifier index now show "0" when a search was made and found no hits. Also, we have added Comments on many of the EST alignments; these can be found at the top of EST data cards.
Peptidase gene knockout data
Because of the great importance of gene knockouts in the understanding of biological functions of peptidases, an additional "Knockout" section has been added under "ACTIVITY" in the PepCards where we have such information. Incidentally, the links for the references cited are to the appropriate year in the Literature file.
Explanations of assignment of families to clans
The relationships between the families that are grouped in a single clan are in the "twilight zone" of sequence similarity, and the evidence of their relationship depends upon a variety of less rigorous criteria. We have therefore added a line to the FamCard to explain the reasons for the assignment of the family to the clan.
Trimming of trees for large subfamilies
Subfamilies C1A and S1A contain more peptidases than we can reasonably display in the sequence alignments and trees. The method of selection of peptidases for the S1A tree is again as was described for the last Release, but as an experiment the C1A alignment and tree contain one sequence from each code, human or mammalian if possible, plus all of the peptidases in the subfamily that are not yet assigned to codes.
Families coming and going
A new family M61 has been established; the type example is the glycyl aminopeptidase of Sphingomonas capsulata.
Previous releases of MEROPS have contained a family A13 in which we placed only the retrotransposon peptidase of Drosophila buzzatii. Homologues of the retrotransposon have now appeared that provide statistically significant links to members of family A2, and family A13 has therefore been closed. The divergent member that was A13.001 is now A02.054.
Release 5.5 15-Jun-2001
Introduction of "MEROPS-" names
It seems obvious that any one peptidase is almost certain to occur in more than one organism. This means that a species-specific name like "zebra fish protease A" is not going to work for long, because we shall not know what to call the same enzyme when it turns up in a different fish, or wherever. Although this seems obvious, many peptidases are still named in just this way. As a result, we often find ourselves in the ludicrous position of listing things like "zebra fish protease A from tuna" in MEROPS. At last we decided that this could not go on, so we have started a system of temporary names to use until the scientific community, or perhaps the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology, produces more satisfactory names. Such a temporary name takes the form "MEROPS-AA001 peptidase" in which we simply increment the letters and numbers. We shall look forward to retiring each MEROPS name as soon as an acceptable new name appears (and then we shall not re-use it for anything else). You may notice such names in MEROPS.
Introduction of subclans
All the peptidases that are grouped together in a clan are believed to be derived from a single evolutionary ancestor. Nevertheless, some clans contain distinct groups that are so divergent that there is a clear need to recognize them, and we have therefore introduced the concept of "subclan". The identifier for a subclan is formed by adding a letter in parenthesis to the clan identifier. One clan split in this way is MA, representing the gluzincin and metzincin groups of HEXXH-containing metallopeptidase, MA(E) and MA(M), respectively. A second example is clan PA, in which the families of serine peptidases are placed in PA(S) and the cysteine peptidases in PA(C).
Selection of sequences for alignments and trees
A few families and even subfamilies are now so large that it is not practicable to include all the sequences we have in the alignment and tree. In the present release we have had to cut down the number of sequences in subfamily S1A, and what we have done is to use only the human sequences. This will probably be attractive to those who care only about the human peptidases anyway, but will be frustrating to anyone wanting to identify a protein from another species as an orthologue of a human peptidase. Perhaps we will use another set next time.
A new search of the human and mouse ESTs
The quantity of expressed sequence tags in the databases is increasing at an impressive rate, and we have made another set of searches of the human and mouse ESTs. We searched a total of about five million ESTs to retrieve about 60,000 that match peptidases, and these are presented in our new set of alignments. In a change from Release 5.4, the existence of EST alignments is now indicated by the number of ESTs found in a column in the "Peptidase by Identifier" index table.
More informative EST data cards
If you find the EST alignments useful, don't miss the Data cards (linked from the top of each alignment). Amongst the new things in this release are the live links to Unigene clusters. You will find that we do not always agree with the Unigene assignments, but the links will make it easy to compare.
Clearer presentation of "unassigned" peptidases and homologues
We know many peptidases only by their deduced amino acid sequences. The sequence allows us to assign a putative peptidase to a family and often a subfamily, but unless it is closely similar to that of a peptidase that has been characterised biochemically, we commonly cannot assign the peptidase to a specific MEROPS identifier. This means that we are left with numbers of unassigned peptidases, as well as unassigned non-peptidase homologues, in many families. We have now introduced a new style of data card better suited to presenting the information about unassigned peptidases. Suggestions for further improvements in these or any other aspect of MEROPS will always be most welcome.
Inclusion of LocusLinks
We have started to include links to the valuable LocusLinks resource at NCBI. These appear in the Human Genetics section of the peptidase Summary cards, and in the Species card for Homo sapiens.
Better Downloads table
The Downloads table that you can use to fetch peptidase sequences for your own work now contains comments (in GCG format) giving the family and organism (NCBI Taxonomy code) for each accession number. This makes it easy to filter the lines (perhaps using a few lines of Perl) so that you could import only the sequences for family S1, say, or only human sequences. The Taxonomy identifier can be found at the top of the card for each organism, e.g. "9606" for Homo sapiens.
Release 5.4 23-Mar-2001
New format for Sequence pages
We have reorganized the sequence pages to make them smaller, faster to load and easier to use. The links are no longer divided between protein sequence and nucleic acid sequence tables, but combined in one table. This has allowed us to associate each TrEMBL database entry with its corresponding nucleic acid sequence database entry. The re-organisation has yet to be implemented with the PIR database entries, so the PIR database links are temporarily suspended from this release of MEROPS, but will return soon.
Inclusion of mouse EST alignments
In Release 5.3 we introduced alignments of human ESTs for peptidases, and we now have pleasure in adding alignments of mouse ESTs in the same format. We anticipate that users will find these helpful in the identification of novel homologues of known peptidases, and in recognising polymorphisms and splice variants. We are working hard to keep the alignments up to date as new ESTs are added to the databases at a huge rate, so please take a look at the alignments for your favourite peptidases in case important new information has appeared since the last Release.
Again, many new peptidases are recognised, and some additional families and clans
New families: C55 (clan CE), type example YopJ protease (Yersinia pseudotuberculosis), C56 (clan CJ) moved from U46, type example PfpI endopeptidase (Pyrococcus furiosus), C57 (clan CE), type example I7 processing peptidase (Vaccinia virus) and M55 type example D-aminopeptidase DppA (Bacillus subtilis).
New clans: CJ containing family C56, type example PfpI endopeptidase (Pyrococcus furiosus); CK containing family C26, type example gamma-glutamyl hydrolase (Rattus norvegicus), and SN containing family S51, type example dipeptidase E (Escherichia coli).
New format for Literature files
We hope that you will find the Literature files easier to scan, with the inclusion of Year headings and titles in blue.
Addition of two dimensional structures
We feel that the essence of a molecular structure can sometimes be captured most easily in a two-dimensional representation. In particular, two-dimensional representations reflect in a very simple way the similarities in protein fold that are so valuable in detecting the distant relationships at the clan level. The order of catalytic residues can also be shown well in this format. So we now provide a depiction of the two-dimensional structure with each three-dimensional structure image, and have assembled new pages of two-dimensional structures at the Family and Clan levels.
Release 5.3 4-Dec-2000
Inclusion of a "specificity" search
We have collected a good deal of data for the specificity of peptidases, and made it available to several kinds of search functions. The searches are now on the "Searches" menu. Of course, one could never have enough specificity data, and if anyone would like to contribute some of their own published data to be included, we should be happy to hear from them.
Addition of alignments of human ESTs
We developed a system for screening the human EST collection, initially in order to find novel homologues to follow up in our wet lab. As a result, we now have more novel peptidases to clone and sequence than we can handle, so rather than just sitting on the data we decided to share it. The existence of an alignment of human ESTs for a particular peptidase in MEROPS is indicated by a red "EST" tag in the Identifier index. Then, access from the PepCard is via the red button. Each alignment offers a link to the table of EST data, which includes the Unigene cluster assignments of the ESTs. Coming soon: the same for mouse ESTs!
Release 5.2 31-Aug-2000
Identifiers assigned to "multipeptidases"
Our classification is essentially one of "peptidase units", and when a peptidase molecule contains several different kinds of peptidase units it makes problems for us: no one location in the classification is right for the whole of such a multipeptidase molecule. The proteasome is an obvious example, but there are half a dozen others. We now use a MEROPS identifier starting in "X" for each of the multipeptidases, and reserve the standard codes for the individual peptidase units. For example, the somatic form of peptidyl-dipeptidase A (angiotensin-converting enzyme) is X06.001, and its two peptidase units are M02.001 and M02.004. There is a Pepcard for X06.001 including a Literature button, in addition to the standard Pepcards for M02.001 and M02.004. If you care to know which other multipeptidases we recognize, look under "X" in the alphabetical index of MEROPS identifiers.
Pepcards provided for unsequenced peptidases
Until a peptidase is sequenced, we cannot classify it as we would wish, but nevertheless, several of the peptidases that are yet to be sequenced are of real interest, and as a way to display more information about them, we now include Pepcards for them.
Release 5.1 15-Jun-2000
Anyone reading this has obviously obtained access to the database, but if you know of anyone who is not able to get access as they wish following the recent changes, please encourage them to contact firstname.lastname@example.org. We shall do what we can to help.
New alignments for families of peptidase units
We now have a new system for generating the sequence alignments and evolutionary trees. We trust that you will find the larger number of these pages, and their new style, helpful.
Release 5.0 3-Apr-2000
MEROPS receives approval of ISI!
We were happy to receive the message:
"You are publishing important, high-quality material on the Web. For this reason, ISI has selected your site for inclusion in Current Web Contents, a new section of Current Contents ConnectTM (CC Connect TM ). ISI editors -- following carefully structured evaluation criteria -- have visited your site, reviewed it, developed a standardized descriptive record, written an abstract and created a link from CC Connect to your site."
A start on search facilities
We have wanted to add search facilities to MEROPS for some time, and with Release 5 we have made a start on this. Let us know what other searches you think we might usefully provide. (Please do not ask for a specificity search, though! This is not feasible at the present time, and we can only refer you to the search function on the CD-ROM of the Handbook of Proteolytic Enzymes).
Lots of new data!
Amongst the new data are those for the 450 or so peptidases in the genome of Drosophila melanogaster just completed. There are now over 4000 names in the index of protease names.
With Release 5 we have made a start on including concise reference lists for all the proteases, families and clans. Many of the references are linked to Medline. This is a large job, but we aim to finish it soon.
Release 4.0 27-Jan-2000
Peptidase/Protease - what's in a name?
"Protease" and "peptidase" are synonymous terms applying to all enzymes that hydrolyze peptide bonds, i.e. proteolytic enzymes. In previous releases of the MEROPS database we have used the term "peptidase" rather than "protease" to describe what it is about. This was because "peptidase" is the term recommended by IUBMB and is familiar as the basis of all the names of subgroups of these enzymes: endopeptidase, exopeptidase, aminopeptidase, etc. However, it has become clear that the majority of scientists working on these enzymes most naturally think of them as proteases, so we have decided to use that more familiar term here also.
The new URL
Things are changing at MEROPS! It would be unrealistic for us to expect the public funding of the database that we have enjoyed for several years to continue indefinitely, so we are asking the many commercial users of the database to pay a modest license fee. This will allow it to remain free for academic users, and will enable us to expand our resources somewhat to do justice to the exciting genomic data on proteases that are going to be with us very soon. At this stage, we would only ask you to note the new URL: "merops.co.uk/merops/merops.htm". The old URL will continue to work for a time, but it is due to be phased out in the not too distant future, so it would be smart to change the bookmark in your browser now. Stay tuned for further developments.
Disappearance of clan MB
During 1999, clan MB disappeared from the MEROPS classification with no explanation being provided in the database. This confused a number of people, and we are sorry about that. The explanation is that we decided that we should merge clan MB into clan MA. According to our definition of a clan, all families that we suspect of having a common evolutionary origin should be contained within a single clan. We do not feel that the presence of the "HEXXH" sequence or even the more extended motif in which it normally occurs is in itself strong enough evidence of common ancestry to justify putting all of the HEXXH families into a single clan. But as more three-dimensional structures became available, we saw that there is a characteristic arrangement of beta-strands located N-terminally to the zinc-binding site in both "gluzincin" and "metzincin" families. This can be seen in many images provided in the database, and examples would be pseudolysin, astacin and snapalysin. The existence of this common structural motif left us in no significant doubt that these families did indeed have a common origin, despite other differences including the third zinc ligands. Accordingly, the families of clan MB were moved into clan MA, and clan MB disappeared, never to be seen again.
We have improved the display of peptidases grouped by source organism. There is now a document that we term a "SpecCard" for each of the over 1000 organisms from which a peptidase has been sequenced. Each SpecCard shows an abbreviated taxonomy for the organism and a list of its known peptidases. The taxonomy is derived from the Taxonomy database at NIH, but with some minor modifications of our own, particularly for the higher taxa in the bacteria and viruses. The table of peptidases gives the clan, family, MEROPS identifier, recommended name and gene symbol for each peptidase. Click on a MEROPS identifier to find the PepCard listing sequences and more.