SPRINT Home UMBER Home Contents Standard Search Advanced Search Relation Search

==SPRINT==> PRINTS View



  selected as


PR00379

Identifier
INTEIN  [View Relations]  [View Alignment]  
Accession
PR00379
No. of Motifs
6
Creation Date
11-OCT-1995  (UPDATE 28-JUL-1999)
Title
Intein signature
Database References

PROSITE; PS00881 PROTEIN_SPLICING
BLOCKS; BL00881
INTERPRO; IPR002203
Literature References
1. PIETROKOVSKI, S.
Conserved sequence features of inteins (protein introns) and their use
in identifying new inteins and related proteins.
PROTEIN SCI. 3 2340-2350 (1994).
 
2. PERLER, F.B., DAVIS, E.O., DEAN, G.E., GIMBLE, F.S., JACK, W.E.,
NEFF, N., NOREN, C.J., THORNER, J. AND BELFORT, M.
Protein splicing elements: inteins and exteins - a definition of
terms and recommended nomenclature.
NUCLEIC ACIDS RES. 22(7) 1125-1127 (1994).

Documentation
Inteins, or protein introns, are parts of protein sequences that are
post-translationally excised, their flanking regions (exteins) being
spliced together to yield an additional protein product [1,2]. This
process is believed to be self-catalysed, apparently initiating at
the C-terminal splice junction, where a conserved asparagine residue
mediates the nucleophilic attack of the peptide bond between it and its
neighbouring residue.
 
Inteins are difficult to identify from sequence data because they lie in
the same reading frame as the spliced protein and they are characterised
by only a few short conserved motifs [1]: two of these are similar to
the nonapeptide LAGLIDADG, which is diagnostic of certain homing
endonucleases (mutation of one such motif causes loss of endonucleic
activity, but not of the protein splicing function); another includes the
C' splice site, mutations in which disable protein function.
 
INTEIN is a 6-element fingerprint that provides a signature for inteins.
The fingerprint was derived from an initial alignment of 6 sequences: the
motifs were drawn from short conserved regions within the central portion
of the alignment (after Pietrokovski [1]) - motifs 2 and 3 include the nona-
peptide characteristic of endonucleic acitivity; motif 6 includes the C'
splice site containing the nucleophilic asparagine, and corresponds to
the region encoded by PROSITE pattern PROTEIN_SPLICING (PS00881). Two
iterations on OWL26.0 were required to reach convergence, at which point a
true set comprising 8 sequences was identified. A single partial match was
also found, the yeast HO endonuclease, which fails to make significant
matches with motifs 4 and 6 (note, although the fingerprint makes only a
single identification of entries DPOL_THELI and PYWKODPOL, each in fact
contains 2 independent protein sequences).
 
An update on SPTR37_9f identified a true set of 16 sequences, and 18
partial matches.
Summary Information
  16 codes involving  6 elements
3 codes involving 5 elements
3 codes involving 4 elements
4 codes involving 3 elements
8 codes involving 2 elements
Composite Feature Index
6161616161616
5133323
4321132
3403041
2603241
123456
True Positives
DNAB_SYNY3    DPOL_PYRSD    DPOL_THEFM    DPOL_THELI    
DPOL_THEST O30477 O33141 O50478
O58101 O58384 P77933 P95484
Q58817 RECA_MYCTU Y189_MYCLE Y832_METJA
True Positive Partials
Codes involving 5 elements
Q58524 VATA_CANTR VATA_YEAST
Codes involving 4 elements
DNAB_MYCTU DPOL_PYRHO O57852
Codes involving 3 elements
DPOL_METJA O64173 Q58907 RPA2_METJA
Codes involving 2 elements
GYRB_SYNY3 O31874 O58001 O58310
O58837 Y682_METJA YA54_METJA YE20_METJA
Sequence Titles
DNAB_SYNY3  REPLICATIVE DNA HELICASE (EC 3.6.1.-) - SYNECHOCYSTIS SP. (STRAIN PCC 6803). 
DPOL_PYRSD DNA POLYMERASE (EC 2.7.7.7) (DEEP VENT DNA POLYMERASE) [CONTAINS: PI-PSP I ENDONUCLEASE] - PYROCOCCUS SP. (STRAIN GB-D).
DPOL_THEFM DNA POLYMERASE (EC 2.7.7.7) (POL TFU) - THERMOCOCCUS FUMICOLANS.
DPOL_THELI DNA POLYMERASE (EC 2.7.7.7) (VENT DNA POLYMERASE) [CONTAINS: PI-TLI I ENDONUCLEASE; PI-TLI II ENDONUCLEASE] - THERMOCOCCUS LITORALIS.
DPOL_THEST DNA POLYMERASE (EC 2.7.7.7) - THERMOCOCCUS SP. (STRAIN TY).
O30477 DNA HELICASE - RHODOTHERMUS MARINUS.
O33141 HYPOTHETICAL 95.6 KD PROTEIN (INTEIN IN MLCL536.28C) - MYCOBACTERIUM LEPRAE.
O50478 HELICASE - RHODOTHERMUS MARINUS.
O58101 1291AA LONG HYPOTHETICAL RIBONUCLEOSIDE-DIPHOSPHATE REDUCTASE - PYROCOCCUS HORIK
O58384 1136AA LONG HYPOTHETICAL PROTEIN - PYROCOCCUS HORIKOSHII.
P77933 DNA-DEPENDENT DNA POLYMERASE (EC 2.7.7.7) (DNA-DIRECTED DNA POLYMERASE) (DNA NUCLEOTIDYLTRANSFERASE (DNA-DIRECTED)) - PYROCOCCUS SP.
P95484 RIBONUCLEOTIDE REDUCTASE - PYROCOCCUS FURIOSUS.
Q58817 HYPOTHETICAL PROTEIN MJ1422 - METHANOCOCCUS JANNASCHII.
RECA_MYCTU Protein recA - Mycobacterium tuberculosis.
Y189_MYCLE HYPOTHETICAL 95.7 KD PROTEIN B1496_C2_189 - MYCOBACTERIUM LEPRAE.
Y832_METJA HYPOTHETICAL PROTEIN MJ0832 - METHANOCOCCUS JANNASCHII.

Q58524 HYPOTHETICAL PROTEIN MJ1124 - METHANOCOCCUS JANNASCHII.
VATA_CANTR VACUOLAR ATP SYNTHASE CATALYTIC SUBUNIT A (EC 3.6.1.34) [CONTAINS: VMA1-DERIVED ENDONUCLEASE (VDE) (PI-CTR I ENDONUCLEASE)] - CANDIDA TROPICALIS (YEAST).
VATA_YEAST VACUOLAR ATP SYNTHASE CATALYTIC SUBUNIT A (EC 3.6.1.34) [CONTAINS: VMA1-DERIVED ENDONUCLEASE (VDE) (PI-SCE I ENDONUCLEASE)] - SACCHAROMYCES CEREVISIAE (BAKER'S YEAST).

DNAB_MYCTU REPLICATIVE DNA HELICASE (EC 3.6.1.-) - MYCOBACTERIUM TUBERCULOSIS.
DPOL_PYRHO DNA POLYMERASE (EC 2.7.7.7) - PYROCOCCUS HORIKOSHII.
O57852 855AA LONG HYPOTHETICAL REPLICATION FACTOR C SUBUNIT - PYROCOCCUS HORIKOSHII.

DPOL_METJA DNA POLYMERASE (EC 2.7.7.7) - METHANOCOCCUS JANNASCHII.
O64173 RIBUNUCLEOTIDE REDUCTASE LARGE SUBUNIT - BACTERIOPHAGE SPBC2.
Q58907 PUTATIVE REVERSE GYRASE [CONTAINS: HELICASE; TOPOISOMERASE (EC 5.99.1.3)] - METHANOCOCCUS JANNASCHII.
RPA2_METJA DNA-DIRECTED RNA POLYMERASE SUBUNIT A" (EC 2.7.7.6) - METHANOCOCCUS JANNASCHII.

GYRB_SYNY3 DNA GYRASE SUBUNIT B (EC 5.99.1.3) - SYNECHOCYSTIS SP. (STRAIN PCC 6803).
O31874 YOSO PROTEIN (RIBONUCLEOTIDE REDUCTASE HOMOLOG) - BACILLUS SUBTILIS.
O58001 529AA LONG HYPOTHETICAL DNA REPAIR PROTEIN - PYROCOCCUS HORIKOSHII.
O58310 1108AA LONG HYPOTHETICAL CELL DIVISION CONTROL PROTEIN - Pyrococcus horikoshii.
O58837 1352AA LONG HYPOTHETICAL ATP-DEPENDENT HELICASE LHR - PYROCOCCUS HORIKOSHII.
Y682_METJA HYPOTHETICAL PROTEIN MJ0682 - METHANOCOCCUS JANNASCHII.
YA54_METJA HYPOTHETICAL PROTEIN MJ1054 (EC 1.1.1.-) - METHANOCOCCUS JANNASCHII.
YE20_METJA HYPOTHETICAL PROTEIN MJ1420 - METHANOCOCCUS JANNASCHII.
Scan History
OWL26_0    2  100  NSINGLE    
SPTR37_9f 5 50 NSINGLE
Initial Motifs
Motif 1  width=14
Element Seqn Id St Int Rpt
KSQFAATPNHLIRT RECA_MYCLE 275 275 -
LMDFTVSADHKLIL VATA_CANTR 348 348 -
GRKINITAGHSLFT DPOL_THELI 581 581 -
LLKFTCNATHELVV VATA_YEAST 353 353 -
GAIVWATPDHKVLT RECA_MYCTU 315 315 -

Motif 2 width=13
Element Seqn Id St Int Rpt
LGYLLGTWAGIGN VATA_CANTR 482 120 -
FQVVLGSLMGDGN RECA_MYCLE 317 28 -
LAYLLGLWIGDGL VATA_YEAST 491 124 -
HARLLGYLIGDGR RECA_MYCTU 363 34 -
FGKLLGYYVSEGY DPOL_THELI 773 178 -

Motif 3 width=13
Element Seqn Id St Int Rpt
AFLRGLFSADGTV DPOL_THELI 1325 539 -
TFLAGLIDSDGYV VATA_YEAST 600 96 -
LVLAIWYMDDGSF RECA_MYCLE 414 84 -
SLIAGLVDAAGNV VATA_CANTR 618 123 -
NLLFGLFESDGWV RECA_MYCTU 464 88 -

Motif 4 width=16
Element Seqn Id St Int Rpt
LAHQIHWLLLRFGVGS RECA_MYCTU 495 18 -
VRDGLVSLARSLGLVV VATA_YEAST 629 16 -
GRIEICVDAMTEGTRV RECA_MYCLE 442 15 -
VARGLVKIAHSLGIES VATA_CANTR 649 18 -
FLREVRKLLWIVGISN DPOL_THELI 1355 17 -

Motif 5 width=18
Element Seqn Id St Int Rpt
SMNRFDIEVEGNHNYFVD RECA_MYCLE 547 89 -
DYYGITLSDDSDHQFLLA VATA_YEAST 713 68 -
RARTFDLEVEELHTLVAE RECA_MYCTU 668 157 -
EGYVYDIEVEETHRFFAN DPOL_THELI 1448 77 -
NYYGITLAEETDHQFLLS VATA_CANTR 730 65 -

Motif 6 width=10
Element Seqn Id St Int Rpt
GVVVHNCSPP RECA_MYCTU 686 0 -
QVVVHNCGER VATA_YEAST 732 1 -
GVMVHNSPET RECA_MYCLE 565 0 -
MALVHNCGER VATA_CANTR 749 1 -
NILVHNTDGF DPOL_THELI 1466 0 -
Final Motifs
Motif 1  width=14
Element Seqn Id St Int Rpt
GRTIKATANHRFLT DNAB_SYNY3 444 444 -
GRSIRATANHRFLT O30477 484 484 -
GRSIRATANHRFLT O50478 488 488 -
GRKINITAGHSLFT DPOL_THELI 581 581 -
GYEIIATLDHKIMT O58101 567 567 -
GYEITATLDHKLMT P95484 1011 1011 -
GRKIHITRGHSLFT DPOL_THEST 941 941 -
INGLKCTPNHKLPV DPOL_THEFM 457 457 -
GRRIKITSGHSLFS P77933 938 938 -
GRKITITEGHSLFV DPOL_PYRSD 579 579 -
GRALEATGNHQFLV O33141 260 260 -
GRALEATGNHQFLV Y189_MYCLE 260 260 -
GRELKVTTYHPLLI Q58817 125 125 -
GTSIIVTEDHSLFN Y832_METJA 427 427 -
GNEVILTRSHPLFA O58384 539 539 -
GAIVWATPDHKVLT RECA_MYCTU 315 315 -

Motif 2 width=13
Element Seqn Id St Int Rpt
ELGLLGHLIGDGC DNAB_SYNY3 491 33 -
ELALLGHLIGDGC O30477 531 33 -
ELALLGHLIGDGC O50478 535 33 -
FGKLLGYYVSEGY DPOL_THELI 773 178 -
LAFVLGWLIGDGY O58101 616 35 -
LAFVLGWFIGDGY P95484 1060 35 -
FGKFLGYYVSEGY DPOL_THEST 1133 178 -
LWELIGLLVGDGN DPOL_THEFM 1044 573 -
LAKLLGYYVSEGY P77933 1129 177 -
FAKLLGYYVSEGS DPOL_PYRSD 770 177 -
LLWLLGLWLGDGH O33141 341 67 -
LLWLLGLWLGDGH Y189_MYCLE 341 67 -
LAEWLGYFIGDGH Q58817 176 37 -
LCQFLGLFVAEGS Y832_METJA 504 63 -
LSYLAGVILGDGY O58384 642 89 -
HARLLGYLIGDGR RECA_MYCTU 363 34 -

Motif 3 width=13
Element Seqn Id St Int Rpt
IFLRHLWSTDGCV DNAB_SYNY3 597 93 -
TFLRHLWATDGCI O30477 637 93 -
TFLRHLWATDGCI O50478 641 93 -
AFLRGLFSADGTV DPOL_THELI 1325 539 -
AFLRGLFTADGYV O58101 710 81 -
AFLRGLFSADGYV P95484 1154 81 -
AFLRAYFVGDGDI DPOL_THEST 1229 83 -
AFLRGLFSADGTV DPOL_THEFM 1143 86 -
AFLEGYFIGDGDV P77933 1225 83 -
AFLEGYFIGDGDV DPOL_PYRSD 866 83 -
ALIGGLVDADGWT O33141 436 82 -
ALIGGLVDADGWT Y189_MYCLE 436 82 -
AFLRAYFDCDGGI Q58817 272 83 -
AFLGGLISGDGYV Y832_METJA 594 77 -
YFIAGLFDADGYV O58384 739 84 -
NLLFGLFESDGWV RECA_MYCTU 464 88 -

Motif 4 width=16
Element Seqn Id St Int Rpt
LAKDVQSLLLKLGINA DNAB_SYNY3 630 20 -
LARDVQSLLLRLGINA O30477 670 20 -
LARDVQSLLLRLGINA O50478 674 20 -
FLREVRKLLWIVGISN DPOL_THELI 1355 17 -
LLRDVQDLLLLFGIIS O58101 737 14 -
LLREVQDLLLLFGILS P95484 1181 14 -
LANQLVFLLNSLGVSS DPOL_THEST 1256 14 -
LSDAVRKLLWLVGVSN DPOL_THEFM 1173 17 -
LANQLVLLLNSVGVSA P77933 1252 14 -
LVNGLVLLLNSLGVSA DPOL_PYRSD 893 14 -
LLEDVRQLAIGCGLYP O33141 464 15 -
LLEDVRQLAIGCGLYP Y189_MYCLE 464 15 -
LLIDTVWLARISGIES Q58817 914 629 -
LRDTLCLALKILGINY Y832_METJA 1413 806 -
VARKIWYALQRLGIIS O58384 765 13 -
LAHQIHWLLLRFGVGS RECA_MYCTU 495 18 -

Motif 5 width=18
Element Seqn Id St Int Rpt
VEEVFDLTVPGPHNFVAN DNAB_SYNY3 786 140 -
VEEVFDLTVPGPHNFVAN O30477 826 140 -
VEEVFDLTVPGPHNFVAN O50478 830 140 -
EGYVYDIEVEETHRFFAN DPOL_THELI 1448 77 -
EEIVYDLTVPGIHSYISN O58101 829 76 -
EEIVYDFTVPNYHMYISN P95484 1273 76 -
EGYVYDLSVEDNENFLVG DPOL_THEST 1367 95 -
DGYVYDIEVEGTHRFFAN DPOL_THEFM 1266 77 -
DGYVYDLSVEDNENFLVG P77933 1363 95 -
DGYVYDLSVDEDENFLAG DPOL_PYRSD 1004 95 -
EKPTYDIQVVGLENFVAN O33141 564 84 -
EKPTYDIQVVGLENFVAN Y189_MYCLE 564 84 -
NDFVYDVSVPNNEMFFAG Q58817 1037 107 -
PEYVYDISVEGTENFIGG Y832_METJA 1566 137 -
IEYVYDLTVEDDHNYVAN O58384 948 167 -
RARTFDLEVEELHTLVAE RECA_MYCTU 668 157 -

Motif 6 width=10
Element Seqn Id St Int Rpt
DIIVHNSIEQ DNAB_SYNY3 804 0 -
DIIAHNSIEQ O30477 844 0 -
DIIAHNSIEQ O50478 848 0 -
NILVHNTDGF DPOL_THELI 1466 0 -
GFISHNCGEE O58101 847 0 -
GFMSHNCGEE P95484 1291 0 -
GILVHNTDGF DPOL_THEST 1593 208 -
GILVHNTDGF DPOL_THEFM 1284 0 -
LVYAHNSYYG P77933 1383 2 -
FLYAHNSYYG DPOL_PYRSD 1024 2 -
GIVAHNSFIY O33141 582 0 -
GIVAHNSFIY Y189_MYCLE 582 0 -
PILLHNSDER Q58817 1057 2 -
FICLHNTAGR Y832_METJA 1586 2 -
GILVSNCMGT O58384 966 0 -
GVVVHNCSPP RECA_MYCTU 686 0 -