Expression compendia in M3D are versioned to indicate the underlying mysql schema of the database and to denote the particular set of expression data (e.g. E_coli_v4_Build_6 uses mysql schema version 4 and is the sixth compendium built for E. coli). Builds are maintained in perpetuity, allowing researchers to specify the exact dataset used for a particular analysis.
A set of 43337 splice junction pairs was extracted from mammalian GenBank annotated genes. 22489 of them are supported by EST sequences. 98.71% of those contain canonical dinucleotides GT and AG for donor and acceptor sites, respectively. 0.56% hold non-canonical GC-AG splice site pairs. The reminder 0.73% occurs in a lot of small groups (with maximum size of 0.05%). Studying these groups we observe that many of them contain splicing dinucleotides shifted from the annotated splice junction by one position. After close examination of such cases we present a new classification consisting of only 8 observed types of splice site pairs (out of 256 a priori possible combinations).
This web portal is dedicated to putatively active LINE-1 elements.
LINE-1 (Long Interspersed Nucleotide Element 1, L1) are the only autonomous retrotransposons in the group of non-LTR retrotransposons. They comprise about 18%-20% of mammalian genomes. Clickhere for background information and literature on LINE-1 elements.
In the post genome era, proteomic technology has rapidly developed to be a powerful platform for the research of human physiology and identifying potential novel biomarkers for prognosis, diagnosis and therapeusis . And in recent years it is shown that body fluids has become one of the important targets for proteomic research. Analysis of the protein composition in body fluids could help to better understand human physiology and disease proteomics. More and more proteomics research have produced body fluid related data and a database is needed to collect and analyze these data.
Our body fluid protein database, Sys-BodyFluid, contains 11 body fluid proteomes, including plasma/serum, urine, cerebrospinal fluid, saliva, bronchoalveolar lavage fluid, synovial fluid, nipple aspirate fluid, tear fluid, seminal fluid, human milk, and amniotic fluid. Over 10,000 proteins are included in the Sys-BodyFluid. These body fluid proteome data come from 50 peer-review publications of different laboratories all over the world. Protein annotation are provided including protein description, Gene ontology, Domain information, Protein sequence and involved pathway. User can access the proteome data by protein name, protein accession number, sequence similarity. In addition, user could perform query cross different body fluids to get more comprehensive understanding. The difference and similarity between these 11 body fluids are also analyzed. Thus , the Sys-BodyFluid database could serve as a reference database for body fluid research and disease proteomics.
Listeria species are ubiquitous in the environment and often contaminate foods because they grow under conditions used for food preservation. Listeria monocytogenes, the human and animal pathogen, causes Listeriosis, an infection with a high mortality rate in risk groups such as immune-compromised individuals. Furthermore, L.monocytogenes is a model organism for the study of intracellular bacterial pathogens. The publication of its genome sequence and that of the non-pathogenic species Listeria innocua initiated numerous comparative studies and efforts to sequence all species comprising the genus. The Proteome database LEGER (http://leger2.gbf.de/cgi-bin/expLeger.pl) was developed to support functional genome analyses by combining information obtained by applying bioinformatics methods and from public databases to improve the original annotations. LEGER offers three unique key features: (i) it is the first comprehensive information system focusing on the functional assignment of genes and proteins; (ii) integrated visualization tools, KEGG pathway and Genome Viewer, alleviate the functional exploration of complex data; and (iii) LEGER presents results of systematic post-genome studies, thus facilitating analyses combining computational and experimental results. Moreover, LEGER provides an unpublished membrane proteome analysis of L.innocua and in total visualizes experimentally validated information about the subcellular localizations of 789 different listerial proteins.
KineticDB is the thoroughly curated database of protein folding kinetics which contains currently experiments on 87 unique proteins and about hundred of mutants. The main goal of KineticDB is to provide users with the diverse set of protein folding rates known from experiment. Currently the database contains kinetic data on single domain proteins, separate protein domains and short polypeptides without S-S bonds in the native structure.
We hope that the database will be useful for you. Help us make the database as wide and up-to-date as possible - send us references containing new kinetic data! We will be grateful for any contribution to the database from the community: for bug reports, for new kinetic data and for any questions
With the completion of the sequencing of the genomes of human and other organisms, attention has focused on the characterization and function of proteins, the products of genes. The availability of sequence data and the growing impact of structural biology on biomedical research have prompted scientific groups from several countries to undertake projects in the emerging field of structural genomics.
The objective is to make these structures widely available for clinical and basic studies that will expand the knowledge of the role of proteins both in normal biological processes and in disease. The National Institute of General Medical Sciences (NIGMS) played a major role in the early planning for structural genomics and in 1999 organized a national program, the Protein Structure Initiative.
The PSI program officially concluded in 2015, having determined nearly 7000 protein structures, developed ~450 new or improved technologies, and wrote over 2200 publications. The impact of this work enabled the greater biological and biomedical communities.
The Fungal Program scales up sequencing and analysis of fungal genomes to explore the diversity of fungi important for energy and the environment, and to promote functional studies on a system level. Encoded in the genomes of the organisms of the kingdom Fungi are biological processes with high relevance to the Department of Energy missions in bioenergy production, carbon cycling and biogeochemistry. Combining new sequencing technologies and comparative genomics analysis, we work on large and complex sequencing projects such as surveying the broad phylogenetic and ecological diversity of fungi, and capturing genomic variation in natural populations and engineered strains. This approach allows us to build a foundation for translating the genomic potential of fungi into practical applications.
Optimal global pairwise alignments from about 270,000 ribosomal RNA (rRNA) internal transcribed spacer 2 (ITS2) sequences - all against all - have been generated in order to model ITS2 secondary structures based on sequences with known structures. Via 60,000 known ITS2 sequences that fit a common core of the ITS2 secondary structure described for the eukaryotes (Schultz et al. 2005), homology based modeling (Wolf et al. 2005) and reannotation procedures revealed in addition more than 150,000 homologous structures that could not be predicted by standard RNA folding programs.
LOCATE is a curated database that houses data describing the membrane organization and subcellular localization of proteins from the RIKEN FANTOM4 mouse and human protein sequence set. The membrane organization is predicted by the high-throughput, computational pipeline MemO. The subcellular locations were determined by a high-throughput, immunofluorescence-based assay and by manually reviewing peer-reviewed publications.
PlantProm DB was initially developed as an annotated, non-redundant collection of proximal promoter sequences for RNA polymerase II with experimentally determined transcription start site(s), TSS, from various plant species. The first release of DB, 2002.01, developed by the Department of Computer Science at Royal Holloway, University of London, in collaboration with Softberry Inc. (USA), is available at http://mendel.cs.rhul.ac.uk/mendel.php . It contained 305 entries from monocot, dicot and other plants.
The new release of PlantProm DB contains 578 unrelated entries including 151, 396 and 31 promoters with experimentally verified TSS from monocot, dicot and other plants, respectively.In comparison with promoter sets, where TSSs, identified by applying full-length cDNA/5;-5'ESTs mapping, CAGE and SAGE approaches, remain to be confirmed by direct experimental evidence, this DB and The Eukaryotic Promoter Database (134 unrelated plant promoters; see:http://www.epd.isb-sib.ch/ ) present the published promoter sequences with TSS(s) determined by direct experimental approaches and therefore serve as the most accurate sources for development of computational promoter prediction tools (for example, see: TSSP-TCM,TSSP, FPROM, CONPRO). For collecting experimentally verified plant gene promoters the following criteria was followed.
HIT is a comprehensive and fully curated database to complement available resources on protein targets for FDA-approved drugs as well as the promising precursors. It currently contains about 1,301 known protein targets(221 proteins are described as direct targets) derived from more than 3,250 literatures, which covers about 586 active compounds from more than 1,300 reputable Chinese herbs. The molecular target information involves those proteins being directly/indirectly activated or inhibited, protein binders, and enzymes whose substrates or products are those compounds. Detailed interaction values such as IC50 and Kd/Ki are collected if possible. Those up or down regulated genes are also included under the treatment of individual ingredients.
dbDEPC is updated to version 2.0 and contains more information about differentially expressed proteins (DEPs) in human cancers. Currently dbDEPC contains 4029 DEPs, 20 cancers and 331 MS experiments. More advanced search function makes your query easier. Modified profile and new network function provide you a systematic view of DEPs in cancers.
The Small Subunit rRNA Modification Database provides a listing of reported post-transcriptionally modified nucleosides and sequence sites in small subunit rRNAs from bacteria, archaea and eukarya. Data are compiled from reports of full or partial rRNA sequences, including RNase T1 oligonucleotide catalogs reported in earlier literature in studies of phylogenetic relatedness. Options for data presentation include full sequence maps, some of which have been assembled by database curators with the aid of contemporary gene sequence data, and tabular forms organized by source organism or chemical identity of the modification. A total of 32 rRNA sequence alignments are provided, annotated with sites of modification and chemical identities of modifications if known, with provision for scrolling full sequences or user-dictated subsequences for comparative viewing for organisms of interest. The database can be accessed through the World Wide Web at http://medlib.med.utah.edu/SSUmods.
In bacteria, small (~30-500 nt) non-coding RNAs (sRNAs) are the most abundant class of post-transcriptional regulators that are involved in diverse processes including quorum sensing, stress response, virulence and carbon metabolism. Based on the target molecules, sRNAs can be divided into two major groups: (i) mRNA-binding antisense sRNAs and (ii) protein-binding sRNAs. The antisense RNAs can further be categorized as cis-encoded antisense sRNAs, which are completely complementary to their targets, and trans-encoded antisense sRNAs, which are only partially complementary to their targets. In any case, the interaction between antisense RNAs and target mRNAs could direct a plethora of biological regulatory circuits. Recent developments in high-throughput techniques, such as genomic tiling arrays and RNA-Seq have provided invaluable insights into the detection and characterization of bacterial sRNAs. However, a comprehensive bacterial sRNA database is not yet available, especially for integrating and analyzing high-throughput sequencing data.
Here, we have designed and constructed BSRD (Bacterial Small regulatory RNA Database) which hosts sRNAs collected from over 783 bacterial species and 957 strains.
The distinctive features of BSRD are:
(1) BSRD hosts sRNAs retrieved from online databases including Rfam, sRNAMap, GenBank, RegulonDB and EcoCyc, as well as manual curation.In additional, we have also integrated 20,115 regulatory elements in BSRD.
(2) BSRD collects sRNAs targets predicted by computational algorithms, IntaRNA and RNAplex, as well as experimentally validated sRNAs targets in sRNATarBase and related literatures.
(3) BSRD includes information on regulatory relationships between transcription factors (TF) and their target genes, which could provide insights into the combinatorial regulations of sRNAs and TF to their common targets.
(4) BSRD has integrated expression data from NCBI GEO (Gene Expression Omnibus), which provides detailed evidences for sRNA expression profiling and re-annotation.
(5) BSRD includes multiple new sRNA annotations from manually curated literature mining, including growth phase, Hfq binding, dual function and Rho-independent terminators.
(6) BSRD harbors a novel RNA-Seq analysis platform, sRNADeep, that allows perform comprehensive sRNA expression profiling and differential expression analysis in large-scale transcriptome sequencing projects. With the aid of sRNADeep, users can (i) filter low-quality reads and adaptors from raw sequencing data, and (ii) align large amount of short reads to BSRD for identification of known sRNAs.
(7) BSRD has implemented a Wikipedia-based community annotation function.
(8) BSRD has a user-friendly interface, a flexible search option and a BLAST server for sequence homology searching.
If you want to download all sRNA sequences in BSRD, you can access this link to get a quick download.
Before getting started, you are suggested to read the full help document to get a quick guide on the database.
KBDOCK is a 3D database system that defines and spatially clusters protein binding sites for knowledge-based protein docking. KBDOCK extracts protein domain-domain interaction (DDI) and domain-peptide interaction (DPI) information from the PDB using the PFAM domain classification in order to analyse the spatial arrangements of DDIs and DPIs by Pfam family, and to propose structural templates for protein docking.
Given a query Pfam domain, KBDOCK shows:
a non-redundant list of DDIs involving the query domain, grouped by their binding site.
a Jmol view showing the DDIs placed in the coordinate frame of the query domain. You can choose to view one DDI at a time or a group of DDIs which share a similar binding site according to our binding site direction vector algorithm (see references).
a consensus sequence alignment of the domains. Each sequence is annotated with core and rim interacting residues, and also the binding site centre residue.
Given a query Pfam domain, KBDOCK can also show DDIs involving different but structurally similar Pfam domains. For each Pfam family, KBDOCK stores a list of "Pfam neighbours" which is calculated using the Kpax structure alignment algorithm. Given two query domain structures, if full-homology (FH)† DDI templates exist in KBDOCK, then for each distinct interface, KBDOCK provides:
the best FH template (based on overall sequence identity) and the corresponding docking model.
a Jmol view of the query domains superposed onto the FH template.
a pairwise sequence alignment of the query domains and their corresponding template domain, showing the core, rim, and centre binding site residues.
If FH DDIs do not exist or if only one query domain structure is given, KBDOCK finds semi-homology (SH)† DDI templates. For each query domain and for each distinct binding site, it provides:
the best SH template (based on domain sequence identity).
a Jmol view of the query domain superposed onto the SH template. A centre binding site residue is proposed.
a pairwise sequence alignment of the query domain and its corresponding template domain, showing the core, rim, and centre binding site residues.
KBDOCK can also find DDIs involving structurally similar Pfam domains to the query domains using pre-calculated Pfam neighbour lists.
NRED integrates annotated expression data from various sources. Use this form to filter expression results data based on probe characteristics and/or the values of the expression data. If no experimental result set is selected, the form can also be used to search the probe table. All search fields are optional. For help and descriptions of the different fields, simply hover your mouse over the form labels. To reference NRED, please cite Dinger et al., 2008, Nucleic Acid Res.
Welcome to dbPHSP (http://jjwanglab.org/dbpshp)(hg19/GRCh37). The database contained curated publications about positive selection in different human populations, which consisted of over 15,000 loci from either publications attempting to study positively selected genomic locus and gene related to specific functions/traits/diseases, or publications to detect the genome-wide selective signals with different statistical methods.
It also includes 15 statistical terms for each single nucleotide polymorphism site from the HapMap III and 1000 Genomes Project genotyping data. These attributes include variant allele frequency, variant heterozygosity, within population diversity, haplotype homozygosity, long-range haplotypes, pairwise population differentiation and evolutionary conservation. We also provided interactive pages for visualization and annotation of different selective signals.
The JASPAR CORE database contains a curated, non-redundant set of profiles, derived from published collections of experimentally defined transcription factor binding sites for eukaryotes. The prime difference to similar resources (TRANSFAC, etc) consist of the open data acess, non-redundancy and quality.
When should it be used? When seeking models for specific factors or structural classes, or if experimental evidence is paramount
Sharing your scoops to your social media accounts is a must to distribute your curated content. Not only will it drive traffic and leads through your content, but it will help show your expertise with your followers.
How to integrate my topics' content to my website?
Integrating your curated content to your website or blog will allow you to increase your website visitors’ engagement, boost SEO and acquire new visitors. By redirecting your social media traffic to your website, Scoop.it will also help you generate more qualified traffic and leads from your curation work.
Distributing your curated content through a newsletter is a great way to nurture and engage your email subscribers will developing your traffic and visibility.
Creating engaging newsletters with your curated content is really easy.