bioinformatics-databases
5.0K views | +0 today
Follow
 
bioinformatics-databases
Bioinformatics - Databases
Your new post is loading...
Your new post is loading...
Scooped by Dr. Stefan Gruenwald
Scoop.it!

M3D: A Many Microbe Microarrays Database

M3D: A Many Microbe Microarrays Database | bioinformatics-databases | Scoop.it

Expression compendia in M3D are versioned to indicate the underlying mysql schema of the database and to denote the particular set of expression data (e.g. E_coli_v4_Build_6 uses mysql schema version 4 and is the sixth compendium built for E. coli). Builds are maintained in perpetuity, allowing researchers to specify the exact dataset used for a particular analysis.

 
 
Citing M3D

Faith JJ, Driscoll ME, Fusaro VA, Cosgrove EJ, Hayete B, Juhn FS, Schneider SJ, and Gardner TS. Many Microbe Microarrays Database: uniformly normalized Affymetrix compendia with structured experimental metadata. Nucleic Acids Research 2008

nt

more...
No comment yet.
Scooped by Dr. Stefan Gruenwald
Scoop.it!

SpliceDB - canonical and non-canonical splice site sequences in mammalian genes

SpliceDB - canonical and non-canonical splice site sequences in mammalian genes | bioinformatics-databases | Scoop.it
A set of 43337 splice junction pairs was extracted from mammalian GenBank annotated genes. 22489 of them are supported by EST sequences. 98.71% of those contain canonical dinucleotides GT and AG for donor and acceptor sites, respectively. 0.56% hold non-canonical GC-AG splice site pairs. The reminder 0.73% occurs in a lot of small groups (with maximum size of 0.05%). Studying these groups we observe that many of them contain splicing dinucleotides shifted from the annotated splice junction by one position. After close examination of such cases we present a new classification consisting of only 8 observed types of splice site pairs (out of 256 a priori possible combinations).
more...
No comment yet.
Scooped by Dr. Stefan Gruenwald
Scoop.it!

L1Resources - Database for Full-Length, Intact L1 Elements

L1Resources - Database for Full-Length, Intact L1 Elements | bioinformatics-databases | Scoop.it

L1Resources Database

 

This web portal is dedicated to putatively active LINE-1 elements.

 

LINE-1 (Long Interspersed Nucleotide Element 1, L1) are the only autonomous retrotransposons in the group
of non-LTR retrotransposons. They comprise about 18%-20% of mammalian genomes.  
Click here for background information and literature on LINE-1 elements.

 

more...
No comment yet.
Scooped by Dr. Stefan Gruenwald
Scoop.it!

Sys-BodyFluid - a body fluid database

Sys-BodyFluid - a body fluid database | bioinformatics-databases | Scoop.it

In the post genome era, proteomic technology has rapidly developed to be a powerful platform for the research of human physiology and identifying potential novel biomarkers for prognosis, diagnosis and therapeusis . And in recent years it is shown that body fluids has become one of the important targets for proteomic research. Analysis of the protein composition in body fluids could help to better understand human physiology and disease proteomics. More and more proteomics research have produced body fluid related data and a database is needed to collect and analyze these data.

 

Our body fluid protein database, Sys-BodyFluid, contains 11 body fluid proteomes, including plasma/serum, urine, cerebrospinal fluid, saliva, bronchoalveolar lavage fluid, synovial fluid, nipple aspirate fluid, tear fluid, seminal fluid, human milk, and amniotic fluid. Over 10,000 proteins are included in the Sys-BodyFluid. These body fluid proteome data come from 50 peer-review publications of different laboratories all over the world. Protein annotation are provided including protein description, Gene ontology, Domain information, Protein sequence and involved pathway. User can access the proteome data by protein name, protein accession number, sequence similarity. In addition, user could perform query cross different body fluids to get more comprehensive understanding. The difference and similarity between these 11 body fluids are also analyzed. Thus , the Sys-BodyFluid database could serve as a reference database for body fluid research and disease proteomics.

Click here to edit the content

more...
No comment yet.
Scooped by Dr. Stefan Gruenwald
Scoop.it!

LEGER: knowledge database and visualization tool for comparative genomics of pathogenic and non-pathogenic Listeria species

LEGER: knowledge database and visualization tool for comparative genomics of pathogenic and non-pathogenic Listeria species | bioinformatics-databases | Scoop.it

Listeria species are ubiquitous in the environment and often contaminate foods because they grow under conditions used for food preservation. Listeria monocytogenes, the human and animal pathogen, causes Listeriosis, an infection with a high mortality rate in risk groups such as immune-compromised individuals. Furthermore, L.monocytogenes is a model organism for the study of intracellular bacterial pathogens. The publication of its genome sequence and that of the non-pathogenic species Listeria innocua initiated numerous comparative studies and efforts to sequence all species comprising the genus. The Proteome database LEGER (http://leger2.gbf.de/cgi-bin/expLeger.pl) was developed to support functional genome analyses by combining information obtained by applying bioinformatics methods and from public databases to improve the original annotations. LEGER offers three unique key features: (i) it is the first comprehensive information system focusing on the functional assignment of genes and proteins; (ii) integrated visualization tools, KEGG pathway and Genome Viewer, alleviate the functional exploration of complex data; and (iii) LEGER presents results of systematic post-genome studies, thus facilitating analyses combining computational and experimental results. Moreover, LEGER provides an unpublished membrane proteome analysis of L.innocua and in total visualizes experimentally validated information about the subcellular localizations of 789 different listerial proteins.

more...
No comment yet.
Scooped by Dr. Stefan Gruenwald
Scoop.it!

mESAdb: An miRNA Sequence and Expression Database

mESAdb: An miRNA Sequence and Expression Database | bioinformatics-databases | Scoop.it

mESAdb is a regularly updated database for the multivariate analysis of sequences and expression of microRNAs from multiple taxa. mESAdb is modular and has a user interface implemented in PHP and JavaScript and coupled with statistical analysis and visualization packages written for the R language. The database primarily comprises mature microRNA sequences and their target data, along with selected human, mouse and zebrafish expression datasets. mESAdb analysis modules allow a) mining of selected microRNA expression datasets for a list of microRNAs; and b) pair-wise multivariate analysis of expression datasets within and between taxa; and c) association of microRNA lists or microRNAs with a given motif with annotation databases, HUGE Navigator [1], KEGG [2], and GO [3]. The use of existing and customized R packages facilitates future addition of datasets and analysis tools. Furthermore, the ability to upload and analyze user-specified datasets makes mESAdb an interactive and expandable analysis tool for microRNA sequence and expression data.

more...
No comment yet.
Scooped by Dr. Stefan Gruenwald
Scoop.it!

KineticDB - A Protein Folding Kinetics Database

KineticDB - A Protein Folding Kinetics Database | bioinformatics-databases | Scoop.it

KineticDB is the thoroughly curated database of protein folding kinetics which contains currently experiments on 87 unique proteins and about hundred of mutants. The main goal of KineticDB is to provide users with the diverse set of protein folding rates known from experiment. Currently the database contains kinetic data on single domain proteins, separate protein domains and short polypeptides without S-S bonds in the native structure.

We hope that the database will be useful for you. Help us make the database as wide and up-to-date as possible - send us references containing new kinetic data! We will be grateful for any contribution to the database from the community: for bug reports, for new kinetic data and for any questions

How to cite:
Bogatyreva N.S., Osypov A.A., Ivankov D.N. (2009). KineticDB: a database of protein folding kinetics. NAR, 37:D342-D346.

more...
No comment yet.
Scooped by Dr. Stefan Gruenwald
Scoop.it!

psi - A Structural Biology Knowlegebase

psi - A Structural Biology Knowlegebase | bioinformatics-databases | Scoop.it

With the completion of the sequencing of the genomes of human and other organisms, attention has focused on the characterization and function of proteins, the products of genes. The availability of sequence data and the growing impact of structural biology on biomedical research have prompted scientific groups from several countries to undertake projects in the emerging field of structural genomics.

 

The objective is to make these structures widely available for clinical and basic studies that will expand the knowledge of the role of proteins both in normal biological processes and in disease. The National Institute of General Medical Sciences (NIGMS) played a major role in the early planning for structural genomics and in 1999 organized a national program, the Protein Structure Initiative.

 

The PSI program officially concluded in 2015, having determined nearly 7000 protein structures, developed ~450 new or improved technologies, and wrote over 2200 publications. The impact of this work enabled the greater biological and biomedical communities.

more...
No comment yet.
Scooped by Dr. Stefan Gruenwald
Scoop.it!

GWASdb2 - A database for human genetic variants identified by genome-wide association studies

GWASdb2 - A database for human genetic variants identified by genome-wide association studies | bioinformatics-databases | Scoop.it
more...
No comment yet.
Scooped by Dr. Stefan Gruenwald
Scoop.it!

JGI Fungi Portal - 1000 fungal genome project

JGI Fungi Portal - 1000 fungal genome project | bioinformatics-databases | Scoop.it

The Fungal Program scales up sequencing and analysis of fungal genomes to explore the diversity of fungi important for energy and the environment, and to promote functional studies on a system level. Encoded in the genomes of the organisms of the kingdom Fungi are biological processes with high relevance to the Department of Energy missions in bioenergy production, carbon cycling and biogeochemistry. Combining new sequencing technologies and comparative genomics analysis, we work on large and complex sequencing projects such as surveying the broad phylogenetic and ecological diversity of fungi, and capturing genomic variation in natural populations and engineered strains. This approach allows us to build a foundation for translating the genomic potential of fungi into practical applications.

more...
No comment yet.
Scooped by Dr. Stefan Gruenwald
Scoop.it!

ITS2 - an internal transcribed spacer 2 ribosomal RNA database

ITS2 - an internal transcribed spacer 2 ribosomal RNA database | bioinformatics-databases | Scoop.it

Optimal global pairwise alignments from about 270,000 ribosomal RNA (rRNA) internal transcribed spacer 2 (ITS2) sequences - all against all - have been generated in order to model ITS2 secondary structures based on sequences with known structures. Via 60,000 known ITS2 sequences that fit a common core of the ITS2 secondary structure described for the eukaryotes (Schultz et al. 2005), homology based modeling (Wolf et al. 2005) and reannotation procedures revealed in addition more than 150,000 homologous structures that could not be predicted by standard RNA folding programs.

more...
No comment yet.
Scooped by Dr. Stefan Gruenwald
Scoop.it!

Lynx : A Systems Biology Platform for Integrative Medicine

Lynx : A Systems Biology Platform for Integrative Medicine | bioinformatics-databases | Scoop.it

Gene Annotations

Various classes of gene annotations (e.g. functions, pathways, diseases association, tissue expression, etc) from Lynx Knowledge base integrating information from over 40 public databasest.

 

Enrichment Analysis
Identification of gene categories over-represented in the user-submitted gene list based on Pathways, Gene Ontology, Diseases, Tissue expression and other features associations.
 
Genes Prioritization
Networks based Gene Prioritization using PINTA algorithms: Heat Kernal Ranking, Simple Random Walk, PageRank with Priors, HITS with Priors and K-Step Markov. Using STRING-9 underlying network.
more...
No comment yet.
Scooped by Dr. Stefan Gruenwald
Scoop.it!

LOCATE - A Mammalian Protein Localization Database

LOCATE - A Mammalian Protein Localization Database | bioinformatics-databases | Scoop.it

LOCATE is a curated database that houses data describing the membrane organization and subcellular localization of proteins from the RIKEN FANTOM4 mouse and human protein sequence set. The membrane organization is predicted by the high-throughput, computational pipeline MemO. The subcellular locations were determined by a high-throughput, immunofluorescence-based assay and by manually reviewing peer-reviewed publications.

 

more...
D Wot's curator insight, May 9, 10:02 AM

This could be a useful reference.

Scooped by Dr. Stefan Gruenwald
Scoop.it!

PlantProm - Plant Promoter Database

PlantProm - Plant Promoter Database | bioinformatics-databases | Scoop.it

PlantProm DB was initially developed as an annotated, non-redundant collection of proximal promoter sequences for RNA polymerase II with experimentally determined transcription start site(s), TSS, from various plant species. The first release of DB, 2002.01, developed by the Department of Computer Science at Royal Holloway, University of London, in collaboration with Softberry Inc. (USA), is available at http://mendel.cs.rhul.ac.uk/mendel.php . It contained 305 entries from monocot, dicot and other plants.

The new release of PlantProm DB contains 578 unrelated entries including 151, 396 and 31 promoters with experimentally verified TSS from monocot, dicot and other plants, respectively.In comparison with promoter sets, where TSSs, identified by applying full-length cDNA/5;-5'ESTs mapping, CAGE and SAGE approaches, remain to be confirmed by direct experimental evidence, this DB and The Eukaryotic Promoter Database (134 unrelated plant promoters; see:http://www.epd.isb-sib.ch/ ) present the published promoter sequences with TSS(s) determined by direct experimental approaches and therefore serve as the most accurate sources for development of computational promoter prediction tools (for example, see: TSSP-TCM, TSSP, FPROM, CONPRO). For collecting experimentally verified plant gene promoters the following criteria was followed.

more...
No comment yet.
Scooped by Dr. Stefan Gruenwald
Scoop.it!

HIT  - a Herbal Ingredients' Targets Database

HIT  - a Herbal Ingredients' Targets Database | bioinformatics-databases | Scoop.it

Herbal Ingredients' Targets Database

 

HIT is a comprehensive and fully curated database to complement available resources on protein targets for FDA-approved drugs as well as the promising precursors. It currently contains about 1,301 known protein targets(221 proteins are described as direct targets) derived from more than 3,250 literatures, which covers about 586 active compounds from more than 1,300 reputable Chinese herbs. The molecular target information involves those proteins being directly/indirectly activated or inhibited, protein binders, and enzymes whose substrates or products are those compounds. Detailed interaction values such as IC50 and Kd/Ki are collected if possible. Those up or down regulated genes are also included under the treatment of individual ingredients.

more...
D Wot's curator insight, May 9, 10:04 AM

This could be a useful reference for herbs.

Scooped by Dr. Stefan Gruenwald
Scoop.it!

dbDEPC - a database of differentially expressed proteins in human cancer

dbDEPC - a database of differentially expressed proteins in human cancer | bioinformatics-databases | Scoop.it

dbDEPC is updated to version 2.0 and contains more information about differentially expressed proteins (DEPs) in human cancers. Currently dbDEPC contains 4029 DEPs, 20 cancers and 331 MS experiments. More advanced search function makes your query easier. Modified profile and new network function provide you a systematic view of DEPs in cancers.

more...
No comment yet.
Scooped by Dr. Stefan Gruenwald
Scoop.it!

SSU - The Small Subunit rRNA Modification Database

SSU - The Small Subunit rRNA Modification Database | bioinformatics-databases | Scoop.it

The Small Subunit rRNA Modification Database provides a listing of reported post-transcriptionally modified nucleosides and sequence sites in small subunit rRNAs from bacteria, archaea and eukarya. Data are compiled from reports of full or partial rRNA sequences, including RNase T1 oligonucleotide catalogs reported in earlier literature in studies of phylogenetic relatedness. Options for data presentation include full sequence maps, some of which have been assembled by database curators with the aid of contemporary gene sequence data, and tabular forms organized by source organism or chemical identity of the modification. A total of 32 rRNA sequence alignments are provided, annotated with sites of modification and chemical identities of modifications if known, with provision for scrolling full sequences or user-dictated subsequences for comparative viewing for organisms of interest. The database can be accessed through the World Wide Web at http://medlib.med.utah.edu/SSUmods.

more...
No comment yet.
Scooped by Dr. Stefan Gruenwald
Scoop.it!

BSRD: A Bacterial Small Regulatory RNA Database

BSRD: A Bacterial Small Regulatory RNA Database | bioinformatics-databases | Scoop.it

In bacteria, small (~30-500 nt) non-coding RNAs (sRNAs) are the most abundant class of post-transcriptional regulators that are involved in diverse processes including quorum sensing, stress response, virulence and carbon metabolism. Based on the target molecules, sRNAs can be divided into two major groups: (i) mRNA-binding antisense sRNAs and (ii) protein-binding sRNAs. The antisense RNAs can further be categorized as cis-encoded antisense sRNAs, which are completely complementary to their targets, and trans-encoded antisense sRNAs, which are only partially complementary to their targets. In any case, the interaction between antisense RNAs and target mRNAs could direct a plethora of biological regulatory circuits. Recent developments in high-throughput techniques, such as genomic tiling arrays and RNA-Seq have provided invaluable insights into the detection and characterization of bacterial sRNAs. However, a comprehensive bacterial sRNA database is not yet available, especially for integrating and analyzing high-throughput sequencing data.

 

Here, we have designed and constructed BSRD (Bacterial Small regulatory RNA Database) which hosts sRNAs collected from over 783 bacterial species and 957 strains.

 

The distinctive features of BSRD are:

    (1) BSRD hosts sRNAs retrieved from online databases including Rfam, sRNAMap, GenBank, RegulonDB and EcoCyc, as well as manual curation.In additional, we have also integrated 20,115 regulatory elements in BSRD.

    (2) BSRD collects sRNAs targets predicted by computational algorithms, IntaRNA and RNAplex, as well as experimentally validated sRNAs targets in sRNATarBase and related literatures.

    (3) BSRD includes information on regulatory relationships between transcription factors (TF) and their target genes, which could provide insights into the combinatorial regulations of sRNAs and TF to their common targets.

    (4) BSRD has integrated expression data from NCBI GEO (Gene Expression Omnibus), which provides detailed evidences for sRNA expression profiling and re-annotation.

    (5) BSRD includes multiple new sRNA annotations from manually curated literature mining, including growth phase, Hfq binding, dual function and Rho-independent terminators.

    (6) BSRD harbors a novel RNA-Seq analysis platform, sRNADeep, that allows perform comprehensive sRNA expression profiling and differential expression analysis in large-scale transcriptome sequencing projects. With the aid of sRNADeep, users can (i) filter low-quality reads and adaptors from raw sequencing data, and (ii) align large amount of short reads to BSRD for identification of known sRNAs.

    (7) BSRD has implemented a Wikipedia-based community annotation function.

    (8) BSRD has a user-friendly interface, a flexible search option and a BLAST server for sequence homology searching.

 

If you want to download all sRNA sequences in BSRD, you can access this link to get a quick download.

 

Before getting started, you are suggested to read the full help document to get a quick guide on the database.
more...
No comment yet.
Scooped by Dr. Stefan Gruenwald
Scoop.it!

KNOTTIN: A database for intriguing miniproteins with strong potential in drug design

KNOTTIN: A database for intriguing miniproteins with strong potential in drug design | bioinformatics-databases | Scoop.it

The KNOTTIN database provides standardized data on the knottin structural family (also referred to as the "Inhibitor Cystine Knot (ICK) motif/family/fold").


Read below essential information about knottins
Look at current statistics on the database content
Look at current statistics on loop lengths


Comments, suggestions or supplementary data are welcome.
When using this website and database, please cite: J. Gracy et al. Nucleic Acids Res. 2008, 36:D314-9, and J.-C. Gelly et al. Nucleic Acids Res. 2004, 32:D156-9.

more...
No comment yet.
Scooped by Dr. Stefan Gruenwald
Scoop.it!

KBDOCK - a protein domain-domain interaction database

KBDOCK - a protein domain-domain interaction database | bioinformatics-databases | Scoop.it

KBDOCK is a 3D database system that defines and spatially clusters protein binding sites for knowledge-based protein docking. KBDOCK extracts protein domain-domain interaction (DDI) and domain-peptide interaction (DPI) information from the PDB using the PFAM domain classification in order to analyse the spatial arrangements of DDIs and DPIs by Pfam family, and to propose structural templates for protein docking.

 

Given a query Pfam domain, KBDOCK shows:

  • a non-redundant list of DDIs involving the query domain, grouped by their binding site.
  • a Jmol view showing the DDIs placed in the coordinate frame of the query domain. You can choose to view one DDI at a time or a group of DDIs which share a similar binding site according to our binding site direction vector algorithm (see references).
  • a consensus sequence alignment of the domains. Each sequence is annotated with core and rim interacting residues, and also the binding site centre residue.

 

Given a query Pfam domain, KBDOCK can also show DDIs involving different but structurally similar Pfam domains. For each Pfam family, KBDOCK stores a list of "Pfam neighbours" which is calculated using the Kpax structure alignment algorithm. Given two query domain structures, if full-homology (FH)† DDI templates exist in KBDOCK, then for each distinct interface, KBDOCK provides:

  • the best FH template (based on overall sequence identity) and the corresponding docking model.
  • a Jmol view of the query domains superposed onto the FH template.
  • a pairwise sequence alignment of the query domains and their corresponding template domain, showing the core, rim, and centre binding site residues.

 

If FH DDIs do not exist or if only one query domain structure is given, KBDOCK finds semi-homology (SH)† DDI templates. For each query domain and for each distinct binding site, it provides:

  • the best SH template (based on domain sequence identity).
  • a Jmol view of the query domain superposed onto the SH template. A centre binding site residue is proposed.
  • a pairwise sequence alignment of the query domain and its corresponding template domain, showing the core, rim, and centre binding site residues.

 

KBDOCK can also find DDIs involving structurally similar Pfam domains to the query domains using pre-calculated Pfam neighbour lists.

more...
No comment yet.
Scooped by Dr. Stefan Gruenwald
Scoop.it!

NRED - an ncRNA Expression Database

NRED - an ncRNA Expression Database | bioinformatics-databases | Scoop.it

NRED integrates annotated expression data from various sources. Use this form to filter expression results data based on probe characteristics and/or the values of the expression data. If no experimental result set is selected, the form can also be used to search the probe table. All search fields are optional. For help and descriptions of the different fields, simply hover your mouse over the form labels. To reference NRED, please cite Dinger et al., 2008, Nucleic Acid Res.

more...
No comment yet.
Scooped by Dr. Stefan Gruenwald
Scoop.it!

dbPSHP: a database of recent positive selection across human populations

dbPSHP: a database of recent positive selection across human populations | bioinformatics-databases | Scoop.it

Welcome to dbPHSP (http://jjwanglab.org/dbpshp)(hg19/GRCh37). The database contained curated publications about positive selection in different human populations, which consisted of over 15,000 loci from either publications attempting to study positively selected genomic locus and gene related to specific functions/traits/diseases, or publications to detect the genome-wide selective signals with different statistical methods.

It also includes 15 statistical terms for each single nucleotide polymorphism site from the HapMap III and 1000 Genomes Project genotyping data. These attributes include variant allele frequency, variant heterozygosity, within population diversity, haplotype homozygosity, long-range haplotypes, pairwise population differentiation and evolutionary conservation. We also provided interactive pages for visualization and annotation of different selective signals.

more...
No comment yet.
Scooped by Dr. Stefan Gruenwald
Scoop.it!

JASPAR - a database for defined transcription factor binding sites for eukaryotes

JASPAR - a database for defined transcription factor binding sites for eukaryotes | bioinformatics-databases | Scoop.it
 

The JASPAR CORE database contains a curated, non-redundant set of profiles, derived from published collections of experimentally defined transcription factor binding sites for eukaryotes. The prime difference to similar resources (TRANSFAC, etc) consist of the open data acess, non-redundancy and quality.

 

When should it be used? When seeking models for specific factors or structural classes, or if experimental evidence is paramount

more...
No comment yet.