6.4K views | +2 today
Scooped by Dr. Stefan Gruenwald
onto bioinformatics-databases!

MGI: The international database resource for the laboratory mouse

MGI: The international database resource for the laboratory mouse | bioinformatics-databases |
MGI: the international database resource for the laboratory mouse, providing integrated genetic, genomic, and biological data for researching human health and disease.
No comment yet.
Bioinformatics - Databases
Your new post is loading...
Your new post is loading...
Scooped by Dr. Stefan Gruenwald!

Software packages for next gen sequence analysis

Software packages for next gen sequence analysis | bioinformatics-databases |
Software packages for next gen sequence analysis Bioinformatics


Integrated solutions
* CLCbio Genomics Workbench - de novo and reference assembly of Sanger, Roche FLX, Illumina, Helicos, and SOLiD data. Commercial next-gen-seq software that extends the CLCbio Main Workbench software. Includes SNP detection, CHiP-seq, browser and other features. Commercial. Windows, Mac OS X and Linux.
* Galaxy - Galaxy = interactive and reproducible genomics. A job webportal.
* Genomatix - Integrated Solutions for Next Generation Sequencing data analysis.
* JMP Genomics - Next gen visualization and statistics tool from SAS. They are working with NCGR to refine this tool and produce others.
* NextGENe - de novo and reference assembly of Illumina, SOLiD and Roche FLX data. Uses a novel Condensation Assembly Tool approach where reads are joined via "anchors" into mini-contigs before assembly. Includes SNP detection, CHiP-seq, browser and other features. Commercial. Win or MacOS.
* SeqMan Genome Analyser - Software for Next Generation sequence assembly of Illumina, Roche FLX and Sanger data integrating with Lasergene Sequence Analysis software for additional analysis and visualization capabilities. Can use a hybrid templated/de novo approach. Commercial. Win or Mac OS X.
* SHORE - SHORE, for Short Read, is a mapping and analysis pipeline for short DNA sequences produced on a Illumina Genome Analyzer. A suite created by the 1001 Genomes project. Source for POSIX.
* SlimSearch - Fledgling commercial product.

Align/Assemble to a reference
* BFAST - Blat-like Fast Accurate Search Tool. Written by Nils Homer, Stanley F. Nelson and Barry Merriman at UCLA.
* Bowtie - Ultrafast, memory-efficient short read aligner. It aligns short DNA sequences (reads) to the human genome at a rate of 25 million reads per hour on a typical workstation with 2 gigabytes of memory. Uses a Burrows-Wheeler-Transformed (BWT) index.Link to discussion thread here. Written by Ben Langmead and Cole Trapnell. Linux, Windows, and Mac OS X.
* BWA - Heng Lee's BWT Alignment program - a progression from Maq. BWA is a fast light-weighted tool that aligns short sequences to a sequence database, such as the human reference genome. By default, BWA finds an alignment within edit distance 2 to the query sequence. C++ source.
* ELAND - Efficient Large-Scale Alignment of Nucleotide Databases. Whole genome alignments to a reference genome. Written by Illumina author Anthony J. Cox for the Solexa 1G machine.
* Exonerate - Various forms of pairwise alignment (including Smith-Waterman-Gotoh) of DNA/protein against a reference. Authors are Guy St C Slater and Ewan Birney from EMBL. C for POSIX.
* GenomeMapper - GenomeMapper is a short read mapping tool designed for accurate read alignments. It quickly aligns millions of reads either with ungapped or gapped alignments. A tool created by the 1001 Genomes project. Source for POSIX.
* GMAP - GMAP (Genomic Mapping and Alignment Program) for mRNA and EST Sequences. Developed by Thomas Wu and Colin Watanabe at Genentec. C/Perl for Unix.
* gnumap - The Genomic Next-generation Universal MAPper (gnumap) is a program designed to accurately map sequence data obtained from next-generation sequencing machines (specifically that of Solexa/Illumina) back to a genome of any size. It seeks to align reads from nonunique repeats using statistics. From authors at Brigham Young University. C source/Unix.
* MAQ - Mapping and Assembly with Qualities (renamed from MAPASS2). Particularly designed for Illumina with preliminary functions to handle ABI SOLiD data. Written by Heng Li from the Sanger Centre. Features extensive supporting tools for DIP/SNP detection, etc. C++ source
* MOSAIK - MOSAIK produces gapped alignments using the Smith-Waterman algorithm. Features a number of support tools. Support for Roche FLX, Illumina, SOLiD, and Helicos. Written by Michael Strömberg at Boston College. Win/Linux/MacOSX
* MrFAST and MrsFAST - mrFAST & mrsFAST are designed to map short reads generated with the Illumina platform to reference genome assemblies; in a fast and memory-efficient manner. Robust to INDELs and MrsFAST has a bisulphite mode. Authors are from the University of Washington. C as source.
* MUMmer - MUMmer is a modular system for the rapid whole genome alignment of finished or draft sequence. Released as a package providing an efficient suffix tree library, seed-and-extend alignment, SNP detection, repeat detection, and visualization tools. Version 3.0 was developed by Stefan Kurtz, Adam Phillippy, Arthur L Delcher, Michael Smoot, Martin Shumway, Corina Antonescu and Steven L Salzberg - most of whom are at The Institute for Genomic Research in Maryland, USA. POSIX OS required.
* Novocraft - Tools for reference alignment of paired-end and single-end Illumina reads. Uses a Needleman-Wunsch algorithm. Can support Bis-Seq. Commercial. Available free for evaluation, educational use and for use on open not-for-profit projects. Requires Linux or Mac OS X.
* PASS - It supports Illumina, SOLiD and Roche-FLX data formats and allows the user to modulate very finely the sensitivity of the alignments. Spaced seed intial filter, then NW dynamic algorithm to a SW(like) local alignment. Authors are from CRIBI in Italy. Win/Linux.
* RMAP - Assembles 20 - 64 bp Illumina reads to a FASTA reference genome. By Andrew D. Smith and Zhenyu Xuan at CSHL. (published in BMC Bioinformatics). POSIX OS required.
* SeqMap - Supports up to 5 or more bp mismatches/INDELs. Highly tunable. Written by Hui Jiang from the Wong lab at Stanford. Builds available for most OS's.
* SHRiMP - Assembles to a reference sequence. Developed with Applied Biosystem's colourspace genomic representation in mind. Authors are Michael Brudno and Stephen Rumble at the University of Toronto. POSIX.
* Slider- An application for the Illumina Sequence Analyzer output that uses the probability files instead of the sequence files as an input for alignment to a reference sequence or a set of reference sequences. Authors are from BCGSC. Paper is here.
* SOAP - SOAP (Short Oligonucleotide Alignment Program). A program for efficient gapped and ungapped alignment of short oligonucleotides onto reference sequences. The updated version uses a BWT. Can call SNPs and INDELs. Author is Ruiqiang Li at the Beijing Genomics Institute. C++, POSIX.
* SSAHA - SSAHA (Sequence Search and Alignment by Hashing Algorithm) is a tool for rapidly finding near exact matches in DNA or protein databases using a hash table. Developed at the Sanger Centre by Zemin Ning, Anthony Cox and James Mullikin. C++ for Linux/Alpha.
* SOCS - Aligns SOLiD data. SOCS is built on an iterative variation of the Rabin-Karp string search algorithm, which uses hashing to reduce the set of possible matches, drastically increasing search speed. Authors are Ondov B, Varadarajan A, Passalacqua KD and Bergman NH.
* SWIFT - The SWIFT suit is a software collection for fast index-based sequence comparison. It contains: SWIFT — fast local alignment search, guaranteeing to find epsilon-matches between two sequences. SWIFT BALSAM — a very fast program to find semiglobal non-gapped alignments based on k-mer seeds. Authors are Kim Rasmussen (SWIFT) and Wolfgang Gerlach (SWIFT BALSAM)
* SXOligoSearch - SXOligoSearch is a commercial platform offered by the Malaysian based Synamatix. Will align Illumina reads against a range of Refseq RNA or NCBI genome builds for a number of organisms. Web Portal. OS independent.
* Vmatch - A versatile software tool for efficiently solving large scale sequence matching tasks. Vmatch subsumes the software tool REPuter, but is much more general, with a very flexible user interface, and improved space and time requirements. Essentially a large string matching toolbox. POSIX.
* Zoom - ZOOM (Zillions Of Oligos Mapped) is designed to map millions of short reads, emerged by next-generation sequencing technology, back to the reference genomes, and carry out post-analysis. ZOOM is developed to be highly accurate, flexible, and user-friendly with speed being a critical priority. Commercial. Supports Illumina and SOLiD data.

De novo Align/Assemble
* ABySS - Assembly By Short Sequences. ABySS is a de novo sequence assembler that is designed for very short reads. The single-processor version is useful for assembling genomes up to 40-50 Mbases in size. The parallel version is implemented using MPI and is capable of assembling larger genomes. By Simpson JT and others at the Canada's Michael Smith Genome Sciences Centre. C++ as source.
* ALLPATHS - ALLPATHS: De novo assembly of whole-genome shotgun microreads. ALLPATHS is a whole genome shotgun assembler that can generate high quality assemblies from short reads. Assemblies are presented in a graph form that retains ambiguities, such as those arising from polymorphism, thereby providing information that has been absent from previous genome assemblies. Broad Institute.
* Edena - Edena (Exact DE Novo Assembler) is an assembler dedicated to process the millions of very short reads produced by the Illumina Genome Analyzer. Edena is based on the traditional overlap layout paradigm. By D. Hernandez, P. François, L. Farinelli, M. Osteras, and J. Schrenzel. Linux/Win.
* EULER-SR - Short read de novo assembly. By Mark J. Chaisson and Pavel A. Pevzner from UCSD (published in Genome Research). Uses a de Bruijn graph approach.
* MIRA2 - MIRA (Mimicking Intelligent Read Assembly) is able to perform true hybrid de-novo assemblies using reads gathered through 454 sequencing technology (GS20 or GS FLX). Compatible with 454, Solexa and Sanger data. Linux OS required.
* SEQAN - A Consistency-based Consensus Algorithm for De Novo and Reference-guided Sequence Assembly of Short Reads. By Tobias Rausch and others. C++, Linux/Win.
* SHARCGS - De novo assembly of short reads. Authors are Dohm JC, Lottaz C, Borodina T and Himmelbauer H. from the Max-Planck-Institute for Molecular Genetics.
* SSAKE - The Short Sequence Assembly by K-mer search and 3' read Extension (SSAKE) is a genomics application for aggressively assembling millions of short nucleotide sequences by progressively searching for perfect 3'-most k-mers using a DNA prefix tree. Authors are René Warren, Granger Sutton, Steven Jones and Robert Holt from the Canada's Michael Smith Genome Sciences Centre. Perl/Linux.
* SOAPdenovo - Part of the SOAP suite. See above.
* VCAKE - De novo assembly of short reads with robust error correction. An improvement on early versions of SSAKE.
* Velvet - Velvet is a de novo genomic assembler specially designed for short read sequencing technologies, such as Solexa or 454. Need about 20-25X coverage and paired reads. Developed by Daniel Zerbino and Ewan Birney at the European Bioinformatics Institute (EMBL-EBI).

SNP/Indel Discovery
* ssahaSNP - ssahaSNP is a polymorphism detection tool. It detects homozygous SNPs and indels by aligning shotgun reads to the finished genome sequence. Highly repetitive elements are filtered out by ignoring those kmer words with high occurrence numbers. More tuned for ABI Sanger reads. Developers are Adam Spargo and Zemin Ning from the Sanger Centre. Compaq Alpha, Linux-64, Linux-32, Solaris and Mac
* PolyBayesShort - A re-incarnation of the PolyBayes SNP discovery tool developed by Gabor Marth at Washington University. This version is specifically optimized for the analysis of large numbers (millions) of high-throughput next-generation sequencer reads, aligned to whole chromosomes of model organism or mammalian genomes. Developers at Boston College. Linux-64 and Linux-32.
* PyroBayes - PyroBayes is a novel base caller for pyrosequences from the 454 Life Sciences sequencing machines. It was designed to assign more accurate base quality estimates to the 454 pyrosequences. Developers at Boston College.

Genome Annotation/Genome Browser/Alignment Viewer/Assembly Database
* EagleView - An information-rich genome assembler viewer. EagleView can display a dozen different types of information including base quality and flowgram signal. Developers at Boston College.
* LookSeq - LookSeq is a web-based application for alignment visualization, browsing and analysis of genome sequence data. LookSeq supports multiple sequencing technologies, alignment sources, and viewing modes; low or high-depth read pileups; and easy visualization of putative single nucleotide and structural variation. From the Sanger Centre.
* MapView - MapView: visualization of short reads alignment on desktop computer. From the Evolutionary Genomics Lab at Sun-Yat Sen University, China. Linux.
* SAM - Sequence Assembly Manager. Whole Genome Assembly (WGA) Management and Visualization Tool. It provides a generic platform for manipulating, analyzing and viewing WGA data, regardless of input type. Developers are Rene Warren, Yaron Butterfield, Asim Siddiqui and Steven Jones at Canada's Michael Smith Genome Sciences Centre. MySQL backend and Perl-CGI web-based frontend/Linux.
* STADEN - Includes GAP4. GAP5 once completed will handle next-gen sequencing data. A partially implemented test version is available here
* XMatchView - A visual tool for analyzing cross_match alignments. Developed by Rene Warren and Steven Jones at Canada's Michael Smith Genome Sciences Centre. Python/Win or Linux.

Counting e.g. CHiP-Seq, Bis-Seq, CNV-Seq
* BS-Seq - The source code and data for the "Shotgun Bisulphite Sequencing of the Arabidopsis Genome Reveals DNA Methylation Patterning" Nature paper by Cokus et al. (Steve Jacobsen's lab at UCLA). POSIX.
* CHiPSeq - Program used by Johnson et al. (2007) in their Science publication
* CNV-Seq - CNV-seq, a new method to detect copy number variation using high-throughput sequencing. Chao Xie and Martti T Tammi at the National University of Singapore. Perl/R.
* FindPeaks - perform analysis of ChIP-Seq experiments. It uses a naive algorithm for identifying regions of high coverage, which represent Chromatin Immunoprecipitation enrichment of sequence fragments, indicating the location of a bound protein of interest. Original algorithm by Matthew Bainbridge, in collaboration with Gordon Robertson. Current code and implementation by Anthony Fejes. Authors are from the Canada's Michael Smith Genome Sciences Centre. JAVA/OS independent. Latest versions available as part of the Vancouver Short Read Analysis Package
* MACS - Model-based Analysis for ChIP-Seq. MACS empirically models the length of the sequenced ChIP fragments, which tends to be shorter than sonication or library construction size estimates, and uses it to improve the spatial resolution of predicted binding sites. MACS also uses a dynamic Poisson distribution to effectively capture local biases in the genome sequence, allowing for more sensitive and robust prediction. Written by Yong Zhang and Tao Liu from Xiaole Shirley Liu's Lab.
* PeakSeq - PeakSeq: Systematic Scoring of ChIP-Seq Experiments Relative to Controls. a two-pass approach for scoring ChIP-Seq data relative to controls. The first pass identifies putative binding sites and compensates for variation in the mappability of sequences across the genome. The second pass filters out sites that are not significantly enriched compared to the normalized input DNA and computes a precise enrichment and significance. By Rozowsky J et al. C/Perl.
* QuEST - Quantitative Enrichment of Sequence Tags. Sidow and Myers Labs at Stanford. From the 2008 publication Genome-wide analysis of transcription factor binding sites based on ChIP-Seq data. (C++)
* SISSRs - Site Identification from Short Sequence Reads. BED file input. Raja Jothi @ NIH. Perl.
**See also this thread for ChIP-Seq, until I get time to update this list.

Alternate Base Calling
* Rolexa - R-based framework for base calling of Solexa data. Project publication
* Alta-cyclic - "a novel Illumina Genome-Analyzer (Solexa) base caller"

* ERANGE - Mapping and Quantifying Mammalian Transcriptomes by RNA-Seq. Supports Bowtie, BLAT and ELAND. From the Wold lab.
* G-Mo.R-Se - G-Mo.R-Se is a method aimed at using RNA-Seq short reads to build de novo gene models. First, candidate exons are built directly from the positions of the reads mapped on the genome (without any ab initio assembly of the reads), and all the possible splice junctions between those exons are tested against unmapped reads. From CNS in France.
* MapNext - MapNext: A software tool for spliced and unspliced alignments and SNP detection of short sequence reads. From the Evolutionary Genomics Lab at Sun-Yat Sen University, China.
* QPalma - Optimal Spliced Alignments of Short Sequence Reads. Authors are Fabio De Bona, Stephan Ossowski, Korbinian Schneeberger, and Gunnar Rätsch. A paper is available.
* RSAT - RSAT: RNA-Seq Analysis Tools. RNASAT is developed and maintained by Hui Jiang at Stanford University.
* TopHat - TopHat is a fast splice junction mapper for RNA-Seq reads. It aligns RNA-Seq reads to mammalian-sized genomes using the ultra high-throughput short read aligner Bowtie, and then analyzes the mapping results to identify splice junctions between exons. TopHat is a collaborative effort between the University of Maryland and the University of California, Berkeley

No comment yet.
Scooped by Dr. Stefan Gruenwald!

uORFdb — a comprehensive literature database on eukaryotic uORF biology

uORFdb — a comprehensive literature database on eukaryotic uORF biology | bioinformatics-databases |

Translation initiation sites that precede the main coding sequence (CDS) of a transcript give rise to upstream open reading frames (uORFs). The presence of uORFs affects initiation rates at the CDS by interfering with unrestrained progression of ribosomes across the 5´-transcript leader sequence.

No comment yet.
Scooped by Dr. Stefan Gruenwald!

FungiDB: The Fungal and Oomycete Genomics Resource

FungiDB: The Fungal and Oomycete Genomics Resource | bioinformatics-databases |

FungiDB provides functional genomics of fungi.

No comment yet.
Scooped by Dr. Stefan Gruenwald!

Lynx : A Systems Biology Platform for Integrative Medicine

Lynx : A Systems Biology Platform for Integrative Medicine | bioinformatics-databases |

Gene Annotations

Various classes of gene annotations (e.g. functions, pathways, diseases association, tissue expression, etc) from Lynx Knowledge base integrating information from over 40 public databasest.


Enrichment Analysis
Identification of gene categories over-represented in the user-submitted gene list based on Pathways, Gene Ontology, Diseases, Tissue expression and other features associations.
Genes Prioritization
Networks based Gene Prioritization using PINTA algorithms: Heat Kernal Ranking, Simple Random Walk, PageRank with Priors, HITS with Priors and K-Step Markov. Using STRING-9 underlying network.
No comment yet.
Scooped by Dr. Stefan Gruenwald!

LOCATE - A Mammalian Protein Localization Database

LOCATE - A Mammalian Protein Localization Database | bioinformatics-databases |

LOCATE is a curated database that houses data describing the membrane organization and subcellular localization of proteins from the RIKEN FANTOM4 mouse and human protein sequence set. The membrane organization is predicted by the high-throughput, computational pipeline MemO. The subcellular locations were determined by a high-throughput, immunofluorescence-based assay and by manually reviewing peer-reviewed publications.


D Wot's curator insight, May 9, 2016 10:02 AM

This could be a useful reference.

Scooped by Dr. Stefan Gruenwald!

PlantProm - Plant Promoter Database

PlantProm - Plant Promoter Database | bioinformatics-databases |

PlantProm DB was initially developed as an annotated, non-redundant collection of proximal promoter sequences for RNA polymerase II with experimentally determined transcription start site(s), TSS, from various plant species. The first release of DB, 2002.01, developed by the Department of Computer Science at Royal Holloway, University of London, in collaboration with Softberry Inc. (USA), is available at . It contained 305 entries from monocot, dicot and other plants.

The new release of PlantProm DB contains 578 unrelated entries including 151, 396 and 31 promoters with experimentally verified TSS from monocot, dicot and other plants, respectively.In comparison with promoter sets, where TSSs, identified by applying full-length cDNA/5;-5'ESTs mapping, CAGE and SAGE approaches, remain to be confirmed by direct experimental evidence, this DB and The Eukaryotic Promoter Database (134 unrelated plant promoters; see: ) present the published promoter sequences with TSS(s) determined by direct experimental approaches and therefore serve as the most accurate sources for development of computational promoter prediction tools (for example, see: TSSP-TCM, TSSP, FPROM, CONPRO). For collecting experimentally verified plant gene promoters the following criteria was followed.

No comment yet.
Scooped by Dr. Stefan Gruenwald!

HIT  - a Herbal Ingredients' Targets Database

HIT  - a Herbal Ingredients' Targets Database | bioinformatics-databases |

Herbal Ingredients' Targets Database


HIT is a comprehensive and fully curated database to complement available resources on protein targets for FDA-approved drugs as well as the promising precursors. It currently contains about 1,301 known protein targets(221 proteins are described as direct targets) derived from more than 3,250 literatures, which covers about 586 active compounds from more than 1,300 reputable Chinese herbs. The molecular target information involves those proteins being directly/indirectly activated or inhibited, protein binders, and enzymes whose substrates or products are those compounds. Detailed interaction values such as IC50 and Kd/Ki are collected if possible. Those up or down regulated genes are also included under the treatment of individual ingredients.

D Wot's curator insight, May 9, 2016 10:04 AM

This could be a useful reference for herbs.

Scooped by Dr. Stefan Gruenwald!

dbDEPC - a database of differentially expressed proteins in human cancer

dbDEPC - a database of differentially expressed proteins in human cancer | bioinformatics-databases |

dbDEPC is updated to version 2.0 and contains more information about differentially expressed proteins (DEPs) in human cancers. Currently dbDEPC contains 4029 DEPs, 20 cancers and 331 MS experiments. More advanced search function makes your query easier. Modified profile and new network function provide you a systematic view of DEPs in cancers.

No comment yet.
Scooped by Dr. Stefan Gruenwald!

SSU - The Small Subunit rRNA Modification Database

SSU - The Small Subunit rRNA Modification Database | bioinformatics-databases |

The Small Subunit rRNA Modification Database provides a listing of reported post-transcriptionally modified nucleosides and sequence sites in small subunit rRNAs from bacteria, archaea and eukarya. Data are compiled from reports of full or partial rRNA sequences, including RNase T1 oligonucleotide catalogs reported in earlier literature in studies of phylogenetic relatedness. Options for data presentation include full sequence maps, some of which have been assembled by database curators with the aid of contemporary gene sequence data, and tabular forms organized by source organism or chemical identity of the modification. A total of 32 rRNA sequence alignments are provided, annotated with sites of modification and chemical identities of modifications if known, with provision for scrolling full sequences or user-dictated subsequences for comparative viewing for organisms of interest. The database can be accessed through the World Wide Web at

No comment yet.
Scooped by Dr. Stefan Gruenwald!

BSRD: A Bacterial Small Regulatory RNA Database

BSRD: A Bacterial Small Regulatory RNA Database | bioinformatics-databases |

In bacteria, small (~30-500 nt) non-coding RNAs (sRNAs) are the most abundant class of post-transcriptional regulators that are involved in diverse processes including quorum sensing, stress response, virulence and carbon metabolism. Based on the target molecules, sRNAs can be divided into two major groups: (i) mRNA-binding antisense sRNAs and (ii) protein-binding sRNAs. The antisense RNAs can further be categorized as cis-encoded antisense sRNAs, which are completely complementary to their targets, and trans-encoded antisense sRNAs, which are only partially complementary to their targets. In any case, the interaction between antisense RNAs and target mRNAs could direct a plethora of biological regulatory circuits. Recent developments in high-throughput techniques, such as genomic tiling arrays and RNA-Seq have provided invaluable insights into the detection and characterization of bacterial sRNAs. However, a comprehensive bacterial sRNA database is not yet available, especially for integrating and analyzing high-throughput sequencing data.


Here, we have designed and constructed BSRD (Bacterial Small regulatory RNA Database) which hosts sRNAs collected from over 783 bacterial species and 957 strains.


The distinctive features of BSRD are:

    (1) BSRD hosts sRNAs retrieved from online databases including Rfam, sRNAMap, GenBank, RegulonDB and EcoCyc, as well as manual curation.In additional, we have also integrated 20,115 regulatory elements in BSRD.

    (2) BSRD collects sRNAs targets predicted by computational algorithms, IntaRNA and RNAplex, as well as experimentally validated sRNAs targets in sRNATarBase and related literatures.

    (3) BSRD includes information on regulatory relationships between transcription factors (TF) and their target genes, which could provide insights into the combinatorial regulations of sRNAs and TF to their common targets.

    (4) BSRD has integrated expression data from NCBI GEO (Gene Expression Omnibus), which provides detailed evidences for sRNA expression profiling and re-annotation.

    (5) BSRD includes multiple new sRNA annotations from manually curated literature mining, including growth phase, Hfq binding, dual function and Rho-independent terminators.

    (6) BSRD harbors a novel RNA-Seq analysis platform, sRNADeep, that allows perform comprehensive sRNA expression profiling and differential expression analysis in large-scale transcriptome sequencing projects. With the aid of sRNADeep, users can (i) filter low-quality reads and adaptors from raw sequencing data, and (ii) align large amount of short reads to BSRD for identification of known sRNAs.

    (7) BSRD has implemented a Wikipedia-based community annotation function.

    (8) BSRD has a user-friendly interface, a flexible search option and a BLAST server for sequence homology searching.


If you want to download all sRNA sequences in BSRD, you can access this link to get a quick download.


Before getting started, you are suggested to read the full help document to get a quick guide on the database.
No comment yet.
Scooped by Dr. Stefan Gruenwald!

KNOTTIN: A database for intriguing miniproteins with strong potential in drug design

KNOTTIN: A database for intriguing miniproteins with strong potential in drug design | bioinformatics-databases |

The KNOTTIN database provides standardized data on the knottin structural family (also referred to as the "Inhibitor Cystine Knot (ICK) motif/family/fold").

Read below essential information about knottins
Look at current statistics on the database content
Look at current statistics on loop lengths

Comments, suggestions or supplementary data are welcome.
When using this website and database, please cite: J. Gracy et al. Nucleic Acids Res. 2008, 36:D314-9, and J.-C. Gelly et al. Nucleic Acids Res. 2004, 32:D156-9.

No comment yet.
Scooped by Dr. Stefan Gruenwald!

KBDOCK - a protein domain-domain interaction database

KBDOCK - a protein domain-domain interaction database | bioinformatics-databases |

KBDOCK is a 3D database system that defines and spatially clusters protein binding sites for knowledge-based protein docking. KBDOCK extracts protein domain-domain interaction (DDI) and domain-peptide interaction (DPI) information from the PDB using the PFAM domain classification in order to analyse the spatial arrangements of DDIs and DPIs by Pfam family, and to propose structural templates for protein docking.


Given a query Pfam domain, KBDOCK shows:

  • a non-redundant list of DDIs involving the query domain, grouped by their binding site.
  • a Jmol view showing the DDIs placed in the coordinate frame of the query domain. You can choose to view one DDI at a time or a group of DDIs which share a similar binding site according to our binding site direction vector algorithm (see references).
  • a consensus sequence alignment of the domains. Each sequence is annotated with core and rim interacting residues, and also the binding site centre residue.


Given a query Pfam domain, KBDOCK can also show DDIs involving different but structurally similar Pfam domains. For each Pfam family, KBDOCK stores a list of "Pfam neighbours" which is calculated using the Kpax structure alignment algorithm. Given two query domain structures, if full-homology (FH)† DDI templates exist in KBDOCK, then for each distinct interface, KBDOCK provides:

  • the best FH template (based on overall sequence identity) and the corresponding docking model.
  • a Jmol view of the query domains superposed onto the FH template.
  • a pairwise sequence alignment of the query domains and their corresponding template domain, showing the core, rim, and centre binding site residues.


If FH DDIs do not exist or if only one query domain structure is given, KBDOCK finds semi-homology (SH)† DDI templates. For each query domain and for each distinct binding site, it provides:

  • the best SH template (based on domain sequence identity).
  • a Jmol view of the query domain superposed onto the SH template. A centre binding site residue is proposed.
  • a pairwise sequence alignment of the query domain and its corresponding template domain, showing the core, rim, and centre binding site residues.


KBDOCK can also find DDIs involving structurally similar Pfam domains to the query domains using pre-calculated Pfam neighbour lists.

No comment yet.
Scooped by Dr. Stefan Gruenwald!

NRED - an ncRNA Expression Database

NRED - an ncRNA Expression Database | bioinformatics-databases |

NRED integrates annotated expression data from various sources. Use this form to filter expression results data based on probe characteristics and/or the values of the expression data. If no experimental result set is selected, the form can also be used to search the probe table. All search fields are optional. For help and descriptions of the different fields, simply hover your mouse over the form labels. To reference NRED, please cite Dinger et al., 2008, Nucleic Acid Res.

No comment yet.
Scooped by Dr. Stefan Gruenwald!

Bioinformatics Links Directory

Bioinformatics Links Directory | bioinformatics-databases |

The Bioinformatics Links Directory features curated links to molecular resources, tools and databases. The links listed in this directory are selected on the basis of recommendations from bioinformatics experts in the field. Starting in 2003, all links contained in the NAR Webserver issue are included.

No comment yet.
Scooped by Dr. Stefan Gruenwald!

Oomycetes Transcriptomics Database

Oomycetes Transcriptomics Database | bioinformatics-databases |

Oomycetes Transcriptomics Database is an integrated transcriptome and EST data resource for oomycete pathogens. The database currently stores processed ABI SOLiD transcript sequences from Phytophthora sojae mycelial and plant infection libraries as well as Illumina transcript sequences from five Hyaloperonospora arabidopsidis libraries. In addition to those resources, it has also a complete set of Sanger EST sequences from P. sojae, P. infestans and H. arabidopsidis grown under various conditions. A new web-based transcriptome browser was created for visualization of assembled transcripts, their mapping to the reference genome, expression profiling and depth of read coverage for a particular location on the genome. The transcriptome browser merges EST derived contigs with NGS derived assembled transcripts on the fly and displays the consensus. OTD possesses strong query features and the database interacts with VBI Microbial Database as well as the Phytophthora Transcriptomics Database. The legacy EST data of P.sojae comes from 10 libraries e.g; sHA, sHB, sMA, sML, sMY, sMC, sZG, sZO, sZS, iMY with a total of 33350 raw sequences. Additionally, there are 99,320 EST sequences fromP.infestans from NCBI. For P.infestans sequences cleaning information is not available. These sequences are clustered and assembled and data analysis was performed by us. We have separated Soybean ESTs from P.sojae libraries using insillico methods. The soybean libraries are named as gHA and gHB based on their origin from sHA or sHB.

No comment yet.
Scooped by Dr. Stefan Gruenwald!

M3D: A Many Microbe Microarrays Database

M3D: A Many Microbe Microarrays Database | bioinformatics-databases |

Expression compendia in M3D are versioned to indicate the underlying mysql schema of the database and to denote the particular set of expression data (e.g. E_coli_v4_Build_6 uses mysql schema version 4 and is the sixth compendium built for E. coli). Builds are maintained in perpetuity, allowing researchers to specify the exact dataset used for a particular analysis.

Citing M3D

Faith JJ, Driscoll ME, Fusaro VA, Cosgrove EJ, Hayete B, Juhn FS, Schneider SJ, and Gardner TS. Many Microbe Microarrays Database: uniformly normalized Affymetrix compendia with structured experimental metadata. Nucleic Acids Research 2008


No comment yet.
Scooped by Dr. Stefan Gruenwald!

SpliceDB - canonical and non-canonical splice site sequences in mammalian genes

SpliceDB - canonical and non-canonical splice site sequences in mammalian genes | bioinformatics-databases |
A set of 43337 splice junction pairs was extracted from mammalian GenBank annotated genes. 22489 of them are supported by EST sequences. 98.71% of those contain canonical dinucleotides GT and AG for donor and acceptor sites, respectively. 0.56% hold non-canonical GC-AG splice site pairs. The reminder 0.73% occurs in a lot of small groups (with maximum size of 0.05%). Studying these groups we observe that many of them contain splicing dinucleotides shifted from the annotated splice junction by one position. After close examination of such cases we present a new classification consisting of only 8 observed types of splice site pairs (out of 256 a priori possible combinations).
No comment yet.
Scooped by Dr. Stefan Gruenwald!

L1Resources - Database for Full-Length, Intact L1 Elements

L1Resources - Database for Full-Length, Intact L1 Elements | bioinformatics-databases |

L1Resources Database


This web portal is dedicated to putatively active LINE-1 elements.


LINE-1 (Long Interspersed Nucleotide Element 1, L1) are the only autonomous retrotransposons in the group
of non-LTR retrotransposons. They comprise about 18%-20% of mammalian genomes.  
Click here for background information and literature on LINE-1 elements.


No comment yet.
Scooped by Dr. Stefan Gruenwald!

Sys-BodyFluid - a body fluid database

Sys-BodyFluid - a body fluid database | bioinformatics-databases |

In the post genome era, proteomic technology has rapidly developed to be a powerful platform for the research of human physiology and identifying potential novel biomarkers for prognosis, diagnosis and therapeusis . And in recent years it is shown that body fluids has become one of the important targets for proteomic research. Analysis of the protein composition in body fluids could help to better understand human physiology and disease proteomics. More and more proteomics research have produced body fluid related data and a database is needed to collect and analyze these data.


Our body fluid protein database, Sys-BodyFluid, contains 11 body fluid proteomes, including plasma/serum, urine, cerebrospinal fluid, saliva, bronchoalveolar lavage fluid, synovial fluid, nipple aspirate fluid, tear fluid, seminal fluid, human milk, and amniotic fluid. Over 10,000 proteins are included in the Sys-BodyFluid. These body fluid proteome data come from 50 peer-review publications of different laboratories all over the world. Protein annotation are provided including protein description, Gene ontology, Domain information, Protein sequence and involved pathway. User can access the proteome data by protein name, protein accession number, sequence similarity. In addition, user could perform query cross different body fluids to get more comprehensive understanding. The difference and similarity between these 11 body fluids are also analyzed. Thus , the Sys-BodyFluid database could serve as a reference database for body fluid research and disease proteomics.

Click here to edit the content

No comment yet.
Scooped by Dr. Stefan Gruenwald!

LEGER: knowledge database and visualization tool for comparative genomics of pathogenic and non-pathogenic Listeria species

LEGER: knowledge database and visualization tool for comparative genomics of pathogenic and non-pathogenic Listeria species | bioinformatics-databases |

Listeria species are ubiquitous in the environment and often contaminate foods because they grow under conditions used for food preservation. Listeria monocytogenes, the human and animal pathogen, causes Listeriosis, an infection with a high mortality rate in risk groups such as immune-compromised individuals. Furthermore, L.monocytogenes is a model organism for the study of intracellular bacterial pathogens. The publication of its genome sequence and that of the non-pathogenic species Listeria innocua initiated numerous comparative studies and efforts to sequence all species comprising the genus. The Proteome database LEGER ( was developed to support functional genome analyses by combining information obtained by applying bioinformatics methods and from public databases to improve the original annotations. LEGER offers three unique key features: (i) it is the first comprehensive information system focusing on the functional assignment of genes and proteins; (ii) integrated visualization tools, KEGG pathway and Genome Viewer, alleviate the functional exploration of complex data; and (iii) LEGER presents results of systematic post-genome studies, thus facilitating analyses combining computational and experimental results. Moreover, LEGER provides an unpublished membrane proteome analysis of L.innocua and in total visualizes experimentally validated information about the subcellular localizations of 789 different listerial proteins.

No comment yet.
Scooped by Dr. Stefan Gruenwald!

mESAdb: An miRNA Sequence and Expression Database

mESAdb: An miRNA Sequence and Expression Database | bioinformatics-databases |

mESAdb is a regularly updated database for the multivariate analysis of sequences and expression of microRNAs from multiple taxa. mESAdb is modular and has a user interface implemented in PHP and JavaScript and coupled with statistical analysis and visualization packages written for the R language. The database primarily comprises mature microRNA sequences and their target data, along with selected human, mouse and zebrafish expression datasets. mESAdb analysis modules allow a) mining of selected microRNA expression datasets for a list of microRNAs; and b) pair-wise multivariate analysis of expression datasets within and between taxa; and c) association of microRNA lists or microRNAs with a given motif with annotation databases, HUGE Navigator [1], KEGG [2], and GO [3]. The use of existing and customized R packages facilitates future addition of datasets and analysis tools. Furthermore, the ability to upload and analyze user-specified datasets makes mESAdb an interactive and expandable analysis tool for microRNA sequence and expression data.

No comment yet.
Scooped by Dr. Stefan Gruenwald!

KineticDB - A Protein Folding Kinetics Database

KineticDB - A Protein Folding Kinetics Database | bioinformatics-databases |

KineticDB is the thoroughly curated database of protein folding kinetics which contains currently experiments on 87 unique proteins and about hundred of mutants. The main goal of KineticDB is to provide users with the diverse set of protein folding rates known from experiment. Currently the database contains kinetic data on single domain proteins, separate protein domains and short polypeptides without S-S bonds in the native structure.

We hope that the database will be useful for you. Help us make the database as wide and up-to-date as possible - send us references containing new kinetic data! We will be grateful for any contribution to the database from the community: for bug reports, for new kinetic data and for any questions

How to cite:
Bogatyreva N.S., Osypov A.A., Ivankov D.N. (2009). KineticDB: a database of protein folding kinetics. NAR, 37:D342-D346.

No comment yet.
Scooped by Dr. Stefan Gruenwald!

psi - A Structural Biology Knowlegebase

psi - A Structural Biology Knowlegebase | bioinformatics-databases |

With the completion of the sequencing of the genomes of human and other organisms, attention has focused on the characterization and function of proteins, the products of genes. The availability of sequence data and the growing impact of structural biology on biomedical research have prompted scientific groups from several countries to undertake projects in the emerging field of structural genomics.


The objective is to make these structures widely available for clinical and basic studies that will expand the knowledge of the role of proteins both in normal biological processes and in disease. The National Institute of General Medical Sciences (NIGMS) played a major role in the early planning for structural genomics and in 1999 organized a national program, the Protein Structure Initiative.


The PSI program officially concluded in 2015, having determined nearly 7000 protein structures, developed ~450 new or improved technologies, and wrote over 2200 publications. The impact of this work enabled the greater biological and biomedical communities.

No comment yet.