Amazing Science
Follow
Find tag "bioinformatics"
484.4K views | +38 today
Amazing Science
Amazing science facts - 3D_printing • aging • AI • anthropology • art • astronomy • bigdata • bioinformatics • biology • biotech • chemistry • computers • cosmology • education • environment • evolution • future • genetics • genomics • geosciences • green_energy • history • language • map • material_science • math • med • medicine • microscopy • nanotech • neuroscience • paleontology • photography • photonics • physics • postings • robotics • science • technology • video
Your new post is loading...
Scooped by Dr. Stefan Gruenwald
Scoop.it!

Broad Institute, Google Genomics combine bioinformatics and computing expertise

Broad Institute, Google Genomics combine bioinformatics and computing expertise | Amazing Science | Scoop.it

Broad Institute of MIT and Harvard is teaming up with Google Genomics to explore how to break down major technical barriers that increasingly hinder biomedical research by addressing the need for computing infrastructure to store and process enormous datasets, and by creating tools to analyze such data and unravel long-standing mysteries about human health.

As a first step, Broad Institute’s Genome Analysis Toolkit, or GATK, will be offered as a service on the Google Cloud Platform, as part of Google Genomics. The goal is to enable any genomic researcher to upload, store, and analyze data in a cloud-based environment that combines the Broad Institute’s best-in-class genomic analysis tools with the scale and computing power of Google.

GATK is a software package developed at the Broad Institute to analyze high-throughput genomic sequencing data. GATK offers a wide variety of analysis tools, with a primary focus on genetic variant discovery and genotyping as well as a strong emphasis on data quality assurance. Its robust architecture, powerful processing engine, and high-performance computing features make it capable of taking on projects of any size.

GATK is already available for download at no cost to academic and non-profit users. In addition, business users can license GATK from the Broad. To date, more than 20,000 users have processed genomic data using GATK.

The Google Genomics service will provide researchers with a powerful, additional way to use GATK. Researchers will be able to upload genetic data and run GATK-powered analyses on Google Cloud Platform, and may use GATK to analyze genetic data already available for research via Google Genomics. GATK as a service will make best-practice genomic analysis readily available to researchers who don’t have access to the dedicated compute infrastructure and engineering teams required for analyzing genomic data at scale. An initial alpha release of the GATK service will be made available to a limited set of users.

“Large-scale genomic information is accelerating scientific progress in cancer, diabetes, psychiatric disorders, and many other diseases,” said Eric Lander, President and Director of Broad Institute. “Storing, analyzing, and managing these data is becoming a critical challenge for biomedical researchers. We are excited to work with Google’s talented and experienced engineers to develop ways to empower researchers around the world by making it easier to access and use genomic information.”

more...
No comment yet.
Scooped by Dr. Stefan Gruenwald
Scoop.it!

Data Scientist on a Quest to Turn Computers Into Doctors

Data Scientist on a Quest to Turn Computers Into Doctors | Amazing Science | Scoop.it

Some of the world’s most brilliant minds are working as data scientists at places like Google, Facebook, and Twitter—analyzing the enormous troves of online information generated by these tech giants—and for hacker and entrepreneur Jeremy Howard, that’s a bit depressing. Howard, a data scientist himself, spent a few years as the president of the Kaggle, a kind of online data scientist community that sought to feed the growing thirst for information analysis. He came to realize that while many of Kaggle’s online data analysis competitions helped scientists make new breakthroughs, the potential of these new techniques wasn’t being fully realized. “Data science is a very sexy job at the moment,” he says. “But when I look at what a lot of data scientists are actually doing, the vast majority of work out there is on product recommendations and advertising technology and so forth.”


So, after leaving Kaggle last year, Howard decided he would find a better use for data science. Eventually, he settled on medicine. And he even did a kind of end run around the data scientists, leveraging not so much the power of the human brain but the rapidly evolving talents of artificial brains. His new company is called Enlitic, and it wants to use state-of-the-art machine learning algorithms—what’s known as “deep learning”—to diagnosis illness and disease.


Publicly revealed for the first time today, the project is only just getting off the ground—“the big opportunities are going to take years to develop,” Howard says—but it’s yet another step forward for deep learning, a form of artificial intelligence that more closely mimics the way our brains work. Facebook is exploring deep learning as a way of recognizing faces in photos. Google uses it for image tagging and voice recognition. Microsoft does real-time translation in Skype. And the list goes on.


But Howard hopes to use deep learning for something more meaningful. His basic idea is to create a system akin to the Star Trek Tricorder, though perhaps not as portable. Enlitic will gather data about a particular patient—from medical images to lab test results to doctors’ notes—and its deep learning algorithms will analyze this data in an effort to reach a diagnosis and suggest treatments. The point, Howard says, isn’t to replace doctors, but to give them the tools they need to work more effectively. With this in mind, the company will share its algorithms with clinics, hospitals, and other medical outfits, hoping they can help refine its techniques. Howard says that the health care industry has been slow to pick-up on the deep-learning trend because it was rather expensive to build the computing clusters needed to run deep learning algorithms. But that’s changing.


The real challenge, Howard says, isn’t writing algorithms but getting enough data to train those algorithms. He says Enlitic is working with a number of organizations that specialize in gathering anonymized medical data for this type of research, but he declines to reveal the names of the organizations he’s working with. And while he’s tight-lipped about the company’s technique now, he says that much of the work the company does will eventually be published in research papers.

more...
Mike Dele's curator insight, March 20, 10:00 PM

why don't we look at the possibility of creating and manufacturing human spare parts just like for cars to replace any form of problem?

Benjamin Mzhari's curator insight, March 27, 8:37 AM

i fore see this type of profession becoming dynamic in the sense that it will not only look at business data but other statistics figures that will aid businesses.

Scooped by Dr. Stefan Gruenwald
Scoop.it!

UCSC Ebola genome browser now online to aid researchers' response to crisis

UCSC Ebola genome browser now online to aid researchers' response to crisis | Amazing Science | Scoop.it

The UC Santa Cruz Genomics Institute late Tuesday (September 30) released a new Ebola genome browser to assist global efforts to develop a vaccine and antiserum to help stop the spread of the Ebola virus.

The team led by University of California, Santa Cruz researcher Jim Kent worked around the clock for the past week, communicating with international partners to gather and present the most current data. The Ebola virus browser aligns five strains of Ebola with two strains of the related Marburg virus. Within these strains, Kent and other members of the UC Santa Cruz Genome Browser team have aligned 148 individual viral genomes, including 102 from the current West Africa outbreak.

UC Santa Cruz has established the UCSC Ebola Genome Portal, with links to the new Ebola genome browser as well as links to all the relevant scientific literature on the virus. 

“Ebola has been one of my biggest fears ever since I learned about it in my first microbiology class in 1997," said Kent, who 14 years ago created the first working draft of the human genome.  "We need a heroic worldwide effort to contain Ebola. Making an informatics resource like the genome browser for Ebola researchers is the least we could do.”

Scientists around the world can access the open-source browser to compare genetic changes in the virus genome and areas where it remains the same. The browser allows scientists and researchers from drug companies, other universities, and governments to study the virus and its genomic changes as they seek a solution to halt the epidemic. 

more...
No comment yet.
Scooped by Dr. Stefan Gruenwald
Scoop.it!

First comprehensive atlas of human gene activity released

First comprehensive atlas of human gene activity released | Amazing Science | Scoop.it

A large international consortium of researchers has produced the first comprehensive, detailed map of the way genes work across the major cells and tissues of the human body. The findings describe the complex networks that govern gene activity, and the new information could play a crucial role in identifying the genes involved with disease.


“Now, for the first time, we are able to pinpoint the regions of the genome that can be active in a disease and in normal activity, whether it’s in a brain cell, the skin, in blood stem cells or in hair follicles,” said Winston Hide, associate professor of bioinformatics and computational biology at Harvard School of Public Health (HSPH) and one of the core authors of the main paper in Nature.


“This is a major advance that will greatly increase our ability to understand the causes of disease across the body.”


The research is outlined in a series of papers published March 27, 2014, two in the journal Nature and 16 in other scholarly journals. The work is the result of years of concerted effort among 250 experts from more than 20 countries as part of FANTOM 5 (Functional Annotation of the Mammalian Genome). The FANTOM project, led by the Japanese institution RIKEN, is aimed at building a complete library of human genes.


Researchers studied human and mouse cells using a new technology called Cap Analysis of Gene Expression (CAGE), developed at RIKEN, to discover how 95% of all human genes are switched on and off. These “switches” — called “promoters” and “enhancers” — are the regions of DNA that manage gene activity. The researchers mapped the activity of 180,000 promoters and 44,000 enhancers across a wide range of human cell types and tissues and, in most cases, found they were linked with specific cell types.


“We now have the ability to narrow down the genes involved in particular diseases based on the tissue cell or organ in which they work,” said Hide. “This new atlas points us to the exact locations to look for the key genetic variants that might map to a disease.”

more...
Eli Levine's curator insight, March 28, 2014 7:27 PM
There it is. As it is in our genes, so too is it in our individual psyches and societies. Check it out!
Martin Daumiller's curator insight, March 29, 2014 12:27 PM

original article: http://www.nature.com/nature/journal/v507/n7493/full/nature13182.html

 

 

Scooped by Dr. Stefan Gruenwald
Scoop.it!

Computational simulations of photosystem II exposes secret pathways behind photosynthesis

Computational simulations of photosystem II exposes secret pathways behind photosynthesis | Amazing Science | Scoop.it
New insights into the behavior of photosynthetic proteins from atomic simulations could hasten the development of artificial light-gathering machines.


The protein complex known as photosystem II splits water molecules to release oxygen using sunlight and relatively simple biological building blocks. Although water can also be split artificially using an electrical voltage and a precious metal catalyst, researchers continue to strive to mimic the efficient natural process. So far, however, these efforts have been hampered by an incomplete understanding of the water oxidation mechanism of photosystem II. Shinichiro Nakamura from the RIKEN Innovation Center and colleagues have now used simulations to reveal the hidden pathways of water molecules inside photosystem II.


At the heart of photosystem II is a cluster of manganese, calcium and oxygen atoms, known as the oxygen-evolving complex (OEC), that catalyzes the water-splitting reaction. Recent high-resolution x-ray crystallography studies have revealed the precise positions of the atoms in the OEC and of the protein residues that contact the site. While this information has yielded important structural clues into photosynthetic water oxidation, the movements of water, oxygen and protons within the protein complex are still the subject of much speculation.


To resolve this problem, researchers have turned to molecular dynamics (MD) simulation, a technique that models the time-dependent behavior of biomolecules using thermodynamics and the physics of motion. While previous MD simulations of photosystem II have involved the use of approximate models that focus only on protein monomers or the main OEC components, Nakamura's team took a different approach. "Our hypothesis was that we cannot understand the mechanism of oxygen evolution just by looking at the manganese-based reaction center," he says. "Therefore, we carried out a total MD simulation, without any truncation of the protein or simplification."


In their simulation, the team embedded an exact model of photosystem II inside a thylakoid—a lipid and fatty-acid membrane-bound compartment found in the chloroplasts of plant cells. After initial computations confirmed the reliability of their model, the researchers performed a rigorous MD simulation of the protein—membrane system in the presence of more than 300,000 water molecules (Fig. 1). "The results indicated that water, oxygen and protons move through photosystem II not randomly but via distinct pathways that are not obviously visible," says Nakamura.


The pathways revealed by the simulations are delicately coupled to the dynamic motions of the photosystem II protein residues. While such intricate activity is currently impossible to reproduce artificially, the researchers suspect that combining quantum-chemical calculations with MD simulations could help to unlock the mysterious principles behind the highly efficient oxygen-evolution reactions of this remarkable biological factory.

more...
No comment yet.
Scooped by Dr. Stefan Gruenwald
Scoop.it!

World's largest disease database will use artificial intelligence to find new cancer treatments

World's largest disease database will use artificial intelligence to find new cancer treatments | Amazing Science | Scoop.it
A new cancer database containing 1.7 billion experimental results will utilise artificial intelligence similar to the technology used to predict the weather to discover the cancer treatments of the future.

 

The system, called CanSAR, is the biggest disease database of its kind anywhere in the world and condenses more data than would be generated by 1 million years of use of the Hubble space telescope.

It is launched today (Monday, 11 November, 2013) and has been developed by researchers at The Institute of Cancer Research, London, using funding from Cancer Research UK.

 

The new CanSAR database is more than double the size of a previous version and has been designed to cope with a huge expansion of data on cancer brought about by advances in DNA sequencing and other technologies.

 

The resource is being made freely available by The Institute of Cancer Research (ICR) and Cancer Research UK, and will help researchers worldwide make use of vast quantities of data, including data from patients, clinical trials and genetic, biochemical and pharmacological research.

 

Although the prototype of CanSAR was on a much smaller scale, it attracted 26,000 unique users in more than 70 countries around the world, and earlier this year was used to identify 46 potentially 'druggable' cancer proteins that had previously been overlooked*.

 

Peer-reviewed scientific paper (NAR) about CanSAR:

http://nar.oxfordjournals.org/content/40/D1/D947.full

more...
No comment yet.
Scooped by Dr. Stefan Gruenwald
Scoop.it!

Amazing Science: Bioinformatics Postings

Amazing Science: Bioinformatics Postings | Amazing Science | Scoop.it

Bioinformatics is an interdisciplinary field that develops and improves upon methods for storing, retrieving, organizing and analyzing biological data. A major activity in bioinformatics is to develop software tools to generate useful biological knowledge. Bioinformatics has become an important part of many areas of biology. In experimental molecular biology, bioinformatics techniques such as image and signal processing allow extraction of useful results from large amounts of raw data. In the field of genetics and genomics, it aids in annotating genomes and their observed mutations.

more...
No comment yet.
Scooped by Dr. Stefan Gruenwald
Scoop.it!

Stunning interactive model of the HIV Virus from the RCSB Protein Data Bank

Stunning interactive model of the HIV Virus from the RCSB Protein Data Bank | Amazing Science | Scoop.it
more...
No comment yet.
Scooped by Dr. Stefan Gruenwald
Scoop.it!

UCI team target p53 for treating wide spectrum of cancers

UCI team target p53 for treating wide spectrum of cancers | Amazing Science | Scoop.it

UC Irvine biologists, chemists and computer scientists have identified an elusive pocket on the surface of the p53 protein that can be targeted by cancer-fighting drugs. The finding heralds a new treatment approach, as mutant forms of this protein are implicated in nearly 40 percent of diagnosed cases of cancer, which kills more than half a million Americans each year.

 

In an open-source study published online this week in Nature Communications, the UC Irvine researchers describe how they employed a computational method to capture the various shapes of the p53 protein. In its regular form, p53 helps repair damaged DNA in cells or triggers cell death if the damage is too great; it has been called the “guardian of the genome.”

Mutant p53, however, does not function properly, allowing the cancer cells it normally would target to slip through control mechanisms and proliferate. For this reason, the protein is a key target of research on cancer therapeutics.

Within cells, p53 proteins undulate constantly, much like a seaweed bed in the ocean, making binding sites for potential drug compounds difficult to locate. But through a computational method called molecular dynamics, the UC Irvine team created a computer simulation of these physical movements and identified an elusive binding pocket that’s open only 5 percent of the time.

 

After using a computer to screen a library of 2,298 small molecules, the researchers selected the 45 most promising to undergo biological assays. Among these 45 compounds, they found one, called stictic acid, that fits into the protein pocket and triggers tumor-suppressing abilities in mutant p53s.

While stictic acid cannot be developed into a viable drug, noted study co-leader Peter Kaiser, professor of biological chemistry, the work suggests that a comprehensive screening of small molecules with similar traits may uncover a usable compound that binds to this specific p53 pocket.

 

“The discovery and pharmaceutical development of such a compound could have a profound impact on cancer treatments,” Kaiser said. “Instead of focusing on a specific form of the disease, oncologists could treat a wide spectrum of cancers, including those of the lung and breast.” He added that there is currently one group of experimental drugs – called Nutlins – that stop p53 degradation, but they don’t target protein mutations as would a drug binding to the newly discovered pocket.

 

The results are the culmination of years of labor by researchers with UC Irvine’s Institute for Genomics & Bioinformatics and the Chao Family Comprehensive Cancer Center.

more...
No comment yet.
Scooped by Dr. Stefan Gruenwald
Scoop.it!

Solving puzzles without a picture: New algorithm assembles chromosomes from next generation sequencing data

Solving puzzles without a picture: New algorithm assembles chromosomes from next generation sequencing data | Amazing Science | Scoop.it

One of the most difficult problems in the field of genomics is assembling relatively short "reads" of DNA into complete chromosomes. In a new paper published in Proceedings of the National Academy of Sciences an interdisciplinary group of genome and computer scientists has solved this problem, creating an algorithm that can rapidly create "virtual chromosomes" with no prior information about how the genome is organized.

 

The powerful DNA sequencing methods developed about 15 years ago, known as next generation sequencing (NGS) technologies, create thousands of short fragments. In species whose genetics has already been extensively studied, existing information can be used to organize and order the NGS fragments, rather like using a sketch of the complete picture as a guide to a jigsaw puzzle. But as genome scientists push into less-studied species, it becomes more difficult to finish the puzzle.

 

To solve this problem, a team led by Harris Lewin, distinguished professor of evolution and ecology and vice chancellor for research at the University of California, Davis and Jian Ma, assistant professor at the University of Illinois at Urbana-Champaign created a computer algorithm that uses the known chromosome organization of one or more known species and NGS information from a newly sequenced genome to create virtual chromosomes.

 

"We show for the first time that chromosomes can be assembled from NGS data without the aid of a preexisting genetic or physical map of the genome," Lewin said. The new algorithm will be very useful for large-scale sequencing projects such as G10K, an effort to sequence 10,000 vertebrate genomes of which very few have a map, Lewin said.

 

"As we have shown previously, there is much to learn about phenotypic evolution from understanding how chromosomes are organized in one species relative to other species," he said. The algorithm is called RACA (for reference-assisted chromosome assembly), co-developed by Jaebum Kim, now at Konkuk University, South Korea, and Denis Larkin of Aberystwyth University, Wales. Kim wrote the software tool which was evaluated using simulated data, standardized reference genome datasets as well as a primary NGS assembly of the newly sequenced Tibetan antelope genome generated by BGI (Shenzhen, China) in collaboration with Professor Ri-Li Ge at Qinghai University, China.

 

Larkin led the experimental validation, in collaboration with scientists at BGI, proving that predictions of chromosome organization were highly accurate. Ma said that the new RACA algorithm will perform even better as developing NGS technologies produce longer reads of DNA sequence. "Even with what is expected from the newest generation of sequencers, complete chromosome assemblies will always be a difficult technical issue, especially for complex genomes. RACA predictions address this problem and can be incorporated into current NGS assembly pipelines," Ma said.

more...
No comment yet.
Scooped by Dr. Stefan Gruenwald
Scoop.it!

Toward a new biocomputational model of the living cell

Toward a new biocomputational model of the living cell | Amazing Science | Scoop.it
Turning vast amounts of genomic data into meaningful information about the cell is the great challenge of bioinformatics, with major implications for human biology and medicine. Researchers at the University of California, San Diego School of Medicine and colleagues have proposed a new method that creates a computational model of the cell from large networks of gene and protein interactions, discovering how genes and proteins connect to form higher-level cellular machinery.

"Our method creates ontology, or a specification of all the major players in the cell and the relationships between them," said first author Janusz Dutkowski, PhD, postdoctoral researcher in the UC San Diego Department of Medicine. It uses knowledge about how genes and proteins interact with each other and automatically organizes this information to form a comprehensive catalog of gene functions, cellular components, and processes.

"What's new about our ontology is that it is created automatically from large datasets. In this way, we see not only what is already known, but also potentially new biological components and processes – the bases for new hypotheses," said Dutkowski.

Originally devised by philosophers attempting to explain the nature of existence, ontologies are now broadly used to encapsulate everything known about a subject in a hierarchy of terms and relationships. Intelligent information systems, such as iPhone's Siri, are built on ontologies to enable reasoning about the real world. Ontologies are also used by scientists to structure knowledge about subjects like taxonomy, anatomy and development, bioactive compounds, disease and clinical diagnosis.
more...
No comment yet.
Scooped by Dr. Stefan Gruenwald
Scoop.it!

A Combinatorial Amino Acid Code for Sequence-Specific RNA Recognition by Pentatricopeptide Repeat Proteins

A Combinatorial Amino Acid Code for Sequence-Specific RNA Recognition by Pentatricopeptide Repeat Proteins | Amazing Science | Scoop.it

The pentatricopeptide repeat (PPR) is a helical repeat motif found in an exceptionally large family of RNA–binding proteins that functions in mitochondrial and chloroplast gene expression. PPR proteins harbor between 2 and 30 repeats and typically bind single-stranded RNA in a sequence-specific fashion. However, the basis for sequence-specific RNA recognition by PPR tracts has been unknown until now. A multinational research group recently used computational methods to infer a code for nucleotide recognition involving two amino acids in each repeat, and validated this model by recoding a PPR protein to bind novel RNA sequences in vitro. Their results show that PPR tracts bind RNA via a modular recognition mechanism that differs from previously described RNA–protein recognition modes and that underpins a natural library of specific protein/RNA partners of unprecedented size and diversity. These findings provide a significant step toward the prediction of native binding sites of the enormous number of PPR proteins found in nature. Furthermore, the extraordinary evolutionary plasticity of the PPR family suggests that the PPR scaffold will be particularly amenable to redesign for new sequence specificities and functions and ultimately lead to a new way to do genome corrections and gene therapy.

more...
No comment yet.
Scooped by Dr. Stefan Gruenwald
Scoop.it!

Development of an Epigenome Browser to Help Understanding the Functional Complexity of the Genome

Development of an Epigenome Browser to Help Understanding the Functional Complexity of the Genome | Amazing Science | Scoop.it

Advances in next-generation sequencing platforms have reshaped the landscape of functional genomic and epigenomic research as well as human genetics studies. Annotation of noncoding regions in the genome with genomic and epigenomic data has facilitated the generation of new, testable hypotheses regarding the functional consequences of genetic variants associated with human complex traits1. Large consortia, such as the US National Institutes of Health (NIH) Roadmap Epigenomics Consortium3 and ENCODE4, have generated tens of thousands of sequencing-based genome-wide data sets, creating a useful resource for the scientific community5. The WashU Epigenome Browser68 continues to provide a platform for investigators to effectively engage with this resource in the context of analyzing their own data.


The newly developed Roadmap Epigenome Browser (http://epigenomegateway.wustl.edu/browser/roadmap/) is based on the WashU Epigenome Browser and integrates data from both the NIH Roadmap Epigenomics Consortium and ENCODE in a visualization and bioinformatics tool that enables researchers to explore the tissue-specific regulatory roles of genetic variants in the context of disease. The browser takes advantage of the over 10,000 epigenomic data sets it currently hosts, including 346 'complete epigenomes', defined as tissues and cell types for which we have collected a complete set of DNA methylation, histone modification, open chromatin and other genomic data sets9.


Data from both the NIH Roadmap Epigenomics and ENCODE resources are seamlessly integrated in the browser using a new Data Hub Cluster framework. Investigators can specify any number of single nucleotide polymorphism (SNP)-associated regions and any type of epigenomic data, for which the browser automatically creates virtual data hubs through a shared hierarchical metadata annotation, retrieves the data and performs real-time clustering analysis. Investigators interact with the browser to determine the tissue specificity of the epigenetic state encompassing genetic variants in physiologically or pathogenically relevant cell types from normal or diseased samples.


The epigenomic annotation of two noncoding SNPs can be identified from genome-wide association studies of people with multiple sclerosis10, by clustering the histone H3K4me1 profile of SNP-harboring regions and RNA-seq signal of their closest genes across multiple primary tissues and cellsThus, reference epigenomes provide important clues into the functional relevance of these genetic variants in the context of the pathophysiology of multiple sclerosis, including inflammation11.

more...
No comment yet.
Scooped by Dr. Stefan Gruenwald
Scoop.it!

First Human Protein Atlas and major protein analysis published

First Human Protein Atlas and major protein analysis published | Amazing Science | Scoop.it

The Human Protein Atlas, a major multinational research project supported by the Knut and Alice Wallenberg Foundation, recently launched (November 6, 2014) an open source tissue-based interactive map of the human protein. Based on 13 million annotated images, the database maps the distribution of proteins in all major tissues and organs in the human body, showing both proteins restricted to certain tissues, such as the brain, heart, or liver, and those present in all. As an open access resource, it is expected to help drive the development of new diagnostics and drugs, but also to provide basic insights in normal human biology.


In the Science article, "Tissue-based Atlas of the Human Proteome", the approximately 20,000 protein coding genes in humans have been analysed and classified using a combination of genomics, transcriptomics, proteomics, and antibody-based profiling, says the article's lead author, Mathias Uhlén, Professor of Microbiology at Stockholm's KTH Royal Institute of Technology and the director of the Human Protein Atlas program. The analysis shows that almost half of the protein-coding genes are expressed in a ubiquitous manner and thus found in all analysed tissues.


Approximately 15% of the genes show an enriched expression in one or several tissues or organs, including well-known tissue-specific proteins, such as insulin and troponin. The testes, or testicles, have the most tissue-enriched proteins followed by the brain and the liver. The analysis suggests that approximately 3,000 proteins are secreted from the cells and an additional 5,500 proteins are located to the membrane systems of the cells.


"This is important information for the pharmaceutical industry. We show that 70% of the current targets for approved pharmaceutical drugs are either secreted or membrane-bound proteins," Uhlén says. "Interestingly, 30% of these protein targets are found in all analysed tissues and organs. This could help explain some side effects of drugs and thus might have consequences for future drug development." The analysis also contains a study of the metabolic reactions occurring in different parts of the human body. The most specialised organ is the liver with a large number of chemical reactions not found in other parts of the human body.

more...
No comment yet.
Scooped by Dr. Stefan Gruenwald
Scoop.it!

Bioinformatics: Big data from DNA Sequencing is giving new Insights into Cancer Development and Treatment Options

Bioinformatics: Big data from DNA Sequencing is giving new Insights into Cancer Development and Treatment Options | Amazing Science | Scoop.it
The torrents of data flowing out of cancer research and treatment are yielding fresh insight into the disease.


In 2013, geneticist Stephen Elledge answered a question that had puzzled cancer researchers for nearly 100 years. In 1914, German biologist Theodor Boveri suggested that the abnormal number of chromosomes — called aneuploidy — seen in cancers might drive the growth of tumors. For most of the next century, researchers made little progress on the matter. They knew that cancers often have extra or missing chromosomes or pieces of chromosomes, but they did not know whether this was important or simply a by-product of tumor growth — and they had no way of finding out.


Elledge found that where aneuploidy had resulted in missing tumor-suppressor genes, or extra copies of the oncogenes that promote cancer, tumors grow more aggressively (T. Davoli et al.Cell 1559489622013). His insight — that aneuploidy is not merely an odd feature of tumors, but an engine of their growth — came from mining voluminous amounts of cellular data. And, says Elledge, it shows how the ability of computers to sift through ever-growing troves of information can help us to deepen our understanding of cancer and open the door to discoveries.


Modern cancer care has the potential to generate huge amounts of data. When a patient is diagnosed, the tumor's genome might be sequenced to see if it is likely to respond to a particular drug. The sequencing might be repeated as treatment progresses to detect changes. The patient might have his or her normal tissue sequenced as well, a practice that is likely to grow as costs come down. The doctor will record the patient's test results and medical history, including dietary and smoking habits, in an electronic health record. The patient may also have computed tomography (CT) and magnetic resonance imaging (MRI) scans to determine the stage of the disease. Multiply all that by the nearly 1.7 million people diagnosed with cancer in 2013 in the United States alone and it becomes clear that oncology is going to generate even more data than it does now. Computers can mine the data for patterns that may advance the understanding of cancer biology and suggest targets for therapy.


Elledge's discovery was the result of a computational method that he and his colleagues developed, called the Tumor Suppressor and Oncogene Explorer. They used it to mine large data sets, including the Cancer Genome Atlas, maintained by the US National Cancer Institute, based in Bethesda, Maryland, and the Catalogue of Somatic Mutations in Cancer, run by the Wellcome Trust Sanger Institute in Hinxton, UK. The databases contained roughly 1.2 million mutations from 8,207 tissue samples of more than 20 types of tumor.


Analyzing the genomes of 8,200 tumors is just a start. Researchers are “trying to figure out how we can bring together and analyze, over the next few years, a million genomes”, says Robert Grossman, who directs the Initiative in Data Intensive Science at the University of Chicago in Illinois. This is an immense undertaking; the combined cancer genome and normal genome from a single patient constitutes about 1 terabyte (1012 bytes) of data, so a million genomes would generate an exabyte (1018 bytes). Storing and analysing this much data could cost US$100 million a year, Grossman says.


But it is the new technologies that are creating an information boom. “We can collect data faster than we can physically do anything with them,” says Manish Parashar, a computer scientist and head of the Rutgers Discovery Informatics Institute in Piscataway, New Jersey, who collaborates with Foran to find ways of handling the information. “There are some fundamental challenges being caused by our ability to capture so much data,” he says.


A major problem with data sets at the terabyte-and-beyond level is figuring out how to manipulate all the data. A single high-resolution medical image can take up tens of gigabytes, and a researcher might want the computer to compare tens of thousands of such images. Breaking down just one image in the Rutgers project into sets of pixels that the computer can identify takes about 15 minutes, and moving that much information from where it is stored to where it can be processed is difficult. “Already we have people walking around with disk drives because you can't effectively use the network,” Parashar says.


Informatics researchers are developing algorithms to split data into smaller packets for parallel processing on separate processors, and to compress files without omitting any relevant information. And they are relying on advances in computer science to speed up processing and communications in general.


Foran emphasizes that the understanding and treatment of cancer has undergone a dramatic shift as oncology has moved from one-size-fits-all attacks on tumours towards personalized medicine. But cancers are complex diseases controlled by many genes and other factors. “It's not as if you're going to solve cancer,” he says. But big data can provide new, better-targeted ways of grappling with the disease. “You're going to come up with probably a whole new set of blueprints for how to treat patients.”


more...
Rescooped by Dr. Stefan Gruenwald from Data visualization
Scoop.it!

Circle of Life: The Beautiful New Way to Visualize Biological Data

Circle of Life: The Beautiful New Way to Visualize Biological Data | Amazing Science | Scoop.it

Via Claudia Mihai
more...
Eli Levine's curator insight, February 20, 2014 12:24 PM

Way cool.

 

This just emphasizes the fact that we're all of one basic, biological species origin and not of a kind of "Creation orchard" as Mr. Ken Ham put it in his debate with Bill Nye.

 

"We are one, and one is all." - Gorillaz.

 

Think about it.

Scooped by Dr. Stefan Gruenwald
Scoop.it!

Expanding the olfactory code by in silico decoding of odor-receptor chemical space

Expanding the olfactory code by in silico decoding of odor-receptor chemical space | Amazing Science | Scoop.it

The peripheral olfactory system is unparalleled in its ability to detect and discriminate amongst an extremely large number of volatile compounds in the environment. To detect this wide variety of volatiles, most organisms have evolved large families of receptor genes that typically encode 7-transmembrane proteins expressed in the olfactory neurons. Coding of information in the peripheral olfactory system depends on two fundamental factors: interaction of individual odors with subsets of the odorant receptor repertoire and mode of signaling that an individual receptor-odor interaction elicits, activation or inhibition. Each volatile chemical in the environment is thought to interact with a specific subset of odorant receptors depending upon odor structure and binding sites on the receptor. This precise detection and coding of odors by the peripheral olfactory neurons are subsequently processed, transformed and integrated in the central nervous system to generate specific behavioral responses that are critical for survival such as finding food, finding mates, avoiding predators etc.

 

A group of researchers has now developed a cheminformatics pipeline that predicts receptor–odorant interactions from a large collection of chemical structures (>240,000) for receptors that have been tested to a smaller panel of odorants (∼100). Using a computational approach, they first identify shared structural features from known ligands of individual receptors. They then used these features to screen in silico new candidate ligands from >240,000 potential volatiles for several Odorant receptors (Ors) in the Drosophila (fruitfly) antenna. Functional experiments from 9 Ors support a high success rate (∼71%) for the screen, resulting in identification of numerous new activators and inhibitors. Such computational prediction of receptor–odor interactions has the potential to enable systems level analysis of olfactory receptor repertoires in organisms.

more...
No comment yet.
Rescooped by Dr. Stefan Gruenwald from Science And Wonder
Scoop.it!

Is ‘massive open online research’ (MOOR) the next frontier for education?

Is ‘massive open online research’ (MOOR) the next frontier for education? | Amazing Science | Scoop.it

UC San Diego is launching the first major online course that prominently features “massive open online research” (MOOR).

 

For Bioinformatics, UC San Diego computer science and engineering professor Pavel Pevzner and his graduate students are offering a course on Coursera that combines research with a MOOC (massive open online course) for the first time.

 

“All students who sign up for the course will be given an opportunity to work on specific research projects under the leadership of prominent bioinformatics scientists from different countries, who have agreed to interact and mentor their respective teams.”

 

“The natural progression of education is for people to make a transition from learning to research, which is a huge jump for many students, and essentially impossible for students in isolated areas,” said Ph.D. student Phillip Compeau, who helped develop the online course. “By integrating the research with an interactive text and a MOOC, it creates a pipeline to streamline this transition.”

 

Bioinformatics Algorithms (Part I) will run for eight weeks starting October 21, 2013, and students are now able to sign up and download some of the course materials. It is offered free of charge to everyone.

 

Another unique feature of the online course: Pevzner and Compeau have developed Bioinformatics Algorithms: An Active-Learning Approach, a e-book supporting the course, while Pevzner’s colleagues in Russia developed a content delivery system that integrates the e-book with hundreds of quizzes and dozens of homework problems.

 

The U.S.-Russian team, led by Pevzner’s foreign student Nikolay Vyahhi, also implemented the online course using the beta version of Stepic, a new, fully integrated educational platform and startup developed by Vyahhi. Stepic derives its name from the “step-by-step, epic” solution its developers delivered for electronic publishing.

 

The course also provides access to Rosalind, a free online resource for learning bioinformatics through problem solving. Rosalind was developed by Pevzner’s students and colleagues in San Diego and St. Petersburg with funding from the Howard Hughes Medical Institute, the Russian Ministry of Education, and Russian Internet billionaires Yuri Milner and Pavel Durov through their “Start Fellows” award. Rosalind already has over 10,000 active users worldwide.

 

Rosalind — named in honor of British scientist Rosalind Franklin, whose X-ray crystallography with Raymond Gosling facilitated the discovery of the DNA double helix by Watson and Crick — will grade the programming assignments. They come in the form of bioinformatics problems of growing complexity as the course progresses.

 

“We developed Rosalind to inspire both biologists and computer science students,” said Rosalind principal developer Vyahhi, who worked with Pevzner during the latter’s sabbatical in Russia. “The platform allows biologists to develop vital programming skills for bioinformatics at their own pace, and Rosalind can also appeal to programmers who have never been exposed to some of the exciting computational problems generated by molecular biology.”


Via LilyGiraud
more...
No comment yet.
Scooped by Dr. Stefan Gruenwald
Scoop.it!

Genokey: Mining the cancer genome -- six billion data points linking drugs with the changes in the genome

Genokey: Mining the cancer genome -- six billion data points linking drugs with the changes in the genome | Amazing Science | Scoop.it

Scientists at the National Cancer Institute (NCI) have created a dataset of cancer-specific genetic coding variants that includes six billion data points linking drugs with the changes in the genome.

 

This source of big data, reported to be the most extensive cancer pharmacology database in the world, provides an opportunity for data mining in cancer drug discovery, and could help researchers understand why patients’ responses to cancer drugs vary and how tumours become resistant. It also has potential to help move forward the quest for personalised medicine.

 

The team carried out whole-exome sequencing of the 60 cell lines in the NCI-60 human cancer cell line panel and catalogued the genetic coding variants, including type I variants (found in the normal population) and type II variants (cancer-specific). The NCI-60 cell lines have been studied extensively, and include cells from nine tissues of origin, including breast, ovary, prostate, colon, lung, kidney, brain, blood, and skin. They used algorithms to predict the sensitivity of the cells with the type II variants and 103 approved anticancer drugs and 207 drugs in development, to see if this could be used to predict response.

 

According to James Doroshow, the director of the NCI’s Division of Cancer Treatment and Diagnosis, in an interview in Shots, thousands of drugs have been screened using the NCI-60 panel, to see their impact on the cells. The researchers have also analysed 5000 different combinations of approved drugs to see if they can find drugs that work well together.

 

In an interview with the American Association of Cancer Research, Yves Pommier, chief of the Laboratory of Molecular Pharmacology at the NCI in Bethesda, explained that the team is making the data set public.

“Opening this extensive data set to researchers will expand our knowledge and understanding of tumorigenesis, as more and more cancer-related gene aberrations are discovered,” Pommier said in the interview. “This comes at a great time, because genomic medicine is becoming a reality, and I am very hopeful this valuable information will change the way we use drugs for precision medicine.”

 

The research was published in Cancer Research, and the genomic data is available through the CellMiner website, amongst others.

 

While not involved in this study, GenoKey focuses on data mining and analytics, and has used a combination of genetic and clinical data, along with combinatorial analysis, to find links between genetic changes and bipolar disorder. The NCI’s dataset of cancer-specific genetic coding variants, and GenoKey’s work, show the power of using big data and healthcare analytics in medical and biopharma research, particularly when combining genetic and clinical data.

more...
No comment yet.
Scooped by Dr. Stefan Gruenwald
Scoop.it!

Irreversible Evolution? Dust Mites Show Parasites Can Violate Dollo’s Law

Irreversible Evolution? Dust Mites Show Parasites Can Violate Dollo’s Law | Amazing Science | Scoop.it

Our world is quite literally lousy with parasites. We are hosts to hundreds of them, and they are so common that in some ecosystems, the total mass of them can outweigh top predators by 20 fold. Even parasites have parasites. It’s such a good strategy that over 40% of all known species are parasitic. They steal genes from their hosts, take over other animals’ bodies, and generally screw with their hosts’ heads. But there’s one thing that we believed they couldn’t do: stop being parasites. Once the genetic machinery set the lifestyle choice in motion, there’s supposed to be no going back to living freely. Once a parasite, always a parasite - unless you’re a mite.

 

In evolutionary biology, the notion of irreversibility is known as Dollo’s Law after the Belgian paleontologist that first hypothesized it in 1893. He stated that once a lineage had lost or modified organs or structures, that they couldn’t turn back the clock and un-evolve those changes. Or, as he put it, “an organism is unable to return, even partially, to a previous stage already realized in the ranks of its ancestors.”

 

While some animals seem to challenge Dollo’s Law, it has long been a deeply held belief in the field of parasitology. Parasitism is, in general, a process of reduction. Adjusting to survival on or in another animal is a severe evolutionary undertaking, and many parasites lose entire organs or even body systems, becoming entirely dependent on their hosts to perform biological tasks like breaking down food or locomotion. Parasitology textbooks often talk about the irreversibility of becoming a parasite in very finite terms. “Parasites as a whole are worthy examples of the inexorable march of evolution into blind alleys” says Noble & Noble’s 1976 Parasitology: the Biology of Animal Parasites.

 

Robert Poulin is even more direct: “Once they are dependent on the host there is no going back. In other words, early specialisation for a parasitic life commits a lineage forever.” Now, parasites are proving that not only can they evade immune systems, trick other animals, and use their hosts’ bodies in hundreds of nefarious ways, some can go back to living on their own. This is exactly what scientists now believed happened in the Pyroglyphidae — the dust mites.

 

Mites, as a whole, are a frighteningly successful but often overlooked group of organisms. More than 48,000 species have been described. These minuscule relatives of spiders can be found worldwide in just about every habitat you can imagine. Many are free-living, but there are also a number of parasitic species, including all-too-familiar pests like Sarcoptes scabiei, the mite which causes scabies. Exactly how the different groups of mites are related to each other, however, has been a hot topic of debate amongst mite biologists. Though the closest relatives of dust mites are the Psoroptidia, a large and diverse parasitic group of mites, many have argued that dust mites came from free-living ancestors — ‘living fossils’ of a sort, the only surviving line of ancestral free-living mites that later gave rise to parasites. In fact, Pavel Klimov and Barry O’Connor from the University of Michigan were able to find 62 different hypothesis as to how the free-living the dust mites fit into the mite family tree. Sixty-two, the team decided, was simply too many. So, they turned to the mites’ genes.

 

To test which of the hypotheses had the most merit, Klimov and O’Conner conscripted a team of 64 biologists in 19 countries to obtain over 700 mite specimens, which they then used to construct a mite family tree. They sequenced five nuclear genes from each species, then applied statistical analyses to construct a tree of relationships called a phylogeny. And that’s when they saw it: deeply nested inside a large group of parasites were our everyday, non-parasitic, allergy-causing dust mites.

 

“This result was so surprising that we decided to contact our colleagues to obtain their feedback prior to sending these data for publication,” said lead author Pavel Klimov. “Parasites can quickly evolve highly sophisticated mechanisms for host exploitation and can lose their ability to function away from the host body,” he explained. “Many researchers in the field perceive such specialization as evolutionarily irreversible.” But, their data were clear. “All our analyses conclusively demonstrated that house dust mites have abandoned a parasitic lifestyle, secondarily becoming free-living.”

more...
No comment yet.
Scooped by Dr. Stefan Gruenwald
Scoop.it!

Not an easy task: Finding cancer-specific genomics fingerprints

Not an easy task: Finding cancer-specific genomics fingerprints | Amazing Science | Scoop.it

Researchers from the Wellcome Trust Sanger Institute's cancer genome project have developed a computer model to identify the fingerprints of DNA-damaging processes that drive cancer development. Armed with these signatures, scientists will be able to search for the chemicals, biological pathways and environmental agents responsible.

 

"For a long time we have known that mutational signatures exist in cancer," says Dr Peter Campbell, Head of the cancer genome project and co-senior author of the paper. "For example UV light and tobacco smoke both produce very specific signatures in a person's genome. Using our computational framework, we expect to uncover and identify further mutational signatures that are diagnostic for specific DNA-damaging processes, shedding greater light on how cancer develops."

 

The computer model will help to overcome a fundamental problem in studying cancer genomes: that the DNA contains not only the mutations that have contributed to cancer development, but also an entire lifetime's worth of other mutations that have also been acquired. These mutations are layered on top of each other and trying to unpick the individual mutations, when they appeared, and the processes that caused them is a daunting task.

 

"The problem we have solved can be compared to the well-known cocktail party problem," explains Ludmil Alexandrov, first author of the paper from Sanger Institute. "At a party there are lots of people talking simultaneously and, if you place microphones all over the room, each one will record a mixture of all the conversations. To understand what is going on you need to be able to separate out the individual discussions. The same is true in cancer genomics. We have catalogues of mutations from cancer genomes and each catalogue contains the signatures of all the mutational processes that have acted on that patient's genome since birth. Our model allows us to identify the signatures produced by different mutation-causing processes within these catalogues."


To identify individual sets of mutations produced by a particular DNA-damaging agent, the cancer genome project at the Sanger Institute simulated cancer genomes and developed a technique to search for these mutational signatures. This approach proved to be very successful. The research team then explored the genomes of 21 breast cancer patients and identified five mutational signatures of cancer-causing processes in the real world.

more...
No comment yet.
Scooped by Dr. Stefan Gruenwald
Scoop.it!

Imitation of Life: Can a Computer Program simulate everything inside a living cell?

Imitation of Life: Can a Computer Program simulate everything inside a living cell? | Amazing Science | Scoop.it

Almost 30 years ago, Harold J. Morowitz, who was then at Yale, set forth a bold plan for molecular biology. He outlined a campaign to study one of the smallest single-celled organisms, a bacterium of the genus Mycoplasma. The first step would be to decipher its complete genetic sequence, which in turn would reveal the amino acid sequences of all the proteins in the cell. In the 1980s reading an entire genome was not the routine task it is today, but Morowitz argued that the analysis should be possible if the genome was small enough. He calculated the information content of mycoplasma DNA to be about 160,000 bits, then added: Alternatively, this much DNA will code for about 600 proteins—which suggests that the logic of life can be written in 600 steps. Completely understanding the operations of a prokaryotic cell is a visualizable concept, one that is within the range of the possible.

 

There was one more intriguing element to Morowitz’s plan: At 600 steps, a computer model is feasible, and every experiment that can be carried out in the laboratory can also be carried out on the computer. The extent to which these match measures the completeness of the paradigm of molecular biology.

 

Looking back on these proposals from the modern era of industrial-scale genomics and proteomics, there’s no doubt that Morowitz was right about the feasibility of collecting sequence data. On the other hand, the challenges of writing down “the logic of life” in 600 steps and “completely understanding” a living cell still look fairly daunting. And what about the computer program that would simulate a living cell well enough to match experiments carried out on real organisms?

 

As it happens, a computer program with exactly that goal was published last summer by Markus W. Covert of Stanford University and eight coworkers. The program, called the WholeCell simulation, describes the full life cycle of Mycoplasma genitalium, a bacterium from the genus that Morowitz had suggested. Included in the model are all the major processes of life: transcription of DNA into RNA, translation of RNA into protein, metabolism of nutrients to produce energy and structural constituents, replication of the genome, and ultimately reproduction by cell fission. The outputs of the simulation do seem to match experimental results. So the question has to be faced: Are we on the threshold of “completing” molecular biology?

more...
No comment yet.
Scooped by Dr. Stefan Gruenwald
Scoop.it!

First predictive computational model of gene networks that control the development of sea-urchin embryos

First predictive computational model of gene networks that control the development of sea-urchin embryos | Amazing Science | Scoop.it

As an animal develops from an embryo, its cells take diverse paths, eventually forming different body parts—muscles, bones, heart. In order for each cell to know what to do during development, it follows a genetic blueprint, which consists of complex webs of interacting genes called gene regulatory networks.

 

Biologists at the California Institute of Technology (Caltech) have spent the last decade or so detailing how these gene networks control development in sea-urchin embryos. Now, for the first time, they have built a computational model of one of these networks. This model, the scientists say, does a remarkably good job of calculating what these networks do to control the fates of different cells in the early stages of sea-urchin development—confirming that the interactions among a few dozen genes suffice to tell an embryo how to start the development of different body parts in their respective spatial locations. The model is also a powerful tool for understanding gene regulatory networks in a way not previously possible, allowing scientists to better study the genetic bases of both development and evolution.

more...
No comment yet.
Scooped by Dr. Stefan Gruenwald
Scoop.it!

3D model of the Ebola virus

3D model of the Ebola virus | Amazing Science | Scoop.it

The Ebola virus and it’s close relative the Marburg virus are members of the Filoviridae family. These viruses are the causative agents of severe hemorrhagic fever, a disease with a fatality rate of up to 90%. The Ebola virus infects mainly the capillary endothelium and several types of immune cells. The symptoms of Ebola infection include maculopapular rash, petechiae, purpura, ecchymoses, dehydration and hematomas.

 

Since Ebola was first described in 1976, there have been several epidemics of this disease. Hundreds of people have died because of Ebola infections, mainly in Zaire, Sudan, Congo and Uganda. In addition, several fatalities have occurred because of accidents in laboratories working with the virus. Currently, a number of scientists claim that terrorists may use Ebola as a biological weapon.

 

In the 3D model presented in this study, Ebola-encoded structures are shown in maroon, and structures from human cells are shown in grey. The Ebola model is based on X-ray analysis, NMR spectroscopy, and general virology data published in the last two decades. Some protein structures were predicted using computational biology techniques, such as molecular modeling.

more...
No comment yet.