Amazing Science
722.1K views | +17 today
Amazing Science
Amazing science facts - 3D_printing • aging • AI • anthropology • art • astronomy • bigdata • bioinformatics • biology • biotech • chemistry • computers • cosmology • education • environment • evolution • future • genetics • genomics • geosciences • green_energy • history • language • map • material_science • math • med • medicine • microscopy • nanotech • neuroscience • paleontology • photography • photonics • physics • postings • robotics • science • technology • video
Your new post is loading...
Scooped by Dr. Stefan Gruenwald!

A rapid comparative analysis of large amounts of metagenomes

A rapid comparative analysis of large amounts of metagenomes | Amazing Science |

Scientists from ITMO University in St. Petersburg, Russia, the Federal Research and Clinical Centre of Physical-Chemical Medicine and MIPT have developed a software program enabling them to quickly compare sets of DNA of microorganisms living in different environments. The researchers have already suggested exactly how the new program could be applied in practice. Using the algorithm to compare the microflora of a healthy person with the microflora of a patient, specialists would be able to detect previously unknown pathogens and their strains, which can aid the development of personalized medicine. The results of the study have been published in Bioinformatics.


Every person has a genome - a specific sequence of genes according to which an individual develops. However, any living organism contains another gene sequence that is called the metagenome.It is the total DNA content of the many different microorganisms that inhabit the same environment - bacteria, fungi, and viruses. The metagenome is often indicative of various diseases or predispositions to such diseases. Studying microbiota, i.e. the full range of microorganisms inhabiting different parts of the human body, has therefore a critical role in metagenomic research.


The software tool developed by the scientists and called MetaFast is able to conduct a rapid comparative analysis of large amounts of metagenomes. "In studying the intestinal microflora of patients, we may be able to detect microorganisms associated with a particular disease, such as diabetes, or a predisposition to the disease. This already forms a basis for applying personalized medicine techniques and developing new drugs. Using the results obtained with the software, biologists will be able to draw conclusions on how to further develop their research, because the algorithm enables them to study environments that we currently know nothing about," says Vladimir Ulyantsev, lead developer of the algorithm and researcher at the Computer Technologies Laboratory at ITMO University.

No comment yet.
Scooped by Dr. Stefan Gruenwald!

Deep learning applied to drug discovery and repurposing

Deep learning applied to drug discovery and repurposing | Amazing Science |

Deep neural networks for drug discovery (credit: Insilico Medicine, Inc.) Scientists from Insilico Medicine, Inc. have trained deep neural networks (DNNs).


Scientists from Insilico Medicine, Inc. have trained deep neural networks (DNNs) to predict the potential therapeutic uses of 678 drugs, using gene-expression data obtained from high-throughput experiments on human cell lines from Broad Institute’s LINCS databases and NIH MeSH databases. The supervised deep-learning drug-discovery engine used the properties of small molecules, transcriptional data, and literature to predict efficacy, toxicity, tissue-specificity, and heterogeneity of response.


“We used LINCS data from Broad Institute to determine the effects on cell lines before and after incubation with compounds, co-author and research scientist Polina Mamoshina explained. “We used gene expression data of total mRNA from cell lines extracted and measured before incubation with compound X and after incubation with compound X to identify the response on a molecular level. The goal is to understand how gene expression (the transcriptome) will change after drug uptake. It is a differential value, so we need a reference (molecular state before incubation) to compare.” The research is described in a paper in the upcoming issue of the journal Molecular Pharmaceutics.


Alex Zhavoronkov, PhD, Insilico Medicine CEO, who coordinated the study, said the initial goal of their research was to help pharmaceutical companies significantly accelerate their R&D and increase the number of approved drugs. “In the process we came up with more than 800 strong hypotheses in oncology, cardiovascular, metabolic, and CNS spaces and started basic validation,” he said.


The team measured the “differential signaling pathway activation score for a large number of pathways to reduce the dimensionality of the data while retaining biological relevance.” They then used those scores to train the deep neural networks.*


“This study is a proof of concept that DNNs can be used to annotate drugs using transcriptional response signatures, but we took this concept to the next level,” said Alex Aliper, president of research, Insilico Medicine, Inc., lead author of the study.


Via Pharma.AI, a newly formed subsidiary of Insilico Medicine, “we developed a pipeline for in silico drug discovery — which has the potential to substantially accelerate the preclinical stage for almost any therapeutic — and came up with a broad list of predictions, with multiple in silico validation steps that, if validated in vitro and in vivo, can almost double the number of drugs in clinical practice.”


Despite the commercial orientation of the companies, the authors agreed not to file for intellectual property on these methods and to publish the proof of concept. According to Mamoshina, earlier this month, Insilico Medicine scientists published the first deep-learned biomarker of human age — aiming to predict the health status of the patient — in a paper titled “Deep biomarkers of human aging: Application of deep neural networks to biomarker development” by Putin et al, in Aging; and an overview of recent advances in deep learning in a paper titled “Applications of Deep Learning in Biomedicine” by Mamoshina et al., also in Molecular Pharmaceutics.

No comment yet.
Rescooped by Dr. Stefan Gruenwald from Next Generation Sequencing (NGS)!

Roles of DNA methylation in prokaryotes

Roles of DNA methylation in prokaryotes | Amazing Science |

Researchers sequenced 230 diverse microbial genomes to learn more about the roles DNA methylation play and the planet’s microbial epigenomic diversity.


DNA methylation, the most common epigenetic change, is a process eukaryotes use to regulate gene expression, for example, keeping certain genes from turning on. Though prokaryotes (bacteria and archaea) are also known to have methylated DNA, the roles this process might play in these single cell organisms is less well understood.


To learn more, a team including researchers at the U.S. Department of Energy Joint Genome Institute (DOE JGI), a DOE Office of Science User Facility, relied on single-molecule, real-time (SMRT) sequencing at the DOE JGI and Pacific Biosciences to reveal DNA methylation patterns in 230 bacterial and archaeal genomes. They found evidence of DNA methylation in 215 microbes (93 percent of those sequenced). These data enabled the annotation of 600 enzymes that methylate DNA (MTases), a massive increase over known annotations. While many DNA methylating enzymes are part of restriction modification systems, consistent with their known role in defense against phages and viruses, the findings suggest that a substantial number of others may be involved in genome regulation, and have a more crucial role in prokaryotic physiology and biology than had been previously suspected.


By mapping and characterizing the epigenetic changes, scientists can associate those targeted genes with environmental adaptations and metabolic activities. Better understanding of such controls on gene expression and under what circumstances they are observed will improve the ability to predict when and where such microbes are detected. In addition, this will inform how microbes interact with plants and other microbes involved in DOE mission interests such as plant bioenergy feedstock growth, advanced biofuel generation, and soil carbon processing. The report was published February 12, 2016 in Plos Genetics and was highlighted February 29, 2016 inNature Genetics Reviews.

Via Integrated DNA Technologies
No comment yet.
Scooped by Dr. Stefan Gruenwald!

Scientists publish the first RNA interactome of the human nucleus

Scientists publish the first RNA interactome of the human nucleus | Amazing Science |

Studying sequence and function of DNA has been in the focus of life sciences for decades, but now the interest of many researchers has turned to the RNA. Today, many scientists believe that RNA molecules, together with a variety of different proteins, play a regulatory or structural role in virtually all cellular processes. However, the mechanisms underlying these RNA-protein interactions are still largely unknown. A team of scientists from the Max Planck Institute for Molecular Genetics in Berlin has now successfully identified hundreds of proteins that interact with RNA molecules in the nucleus of human cells. The researchers present the first RNA interactome of a human nucleus and describe how they have identified the bulk of RNA-binding proteins in the nucleus of human cells, using their newly developed method of "serial RNA interactome capturing".


For decades, proteins have been regarded as the main functional components in living cells. However, in recent years their paramount importance for cellular processes has been rivaled by the growing knowledge about the involvement of RNA molecules. RNA in the shape of messenger RNA (mRNA) and transfer RNA (tRNA) has been believed to act as a mere mediator between the DNA, carrying the genetic information, and the proteins, being the building blocks of the cell. But for a few years it has been shown now, that in addition to being a messenger for the genomic information, the RNA mediates several other functions. These non-coding functions of RNA include tasks in the regulation of gene transcription and protein production as well as the determination of the positions of other molecules within the cell.


Elucidating the interplay between proteins and RNA has therefore become crucial for understanding the molecular mechanisms of the development of organisms and the emergence of diseases.


A research group headed by Ulf Andersson Ørom at the Max Planck Institute for Molecular Genetics in Berlin has now for the first time created an overview of the numerous interactions between RNA molecules and proteins in a human nucleus. For this, the scientists had to modify the "RNA interactome capture technique" for analyzing RNA-protein interactions at first, so that they could use it to identify the RNA-protein interactions inside specific compartments of the cell, e.g., the nucleus. With the modified method, the researchers investigated the nuclei of a total of one billion cells in order to capture and catalogue all possible RNA-protein interactions in the nucleus.


"It has been particularly interesting for us that many of the discovered RNA-binding proteins do not only control the activity of genes and the fate of the resulting RNA molecules, but are also involved in the detection and repair of damaged DNA", Ørom explains. DNA damage such as false or missing bases or breakage of one or both strands of the DNA double helix can occur as a consequence of reactive oxygen species, UV-radiation or other external or internal stimuli. DNA-damage occurs thousands of times each day within every cell in the body. The cell responds with a complex repair process involving numerous proteins and RNA molecules that specifically repair damaged DNA, thus maintaining the functionality of the cell.


"A role for RNA in the repair of damaged DNA has been suspected for some time, but how RNA can impact on this process has remained unknown", says Ørom. "By identifying the protein factors that link RNA to the DNA damage response, this study contributes to a better understanding of these mechanisms." The scientists hope that their findings will contribute to a better understanding of the emergence of human diseases and the development of new therapies against cancer.

No comment yet.
Scooped by Dr. Stefan Gruenwald!

Mogrify, a predictive computational system for direct reprogramming between human cell types

Mogrify, a predictive computational system for direct reprogramming between human cell types | Amazing Science |

Transdifferentiation, the process of converting from one cell type to another without going through a pluripotent state, has great promise for regenerative medicine. The identification of key transcription factors for reprogramming is currently limited by the cost of exhaustive experimental testing of plausible sets of factors, an approach that is inefficient and unscalable. Now, scientists present a predictive system (Mogrify) that combines gene expression data with regulatory network information to predict the reprogramming factors necessary to induce cell conversion. They have applied Mogrify to 173 human cell types and 134 tissues, defining an atlas of cellular reprogramming. Mogrify correctly predicts the transcription factors used in known transdifferentiations. Furthermore, they validated two new transdifferentiations predicted by Mogrify. The researchers provide a practical and efficient mechanism for systematically implementing novel cell conversions, facilitating the generalization of reprogramming of human cells. Predictions are made available to help rapidly further the field of cell conversion.

To achieve this game-changing result, Professor Gough worked with then-PhD student Dr Owen Rackham (who now works at Duke-NUS Medical School in Singapore) for five years to develop a computational algorithm to predict the cellular factors for cell conversions. The algorithm was conceived from data collected as a part of the FANTOM international consortium (based at RIKEN, Japan) of which Professor Gough is a long time member.  The algorithm, called Mogrify, has been made available online for other researchers and scientists, so that the field may advance rapidly.

No comment yet.
Scooped by Dr. Stefan Gruenwald!

Scientists create world's largest protein map to reveal which proteins work together in a cell

Scientists create world's largest protein map to reveal which proteins work together in a cell | Amazing Science |
A multinational team of scientists have sifted through cells of vastly different organisms, from amoebae to worms to mice to humans, to reveal how proteins fit together to build different cells and bodies.

This tour de force of protein science, a result of a collaboration between seven research groups from three countries, led by Professor Andrew Emili from the University of Toronto's Donnelly Centre and Professor Edward Marcotte from the University of Texas at Austin, uncovered tens of thousands of new protein interactions, accounting for about a quarter of all estimated protein contacts in a cell. When even a single one of these interactions is lost it can lead to disease, and the map is already helping scientists spot individual proteins that could be at the root of complex human disorders. The data will be available to researchers across the world through open access databases. The study comes out in Nature on September 7, 2015.

While the sequencing of the human genome more than a decade ago was undoubtedly one of the greatest discoveries in biology, it was only the beginning of our in-depth understanding of how cells work. Genes are just blueprints and it is the genes' products, the proteins, that do much of the work in a cell. Proteins work in teams by sticking to each other to carry out their jobs. Many proteins come together to form so called molecular machines that play key roles, such a building new proteins or recycling those no longer needed by literally grinding them into reusable parts. But for the vast majority of proteins, and there are tens of thousands of them in human cells, we still don't know what they do.

This is where Emili and Marcotte's map comes in. Using a state-of-the-art method developed by the groups, the researchers were able to fish thousands of protein machineries out of cells and count individual proteins they are made of. They then built a network that, similar to social networks, offers clues into protein function based on which other proteins they hang out with. For example, a new and unstudied protein, whose role we don't yet know, is likely to be involved in fixing damage in a cell if it sticks to cell's known "handymen" proteins. Today's landmark study gathered information on protein machineries from nine species that represent the tree of life: baker's yeast, amoeba, sea anemones, flies, worms, sea urchins, frogs, mice and humans. The new map expands the number of known protein associations over 10 fold, and gives insights into how they evolved over time.

"For me the highlight of the study is its sheer scale. We have tripled the number of protein interactions for every species. So across all the animals, we can now predict, with high confidence, more than 1 million protein interactions - a fundamentally 'big step' moving the goal posts forward in terms of protein interactions networks," says Emili, who is also Ontario Research Chair in Biomarkers in Disease Management and a professor in the Department of Molecular Genetics.

The researchers discovered that tens of thousands of protein associations remained unchanged since the first ancestral cell appeared, one billion years ago (!), preceding all of animal life on Earth. "Protein assemblies in humans were often identical to those in other species. This not only reinforces what we already know about our common evolutionary ancestry, it also has practical implications, providing the ability to study the genetic basis for a wide variety of diseases and how they present in different species," says Marcotte.

No comment yet.
Scooped by Dr. Stefan Gruenwald!

Development of an Epigenome Browser to Help Understanding the Functional Complexity of the Genome

Development of an Epigenome Browser to Help Understanding the Functional Complexity of the Genome | Amazing Science |

Advances in next-generation sequencing platforms have reshaped the landscape of functional genomic and epigenomic research as well as human genetics studies. Annotation of noncoding regions in the genome with genomic and epigenomic data has facilitated the generation of new, testable hypotheses regarding the functional consequences of genetic variants associated with human complex traits1. Large consortia, such as the US National Institutes of Health (NIH) Roadmap Epigenomics Consortium3 and ENCODE4, have generated tens of thousands of sequencing-based genome-wide data sets, creating a useful resource for the scientific community5. The WashU Epigenome Browser68 continues to provide a platform for investigators to effectively engage with this resource in the context of analyzing their own data.

The newly developed Roadmap Epigenome Browser ( is based on the WashU Epigenome Browser and integrates data from both the NIH Roadmap Epigenomics Consortium and ENCODE in a visualization and bioinformatics tool that enables researchers to explore the tissue-specific regulatory roles of genetic variants in the context of disease. The browser takes advantage of the over 10,000 epigenomic data sets it currently hosts, including 346 'complete epigenomes', defined as tissues and cell types for which we have collected a complete set of DNA methylation, histone modification, open chromatin and other genomic data sets9.

Data from both the NIH Roadmap Epigenomics and ENCODE resources are seamlessly integrated in the browser using a new Data Hub Cluster framework. Investigators can specify any number of single nucleotide polymorphism (SNP)-associated regions and any type of epigenomic data, for which the browser automatically creates virtual data hubs through a shared hierarchical metadata annotation, retrieves the data and performs real-time clustering analysis. Investigators interact with the browser to determine the tissue specificity of the epigenetic state encompassing genetic variants in physiologically or pathogenically relevant cell types from normal or diseased samples.

The epigenomic annotation of two noncoding SNPs can be identified from genome-wide association studies of people with multiple sclerosis10, by clustering the histone H3K4me1 profile of SNP-harboring regions and RNA-seq signal of their closest genes across multiple primary tissues and cellsThus, reference epigenomes provide important clues into the functional relevance of these genetic variants in the context of the pathophysiology of multiple sclerosis, including inflammation11.

No comment yet.
Scooped by Dr. Stefan Gruenwald!

First Human Protein Atlas and major protein analysis published

First Human Protein Atlas and major protein analysis published | Amazing Science |

The Human Protein Atlas, a major multinational research project supported by the Knut and Alice Wallenberg Foundation, recently launched (November 6, 2014) an open source tissue-based interactive map of the human protein. Based on 13 million annotated images, the database maps the distribution of proteins in all major tissues and organs in the human body, showing both proteins restricted to certain tissues, such as the brain, heart, or liver, and those present in all. As an open access resource, it is expected to help drive the development of new diagnostics and drugs, but also to provide basic insights in normal human biology.

In the Science article, "Tissue-based Atlas of the Human Proteome", the approximately 20,000 protein coding genes in humans have been analysed and classified using a combination of genomics, transcriptomics, proteomics, and antibody-based profiling, says the article's lead author, Mathias Uhlén, Professor of Microbiology at Stockholm's KTH Royal Institute of Technology and the director of the Human Protein Atlas program. The analysis shows that almost half of the protein-coding genes are expressed in a ubiquitous manner and thus found in all analysed tissues.

Approximately 15% of the genes show an enriched expression in one or several tissues or organs, including well-known tissue-specific proteins, such as insulin and troponin. The testes, or testicles, have the most tissue-enriched proteins followed by the brain and the liver. The analysis suggests that approximately 3,000 proteins are secreted from the cells and an additional 5,500 proteins are located to the membrane systems of the cells.

"This is important information for the pharmaceutical industry. We show that 70% of the current targets for approved pharmaceutical drugs are either secreted or membrane-bound proteins," Uhlén says. "Interestingly, 30% of these protein targets are found in all analysed tissues and organs. This could help explain some side effects of drugs and thus might have consequences for future drug development." The analysis also contains a study of the metabolic reactions occurring in different parts of the human body. The most specialised organ is the liver with a large number of chemical reactions not found in other parts of the human body.

No comment yet.
Scooped by Dr. Stefan Gruenwald!

Bioinformatics: Big data from DNA Sequencing is giving new Insights into Cancer Development and Treatment Options

Bioinformatics: Big data from DNA Sequencing is giving new Insights into Cancer Development and Treatment Options | Amazing Science |
The torrents of data flowing out of cancer research and treatment are yielding fresh insight into the disease.

In 2013, geneticist Stephen Elledge answered a question that had puzzled cancer researchers for nearly 100 years. In 1914, German biologist Theodor Boveri suggested that the abnormal number of chromosomes — called aneuploidy — seen in cancers might drive the growth of tumors. For most of the next century, researchers made little progress on the matter. They knew that cancers often have extra or missing chromosomes or pieces of chromosomes, but they did not know whether this was important or simply a by-product of tumor growth — and they had no way of finding out.

Elledge found that where aneuploidy had resulted in missing tumor-suppressor genes, or extra copies of the oncogenes that promote cancer, tumors grow more aggressively (T. Davoli et al.Cell 1559489622013). His insight — that aneuploidy is not merely an odd feature of tumors, but an engine of their growth — came from mining voluminous amounts of cellular data. And, says Elledge, it shows how the ability of computers to sift through ever-growing troves of information can help us to deepen our understanding of cancer and open the door to discoveries.

Modern cancer care has the potential to generate huge amounts of data. When a patient is diagnosed, the tumor's genome might be sequenced to see if it is likely to respond to a particular drug. The sequencing might be repeated as treatment progresses to detect changes. The patient might have his or her normal tissue sequenced as well, a practice that is likely to grow as costs come down. The doctor will record the patient's test results and medical history, including dietary and smoking habits, in an electronic health record. The patient may also have computed tomography (CT) and magnetic resonance imaging (MRI) scans to determine the stage of the disease. Multiply all that by the nearly 1.7 million people diagnosed with cancer in 2013 in the United States alone and it becomes clear that oncology is going to generate even more data than it does now. Computers can mine the data for patterns that may advance the understanding of cancer biology and suggest targets for therapy.

Elledge's discovery was the result of a computational method that he and his colleagues developed, called the Tumor Suppressor and Oncogene Explorer. They used it to mine large data sets, including the Cancer Genome Atlas, maintained by the US National Cancer Institute, based in Bethesda, Maryland, and the Catalogue of Somatic Mutations in Cancer, run by the Wellcome Trust Sanger Institute in Hinxton, UK. The databases contained roughly 1.2 million mutations from 8,207 tissue samples of more than 20 types of tumor.

Analyzing the genomes of 8,200 tumors is just a start. Researchers are “trying to figure out how we can bring together and analyze, over the next few years, a million genomes”, says Robert Grossman, who directs the Initiative in Data Intensive Science at the University of Chicago in Illinois. This is an immense undertaking; the combined cancer genome and normal genome from a single patient constitutes about 1 terabyte (1012 bytes) of data, so a million genomes would generate an exabyte (1018 bytes). Storing and analysing this much data could cost US$100 million a year, Grossman says.

But it is the new technologies that are creating an information boom. “We can collect data faster than we can physically do anything with them,” says Manish Parashar, a computer scientist and head of the Rutgers Discovery Informatics Institute in Piscataway, New Jersey, who collaborates with Foran to find ways of handling the information. “There are some fundamental challenges being caused by our ability to capture so much data,” he says.

A major problem with data sets at the terabyte-and-beyond level is figuring out how to manipulate all the data. A single high-resolution medical image can take up tens of gigabytes, and a researcher might want the computer to compare tens of thousands of such images. Breaking down just one image in the Rutgers project into sets of pixels that the computer can identify takes about 15 minutes, and moving that much information from where it is stored to where it can be processed is difficult. “Already we have people walking around with disk drives because you can't effectively use the network,” Parashar says.

Informatics researchers are developing algorithms to split data into smaller packets for parallel processing on separate processors, and to compress files without omitting any relevant information. And they are relying on advances in computer science to speed up processing and communications in general.

Foran emphasizes that the understanding and treatment of cancer has undergone a dramatic shift as oncology has moved from one-size-fits-all attacks on tumours towards personalized medicine. But cancers are complex diseases controlled by many genes and other factors. “It's not as if you're going to solve cancer,” he says. But big data can provide new, better-targeted ways of grappling with the disease. “You're going to come up with probably a whole new set of blueprints for how to treat patients.”

Rescooped by Dr. Stefan Gruenwald from Data visualization!

Circle of Life: The Beautiful New Way to Visualize Biological Data

Circle of Life: The Beautiful New Way to Visualize Biological Data | Amazing Science |

Via Claudia Mihai
Eli Levine's curator insight, February 20, 2014 12:24 PM

Way cool.


This just emphasizes the fact that we're all of one basic, biological species origin and not of a kind of "Creation orchard" as Mr. Ken Ham put it in his debate with Bill Nye.


"We are one, and one is all." - Gorillaz.


Think about it.

Scooped by Dr. Stefan Gruenwald!

Expanding the olfactory code by in silico decoding of odor-receptor chemical space

Expanding the olfactory code by in silico decoding of odor-receptor chemical space | Amazing Science |

The peripheral olfactory system is unparalleled in its ability to detect and discriminate amongst an extremely large number of volatile compounds in the environment. To detect this wide variety of volatiles, most organisms have evolved large families of receptor genes that typically encode 7-transmembrane proteins expressed in the olfactory neurons. Coding of information in the peripheral olfactory system depends on two fundamental factors: interaction of individual odors with subsets of the odorant receptor repertoire and mode of signaling that an individual receptor-odor interaction elicits, activation or inhibition. Each volatile chemical in the environment is thought to interact with a specific subset of odorant receptors depending upon odor structure and binding sites on the receptor. This precise detection and coding of odors by the peripheral olfactory neurons are subsequently processed, transformed and integrated in the central nervous system to generate specific behavioral responses that are critical for survival such as finding food, finding mates, avoiding predators etc.


A group of researchers has now developed a cheminformatics pipeline that predicts receptor–odorant interactions from a large collection of chemical structures (>240,000) for receptors that have been tested to a smaller panel of odorants (∼100). Using a computational approach, they first identify shared structural features from known ligands of individual receptors. They then used these features to screen in silico new candidate ligands from >240,000 potential volatiles for several Odorant receptors (Ors) in the Drosophila (fruitfly) antenna. Functional experiments from 9 Ors support a high success rate (∼71%) for the screen, resulting in identification of numerous new activators and inhibitors. Such computational prediction of receptor–odor interactions has the potential to enable systems level analysis of olfactory receptor repertoires in organisms.

No comment yet.
Rescooped by Dr. Stefan Gruenwald from Science And Wonder!

Is ‘massive open online research’ (MOOR) the next frontier for education?

Is ‘massive open online research’ (MOOR) the next frontier for education? | Amazing Science |

UC San Diego is launching the first major online course that prominently features “massive open online research” (MOOR).


For Bioinformatics, UC San Diego computer science and engineering professor Pavel Pevzner and his graduate students are offering a course on Coursera that combines research with a MOOC (massive open online course) for the first time.


“All students who sign up for the course will be given an opportunity to work on specific research projects under the leadership of prominent bioinformatics scientists from different countries, who have agreed to interact and mentor their respective teams.”


“The natural progression of education is for people to make a transition from learning to research, which is a huge jump for many students, and essentially impossible for students in isolated areas,” said Ph.D. student Phillip Compeau, who helped develop the online course. “By integrating the research with an interactive text and a MOOC, it creates a pipeline to streamline this transition.”


Bioinformatics Algorithms (Part I) will run for eight weeks starting October 21, 2013, and students are now able to sign up and download some of the course materials. It is offered free of charge to everyone.


Another unique feature of the online course: Pevzner and Compeau have developed Bioinformatics Algorithms: An Active-Learning Approach, a e-book supporting the course, while Pevzner’s colleagues in Russia developed a content delivery system that integrates the e-book with hundreds of quizzes and dozens of homework problems.


The U.S.-Russian team, led by Pevzner’s foreign student Nikolay Vyahhi, also implemented the online course using the beta version of Stepic, a new, fully integrated educational platform and startup developed by Vyahhi. Stepic derives its name from the “step-by-step, epic” solution its developers delivered for electronic publishing.


The course also provides access to Rosalind, a free online resource for learning bioinformatics through problem solving. Rosalind was developed by Pevzner’s students and colleagues in San Diego and St. Petersburg with funding from the Howard Hughes Medical Institute, the Russian Ministry of Education, and Russian Internet billionaires Yuri Milner and Pavel Durov through their “Start Fellows” award. Rosalind already has over 10,000 active users worldwide.


Rosalind — named in honor of British scientist Rosalind Franklin, whose X-ray crystallography with Raymond Gosling facilitated the discovery of the DNA double helix by Watson and Crick — will grade the programming assignments. They come in the form of bioinformatics problems of growing complexity as the course progresses.


“We developed Rosalind to inspire both biologists and computer science students,” said Rosalind principal developer Vyahhi, who worked with Pevzner during the latter’s sabbatical in Russia. “The platform allows biologists to develop vital programming skills for bioinformatics at their own pace, and Rosalind can also appeal to programmers who have never been exposed to some of the exciting computational problems generated by molecular biology.”

Via LilyGiraud
No comment yet.
Scooped by Dr. Stefan Gruenwald!

Genokey: Mining the cancer genome -- six billion data points linking drugs with the changes in the genome

Genokey: Mining the cancer genome -- six billion data points linking drugs with the changes in the genome | Amazing Science |

Scientists at the National Cancer Institute (NCI) have created a dataset of cancer-specific genetic coding variants that includes six billion data points linking drugs with the changes in the genome.


This source of big data, reported to be the most extensive cancer pharmacology database in the world, provides an opportunity for data mining in cancer drug discovery, and could help researchers understand why patients’ responses to cancer drugs vary and how tumours become resistant. It also has potential to help move forward the quest for personalised medicine.


The team carried out whole-exome sequencing of the 60 cell lines in the NCI-60 human cancer cell line panel and catalogued the genetic coding variants, including type I variants (found in the normal population) and type II variants (cancer-specific). The NCI-60 cell lines have been studied extensively, and include cells from nine tissues of origin, including breast, ovary, prostate, colon, lung, kidney, brain, blood, and skin. They used algorithms to predict the sensitivity of the cells with the type II variants and 103 approved anticancer drugs and 207 drugs in development, to see if this could be used to predict response.


According to James Doroshow, the director of the NCI’s Division of Cancer Treatment and Diagnosis, in an interview in Shots, thousands of drugs have been screened using the NCI-60 panel, to see their impact on the cells. The researchers have also analysed 5000 different combinations of approved drugs to see if they can find drugs that work well together.


In an interview with the American Association of Cancer Research, Yves Pommier, chief of the Laboratory of Molecular Pharmacology at the NCI in Bethesda, explained that the team is making the data set public.

“Opening this extensive data set to researchers will expand our knowledge and understanding of tumorigenesis, as more and more cancer-related gene aberrations are discovered,” Pommier said in the interview. “This comes at a great time, because genomic medicine is becoming a reality, and I am very hopeful this valuable information will change the way we use drugs for precision medicine.”


The research was published in Cancer Research, and the genomic data is available through the CellMiner website, amongst others.


While not involved in this study, GenoKey focuses on data mining and analytics, and has used a combination of genetic and clinical data, along with combinatorial analysis, to find links between genetic changes and bipolar disorder. The NCI’s dataset of cancer-specific genetic coding variants, and GenoKey’s work, show the power of using big data and healthcare analytics in medical and biopharma research, particularly when combining genetic and clinical data.

No comment yet.
Rescooped by Dr. Stefan Gruenwald from Limitless learning Universe!

Researchers develop technique allowing them to map important regulatory DNA regions

Researchers develop technique allowing them to map important regulatory DNA regions | Amazing Science |

For a long time dismissed as "junk DNA", we now know that also the regions between the genes fulfil vital functions. Mutations in those DNA regions can severely impair development in humans and may lead to serious diseases later in life. Until now, however, regulatory DNA regions have been hard to find. Scientists around Prof. Julien Gagneur, Professor for Computational Biology at the Technical University of Munich (TUM) and Prof. Patrick Cramer at the Max Planck Institute (MPI) for Biophysical Chemistry in Göttingen have now developed a method to find regulatory DNA regions which are active and controlling genes.


Björn Schwalb and Margaux Michel, members of Cramer’s team, as well as Benedikt Zacher, scientist in Gagneur’s group, have now succeeded in developing a highly sensitive method to catch and identify even very short-lived RNA molecules – the so-called TT-Seq (transient transcriptome sequencing) method. The results are reported in the latest issue of the renowned scientific journal Science on June 3rd. In order to catch the RNA molecules, the three junior researchers used a trick: They supplied cells with a molecule acting as a kind of anchor for a couple of minutes. The cells subsequently incorporated the anchor into each RNA they made during the course of the experiment. With the help of the anchor, the scientists were eventually able to fish the short-lived RNA molecules out of the cell and examine them.


"The RNA molecules we caught with the TT-Seq method provide a snapshot of all DNA regions that were active in the cell at a certain time – the genes as well as the regulatory regions between genes that were so hard to find until now," Cramer explains. "With TT-Seq we now have a suitable tool to learn more about how genes are controlled in different cell types and how gene regulatory programs work," Gagneur adds.


In many cases, researchers have a pretty good idea which genes play a role in a certain disease, but do not know which molecular switches are involved. The scientists around Cramer and Gagneur are hoping to be able to use the new method to uncover key mechanisms that play a role during the emergence or course of a disease. In a next step they want to apply their technique to blood cells to better understand the progress of a HIV infection in patients suffering from AIDS.

Via Integrated DNA Technologies, CineversityTV
No comment yet.
Rescooped by Dr. Stefan Gruenwald from DNA and RNA Research!

Tools to annotate genes and genetic variants 

Tools to annotate genes and genetic variants  | Amazing Science |

Recently published in Genome Biology, Ginger Tsueng and colleagues discuss two high-performance web services for querying gene and variant annotation. Ginger explains more in this blog about the ideas behind the software, and how they advance the field.


Welcome to the big data landscape of gene and variant information! Vast swaths of gene and variant annotation information are spread across many different resources, making it challenging for researchers to integrate up-to-date information into their bioinformatics pipelines.


Researchers typically address this challenge with data warehousing or data federation. By downloading and storing data from different resources (data warehousing), researchers ensure fast access to data of interest to them; however, effort must be spent on writing papers and keeping the data up-to-date.


In contrast, by accessing the data directly from the resource when it is needed (data federation), researchers ensure that they obtain the most up-to-date information available from these resources, but may find their queries to be time consuming due to server and network limitations.


In a recently published paper in Genome Biology Jiwen Xin, et al. describe an alternative solution for obtaining up-to-date gene and variant annotation data from multiple resources: the annotation as a service. Like the hardware superstore in bridge example, and are one-stop shops (i.e. centralized repositories) that serve up-to-date annotation data from key resources via cloud-based web-service endpoints. stores up-to-date data from NCBI Entrez, Ensembl, Uniprot, NetAffy, PharmGKB, UCSC, and CPDB.


Instead of dealing with data warehousing or data federation issues in addition to data format conversion from multiple data sources, researchers or bioinformaticians can utilize any of’s clients (Python, R) or browser-based API to access up-to-date gene annotation data in a single machine-readable format (json). For example, can easily be used to batch convert gene IDs or obtain pertinent gene ontology info—two tasks for which researchers commonly use and cite DAVID. Providing easy access to gene annotation information like gene IDs and gene ontology is so valuable that researchers continue to use DAVID for this purpose even though DAVID has not been updated for a long time!


“With over 50 different annotations types covering over 13 million genes for 15,000 species, has already accumulated over 160 million requests, and serves an average of 3.5 million requests per month!” Dr. Chunlei Wu revealed, the Associate Professor at the Scripps Research Institute in charge of developing these services.


Elaborating on the development of, he added, “After confirming that researchers would find this resource valuable and seeing the volume of requests we get monthly, we wanted to find a similar solution for gene variant annotation data. That was the idea behind” currently incorporates up-to-date variant annotation data from fourteen valuable resources including: dbNSFP, dbSNP, ClinVar, EVS, CADD, MutDB, GWAS Catalog, COSMIC, DOCM, SNPedia, EMVClass, Scripps Wellderly, EXAC, and GRASP.

Via Integrated DNA Technologies
No comment yet.
Scooped by Dr. Stefan Gruenwald!

HPV Genomes Show Greater Diversity Than Expected Among Cervical Cancer Patients

HPV Genomes Show Greater Diversity Than Expected Among Cervical Cancer Patients | Amazing Science |
Human papillomaviruses are associated with invasive cervical cancer as well as more benign disorders such as skin warts. Although more than 180 HPV genomes have been sequenced, there has been little research on the diversity of HPV genomes within the same patient, primarily because the virus is thought to have a low mutation rate.

Of the 13 HPV genotypes thought to be carcinogenic, HPV16 is responsible for about half of all invasive cervical cancer cases worldwide. In the study, the researchers sequenced HPV16 genomes from 10 patients with cervical cancer and one with non-malignant genital warts.

To date, most genomic studies of papillomaviruses have used Sanger sequencing to look at the "most prevalent, consensus sequence" during chronic infection, but Sanger sequencing may "not be appropriate to capture the dynamics of slowly evolving viruses, such as PVs," the authors wrote.

So, they decided to turn to next-generation sequencing. The authors extracted DNA from 10 clinical samples of invasive cervical cancer and one case of genital warts caused by HPV16. They used long PCR to generate 8-kb long amplicons — the size of the HPV genome — and sequenced them using Thermo Fisher's Ion Torrent PGM.

The authors generated both a consensus genome and also de novo assembled each sample using CLC software.

Comparing the clinical samples to the reference sequence, the researchers observed 190 changes, with the E2 gene containing the largest number of changes. Two samples had duplication events in the L1 gene and L2 gene, respectively.

The team also performed a phylogenetic analysis using consensus sequences from the PGM data as well as 20 HPV16 genomes from GenBank. From the eleven clinical samples, the researchers identified three types of HPV: HPV16_A1, HPV16_A2, and HPV16_D. In addition, these types correlated with specific tumor types, with squamous cell carcinomas associated with the A type and adenocarcinomas associated with the D types.

To analyze intra-host variation, the researchers performed de novo assembly. They were able to generate one contiguous sequence for four samples, with the remaining seven samples in three to eight contigs.

The researchers identified between three and 125 polymorphic sites per genome. In the most diverse sample, 31 of the 125 polymorphic sites represented more than 10 percent of the reads in that position. In the least diverse sample, only one polymorphic site represented more than 5 percent of the reads at that position.

Next, the team calculated a "diversity index" for each sample, defined as the "probability of a randomly chosen genome to be identical to the consensus genome." The median value for the samples was just 40 percent.

The authors suggest a number of factors could contribute to the diversity observed, including both innate and adaptive immune responses. For instance, the APOBEC3G family of proteins have been shown to target papillomavirus DNA, " which may partially account for the broad diversity of human PVs." In addition, "polymorphisms observed in the E6 gene could be a result of an immune selective pressure," the authors wrote.

In the future, more research will need to be done on HPV infection to monitor viral diversity in asymptomatic, productive, benign, premalignant and malignant infections. "The possible role of oncovirus intralesion diversity generated during chronic infections should be explored as a differential factor for increased oncogenic potential," they wrote.
No comment yet.
Scooped by Dr. Stefan Gruenwald!

Biology software Cello promises easier way to program living cells

Biology software Cello promises easier way to program living cells | Amazing Science |

Synthetic biologists have created software that automates the design of DNA circuits for living cells. The aim is to help people who are not skilled biologists to quickly design working biological systems, says synthetic biologist Christopher Voigt at the Massachusetts Institute of Technology in Cambridge, who led the work. “This is the first example where we’ve literally created a programming language for cells,” he says.


In the new software — called Cello — a user first specifies the kind of cell they are using and what they want it to do: for example, sense metabolic conditions in the gut and produce a drug in response. They type in commands to explain how these inputs and outputs should be logically connected, using a computing language called Verilog that electrical engineers have long relied on to design silicon circuits. Finally, Cello translates this information to design a DNA sequence that, when put into a cell, will execute the demands.


Voigt says his team is writing user interfaces that would allow biologists to write a single program and be returned different DNA sequences for different organisms. Anyone can access Cello through a Web-based interface, or by downloading its open-source code from the online repository GitHub.


”This paper solves the problem of the automated design, construction and testing of logic circuits in living cells,” says bioengineer Herbert Sauro at the University of Washington in Seattle, who was not involved in the study. The work is published in Science.1

No comment yet.
Scooped by Dr. Stefan Gruenwald!

A powerful new ‘tool’ for assembling biomolecules

A powerful new ‘tool’ for assembling biomolecules | Amazing Science |

Proposed new simplified chemical reaction for assembling biomolecules in a single chemical reaction. It replaces the existing expensive and complex process needed when synthesizing new chemicals --- could revolutionize pharmaceutical and biomaterials manufacturing.

Colorado State University chemists have invented a single chemical reaction that couples two constituent chemicals into a carbon-carbon bond, while simultaneously introducing a nitrogen component. The process promises to replace a multi-step, expensive, and complex process needed when synthesizing new chemicals — for drug creation and testing, for example.

The researchers were able to control this reaction to make the nitrogen atoms go exactly where they want them to, making for precision chemistry that they believe could revolutionize pharmaceutical and biomaterials manufacturing.

The researchers explain in a statement that “almost every significant carbon-based biomolecule contains a nitrogen compound, or amine. Achieving this carbon-nitrogen bond in the lab, though, is tricky business. Drug companies know it well…. They must first create the carbon-carbon bonds, and then introduce the nitrogen to make a molecule that will do something useful.”

The chemists’ starting materials were simply oil refinery byproducts called olefins, or alkenes. They mixed in a specially engineered reagant, then used a complex based on the precious metal rhodium to reliably and specifically trigger the elusive carbon-nitrogen bonds.

The innovation also controls molecular isomers (an isomer is a molecule with the same chemical formula as another molecule, but with a different chemical structure). Some isomers are mirror images, like right and left gloves, and although they’re chemically identical, their functionalities are strikingly different. Being able to select for a single isomer is critical to safety and efficacy — so much so that the FDA mandates that only single-isomer drugs be marketed for human use.

Take thalidomide, infamous for causing severe birth defects when taken by pregnant women in the 1950s. Chemically, thalidomide comes in two mirror-image isomeric forms. One caused the defects, one didn’t.

“For this reason, spatial display of groups in molecules is incredibly important,” said organic chemist Tomislav Rovis, professor of chemistry in the College of Natural Sciences at CSU. Rovis led the research with postdoctoral researcher Tiffany Piou, who designed all the chemical building blocks and ran the experiments.

“Tiffany’s finding gives us a leg up to do this in a carboamination reaction, by making the carbon carbon bond, and delivering the nitrogen selectively,” Rovis said.

The achievement is detailed in the journal Nature, published today (Oct. 21).

No comment yet.
Scooped by Dr. Stefan Gruenwald!

Broad Institute, Google Genomics combine bioinformatics and computing expertise

Broad Institute, Google Genomics combine bioinformatics and computing expertise | Amazing Science |

Broad Institute of MIT and Harvard is teaming up with Google Genomics to explore how to break down major technical barriers that increasingly hinder biomedical research by addressing the need for computing infrastructure to store and process enormous datasets, and by creating tools to analyze such data and unravel long-standing mysteries about human health.

As a first step, Broad Institute’s Genome Analysis Toolkit, or GATK, will be offered as a service on the Google Cloud Platform, as part of Google Genomics. The goal is to enable any genomic researcher to upload, store, and analyze data in a cloud-based environment that combines the Broad Institute’s best-in-class genomic analysis tools with the scale and computing power of Google.

GATK is a software package developed at the Broad Institute to analyze high-throughput genomic sequencing data. GATK offers a wide variety of analysis tools, with a primary focus on genetic variant discovery and genotyping as well as a strong emphasis on data quality assurance. Its robust architecture, powerful processing engine, and high-performance computing features make it capable of taking on projects of any size.

GATK is already available for download at no cost to academic and non-profit users. In addition, business users can license GATK from the Broad. To date, more than 20,000 users have processed genomic data using GATK.

The Google Genomics service will provide researchers with a powerful, additional way to use GATK. Researchers will be able to upload genetic data and run GATK-powered analyses on Google Cloud Platform, and may use GATK to analyze genetic data already available for research via Google Genomics. GATK as a service will make best-practice genomic analysis readily available to researchers who don’t have access to the dedicated compute infrastructure and engineering teams required for analyzing genomic data at scale. An initial alpha release of the GATK service will be made available to a limited set of users.

“Large-scale genomic information is accelerating scientific progress in cancer, diabetes, psychiatric disorders, and many other diseases,” said Eric Lander, President and Director of Broad Institute. “Storing, analyzing, and managing these data is becoming a critical challenge for biomedical researchers. We are excited to work with Google’s talented and experienced engineers to develop ways to empower researchers around the world by making it easier to access and use genomic information.”

No comment yet.
Scooped by Dr. Stefan Gruenwald!

Data Scientist on a Quest to Turn Computers Into Doctors

Data Scientist on a Quest to Turn Computers Into Doctors | Amazing Science |

Some of the world’s most brilliant minds are working as data scientists at places like Google, Facebook, and Twitter—analyzing the enormous troves of online information generated by these tech giants—and for hacker and entrepreneur Jeremy Howard, that’s a bit depressing. Howard, a data scientist himself, spent a few years as the president of the Kaggle, a kind of online data scientist community that sought to feed the growing thirst for information analysis. He came to realize that while many of Kaggle’s online data analysis competitions helped scientists make new breakthroughs, the potential of these new techniques wasn’t being fully realized. “Data science is a very sexy job at the moment,” he says. “But when I look at what a lot of data scientists are actually doing, the vast majority of work out there is on product recommendations and advertising technology and so forth.”

So, after leaving Kaggle last year, Howard decided he would find a better use for data science. Eventually, he settled on medicine. And he even did a kind of end run around the data scientists, leveraging not so much the power of the human brain but the rapidly evolving talents of artificial brains. His new company is called Enlitic, and it wants to use state-of-the-art machine learning algorithms—what’s known as “deep learning”—to diagnosis illness and disease.

Publicly revealed for the first time today, the project is only just getting off the ground—“the big opportunities are going to take years to develop,” Howard says—but it’s yet another step forward for deep learning, a form of artificial intelligence that more closely mimics the way our brains work. Facebook is exploring deep learning as a way of recognizing faces in photos. Google uses it for image tagging and voice recognition. Microsoft does real-time translation in Skype. And the list goes on.

But Howard hopes to use deep learning for something more meaningful. His basic idea is to create a system akin to the Star Trek Tricorder, though perhaps not as portable. Enlitic will gather data about a particular patient—from medical images to lab test results to doctors’ notes—and its deep learning algorithms will analyze this data in an effort to reach a diagnosis and suggest treatments. The point, Howard says, isn’t to replace doctors, but to give them the tools they need to work more effectively. With this in mind, the company will share its algorithms with clinics, hospitals, and other medical outfits, hoping they can help refine its techniques. Howard says that the health care industry has been slow to pick-up on the deep-learning trend because it was rather expensive to build the computing clusters needed to run deep learning algorithms. But that’s changing.

The real challenge, Howard says, isn’t writing algorithms but getting enough data to train those algorithms. He says Enlitic is working with a number of organizations that specialize in gathering anonymized medical data for this type of research, but he declines to reveal the names of the organizations he’s working with. And while he’s tight-lipped about the company’s technique now, he says that much of the work the company does will eventually be published in research papers.

Mike Dele's curator insight, March 20, 2015 10:00 PM

why don't we look at the possibility of creating and manufacturing human spare parts just like for cars to replace any form of problem?

Benjamin Mzhari's curator insight, March 27, 2015 8:37 AM

i fore see this type of profession becoming dynamic in the sense that it will not only look at business data but other statistics figures that will aid businesses.

Scooped by Dr. Stefan Gruenwald!

UCSC Ebola genome browser now online to aid researchers' response to crisis

UCSC Ebola genome browser now online to aid researchers' response to crisis | Amazing Science |

The UC Santa Cruz Genomics Institute late Tuesday (September 30) released a new Ebola genome browser to assist global efforts to develop a vaccine and antiserum to help stop the spread of the Ebola virus.

The team led by University of California, Santa Cruz researcher Jim Kent worked around the clock for the past week, communicating with international partners to gather and present the most current data. The Ebola virus browser aligns five strains of Ebola with two strains of the related Marburg virus. Within these strains, Kent and other members of the UC Santa Cruz Genome Browser team have aligned 148 individual viral genomes, including 102 from the current West Africa outbreak.

UC Santa Cruz has established the UCSC Ebola Genome Portal, with links to the new Ebola genome browser as well as links to all the relevant scientific literature on the virus. 

“Ebola has been one of my biggest fears ever since I learned about it in my first microbiology class in 1997," said Kent, who 14 years ago created the first working draft of the human genome.  "We need a heroic worldwide effort to contain Ebola. Making an informatics resource like the genome browser for Ebola researchers is the least we could do.”

Scientists around the world can access the open-source browser to compare genetic changes in the virus genome and areas where it remains the same. The browser allows scientists and researchers from drug companies, other universities, and governments to study the virus and its genomic changes as they seek a solution to halt the epidemic. 

No comment yet.
Scooped by Dr. Stefan Gruenwald!

First comprehensive atlas of human gene activity released

First comprehensive atlas of human gene activity released | Amazing Science |

A large international consortium of researchers has produced the first comprehensive, detailed map of the way genes work across the major cells and tissues of the human body. The findings describe the complex networks that govern gene activity, and the new information could play a crucial role in identifying the genes involved with disease.

“Now, for the first time, we are able to pinpoint the regions of the genome that can be active in a disease and in normal activity, whether it’s in a brain cell, the skin, in blood stem cells or in hair follicles,” said Winston Hide, associate professor of bioinformatics and computational biology at Harvard School of Public Health (HSPH) and one of the core authors of the main paper in Nature.

“This is a major advance that will greatly increase our ability to understand the causes of disease across the body.”

The research is outlined in a series of papers published March 27, 2014, two in the journal Nature and 16 in other scholarly journals. The work is the result of years of concerted effort among 250 experts from more than 20 countries as part of FANTOM 5 (Functional Annotation of the Mammalian Genome). The FANTOM project, led by the Japanese institution RIKEN, is aimed at building a complete library of human genes.

Researchers studied human and mouse cells using a new technology called Cap Analysis of Gene Expression (CAGE), developed at RIKEN, to discover how 95% of all human genes are switched on and off. These “switches” — called “promoters” and “enhancers” — are the regions of DNA that manage gene activity. The researchers mapped the activity of 180,000 promoters and 44,000 enhancers across a wide range of human cell types and tissues and, in most cases, found they were linked with specific cell types.

“We now have the ability to narrow down the genes involved in particular diseases based on the tissue cell or organ in which they work,” said Hide. “This new atlas points us to the exact locations to look for the key genetic variants that might map to a disease.”

Eli Levine's curator insight, March 28, 2014 7:27 PM
There it is. As it is in our genes, so too is it in our individual psyches and societies. Check it out!
Martin Daumiller's curator insight, March 29, 2014 12:27 PM

original article:



Scooped by Dr. Stefan Gruenwald!

Computational simulations of photosystem II exposes secret pathways behind photosynthesis

Computational simulations of photosystem II exposes secret pathways behind photosynthesis | Amazing Science |
New insights into the behavior of photosynthetic proteins from atomic simulations could hasten the development of artificial light-gathering machines.

The protein complex known as photosystem II splits water molecules to release oxygen using sunlight and relatively simple biological building blocks. Although water can also be split artificially using an electrical voltage and a precious metal catalyst, researchers continue to strive to mimic the efficient natural process. So far, however, these efforts have been hampered by an incomplete understanding of the water oxidation mechanism of photosystem II. Shinichiro Nakamura from the RIKEN Innovation Center and colleagues have now used simulations to reveal the hidden pathways of water molecules inside photosystem II.

At the heart of photosystem II is a cluster of manganese, calcium and oxygen atoms, known as the oxygen-evolving complex (OEC), that catalyzes the water-splitting reaction. Recent high-resolution x-ray crystallography studies have revealed the precise positions of the atoms in the OEC and of the protein residues that contact the site. While this information has yielded important structural clues into photosynthetic water oxidation, the movements of water, oxygen and protons within the protein complex are still the subject of much speculation.

To resolve this problem, researchers have turned to molecular dynamics (MD) simulation, a technique that models the time-dependent behavior of biomolecules using thermodynamics and the physics of motion. While previous MD simulations of photosystem II have involved the use of approximate models that focus only on protein monomers or the main OEC components, Nakamura's team took a different approach. "Our hypothesis was that we cannot understand the mechanism of oxygen evolution just by looking at the manganese-based reaction center," he says. "Therefore, we carried out a total MD simulation, without any truncation of the protein or simplification."

In their simulation, the team embedded an exact model of photosystem II inside a thylakoid—a lipid and fatty-acid membrane-bound compartment found in the chloroplasts of plant cells. After initial computations confirmed the reliability of their model, the researchers performed a rigorous MD simulation of the protein—membrane system in the presence of more than 300,000 water molecules (Fig. 1). "The results indicated that water, oxygen and protons move through photosystem II not randomly but via distinct pathways that are not obviously visible," says Nakamura.

The pathways revealed by the simulations are delicately coupled to the dynamic motions of the photosystem II protein residues. While such intricate activity is currently impossible to reproduce artificially, the researchers suspect that combining quantum-chemical calculations with MD simulations could help to unlock the mysterious principles behind the highly efficient oxygen-evolution reactions of this remarkable biological factory.

No comment yet.
Scooped by Dr. Stefan Gruenwald!

World's largest disease database will use artificial intelligence to find new cancer treatments

World's largest disease database will use artificial intelligence to find new cancer treatments | Amazing Science |
A new cancer database containing 1.7 billion experimental results will utilise artificial intelligence similar to the technology used to predict the weather to discover the cancer treatments of the future.


The system, called CanSAR, is the biggest disease database of its kind anywhere in the world and condenses more data than would be generated by 1 million years of use of the Hubble space telescope.

It is launched today (Monday, 11 November, 2013) and has been developed by researchers at The Institute of Cancer Research, London, using funding from Cancer Research UK.


The new CanSAR database is more than double the size of a previous version and has been designed to cope with a huge expansion of data on cancer brought about by advances in DNA sequencing and other technologies.


The resource is being made freely available by The Institute of Cancer Research (ICR) and Cancer Research UK, and will help researchers worldwide make use of vast quantities of data, including data from patients, clinical trials and genetic, biochemical and pharmacological research.


Although the prototype of CanSAR was on a much smaller scale, it attracted 26,000 unique users in more than 70 countries around the world, and earlier this year was used to identify 46 potentially 'druggable' cancer proteins that had previously been overlooked*.


Peer-reviewed scientific paper (NAR) about CanSAR:

No comment yet.
Scooped by Dr. Stefan Gruenwald!

Amazing Science: Bioinformatics Postings

Amazing Science: Bioinformatics Postings | Amazing Science |

Bioinformatics is an interdisciplinary field that develops and improves upon methods for storing, retrieving, organizing and analyzing biological data. A major activity in bioinformatics is to develop software tools to generate useful biological knowledge. Bioinformatics has become an important part of many areas of biology. In experimental molecular biology, bioinformatics techniques such as image and signal processing allow extraction of useful results from large amounts of raw data. In the field of genetics and genomics, it aids in annotating genomes and their observed mutations.

No comment yet.