Amazing Science
1.1M views | +36 today
Follow
Amazing Science
Amazing science facts - 3D_printing • aging • AI • anthropology • art • astronomy • bigdata • bioinformatics • biology • biotech • chemistry • computers • cosmology • education • environment • evolution • future • genetics • genomics • geosciences • green_energy • language • map • material_science • math • med • medicine • microscopy • nanotech • neuroscience • paleontology • photography • photonics • physics • postings • robotics • science • technology • video
Your new post is loading...
Rescooped by Dr. Stefan Gruenwald from Genetic Engineering in the Press by GEG
Scoop.it!

New open-source software could accelerate genetic discoveries and lead to commercially viable biofuel crops

New open-source software could accelerate genetic discoveries and lead to commercially viable biofuel crops | Amazing Science | Scoop.it

Commercially viable biofuel crops are vital to reducing greenhouse gas emissions, and a new tool developed by the Center for Advanced Bioenergy and Bioproducts Innovation should accelerate their development -; as well as genetic editing advances overall.

 

CROPSR, the first open-source software tool for genome-wide design and evaluation of guide RNA (gRNA) sequences for CRISPR experiments, created by scientists at CABBI, a Department of Energy-funded Bioenergy Research Center (BRC). The genome-wide approach significantly shortens the time required to design a CRISPR experiment, reducing the challenge of working with crops and accelerating gRNA sequence design, evaluation, and validation, according to the study published in BMC Bioinformatics.

 

"CROPSR provides the scientific community with new methods and a new workflow for performing CRISPR/Cas9 knockout experiments," said CROPSR developer Hans Müller Paul, a molecular biologist and Ph.D. student with co-author Matthew Hudson, Professor of Crop Sciences at the University of Illinois Urbana-Champaign. "We hope that the new software will accelerate discovery and reduce the number of failed experiments."

 

CROPSR developer Hans Müller Paul, a molecular biologist and Ph.D. student with co-author Matthew Hudson, Professor of Crop Sciences at the University of Illinois Urbana-Champaign. To better meet the needs of crop geneticists, the team built software that lifts restrictions imposed by other packages on design and evaluation of gRNA sequences, the guides used to locate targeted genetic material. Team members also developed a new machine learning model that would not avoid guides for repetitive genomic regions often found in plants, a problem with existing tools. The CROPSR scoring model provided much more accurate predictions, even in non-crop genomes, the authors said.


Via BigField GEG Tech
BigField GEG Tech's curator insight, March 8, 2022 6:06 AM

CROPSR is the first open source software tool for genome-wide design and evaluation of guide RNA (gRNA) sequences for CRISPR experiments, created by scientists at CABBI, a Department of Energy-funded Bioenergy Research Center (BRC). The genome-wide approach significantly shortens the time needed to design a CRISPR experiment, reducing the challenge of working with crops and speeding up the design, evaluation and validation of gRNA sequences, according to the study published in BMC Bioinformatics. To better meet the needs of crop geneticists, the team built software that lifts restrictions imposed by other packages on the design and evaluation of gRNA sequences, the guides used to locate targeted genetic material. Team members also developed a new machine learning model that would not avoid guides for repetitive genomic regions often found in plants, a problem with existing tools. The CROPSR scoring model provided much more accurate predictions, even in non-crop genomes. In the future, he hopes researchers will record their failures as well as their successes to help generate the data needed to train a non-specific model.   

Scooped by Dr. Stefan Gruenwald
Scoop.it!

Meraculous: Full Genome Alignment With Supercomputers in Mere Minutes

Meraculous: Full Genome Alignment With Supercomputers in Mere Minutes | Amazing Science | Scoop.it
A team of scientists from Berkeley Lab, JGI and UC Berkeley, simplified and sped up genome assembly, reducing a months-long process to mere minutes. This was primarily achieved by “parallelizing” the code to harness the processing power of supercomputers, such as NERSC’s Edison system.

 

Genomes are like the biological owner’s manual for all living things. Cells read DNA instantaneously, getting instructions necessary for an organism to grow, function and reproduce. But for humans, deciphering this “book of life” is significantly more difficult.

 

Nowadays, researchers typically rely on next-generation sequencers to translate the unique sequences of DNA bases (there are only four) into letters: A, G, C and T. While DNA strands can be billions of bases long, these machines produce very short reads, about 50 to 300 characters at a time. To extract meaning from these letters, scientists need to reconstruct portions of the genome—a process akin to rebuilding the sentences and paragraphs of a book from snippets of text.

But this process can quickly become complicated and time-consuming, especially because some genomes are enormous. For example, while the human genome contains about 3 billion bases, the wheat genome contains nearly 17 billion bases and the pine genome contains about 23 billion bases. Sometimes the sequencers will also introduce errors into the dataset, which need to be filtered out. And most of the time, the genomes need to be assembled de novo, or from scratch. Think of it like putting together a ten billion-piece jigsaw puzzle without a complete picture to reference.

 

By applying some novel algorithms, computational techniques and the innovative programming language Unified Parallel C (UPC) to the cutting-edge de novo genome assembly tool Meraculous, a team of scientists from the Lawrence Berkeley National Laboratory (Berkeley Lab)’s Computational Research Division (CRD), Joint Genome Institute (JGI) and UC Berkeley, simplified and sped up genome assembly, reducing a months-long process to mere minutes. This was primarily achieved by “parallelizing” the code to harness the processing power of supercomputers, such as the National Energy Research Scientific Computing Center’s (NERSC’s) Edison system. Put simply, parallelizing code means splitting up tasks once executed one-by-one and modifying or rewriting the code to run on the many nodes (processor clusters) of a supercomputer all at once.

 

“Using the parallelized version of Meraculous, we can now assemble the entire human genome in about eight minutes using 15,360 computer processor cores. With this tool, we estimate that the output from the world’s biomedical sequencing capacity could be assembled using just a portion of NERSC’s Edison supercomputer,” says Evangelos Georganas, a UC Berkeley graduate student who led the effort to parallelize Meraculous. He is also the lead author of a paper published and presented at the SC Conference in November 2014.  

 

“This work has dramatically improved the speed of genome assembly,” says Leonid Oliker computer scientist in CRD. “The new parallel algorithms enable assembly calculations to be performed rapidly, with near linear scaling over thousands of cores. Now genomics researchers can assemble large genomes like wheat and pine in minutes instead of months using several hundred nodes on NERSC’s Edison.”

No comment yet.
Scooped by Dr. Stefan Gruenwald
Scoop.it!

EMBL: The genome in the cloud

EMBL: The genome in the cloud | Amazing Science | Scoop.it

Since the completion of the Human Genome Project in 2001, technological advances have made sequencing genomes much easier, quicker and cheaper, fueling an explosion in sequencing projects. Today, genomics is well into the era of ‘big data’, with genomics datasets often containing hundreds of terabytes (1014 bytes) of information.


The rise of big genomic data offers many scientific opportunities, but also creates new problems, as Jan Korbel, Group Leader in the Genome Biology Unit EMBL Heidelberg, describes in a new commentary paper authored with an international team of scientists and published today in Nature.


Korbel’s research focuses on genetic variation, especially genetic changes leading to cancer, and relies on computational and experimental techniques. While the majority of current cancer genetic studies assess the 1% of the genome comprising genes, a main research interest of the Korbel group is in studying genetic alterations within ‘intergenic’ regions that drive cancer. As this approach looks at much more of the genome than gene-focused studies, it requires analysis of larger amounts of data. This challenge is exemplified via the Pan-Cancer Analysis of Whole Genomes (PCAWG) project, co-led by Korbel, which brings together nearly 1 petabyte (10^15 bytes) of genome sequencing data from more than 2000 cancer patients.


The problem is not a shortage of data but accessing and analysing it. Genome datasets from cancer patients are typically stored in so-called ‘controlled access’ data archives, such as the European Genome-phenome Archive (EGA). These repositories, however, are ‘static’, says Korbel, meaning that the datasets need to be downloaded to a researcher’s institution before they can be further analysed or integrated with other types of data to address biomedically relevant research questions. “With massive datasets, this can take many months and may be unfeasible altogether depending on the institution’s network bandwidth and computational processing capacities,” says Korbel. “It’s a severe limitation for cancer research, blocking scientists from replicating and building on prior work.”


With data stored in one of the various commercial cloud services on offer from companies such as Amazon Web Services, or on academic community clouds, researchers can analyse vast datasets without first downloading them to their institutions, saving time and money that would otherwise need to be spent on maintaining them locally. Cloud computing also allows researchers to draw on the processing power of distributed computers to significantly speed up analysis without purchasing new equipment for computationally laborious tasks. A large portion of the data from PCAWG, for example, will be analysed through cloud computing using both academic community and commercial cloud providers, thanks to new computational frameworks currently being built.


One concern about using cloud computing revolves around the privacy of people who have supplied genetic samples for studies. However, cloud services are now typically as secure as regular institutional data centres, which has diminished this worry: earlier this year, the US National Institutes of Health lifted a 2007 ban on uploading their genomic data into cloud storage. Korbel predicts that the coming months and years will see a big upswing in the use of cloud computing for genomics research, with academic cloud services, such as the EMBL-EBI Embassy Cloud, and commercial cloud providers including Amazon becoming a crucial component of the infrastructure for pursuing research in human genetics.


Yet there remain issues to resolve. One is who should pay for cloud services. Korbel and colleagues urge funding agencies to take on this responsibility given the central role cloud services are predicted to play in future research. Another issue relates to the differing privacy, ethical and normative policies and regulations in Europe, the US, and elsewhere. Some European countries may prefer that patient data remain within their jurisdiction so that they fall under European privacy laws, and not US laws, which apply once a US-based cloud provider is used. Normative and bioethical aspects of patient genome analysis, including in the context of cloud computing, are another specific focus of Korbel’s research, which is being pursued via an inter-disciplinary collaboration with Fruzsina Molnár-Gábor from Heidelberg University faculty of law in a project funded by the Heidelberg Academy of Sciences and Humanities.


No comment yet.
Scooped by Dr. Stefan Gruenwald
Scoop.it!

For $25 a year, Google will keep a copy of any genome in the cloud

For $25 a year, Google will keep a copy of any genome in the cloud | Amazing Science | Scoop.it

Google is approaching hospitals and universities with a new pitch. Have genomes? Store them with us. The search giant’s first product for the DNA age is Google Genomics, a cloud computing service that it launched last March but went mostly unnoticed amid a barrage of high profile R&D announcements from Google, like one late last month about a far-fetched plan to battle cancer with nanoparticles (see “Can Google Use Nanoparticles to Search for Cancer?”).


Google Genomics could prove more significant than any of these moonshots. Connecting and comparing genomes by the thousands, and soon by the millions, is what’s going to propel medical discoveries for the next decade. The question of who will store the data is already a point of growing competition between Amazon, Google, IBM, and Microsoft.


Google began work on Google Genomics 18 months ago, meeting with scientists and building an interface, or API, that lets them move DNA data into its server farms and do experiments there using the same database technology that indexes the Web and tracks billions of Internet users.


“We saw biologists moving from studying one genome at a time to studying millions,” says David Glazer, the software engineer who led the effort and was previously head of platform engineering for Google+, the social network. “The opportunity is how to apply breakthroughs in data technology to help with this transition.”


Some scientists scoff that genome data remains too complex for Google to help with. But others see a big shift coming. When Atul Butte, a bioinformatics expert at Stanford heard Google present its plans this year, he remarked that he now understood “how travel agents felt when they saw Expedia.”


The explosion of data is happening as labs adopt new, even faster equipment for decoding DNA. For instance, the Broad Institute in Cambridge, Massachusetts, said that during the month of October it decoded the equivalent of one human genome every 32 minutes. That translated to about 200 terabytes of raw data.


This flow of data is smaller than what is routinely handled by large Internet companies (over two months, Broad will produce the equivalent of what gets uploaded to YouTube in one day) but it exceeds anything biologists have dealt with. That’s now prompting a wide effort to store and access data at central locations, often commercial ones. The National Cancer Institute said last month that it would pay $19 million to move copies of the 2.6 petabyte Cancer Genome Atlas into the cloud. Copies of the data, from several thousand cancer patients, will reside both at Google Genomics and in Amazon’s data centers.


The idea is to create “cancer genome clouds” where scientists can share information and quickly run virtual experiments as easily as a Web search, says Sheila Reynolds, a research scientist at the Institute for Systems Biology in Seattle. “Not everyone has the ability to download a petabyte of data, or has the computing power to work on it,” she says.

corneja's curator insight, November 27, 2014 7:20 PM

"Our genome in the cloud"... it sounds like the title of a song. Google is offering to keep genome data in the cloud.

Scooped by Dr. Stefan Gruenwald
Scoop.it!

American Internet Services (AIS) Unveils BusinessCloud1 - Genome Cloud Collaboration with Diagnomics, Inc.

American Internet Services (AIS) Unveils BusinessCloud1 - Genome Cloud Collaboration with Diagnomics, Inc. | Amazing Science | Scoop.it

American Internet Services (AIS - http://tinyurl.com/d6nu895), a leading provider of enterprise-class data center, cloud and connectivity services, today announced AIS BusinessCloud1 (BC1) Infrastructure as a Service (IaaS) based on the widely used VMware software suite with state-of-the-art compute and storage technology featuring Cisco, Dell, Arista and NetApp. AIS removes cost barriers so companies can quickly and efficiently migrate to the cloud by gaining access to compute and storage on demand. AIS BusinessCloud1 is San Diego's first full-featured VMware-based cloud service. AIS selected VMware for a number of reasons, among them are its widespread adoption, inherent functionality, and its robust ecosystem of support and product development.

 

Diagnomics (http://www.diagnomics.com), a provider of complete personal genome sequencing and bioinformatics solutions to biomedical researchers, has been working closely with AIS to develop a cloud-based solution for large scale, data intensive genome annotation and storage to help resolve a major bottleneck in life science research.

 

"The cloud I/O of AIS BusinessCloud1has significantly exceeded our expectations compared to a standard server cluster that is both more expensive to deploy and maintain," said Min Lee, chief executive officer at Diagnomics. "By partnering with AIS, Diagnomics has achieved flexibility that is required so that genome annotation can take the next step towards delivering on the vision of personalized medicine."

 

AIS BC1 was designed with security, performance, and redundancy in mind:

 

• No single points of failure (SPoF).Integrated VMware network and storage Quality of Service (QoS) feature set to production service standards.

• Network storage is redundant, diversified, and optimized for performance.

• Self-healing systems architecture with automated failover.

•High-speed network access provided via the AIS regional optical and transit network.

 

Some of the ways AIS BusinessCloud1 can be used include:

• As an IT Extension where a direct connection can be added to the company's current VMware infrastructure and then be managed using vCenter.


As a Hybrid Cloud Solution providing instant IT scalability with built-in firewalling, a single management console, and complete public / private network isolation. As a Zero CAPEX disaster recovery site that can mirror data between sites with full infrastructure duplication. Companies can leverage the AIS 10GigE transport to the AIS Van Buren Data Center (VBDC) in Phoenix -- considered to be one of the safest locations in the nation in terms of protection from natural disasters.

No comment yet.
Scooped by Dr. Stefan Gruenwald
Scoop.it!

RAMBO speeds searches on huge DNA databases

RAMBO speeds searches on huge DNA databases | Amazing Science | Scoop.it

Rice University computer scientists are sending RAMBO to rescue genomic researchers who sometimes wait days or weeks for search results from enormous DNA databases.

 

DNA sequencing is so popular, genomic datasets are doubling in size every two years, and the tools to search the data haven't kept pace. Researchers who compare DNA across genomes or study the evolution of organisms like the virus that causes COVID-19 often wait weeks for software to index large, "metagenomic" databases, which get bigger every month and are now measured in petabytes.

 

RAMBO, which is short for "repeated and merged bloom filter," is a new method that can cut indexing times for such databases from weeks to hours and search times from hours to seconds. Rice University computer scientists presented RAMBO last week at the Association for Computing Machinery data science conference SIGMOD 2021.

 

"Querying millions of DNA sequences against a large database with traditional approaches can take several hours on a large compute cluster and can take several weeks on a single server," said RAMBO co-creator Todd Treangen, a Rice computer scientist whose lab specializes in metagenomics. "Reducing database indexing times, in addition to query times, is crucially important as the size of genomic databases are continuing to grow at an incredible pace."

 

To solve the problem, Treangen teamed with Rice computer scientist Anshumali Shrivastava, who specializes in creating algorithms that make big data and machine learning faster and more scalable, and graduate students Gaurav Gupta and Minghao Yan, co-lead authors of the peer-reviewed conference paper on RAMBO.

 

RAMBO uses a data structure that has a significantly faster query time than state-of-the-art genome indexing methods as well as other advantages like ease of parallelization, a zero false-negative rate and a low false-positive rate.

 

"The search time of RAMBO is up to 35 times faster than existing methods," said Gupta, a doctoral student in electrical and computer engineering. In experiments using a 170-terabyte dataset of microbial genomes, Gupta said RAMBO reduced indexing times from "six weeks on a sophisticated, dedicated cluster to nine hours on a shared commodity cluster."

No comment yet.
Rescooped by Dr. Stefan Gruenwald from Plant Pathogenomics
Scoop.it!

Web resource: 1000 Fungal Genomes Project (2016)

Web resource: 1000 Fungal Genomes Project (2016) | Amazing Science | Scoop.it

Sequencing unsampled fungal diversity.  Efforts to sequence 1000+ fungal genomes. Also see the Google+ site for more discussion opportunities.

 

This project is in collaboration with the work of the JGI and you can find links on this site to the nomination page for submitting candidate species to the project.


Via Kamoun Lab @ TSL
No comment yet.
Scooped by Dr. Stefan Gruenwald
Scoop.it!

Broad Institute, Google Genomics combine bioinformatics and computing expertise

Broad Institute, Google Genomics combine bioinformatics and computing expertise | Amazing Science | Scoop.it

Broad Institute of MIT and Harvard is teaming up with Google Genomics to explore how to break down major technical barriers that increasingly hinder biomedical research by addressing the need for computing infrastructure to store and process enormous datasets, and by creating tools to analyze such data and unravel long-standing mysteries about human health.

As a first step, Broad Institute’s Genome Analysis Toolkit, or GATK, will be offered as a service on the Google Cloud Platform, as part of Google Genomics. The goal is to enable any genomic researcher to upload, store, and analyze data in a cloud-based environment that combines the Broad Institute’s best-in-class genomic analysis tools with the scale and computing power of Google.

GATK is a software package developed at the Broad Institute to analyze high-throughput genomic sequencing data. GATK offers a wide variety of analysis tools, with a primary focus on genetic variant discovery and genotyping as well as a strong emphasis on data quality assurance. Its robust architecture, powerful processing engine, and high-performance computing features make it capable of taking on projects of any size.

GATK is already available for download at no cost to academic and non-profit users. In addition, business users can license GATK from the Broad. To date, more than 20,000 users have processed genomic data using GATK.

The Google Genomics service will provide researchers with a powerful, additional way to use GATK. Researchers will be able to upload genetic data and run GATK-powered analyses on Google Cloud Platform, and may use GATK to analyze genetic data already available for research via Google Genomics. GATK as a service will make best-practice genomic analysis readily available to researchers who don’t have access to the dedicated compute infrastructure and engineering teams required for analyzing genomic data at scale. An initial alpha release of the GATK service will be made available to a limited set of users.

“Large-scale genomic information is accelerating scientific progress in cancer, diabetes, psychiatric disorders, and many other diseases,” said Eric Lander, President and Director of Broad Institute. “Storing, analyzing, and managing these data is becoming a critical challenge for biomedical researchers. We are excited to work with Google’s talented and experienced engineers to develop ways to empower researchers around the world by making it easier to access and use genomic information.”

Weronika's curator insight, April 3, 2022 3:25 AM
“Large-scale genomic information is accelerating scientific progress in cancer, diabetes, psychiatric disorders, and many other diseases,”
Scooped by Dr. Stefan Gruenwald
Scoop.it!

Face-To-Face: Crude Mugshots built from DNA data alone

Face-To-Face: Crude Mugshots built from DNA data alone | Amazing Science | Scoop.it
Computer program crudely predicts a facial structure from genetic variations.


Researchers have now shown how 24 gene variants can be used to construct crude models of facial structure. Thus, leaving a hair at a crime scene could one day be as damning as leaving a photograph of your face. Researchers have developed a computer program that can create a crude three-dimensional (3D) model of a face from a DNA sample.


Using genes to predict eye and hair color is relatively easy. But the complex structure of the face makes it more valuable as a forensic tool — and more difficult to connect to genetic variation, says anthropologist Mark Shriver of Pennsylvania State University in University Park, who led the work, published today in PLOS Genetics1.


Shriver and his colleagues took high-resolution images of the faces of 592 people of mixed European and West African ancestry living in the United States, Brazil and Cape Verde. They used these images to create 3D models, laying a grid of more than 7,000 data points on the surface of the digital face and determining by how much particular points on a given face varied from the average: whether the nose was flatter, for instance, or the cheekbones wider. They had volunteers rate the faces on a scale of masculinity and femininity, as well as on perceived ethnicity.


Next, the authors compared the volunteers’ genomes to identify points at which the DNA differed by a single base, called a single nucleotide polymorphism (SNP). To narrow down the search, they focused on genes thought to be involved in facial development, such as those that shape the head in early embryonic development, and those that are mutated in disorders associated with features such as cleft palate. Then, taking into account the person’s sex and ancestry, they calculated the statistical likelihood that a given SNP was involved in determining a particular facial feature.


This pinpointed 24 SNPs across 20 genes that were significantly associated with facial shape. A computer program the team developed using the data can turn a DNA sequence from an unknown individual into a predictive 3D facial model (see 'Face to face'). Shriver says that the group is now trying to integrate more people and genes, and look at additional traits, such as hair texture and sex-specific differences.

No comment yet.
Rescooped by Dr. Stefan Gruenwald from Singularity Scoops
Scoop.it!

Create Stunning Circular Infographics: Circos

Create Stunning Circular Infographics: Circos | Amazing Science | Scoop.it

Circos is a software package conceived and created by Martin Krzywinski to visualize large amounts of data in a circular layout.  

 

Circos is ideal for creating publication-quality infographics and illustrations with a high data-to-ink ratio, richly layered data and pleasant symmetries. 

 

See some examples of Circus generated infographics: http://circos.ca/images/ 

http://circos.ca/images/published/ 

 

See a tour of Circos feature: http://circos.ca/guide/tables/ 

 

Download the software: http://circos.ca/software/ 

(You will need Perl to run Circos. Perl is an interpretive language, like Python or Ruby. It is available for nearly every operating system and if you're on UNIX or Mac OS X, you very likely already have it installed. Perl 5.8.x, or newer, is recommended.)

 

Check out the tutorials: http://circos.ca/tutorials/ 


Via Robin Good, Frederic Emam-Zade Gerardino
No comment yet.