Amazing Science
820.6K views | +104 today
Amazing Science
Amazing science facts - 3D_printing • aging • AI • anthropology • art • astronomy • bigdata • bioinformatics • biology • biotech • chemistry • computers • cosmology • education • environment • evolution • future • genetics • genomics • geosciences • green_energy • history • language • map • material_science • math • med • medicine • microscopy • nanotech • neuroscience • paleontology • photography • photonics • physics • postings • robotics • science • technology • video
Your new post is loading...
Rescooped by Dr. Stefan Gruenwald from DNA and RNA research!

Can data storage in DNA solve our massive data storage problem in the future?

Can data storage in DNA solve our massive data storage problem in the future? | Amazing Science |

The latest in high-density ultra-durable data storage has been perfected over billions of years by nature itself.


Now ‘Smoke on the Water’ is making history again. This September, it was one of the first items from the Memory Of the World archive to be stored in the form of DNA and then played back with 100% accuracy. The project was a joint effort between the University of Washington, Microsoft and Twist Bioscience, a San Francisco-based DNA manufacturing company.


The demonstration was billed as a ‘proof of principle’ – which is shorthand for successful but too expensive to be practical. At least for now. Many pundits predict it’s just a matter of time till DNA pips magnetic tape as the ultimate way to store data. It’s compact, efficient and resilient. After all, it has been tweaked over billions of years into the perfect repository for genetic information. It will never become obsolete, because as long as there is life on Earth, we will be interested in decoding DNA. “Nature has optimised the format,” says Twist Bioscience’s chief technology officer Bill Peck.


Players like Microsoft, IBM and Intel are showing signs of interest. In April, they joined other industry, academic and government experts at an invitation-only workshop (cosponsored by the U.S. Intelligence Advanced Research Projects Activity (IARPA)) to discuss the practical potential for DNA to solve humanity’s looming data storage crisis.


It’s a big problem that’s getting bigger by the minute. According to a 2016 IBM Marketing Cloud report, 90% of the data that exists today was created in just the past two years. Every day, we generate another 2.5 quintillion (2.5 × 1018) bytes of information. It pours in from high definition video and photos, Big Data from particle physics, genomic sequencing, space probes, satellites, and remote sensing; from think tanks, covert surveillance operations, and Internet tracking algorithms. EVERY DAY, WE GENERATE ANOTHER 2.5 QUINTILLION BYTES OF INFORMATION.


Right now all those bits and bytes flow into gigantic server farms, onto spinning hard drives or reels of state-of-the-art magnetic tape. These physical substrates occupy a lot of space. Compare this to DNA. The entire human genome, a code of three billion DNA base pairs, or in data speak, 3,000 megabytes, fits into a package that is invisible to the naked eye – the cell’s nucleus. A gram of DNA — the size of a drop of water on your fingertip — can store at least the equivalent of 233 computer hard drives weighing more than 150 kilograms. To store the all the genetic information in a human body — 150 zettabytes — on tape or hard drives, you’d need a facility covering thousands, if not millions of square feet.


And then there’s durability. Of the current storage contenders, magnetic tape has the best lifespan, at about 10-20 years. Hard drives, CDs, DVDs and flash drives are less reliable, often failing within five to ten years. DNA has proven that it can survive thousands of years unscathed. In 2013, for example, the genome of an early horse relative was reconstructed from DNA from a 700,000-year-old bone fragment found in the Alaskan permafrost.

Via Integrated DNA Technologies
No comment yet.
Scooped by Dr. Stefan Gruenwald!

Expect the unexpected from the big-data boom in radio astronomy

Expect the unexpected from the big-data boom in radio astronomy | Amazing Science |

Radio astronomy is undergoing a major boost, with new technology gathering data on objects in our universe faster than astronomers can analyze.


A good review of the state of radio astronomy is published in Nature Astronomy. Over the next few years, we will see the universe in a very different light, and we are likely to make completely unexpected discoveries. Radio telescopes view the sky using radio waves and mainly see jets of electrons traveling at the speed of light, propelled by super-massive black holes. That gives a very different view to the one we see when observing a clear night sky using visible light, which mainly sees light from stars.


Black holes were only found in science fiction before radio astronomers discovered them in quasars. It now seems that most galaxies, including our own Milky Way, have a super-massive black hole at their center.


No comment yet.
Rescooped by Dr. Stefan Gruenwald from Nostri Orbis!

Big Data Technology – What’s Next?

Big Data Technology – What’s Next? | Amazing Science |

The world is expanding and so is the data around. The concept of big data has never looked more fascinating than now. Businesses are now looking for patterns to implement big data technology directly into their business applications and software. The term has moved from being just a buzzword to one of the most essential components of a company’s IT infrastructure.


Organizations are taking the next step to identify the current as well as future developments in big data deployments. Massive data sets from an ever-expanding list of sources are what define big data in the simplest way. Organizations are trying to create a culture where they can embed the technology in applications so that it can truly empower their business. Truly, big data has taken the business world by storm, but what next! How big is it going to get; will businesses using data see productivity benefits, are there any security concerns? Questions abound and so does their answers. However, differentiating between what will sustain and what will pass will save you time and most importantly, a wrong investment.


Since long, big data solutions, has been introduced as massive sets built around centralized data lakes. The reasons were quite simple. The data became difficult to duplicate and management was easier. In 2016 though, organizations are thinking of moving to distributed big data processing, not to mention managing multiple data location centers and multiple devices. Further, the continued growth of the Internet of Things is increasingly going to affect the deployment of distributed data processing frameworks.


The ever-expanding list of resources continues generating larger and larger volumes of data. There are a lot of data and there is going to be more. Data-driven companies like Google are stressing on data analysis and how it must be grounded in sound values and practices.


Terabytes, petabytes, or exabytes - big data means huge amounts of data getting transferred between applications. The consistent back and forth of information is creating huge security concerns. Intrusions are detected on a daily basis and what’s worse is organizations are keeping security as their second or third priority. Hackers can breach any database if your security system gets easily defeated.


Organizations with weak protection solutions will end up being victims of thousands of hackers out there. Get a top-notch anti-malware software program that will safeguard your perimeter and it won’t be pregnable.


Experts debate when it comes to quality of data. According to them, big isn’t necessary whatsoever and business doesn’t use even a fraction of the data they have access to. The idea is moving from big to fast and actionable data that answer questions and produces effective uses of the data.


Big data is going to get bigger no matter what. Don’t get left behind and adopt it as soon as possible. For effective data management and protection from intrusion, make sure you are securing your enterprise with top performance anti-malware solutions. If big data is expanding, then so has to be the scale, speed, security and integration requirements of your organization.

Via Fernando Gil
No comment yet.
Scooped by Dr. Stefan Gruenwald!

Birdsnap: Identifying a bird from a picture with artificial intelligence

Birdsnap: Identifying a bird from a picture with artificial intelligence | Amazing Science |

Birdsnap is a free electronic field guide covering 500 of the most common North American bird species, available as a web site or aniPhone app. Researchers from Columbia University and the University of Maryland developed Birdsnap using computer vision and machine learning to explore new ways of identifying bird species. Birdsnap automatically discovers visually similar species and makes visual suggestions for how they can be distinguished. In addition, Birdsnap uses visual recognition technology to allow users who upload bird images to search for visually similar species. Birdsnap estimates the likelihood of seeing each species at any location and time of year based on sightings records, and uses this likelihood both to produce a custom guide to local birds for each user and to improve the accuracy of visual recognition.

The genesis of Birdsnap (and its predecessors Leafnsap and Dogsnap) was the realization that many techniques used for face recognition developed by Peter Belhumeur (Columbia University) and David Jacobs(University of Maryland) could also be applied to automatic species identification. State-of-the-art face recognition algorithms rely on methods that find correspondences between comparable parts of different faces, so that, for example, a nose is compared to a nose, and an eye to an eye. In the same way, Birdsnap detects the parts of a bird, so that it can examine the visual similarity of comparable parts of the bird.Our first electronic field guide Leafsnap, produced in collaboration with the Smithsonian Institution, was launched in May 2011.


This free iPhone app uses visual recognition software to help identify tree species from photographs of their leaves. Leafsnap currently includes the trees of the northeastern US and will soon grow to include the trees of the United Kingdom. Leafsnap has been downloaded by over a million users, and discussed extensively in the press (see, for more information). In 2012, we launched Dogsnap, an iPhone app that allows you to use visual recognition to help determine dog breeds. Dogsnap contains images and textual descriptions of over 150 breeds of dogs recognized by the American Kennel Club.

For their inspiration and advice on bird identification, we thank the UCSD Computer Vision group, especially Serge Belongie, Catherine Wah, and Grant Van Horn; the Caltech Computational Vision group, especially Pietro Perona, Peter Welinder, and Steve Branson; the alumni of these groups Ryan Farrell (now at BYU), Florian Schroff (at Google), and Takeshi Mita (at Toshiba); and the Visipedia effort.

No comment yet.
Rescooped by Dr. Stefan Gruenwald from Popular Science!

Watch how the measles outbreak spreads when kids get vaccinated – and when they don't

Watch how the measles outbreak spreads when kids get vaccinated – and when they don't | Amazing Science |

If you take 10 communities and run a simulation, it’s easy to see why we need as many members of the ‘herd’ as possible to get vaccines – before it’s too late.


Measles are back in the US – and spreading. More than 100 cases across 14 states and Washington DC have been confirmed by US health officials since an outbreak began at Disneyland last December. With a majority of those infections in unvaccinated people, widespread blame – from Washington to the rest of the world – has fallen on parents who chose not to vaccinate their children.


Part of the problem, according to Dr Elizabeth Edwards, professor of pediatrics and director of the Vanderbilt Vaccine Research Program, is just that: vaccination is understood by many as an individual choice, when science makes clear that the choice – to vaccinate or not to vaccinate – can affect an entire community.


“When you immunize your child, you’re not only immunizing your child. That child’s immunization is contributing to the control of the disease in the population,” Edwards explained. That sheltering effect is called herd immunity: a population that is highly immunized makes for a virus that can’t spread easily, providing protection to the community – or the herd – as a whole.

Despite the high overall measles vaccination rate in the US, vaccine skeptics – and their unimmunized kids – often congregate in like-minded communities, creating pockets of under-immunization.


California, where the bulk of current measles cases can still be found, is a prime example. It's one of 20 states that allow parents to skip vaccination based on their personal, philosophical beliefs – even though legislators introduced a bill that would ban such an opt-out provision.

Via Neelima Sinha
Boris Limpopo's curator insight, February 17, 2017 9:55 AM
Vaccinate e fate vaccinare
Andreas Maniatis's curator insight, February 18, 2017 5:50 AM
Watch how the measles outbreak spreads when kids get vaccinated – and when they don't
Scooped by Dr. Stefan Gruenwald!

With new algorithms, data scientists could accomplish in days what once took months

With new algorithms, data scientists could accomplish in days what once took months | Amazing Science |

Last year, MIT researchers presented a system that automated a crucial step in big-data analysis: the selection of a "feature set," or aspects of the data that are useful for making predictions. The researchers entered the system in several data science contests, where it outperformed most of the human competitors and took only hours instead of months to perform its analyses.


Now, in a pair of papers at the IEEE International Conference on Data Science and Advanced Analytics, the team described an approach to automating most of the rest of the process of big-data analysis—the preparation of the data for analysis and even the specification of problems that the analysis might be able to solve.


The researchers believe that, again, their systems could perform in days tasks that used to take data scientists months.

"The goal of all this is to present the interesting stuff to the data scientists so that they can more quickly address all these new data sets that are coming in," says Max Kanter MEng '15, who is first author on last year's paper and one of this year's papers.


"Data scientists want to know, 'Why don't you show me the top 10 things that I can do the best, and then I'll dig down into those?' So, these methods are shrinking the time between getting a data set and actually producing value out of it."


Both papers focus on time-varying data, which reflects observations made over time, and they assume that the goal of analysis is to produce a probabilistic model that will predict future events on the basis of current observations.


The first paper describes a general framework for analyzing time-varying data. It splits the analytic process into three stages: labeling the data, or categorizing salient data points so they can be fed to a machine-learning system; segmenting the data, or determining which time sequences of data points are relevant to which problems; and "featurizing" the data, the step performed by the system the researchers presented last year.


The second paper describes a new language for describing data-analysis problems and a set of algorithms that automatically recombine data in different ways, to determine what types of prediction problems the data might be useful for solving.


According to Kalyan Veeramachaneni, a principal research scientist at MIT's Laboratory for Information and Decision Systems and senior author on all three papers, the work grew out of his team's experience with real data-analysis problems brought to it by industry researchers.

No comment yet.
Scooped by Dr. Stefan Gruenwald!

Museums of the Web: Relationships between a museum and its environment

Museums of the Web: Relationships between a museum and its environment | Amazing Science |

In 2014 at the Museums and the Web conferences in Baltimore, Maryland, and Florence, Italy, a group of data scientists presented a report on how two thousand museums were relating on Twitter. After our presentation, they decided to continue their investigation in two directions: The first are the case studies. They wanted to analyze some museums' Twitter environments, e.g., how museums are relating to their followers, how they form different communities defined by underlying relationships, the growth of the network around a museum, detection of influencers, etc. They also wanted to study how to use this information in order to design better social-media strategies for an improved and more realistic assessment of the work and actions’ results on Twitter.


Case studies from London's Victoria & Albert, Turin's Palazzo Madama, and Barcelona's Center for Contemporary Culture have chosen museums with different Twitter strategies, countries, languages, and follower bases ranging from 7,000 at the time of writing this abstract (Palazzo Madama) to the 450,000 from V&A. The goal was to have different situations represented and run a useful research for museums of different sizes. The second direction nowadays is how museum professionals are interacting with each other on Twitter thanks to the use of hashtags, like #musetech, in order to share information, questions, and experiences.

No comment yet.
Scooped by Dr. Stefan Gruenwald!

Maps show genetic diversity in mammals, amphibians around the world

Maps show genetic diversity in mammals, amphibians around the world | Amazing Science |

Maps have long been used to show the animal kingdom’srange, regional mix, populations at risk and more. Now a new set of maps reveals the global distribution of genetic diversity.


“Without genetic diversity, species can’t evolve into new species,” says Andreia Miraldo, a population geneticist at the Natural History Museum of Denmark in Copenhagen. “It also plays a fundamental role in allowing species populations to adapt to changes in their environment.”


Miraldo and her colleagues gathered geographical coordinates for more than 92,000 records of mitochondrial DNA from 4,675 species of land mammals and amphibians. The researchers compared changes in cytochrome b, a gene often used to measure genetic diversity within a species, and then mapped the average genetic diversity for all species within roughly 150,000 square-kilometer areas.

No comment yet.
Scooped by Dr. Stefan Gruenwald!

DNAdigest and Repositive: Connecting the World of Genomic Data

DNAdigest and Repositive: Connecting the World of Genomic Data | Amazing Science |

There is no unified place where genomics researchers can search through all available raw genomic data in a way similar to OMIM for genes or Uniprot for proteins. With the recent increase in the amount of genomic data that is being produced and the ever-growing promises of precision medicine, this is becoming more and more of a problem. DNAdigest is a charity working to promote efficient sharing of human genomic data to improve the outcome of genomic research and diagnostics for the benefit of patients. Repositive, a social enterprise spin-out of DNAdigest, is building an online platform that indexes genomic data stored in repositories and thus enables researchers to search for and access a range of human genomic data sources through a single, easy-to-use interface, free of charge.

No comment yet.
Scooped by Dr. Stefan Gruenwald!

Storage technologies struggle to keep up with big data – is there a biological alternative?

Storage technologies struggle to keep up with big data – is there a biological alternative? | Amazing Science |

If DNA archives become a plausible method of data storage, it will be thanks to rapid advances in genetic technologies. The sequencing machines that “read out” DNA code have already become exponentially faster and cheaper; the National Institutes of Health shows costs for sequencing a 3-billion-letter genome plummeting from US $100 million in 2001 to a mere $1,000 today. However, DNA synthesis technologies required to “write” the code are much newer and less mature. Synthetic-biology companies like San Francisco’s Twist Biosciencehave begun manufacturing DNA to customers’ specifications only in the last few years, primarily serving biotechnology companies that are tweaking the genomes of microbes to trick them into making some desirable product. Manufacturing DNA for data storage could be a profitable new market, says Twist CEO Emily Leproust.


Twist sent a representative to the April meeting, and the company is also working with Microsoft on a separate experiment in DNA storage, in which it synthesized 10 million strands of DNA to encode Microsoft’s test file. Leproust says Microsoft and the other tech companies are currently trying to determine “what kind of R&D has to be done to make a viable commercial product.” To make a product that’s competitive with magnetic tape for long-term storage, Leproust estimates that the cost of DNA synthesis must fall to 1/10,000 of today’s price. “That is hard,” she says mildly. But, she adds, her industry can take inspiration from semiconductor manufacturing, where costs have dropped far more dramatically. And just last month, an influential group of geneticists proposed an international effort to reduce the cost of DNA synthesis, suggesting that $100 million could launch the project nicely.

No comment yet.
Scooped by Dr. Stefan Gruenwald!

Is Fog Computing The Next Big Thing In Internet of Things?

Is Fog Computing The Next Big Thing In Internet of Things? | Amazing Science |

One of the reasons why IoT has gained momentum in the recent past is the rise of cloud services. Though the concept of M2M existed for over a decade, organizations never tapped into the rich insights derived from the datasets generated by sensors and devices. Existing infrastructure was just not ready to deal with the massive scale demanded by the connected devices architecture. That’s where cloud becomes an invaluable resource for enterprises.


With abundant storage and ample computing power, cloud became an affordable extension to the enterprise data center. The adoption of cloud resulted in increased usage of Big Data platforms and analytics. Organizations are channelizing every bit of data generated from a variety of sources and devices to the cloud where it is stored, processed, and analyzed for deriving valuable insights. The combination of cloud and Big Data is the key enabler of Internet of Things. IoT is all set to become the killer use case for distributed computing and analytics.


Cloud service providers such as Amazon, GoogleIBM, MicrosoftSalesforce, and Oracle are offering managed IoT platforms that deliver the entire IoT stack as a service. Customers can on-board devices, ingest data, define data processing pipelines that analyze streams in real-time, and derive insights from the sensor data. Cloud-based IoT platforms are examples of verticalized PaaS offerings, which are designed for a specific use case.


While cloud is a perfect match for the Internet of Things, not every IoT scenario can take advantage of it. Industrial IoT solutions demand low-latency ingestion and immediate processing of data. Organizations cannot afford the delay caused by the roundtrip between the devices layer and cloud-based IoT platforms. The solution demands instant processing of data streams with quick turnaround. For example, it may be too late before the IoT cloud shuts down an LPG refilling machine after detecting an unusual combination of pressure and temperature thresholds. Instead, the anomaly should be detected locally within milliseconds followed by an immediate action trigged by a rule. The other scenario that demands local processing is healthcare. Given the sensitivity of data, healthcare companies don’t want to stream critical data points generated by life-saving systems. That data needs to be processed locally not only for faster turnaround but also for anonymizing personally identifiable patient data.


The demand for distributing the IoT workloads between the local data center and cloud has resulted in an architectural pattern called Fog computing. Large enterprises dealing with industrial automation will have to deploy infrastructure within the data center that’s specifically designed for IoT. This infrastructure is a cluster of compute, storage, and networking resources delivering sufficient horsepower to deal with the IoT data locally. The cluster that lives on the edge is called the Fog layer. Fog computing mimics cloud capabilities within the edge location, while still taking advantage of the cloud for heavy lifting. Fog computing is to IoT what hybrid cloud is to enterprise IT. Both the architectures deliver best of both worlds.


Cisco is one of the early movers in the Fog computing market. The company is credited with coining the term even before IoT became a buzzword. Cisco positioned Fog as the layer to reduce the latency in hybrid cloud scenarios. With enterprises embracing converged infrastructure in data centers and cloud for distributed computing, Cisco had vested interest in pushing Fog to stay relevant in the data center. Almost after five years of evangelizing Fog computing with little success, Cisco finally found a legitimate use case in the form of IoT.

No comment yet.
Scooped by Dr. Stefan Gruenwald!

The biggest Big Data project on Earth

The biggest Big Data project on Earth | Amazing Science |
The biggest amount of data ever gathered and processed passing through the UK, for scientists and SMBs to slice, dice, and turn into innovations and insights. When Big Data becomes Super-Massive Data.


Eventually there will be two SKA telescopes. The first, consisting of 130,000 2m dipole low-frequency antennae, is being built in the Shire of Murchison, a remote region about 800km north of Perth, Australia – an area the size of the Netherlands, but with a population of less than 100 people. Construction kicks off in 2018.


By Phase 2, said Diamond, the SKA will consist of half-a-million low and mid-frequency antennae, with arrays spread right across southern Africa as well as Australia, stretching all the way from South Africa to Ghana and Kenya – a multibillion-euro project on an engineering scale similar to the Large Hadron Collider. Which brings us to that supermassive data challenge for what, ultimately, will be an ICT-driven science facility. Diamond says: "The antennae will generate enormous volumes of data: even by the mid-2020s, Phase 1 of the project will be looking at 5,000 petabytes – five exabytes – a day of raw data. This will go to huge banks of digital signal processors, which we’re in the process of designing, and then into high-performance computers, and into an archive for scientists worldwide to access."


Our archive growth rate will be somewhere will be somewhere between 300 and 500 petabytes a year – science-quality data coming out of the supercomputer.


Using the most common element in the universe, neutral hydrogen, as a tracer, the SKA will be able to follow the trail all the way back to the cosmic dawn, a few hundred thousand years after the Big Bang. But over billions of years (a beam of light travelling at 671 million miles an hour would take 46.5 billion years to reach the edge of the observable universe) the wavelength of those ancient hydrogen signatures becomes stretched via the doppler effect, until it falls into the same range as the radiation emitted by mobile phones, aircraft, FM radio, and digital TV. This is why the SKA arrays are being built in remote, sparsely populated regions, says Diamond:

"The aim is to get away from people. It’s not because we’re antisocial – although some of my colleagues probably are a little! – but we need to get away from radio interference, phones, microwaves, and so on, which are like shining a torch in the business end of an optical telescope."


Eventually there will be two SKA telescopes. The first, consisting of 130,000 2m dipole low-frequency antennae, is being built in the Shire of Murchison, a remote region about 800km north of Perth, Australia – an area the size of the Netherlands, but with a population of less than 100 people. Construction kicks off in 2018.


By Phase 2, said Diamond, the SKA will consist of half-a-million low and mid-frequency antennae, with arrays spread right across southern Africa as well as Australia, stretching all the way from South Africa to Ghana and Kenya – a multibillion-euro project on an engineering scale similar to the Large Hadron Collider.

No comment yet.
Scooped by Dr. Stefan Gruenwald!

Discover stories within data using SandDance, a new Microsoft Research project

Discover stories within data using SandDance, a new Microsoft Research project | Amazing Science |

Data can be daunting. But within those numbers and spreadsheets is a wealth of information. There are also stories that the data can tell, if you’re able to see them. SandDance, a new Microsoft Garage project from Microsoft Research, helps you visually explore data sets to find stories and extract insights. It uses a free Web and touch-based interface to help users dynamically navigate through complex data they upload into the tool.


While data science experts will find that SandDance is a powerful tool, its ease of use can help people who aren’t experts in data science or programming the ability to analyze information – and present it – in a way that is accessible to a wider audience.


“We had this notion that a lot of visualization summarized data, and that summary is great, but sometimes you need the individual elements of your data set too,” says Steven Drucker, a principal researcher who’s focused on information visualization and data collections.


“We don’t want to lose sight of the trees because of the forest, but we also want to see the forest and the overall shape of the data. With this, you’ll see information about individuals and how they’re relative to each other. Most tools show one thing or the other. With SandDance, you can look at data from many different angles.”

No comment yet.
Rescooped by Dr. Stefan Gruenwald from Conformable Contacts!

Google Earth now includes crowdsourced photos

Google Earth now includes crowdsourced photos | Amazing Science |

While Google Street View and Google Earth already give us a good view of whichever place we want to take a look at almost anywhere in the world, there is also nothing like seeing those places through the lens of another person who is actually there or who have been there. Google Earth now includes a global map of crowdsourced photos which you can consult when making travel plans, doing research for school, or just dreaming about another place far away from your home.

Via YEC Geo
No comment yet.
Scooped by Dr. Stefan Gruenwald!

Genomic Data Growing Faster Than Twitter, YouTube and Astronomy

Genomic Data Growing Faster Than Twitter, YouTube and Astronomy | Amazing Science |

In the age of Big Data, it turns out that the largest, fastest growing data source lies within your cells. Quantitative biologists at the University of Illinois Urbana-Champaign and Cold Spring Harbor Laboratory, in New York, found that genomics reigns as champion over three of the biggest data domains around: astronomy, Twitter, and YouTube.


The scientists determined which would expand the fastest by evaluating acquisition, storage, distribution, and analysis of each set of data. Genomes are quantified by their chemical constructs, or base pairs. Genomics trumps other data generators because the genome sequencing rate doubles every seven months. If it maintains this rate, by 2020 more than one billion billion bases will be sequenced and stored per year, or 1 exabase. By 2025, researchers estimate the rate will be almost one zettabase, one trillion billion bases, per sequence per year. “What does it mean to have more genomes than people on the planet?”—Michael Schatz, Cold Spring Harbor Laboratory


90 percent of the genome data analyzed in the study was human. The scientists estimate that 100 million to 2 billion human genomes will be sequenced by 2025. That’s a four to five order of magnitude of growth in ten years, which far exceeds the other three data generators they studied. “For human genomics, which is the biggest driver of the whole field, the hope is that by sequencing many, many individuals, that knowledge will be obtained to help predict and cure a variety of diseases,” says University of Illinois Urbana-Champaign co-author, Gene Robinson. Before it can be useful for medicine, genomes must be coupled with other genomic data sets, including tissue information.


One reason the rate is doubling so quickly is because scientists have begun sequencing individual cells. Single-cell genome sequencing technology for cancer research can reveal mutated sequences and aid in diagnosis. Patients may have multiple single cells sequenced, and there could end up being more than 7 billion genomes sequenced.


That “is more than the population of the Earth,” says Michael Schatz, associate professor at Cold Spring Harbor Laboratory, in New York. “What does it mean to have more genomes than people on the planet?” What it means is a mountain of information must be collected, filed, and analyzed.


“Other disciplines have been really successful at these scales, like YouTube,” says Schatz. Today, YouTube users upload 300 hours of video every minute, and the researchers expect that rate to grow up to 1,700 hours per minute, or 2 exabytes of video data per year, by 2025. Google set up a seamless data-flowing infrastructure for YouTube. They provided really fast Internet, huge hard drive space, algorithms that optimized results, and a team of experienced researchers.

No comment yet.
Scooped by Dr. Stefan Gruenwald!

A geographically-diverse collection of 418 human gut microbiome pathway genome databases

A geographically-diverse collection of 418 human gut microbiome pathway genome databases | Amazing Science |

Advances in high-throughput sequencing are reshaping how we perceive microbial communities inhabiting the human body, with implications for therapeutic interventions. Several large-scale datasets derived from hundreds of human microbiome samples sourced from multiple studies are now publicly available.


However, idiosyncratic data processing methods between studies introduce systematic differences that confound comparative analyses. To overcome these challenges, scientists developed GUTCYC, a compendium of environmental pathway genome databases (ePGDBs) constructed from 418 assembled human microbiome datasets using METAPATHWAYS, enabling reproducible functional metagenomic annotation. They also generated metabolic network reconstructions for each metagenome using the PATHWAYTOOLS software, empowering researchers and clinicians interested in visualizing and interpreting metabolic pathways encoded by the human gut microbiome. For the first time, GUTCYC provides consistent annotations and metabolic pathway predictions, making possible comparative community analyses between health and disease states in inflammatory bowel disease, Crohn’s disease, and type 2 diabetes. GUTCYC data products are searchable online, or may be downloaded and explored locally using METAPATHWAYS and PATHWAY TOOLS.

No comment yet.
Scooped by Dr. Stefan Gruenwald!

Cancer statistics, 2017

Cancer statistics, 2017 | Amazing Science |

Each year, the American Cancer Society estimates the numbers of new cancer cases and deaths that will occur in the United States in the current year and compiles the most recent data on cancer incidence, mortality, and survival. Incidence data were collected by the Surveillance, Epidemiology, and End Results Program; the National Program of Cancer Registries; and the North American Association of Central Cancer Registries.


Mortality data were collected by the National Center for Health Statistics. In 2017, 1,688,780 new cancer cases and 600,920 cancer deaths are projected to occur in the United States. For all sites combined, the cancer incidence rate is 20% higher in men than in women, while the cancer death rate is 40% higher.


However, sex disparities vary by cancer type. For example, thyroid cancer incidence rates are 3-fold higher in women than in men (21 vs 7 per 100,000 population), despite equivalent death rates (0.5 per 100,000 population), largely reflecting sex differences in the “epidemic of diagnosis.” Over the past decade of available data, the overall cancer incidence rate (2004-2013) was stable in women and declined by approximately 2% annually in men, while the cancer death rate (2005-2014) declined by about 1.5% annually in both men and women. From 1991 to 2014, the overall cancer death rate dropped 25%, translating to approximately 2,143,200 fewer cancer deaths than would have been expected if death rates had remained at their peak.


Although the cancer death rate was 15% higher in blacks than in whites in 2014, increasing access to care as a result of the Patient Protection and Affordable Care Act may expedite the narrowing racial gap; from 2010 to 2015, the proportion of blacks who were uninsured halved, from 21% to 11%, as it did for Hispanics (31% to 16%). Gains in coverage for traditionally underserved Americans will facilitate the broader application of existing cancer control knowledge across every segment of the population.

No comment yet.
Scooped by Dr. Stefan Gruenwald!

Scientists make huge dataset of nearby stars available to public

Scientists make huge dataset of nearby stars available to public | Amazing Science |
Members of the public can search a newly released database of 1,600 stars to find signs of undiscovered exoplanets. The dataset, taken over two decades by the W.M. Keck Observatory in Hawaii, comes with an open-source software package and an online tutorial.


The search for planets beyond our solar system is about to gain some new recruits. Just recently, a team that includes MIT and is led by the Carnegie Institution for Science has released the largest collection of observations made with a technique called radial velocity, to be used for hunting exoplanets. The huge dataset, taken over two decades by the W.M. Keck Observatory in Hawaii, is now available to the public, along with an open-source software package to process the data and an online tutorial.


By making the data public and user-friendly, the scientists hope to draw fresh eyes to the observations, which encompass almost 61,000 measurements of more than 1,600 nearby stars.

“This is an amazing catalog, and we realized there just aren’t enough of us on the team to be doing as much science as could come out of this dataset,” says Jennifer Burt, a Torres Postdoctoral Fellow in MIT’s Kavli Institute for Astrophysics and Space Research. “We’re trying to shift toward a more community-oriented idea of how we should do science, so that others can access the data and see something interesting.”


Burt and her colleagues have outlined some details of the newly available dataset in a paper to appear in The Astronomical Journal. After taking a look through the data themselves, the researchers have detected over 100 potential exoplanets, including one orbiting GJ 411, the fourth-closest star to our solar system. “There seems to be no shortage of exoplanets,” Burt says. “There are a ton of them out there, and there is ton of science to be done.”

No comment yet.
Scooped by Dr. Stefan Gruenwald!

Navigating an Ocean of Biological Data in the Modern Era

Navigating an Ocean of Biological Data in the Modern Era | Amazing Science |

Modern experiment methods in biology can generate overwhelming amounts of raw data.  To manage this, scientists must create entirely new workflows and systems capable of merging large, disparate data sets and presenting them intuitively.  Arrowland is a combined -omics visualization tool that runs on a web browser, tablet, or cellphone.  It shows functional genomics data (fluxomics, metabolomics, proteomics and transcriptomics) together on a zoomable, searchable map, similar to the street maps used for navigation. In addition to providing a coherent layout for -omics measurements, the maps in Arrowland are also an easy-to-use reference for the relationships between the reactions, genes, and proteins involved in metabolism.


“At JBEI we have leveraged the insights provided by metabolic fluxes measured through 13C Metabolic Flux Analysis to make quantitative predictions for E. coli, and we shared these fluxes on Arrowland” said Hector Garcia Martin, Director of Quantitative Metabolic Modeling at JBEI. “By making this cloud-based interactive tool publicly available, our hope is to enable application of this tool in a wide range of fields, not only bioenergy. We are open to collaborations with the broader scientific community and industry.”


Setting itself apart from other available tools, Arrowland is beneficial for its clarity and ease of use. With its unique interface and presentation method, Arrowland makes the exploration of -omics data as intuitive as possible, on a platform that does not require special hardware or configuration.  Four different types of -omics data are integrated into the same map – an ability that no competing visualization platforms share – making important correlations and progressions easy to recognize and examine.


Arrowland is open-source and currently under active development, with plans to add more maps, editing and curation features, and is integrated with an open-source modeling package that JBEI plans to release separately, called the JBEI Quantitative Metabolic Modeling Library. For the time being, a sample data set from the flux profiles published in “A Method to Constrain Genome-Scale Models with 13C Labeling Data”, PLOS Computational Biology (Garcia Martin et al, 2015) can be viewed at

No comment yet.
Scooped by Dr. Stefan Gruenwald!

New IBM Watson Data Platform and Data Science Experience

New IBM Watson Data Platform and Data Science Experience | Amazing Science |

IBM recently announced a new IBM Watson Data Platform that combines the world’s fastest data ingestion engine touting speeds up to 100+GB/second with cloud data source, data science, and cognitive API services. IBM is also making IBM Watson Machine Learning Service more intuitive with a self-service interface.


According to Bob Picciano, Senior Vice President of IBM Analytics “Watson Data Platform applies cognitive assistance for creating machine learning models, making it far faster to get from data to insight. It also, provides one place to access machine learning services and languages, so that anyone, from an app developer to the Chief Data Officer, can collaborate seamlessly to make sense of data, ask better questions, and more effectively operationalize insight.”


For more information or a free trial of IBM Watson Data Platform, Data Science Experience, Watson APIs, or Bluemix, the following resources are useful:

No comment yet.
Scooped by Dr. Stefan Gruenwald!

Mapping Every Single U.S. Road Fatality From 2004 to 2013

Mapping Every Single U.S. Road Fatality From 2004 to 2013 | Amazing Science |
The death toll amounts to 373,377 lost lives.


Max Galka of Metrocosm has made some lovely maps in 2015—tracking everything from obesity trends to property values to UFO sightings—but his latest effort may be the most powerful yet. Using data from the federal Fatality Analysis Reporting System, Galka maps every single U.S. road fatality from 2004 to 2013. The death toll amounts to 373,377 lost lives.


At the national level, Galka’s map almost looks like an electricity grid stretching across America’s road network—with bright orange clusters in metro areas connected via dim red threads across remote regions. Here’s a wide view of the whole country:

No comment yet.
Rescooped by Dr. Stefan Gruenwald from Nostri Orbis!

A map of where our food originates from holds some surprises

A map of where our food originates from holds some surprises | Amazing Science |

A new study reveals the full extent of globalization in our food supply. More than two-thirds of the crops that underpin national diets originally came from somewhere else — often far away.


Previous work by the same authors had shown that national diets have adopted new crops and become more and more globally alike in recent decades. The new study shows that those crops are mainly foreign.


The idea that crop plants have centers of origin, where they were originally domesticated, goes back to the 1920s and the great Russian plant explorer Nikolai Vavilov. He reasoned that the region where a crop had been domesticated would be marked by the greatest diversity of that crop, because farmers there would have been selecting different types for the longest time. Diversity, along with the presence of that crop's wild relatives, marked the center of origin.


The Fertile Crescent, with its profusion of wild grasses related to wheat and barley, is the primary center of diversity for those cereals. Thai chilies come originally from Central America and tropical South America, while Italian tomatoes come from the Andes.


Khoury and his colleagues extended Vavilov's methods to look for the origins of 151 different crops across 23 geographical regions. They then examined national statistics for diet and food production in 177 countries, covering 98.5 percent of the world's population.

Via DrDids, Neil Bombardier, Fernando Gil
Neil Bombardier's curator insight, July 18, 2016 6:34 AM
Fascinating map of where your food comes from
Eric Larson's curator insight, July 22, 2016 11:01 AM
Interesting maps. 
Scooped by Dr. Stefan Gruenwald!

Cancer-patient big data can save lives if shared globally

Cancer-patient big data can save lives if shared globally | Amazing Science |

Sharing genetic information from millions of cancer patients around the world could revolutionize cancer prevention and care, according to a paper in Nature Medicine by the Cancer Task Team of the Global Alliance for Genomics and Health (GA4GH). Hospitals, laboratories and research facilities around the world hold huge amounts of this data from cancer patients, but it’s currently held in isolated “silos” that don’t talk to each other, according to GA4GH, a partnership between scientists, clinicians, patients, and the IT and Life Sciences industry, involving more than 400 organizations in over 40 countries. GA4GH intends to provide a common framework for the responsible, voluntary and secure sharing of patients’ clinical and genomic data.


“Imagine if we could create a searchable cancer database that allowed doctors to match patients from different parts of the world with suitable clinical trials,” said GA4GH co-chair professor Mark Lawler, a leading cancer expert fromQueen’s University Belfast. “This genetic matchmaking approach would allow us to develop personalized treatments for each individual’s cancer, precisely targeting rogue cells and improving outcomes for patients.


“This data sharing presents logistical, technical, and ethical challenges. Our paper highlights these challenges and proposes potential solutions to allow the sharing of data in a timely, responsible and effective manner. We hope this blueprint will be adopted by researchers around the world and enable a unified global approach to unlocking the value of data for enhanced patient care.”


GA4GH acknowledges that there are security issues, and has created a Security Working Group and a policy paper that documents the standards and implementation practices for protecting the privacy and security of shared genomic and clinical data.


Examples of current initiatives for clinico-genomic data-sharing include the U.S.-based Precision Medicine Initiative and the UK’s 100,000 Genomes Project, both of which have cancer as a major focus.


Herve Moal's curator insight, May 26, 2016 4:47 AM

l'enjeu du partage des données

Rescooped by Dr. Stefan Gruenwald from Health and Biomedical Informatics!

The role of big data in medicine

The role of big data in medicine | Amazing Science |

Technology is revolutionizing our understanding and treatment of disease, says the founding director of the Icahn Institute for Genomics and Multiscale Biology at New York’s Mount Sinai Health System.


The role of big data in medicine is one where we can build better health profiles and better predictive models around individual patients so that we can better diagnose and treat disease.


One of the main limitations with medicine today and in the pharmaceutical industry is our understanding of the biology of disease. Big data comes into play around aggregating more and more information around multiple scales for what constitutes a disease—from the DNA, proteins, and metabolites to cells, tissues, organs, organisms, and ecosystems. Those are the scales of the biology that we need to be modeling by integrating big data. If we do that, the models will evolve, the models will build, and they will be more predictive for given individuals.


It’s not going to be a discrete event—that all of a sudden we go from not using big data in medicine to using big data in medicine. I view it as more of a continuum, more of an evolution. As we begin building these models, aggregating big data, we’re going to be testing and applying the models on individuals, assessing the outcomes, refining the models, and so on. Questions will become easier to answer. The modeling becomes more informed as we start pulling in all of this information. We are at the very beginning stages of this revolution, but I think it’s going to go very fast, because there’s great maturity in the information sciences beyond medicine.


The life sciences are not the first to encounter big data. We have information-power companies like Google and Amazon and Facebook, and a lot of the algorithms that are applied there—to predict what kind of movie you like to watch or what kind of foods you like to buy—use the same machine-learning techniques. Those same types of methods, the infrastructure for managing the data, can all be applied in medicine.

Via fjms
No comment yet.
Scooped by Dr. Stefan Gruenwald!

Scientists create largest map of brain connections to date

Scientists create largest map of brain connections to date | Amazing Science |
Map of mouse visual cortex shows some striking functional connections


This tangle of wiry filaments is not a bird’s nest or a root system. Instead, it’s the largest map to date of the connections between brain cells—in this case, about 200 from a mouse’s visual cortex. To map the roughly 1300 connections, or synapses, between the cells, researchers used an electron microscope to take millions of nanoscopic pictures from a speck of tissue not much bigger than a dust mite, carved into nearly 3700 slices.


Then, teams of “annotators” traced the spindly projections of the synapses, digitally stitching stacked slices together to form the 3D map. The completed map reveals some interesting clues about how the mouse brain is wired: Neurons that respond to similar visual stimuli, such as vertical or horizontal bars, are more likely to be connected to one another than to neurons that carry out different functions, the scientists report online today in Nature.


In the image above, some neurons are color-coded according to their sensitivity to various line orientations. Ultimately, by speeding up and automating the process of mapping such networks in both mouse and human brain tissue, researchers hope to learn how the brain’s structure enables us to sense, remember, think, and feel.

No comment yet.