eBay has a new open-source, real-time analytics and stream-processing framework called Pulsar that the company claims is in production and is available for others to download, according to an eBay blog post on Monday. The online auction site is now using Pulsar to gather and process all the data pertaining to user interactions and their…
SYSTAP is very pleased to launch it’s new graph database platform Blazegraph™. It is built on the same open source GPLv2 platform and maintains 100% binary and API compatibility with Bigdata®. Blazegraph™ will take over as SYSTAP’s flagship graph database. It is specifically designed to support big graphs offering both Semantic Web (RDF/SPARQL) and Graph Database (tinkerpop, blueprints, vertex-centric) APIs. It features robust, scalable, fault-tolerant, enterprise-class storage and query and high-availability with online backup, failover and self-healing. It is in production use with enterprises such as Autodesk, EMC, Yahoo7!, and many others. Blazegraph™ provides both embedded and standalone modes of operation.
Blazegraph has a High Availability and Scale Out architecture. It provides robust support for Semantic Web (RDF/SPARQ)L and Property Graph (Tinkerpop) APIs. Highly scalable Blazegraph graph can handle 50 Billion edges on a single node.
Flink contributors talk Big Data processing, open-source community and the future of the newly minted TLP
Flink is an open-source Big Data system that fuses processing and analysis of both batch and streaming data. The data-processing engine, which offers APIs in Java and Scala as well as specialized APIs for graph processing, is presented as an alternative to Hadoop’s MapReduce component with its own runtime. Yet the system still provides access to Hadoop’s distributed file system and YARN resource manager.
Using a computer algorithm that can sift through mounds of genetic data, researchers from Brown University have identified several networks of genes that, when hit by a mutation, could play a role in the development of multiple types of cancer.
Skin cancer can be detected more quickly and accurately by using cognitive computing-based visual analytics, researchers at IBM Research have found, in collaboration with New York's Memorial Sloan Kettering Cancer Center.
This docker image is a great addition to Neo4j if you're looking to do easy PageRank or community detection on your graph data. Additionally, the results of the graph analysis are applied back to Neo4j.
Advancing New Tools to Fill in the Microbial Tree of Life To paraphrase a famous passage from Coleridge’s “The Rime of the Ancient Mariner”: microbes, microbes everywhere, though most we do not know. This is changing, though.
Yoshua Bengio, one of the most influential people in deep learning, will be hosting a live Q&A on Quora tomorrow (January 19) at 4 PM. You may need to have a Quora account to post a question for him to answer.
This is a huge opportunity if you are at all interested in deep learning, one of the most transformative methods in big data analytics today.
The following post was originally published in the Ibis project blog. (Ibis is a data analysis framework incubating in Cloudera Labs that brings Apache Hadoop scale to Python development.) The new Apache Kudu (incubating) columnar storage engine together with Apache Impala (incubating) interactive SQL engine enable a new fully open source big data architecture for data that is arriving and changing very quickly. By integrating Kudu and Impala with Ibis, Read More
StreamFlow™ is a stream processing tool designed to rapidly build and monitor processing workflows. The ultimate goal of StreamFlow is to make working with stream processing frameworks such as Apache Storm easier, faster, and with "enterprise" like management functionality.
StreamFlow provides a graphical user interface for non-developers such as data scientists, analysts, or operational users to rapidly build scalable data flows and analytics.
On December 15th, Kaggle started the National Data Science Bowl competition (which runs till the end of March 2015). The competition consists of classifying images of ocean plankton in 121 different classes, with a supplied training set of around 30,000 labeled images, and a test set of 130,000 for which you have to provide the classification. The images are black and white, and in different sizes and shapes, with width and heights ranges roughly between 30 pixels and over 200 pixels. This is a real-world problem to tackle, while also providing through the leaderboard an ability to track your progress, as well as how you do compared to others.
Dahl Winters's insight:
A good overview of getting started fast with deep learning on a real-world problem.
Sharing your scoops to your social media accounts is a must to distribute your curated content. Not only will it drive traffic and leads through your content, but it will help show your expertise with your followers.
How to integrate my topics' content to my website?
Integrating your curated content to your website or blog will allow you to increase your website visitors’ engagement, boost SEO and acquire new visitors. By redirecting your social media traffic to your website, Scoop.it will also help you generate more qualified traffic and leads from your curation work.
Distributing your curated content through a newsletter is a great way to nurture and engage your email subscribers will developing your traffic and visibility.
Creating engaging newsletters with your curated content is really easy.