EEDSP
17.8K views | +7 today
Follow
EEDSP
Digital Signal Processing, Data Analytics, Big Data, HPC, Deep Learning, GPGPU, Distributed and Parallel Computing
Curated by Shiwon Cho
Your new post is loading...
Your new post is loading...
Scooped by Shiwon Cho
Scoop.it!

Getting Started with Tachyon by Use Cases | Intel® Developer Zone

Getting Started with Tachyon by Use Cases | Intel® Developer Zone | EEDSP | Scoop.it
In-memory computing has become an irreversible trend in big data technology, for which the wide popularity of Spark provides a good evidence. Meanwhile, memory storage and management for large data sets are still posing challenges. Out of numerous solutions, Tachyon, a memory-centric distributed storage, well solves the problems faced by many application scenarios. For example, it avoids severe GC issues due to large in-memory data being stored in JVM heap, enables data sharing across applications/jobs through memory, and reduces overhead caused by JVM heap memory allocation and on-heap data management, which help improve the computing engine’s stability and availability. This article uses application examples to demonstrate how to solve real-world big data issues with Tachyon, and to help understand what Tachyon can offer in various big data scenarios.
more...
No comment yet.
Scooped by Shiwon Cho
Scoop.it!

Real Time Analytics of Big Data

Real Time Analytics of Big Data | EEDSP | Scoop.it
The data that we deal with can be analyzed by two ways.
i) When the data is in motion. That mean when data is still running and it has not been inserted into database.
ii)After data has been inserted into database.
more...
No comment yet.
Scooped by Shiwon Cho
Scoop.it!

Introducing Apache Arrow: A Fast, Interoperable In-Memory Columnar Data Structure Standard - Cloudera Engineering Blog

Introducing Apache Arrow: A Fast, Interoperable In-Memory Columnar Data Structure Standard - Cloudera Engineering Blog | EEDSP | Scoop.it
Engineers from across the Apache Hadoop community are collaborating to establish Arrow as a de-facto standard for columnar in-memory processing and interchange. Here’s how it works.
Apache Arrow is an in-memory data structure specification for use by engineers building data systems. It has several key benefits:

A columnar memory-layout permitting O(1) random access. The layout is highly cache-efficient in analytics workloads and permits SIMD optimizations with modern processors. Read More
more...
No comment yet.
Scooped by Shiwon Cho
Scoop.it!

Big Data Basics: Hadoop, MapReduce, Hive, Pig, & Spark

Big Data Basics: Hadoop, MapReduce, Hive, Pig, & Spark | EEDSP | Scoop.it
Learn about some of the most popular big data analysis frameworks in commercial use - and look at some real code! - Free Course
more...
No comment yet.
Scooped by Shiwon Cho
Scoop.it!

Design of a Modern Cache - High Scalability -

Design of a Modern Cache - High Scalability - | EEDSP | Scoop.it
This is a guest post by Benjamin Manes , who did engineery things for Google and is now doing e...
more...
No comment yet.
Scooped by Shiwon Cho
Scoop.it!

A distributed approach to co-operative data | Tim's Blog

Principle 6 of the International Co-operative Alliance calls for ‘co-operation amongst co-operatives’. Yet, for many co-ops, finding other worker owned businesses to work with can be challenging. Although there are over 7,000 co-operatives in the UK, and many more worldwide, it can be challenging to find out much about them.
more...
No comment yet.
Scooped by Shiwon Cho
Scoop.it!

Big Data Doesn’t Exist

Big Data Doesn’t Exist | EEDSP | Scoop.it
My customers always lie to me. They don’t lie about what they can afford. They don’t lie about how much (or how little) customer service they’ll need...
more...
No comment yet.
Scooped by Shiwon Cho
Scoop.it!

分散システム処理モデルに関する動向について(MapReduceからBorgまで)

分散システム処理モデルに関する動向について(MapReduceからBorgまで) | EEDSP | Scoop.it
今回は、Googleから公開されたBorgなる論文を大規模分散システムの処理モデル的な観点から考察してみたいと思います。端的に言えば、Borgも含めた最近のクラウド環境の分散システムには重要なパラダイムシフト的な潮流があります。
大規模分散システムの処理モデル的な観点で、最初に近年のクラウド環境の分散システム動向を整理しつつ、最後にBorgから直近の分散システムの潮流を考察してみます。
more...
No comment yet.
Scooped by Shiwon Cho
Scoop.it!

How Stephen Wolfram’s image-recognition tool performs against 5 alternatives

How Stephen Wolfram’s image-recognition tool performs against 5 alternatives | EEDSP | Scoop.it
To get a feel for the power of the new Wolfram technology, I put it up against other image-recognition systems.
more...
No comment yet.
Scooped by Shiwon Cho
Scoop.it!

Grappling with the growth of scientific data - Scientific Computing World

Grappling with the growth of scientific data - Scientific Computing World | EEDSP | Scoop.it

Metadata is key to mastering the volumes of data in science and engineering, argues Bob Murphy, and tools are available to make it easier

more...
No comment yet.
Scooped by Shiwon Cho
Scoop.it!

Putting Apache Kafka To Use: A Practical Guide to Building a Stream Data Platform (Part 1)

Putting Apache Kafka To Use: A Practical Guide to Building a Stream Data Platform (Part 1) | EEDSP | Scoop.it
These days you hear a lot about "stream processing", "event data", and "real-time", often related to technologies like Kafka, Storm, Samza, or Spark's Streaming module. Though there is a lot of exc...
more...
No comment yet.
Scooped by Shiwon Cho
Scoop.it!

Start of a new era: Apache HBase 1.0 : Apache HBase

Start of a new era: Apache HBase 1.0 : Apache HBase | EEDSP | Scoop.it

The Apache HBase community has released Apache HBase 1.0.0. Seven years in the making, it marks a major milestone in the Apache HBase project’s development, offers some exciting features and new API’s without sacrificing stability, and is both on-wire and on-disk compatible with HBase 0.98.x.

 
more...
No comment yet.
Scooped by Shiwon Cho
Scoop.it!

Google Launches Cloud Dataproc, A Managed Spark And Hadoop Big Data Service

Google Launches Cloud Dataproc, A Managed Spark And Hadoop Big Data Service | EEDSP | Scoop.it
Google is adding another product in its range of big data services on the Google Cloud Platform today. The new Google Cloud Dataproc service sits between..
more...
No comment yet.
Scooped by Shiwon Cho
Scoop.it!

Streaming analytics from the center to the edge

Streaming analytics from the center to the edge | EEDSP | Scoop.it
Stream processing systems are commonly used to process data from edge devices and there is a need to push some of the streaming analytics to the edge to reduce communication costs, react locally an...
more...
No comment yet.
Scooped by Shiwon Cho
Scoop.it!

The 5 V's of Big Data by Bernard Marr

The 5 V's of Big Data by Bernard Marr | EEDSP | Scoop.it
Nice infographics produced by famous business management consultant and author, Bernard Marr. Click on the picture, then click one more time on the picture, to…
more...
No comment yet.
Scooped by Shiwon Cho
Scoop.it!

Yahoo Releases the Largest-ever Machine Learning Dataset for Researchers

Yahoo Releases the Largest-ever Machine Learning Dataset for Researchers | EEDSP | Scoop.it
Yahoo Releases the Largest-ever Machine Learning Dataset for Researchers yahoolabs: “ By Suju Rajan Data is the lifeblood of research in machine learning. However, access to truly large-scale datasets...
more...
No comment yet.
Scooped by Shiwon Cho
Scoop.it!

Introducing The Newly Redesigned Apache HAWQ | Pivotal P.O.V.

Introducing The Newly Redesigned Apache HAWQ | Pivotal P.O.V. | EEDSP | Scoop.it
Pivotal announced that we donated the Pivotal HAWQ core to the Apache Software Foundation (ASF) and it is now an officially incubating project. Apache HAWQ (incubating) is a redesign of HAWQ architecture to enable greater elasticity to meet the requirements of a growing user base. With the addition of YARN support and its acceptance as an Apache project, HAWQ is now more than ever a truly Hadoop Native SQL Engine. This blog is a technical primer for the background and architecture Apache HAWQ.
more...
No comment yet.
Scooped by Shiwon Cho
Scoop.it!

GitHub Special: Data Scientists to Follow & Best Tutorials on GitHub

GitHub Special: Data Scientists to Follow & Best Tutorials on GitHub | EEDSP | Scoop.it
GitHub has some of the most awesome collections of data science resources. This article provides this list and people to follow on GitHub
more...
No comment yet.
Scooped by Shiwon Cho
Scoop.it!

You’re Not a Data Scientist

You’re Not a Data Scientist | EEDSP | Scoop.it
Many of my friends, colleagues and contacts have started calling themselves Data Scientists. A number of resumes have cr…
more...
No comment yet.
Scooped by Shiwon Cho
Scoop.it!

Tachyon Overview - Tachyon 0.6.4 Documentation

Tachyon is a memory-centric distributed storage system enabling reliable data sharing at memory-speed across cluster frameworks, such as Spark and MapReduce. It achieves high performance by leveraging lineage information and using memory aggressively. Tachyon caches working set files in memory, thereby avoiding going to disk to load datasets that are frequently read. This enables different jobs/queries and frameworks to access cached files at memory speed.
more...
No comment yet.
Scooped by Shiwon Cho
Scoop.it!

New in the Wolfram Language: WikipediaData—Wolfram Blog

New in the Wolfram Language: WikipediaData—Wolfram Blog | EEDSP | Scoop.it
Wikipedia integrated service has been added to the latest version of the Wolfram Language. Just feed in content for text processing and visualization.
more...
No comment yet.