EEDSP
Follow
Find
15.0K views | +8 today
 
Scooped by Shiwon Cho
onto EEDSP
Scoop.it!

High Scalability - High Scalability - MongoDB and GridFS for Inter and Intra Datacenter Data Replication 

High Scalability - High Scalability - MongoDB and GridFS for Inter and Intra Datacenter Data Replication  | EEDSP | Scoop.it

An inevitable part of disaster recovery planning is making sure customer data exists in multiple locations.  In the case of LogicMonitor, a SaaS-based monitoring solution for physical, virtual, and cloud environments, we wanted copies of customer data files both within a data center and outside of it.  The former was to protect against the loss of individual servers within a facility, and the latter for recovery in the event of the complete loss of a data center.

more...
No comment yet.
EEDSP
Digital Signal Processing, Data Analytics, Big Data, HPC, Deep Learning, GPGPU, Distributed and Parallel Computing
Curated by Shiwon Cho
Your new post is loading...
Your new post is loading...
Scooped by Shiwon Cho
Scoop.it!

Using Apache Spark for Massively Parallel NLP at TripAdvisor

We solve this problem using a semi-supervised form of logistic regression. A large portion of the model consists of “bag of words” type features from user submitted reviews on the properties. Since it is a semi-supervised technique, not only do we use the reviews on locations that we have tag votes on during training, we also use a large chunk of unlabeled data. Also, when applying the model to get the end results, we need to read and process all our reviews. On top of that, we have hundreds of different tags.

more...
No comment yet.
Scooped by Shiwon Cho
Scoop.it!

Spark Analytics with SynerScope: On YARN and HDP

Spark Analytics with SynerScope: On YARN and HDP | EEDSP | Scoop.it
Learn about exploratory analytics using Spark on YARN and HDP and Synerscope. Jorik Blaas, chief technical officer at SynerScope, explores a use case.
more...
Scooped by Shiwon Cho
Scoop.it!

Essentials of Machine Learning Algorithms (with Python and R Codes)

Essentials of Machine Learning Algorithms (with Python and R Codes) | EEDSP | Scoop.it
this article displays the list of machine learning algorithms such as linear, logistic regression, kmeans, decision trees along with Python R code
more...
No comment yet.
Scooped by Shiwon Cho
Scoop.it!

Labellio: Scalable Cloud Architecture for Efficient Multi-GPU Deep Learning

Labellio: Scalable Cloud Architecture for Efficient Multi-GPU Deep Learning | EEDSP | Scoop.it
Labellio is the world’s easiest deep learning web service for computer vision. It aims to provide a deep learning environment for image data where non-experts in deep learning can experiment with t...
more...
No comment yet.
Scooped by Shiwon Cho
Scoop.it!

Fighting spam with Haskell

Fighting spam with Haskell | EEDSP | Scoop.it
We recently completed a two-year redesign of Sigma, one of our spam-fighting systems. Check out how we integrated Haskell with our existing C++ code and the improvements we made to GHC.
more...
No comment yet.
Scooped by Shiwon Cho
Scoop.it!

Apache Kafka, Samza, and the Unix Philosophy of Distributed Data

Apache Kafka, Samza, and the Unix Philosophy of Distributed Data | EEDSP | Scoop.it
There are interesting similarities between the design of Apache Kafka and Unix pipes. In this post we explore how this design allows us to build large, scalable applications by composing small stream processing tools.
more...
No comment yet.
Scooped by Shiwon Cho
Scoop.it!

The New York Times built a robot to help make article tagging easier

The New York Times built a robot to help make article tagging easier | EEDSP | Scoop.it
Developed by the Times R&D lab, the Editor tool scans text to suggest article tags in real time. But the automatic tagging system won't be moving into the newsroom soon.
more...
No comment yet.
Scooped by Shiwon Cho
Scoop.it!

Diving into Spark Streaming’s Execution Model

Diving into Spark Streaming’s Execution Model | EEDSP | Scoop.it
In this post, we outline Spark Streaming’s architecture and explain how it provides the above benefits. We also discuss some of the interesting ongoing work in the project that leverages the execution model.
more...
No comment yet.
Scooped by Shiwon Cho
Scoop.it!

Intel and Micron Produce Breakthrough Memory Technology

Intel and Micron Produce Breakthrough Memory Technology | EEDSP | Scoop.it
New Class of Memory Unleashes the Performance of PCs, Data Centers and More NEWS HIGHLIGHTS Intel and Micron begin production on new class of non-volatile memory, creating the first new memory category in more than 25 years.New 3D XPoint™ technology brings non-volatile memory speeds up to 1,000
more...
No comment yet.
Scooped by Shiwon Cho
Scoop.it!

Benchmarks: Intel Xeon Phi vs. NVIDIA Tesla GPU

Benchmarks: Intel Xeon Phi vs. NVIDIA Tesla GPU | EEDSP | Scoop.it

We’ve seen that there is one processor that needs to be added to the picture — the commodity multi-core CPU. This is already a part of many server configurations, and for some applications, e.g., Monte-Carlo pricing of American options, it can give better or comparable performance than an accelerator processor when optimized correctly. Between NVIDIA’s Kepler GPUs and Xeon Phi, the GPU wins for both of our test applications.

more...
No comment yet.
Scooped by Shiwon Cho
Scoop.it!

What's New in RNeo4j - Neo4j Graph Database

What's New in RNeo4j - Neo4j Graph Database | EEDSP | Scoop.it
Written by Nicole White What’s New in RNeo4j? RNeo4j is Neo4j’s R driver – it allows you to quickly and easily interact with a Neo4j database from your R environment. Some recent updates to RNeo4j include: My contributions Functionality for… Learn More »
more...
No comment yet.
Scooped by Shiwon Cho
Scoop.it!

Introduction to Neural Machine Translation with GPUs (part 3)

Introduction to Neural Machine Translation with GPUs (part 3) | EEDSP | Scoop.it
Note: This is the final part of a detailed three-part series on machine translation with neural networks by Kyunghyun Cho. You may enjoy part 1 and part 2. In the previous post in this series, I in...
more...
No comment yet.
Scooped by Shiwon Cho
Scoop.it!

Deeplearning4j - Open-source, distributed deep learning for the JVM

Deeplearning4j - Open-source, distributed deep learning for the JVM | EEDSP | Scoop.it
Open-Source Deep-Learning Software for Java and Scala on Hadoop and Spark
more...
No comment yet.
Scooped by Shiwon Cho
Scoop.it!

Graph Databases for Beginners: Data Modeling Pitfalls to Avoid - Neo4j Graph Database

Graph Databases for Beginners: Data Modeling Pitfalls to Avoid - Neo4j Graph Database | EEDSP | Scoop.it
Discover how to avoid these data modeling mistakes and mess-ups made most often by beginners – especially when you’re using a graph database.
more...
No comment yet.
Scooped by Shiwon Cho
Scoop.it!

High-Performance MATLAB with GPU Acceleration

High-Performance MATLAB with GPU Acceleration | EEDSP | Scoop.it
In this post, I will discuss techniques you can use to maximize the performance of your GPU-accelerated MATLAB® code. First I explain how to write MATLAB code which is inherently parallelizable. Th...
more...
No comment yet.
Scooped by Shiwon Cho
Scoop.it!

Get Ready for the Future — JavaScript Scene

Get Ready for the Future - JavaScript Scene - Medium
A high-tech video time capsule from my future self
more...
No comment yet.
Scooped by Shiwon Cho
Scoop.it!

Elastically adding and removing nodes using Akka cluster - Glassbeam

Elastically adding and removing nodes using Akka cluster - Glassbeam | EEDSP | Scoop.it
akka cluster. elastically adding and removing nodes. pull-based master and worker architecture. pull-based master/worker architecture. Add node dynamically.
more...
No comment yet.
Scooped by Shiwon Cho
Scoop.it!

The LMAX Architecture

The LMAX Architecture | EEDSP | Scoop.it
LMAX is a retail financial trading system that can handle that can 6 million orders per second on a single JVM thread. The business logic runs in-memory surrounded by disruptors using event sourcing.
more...
No comment yet.
Scooped by Shiwon Cho
Scoop.it!

Streaming Petabytes of Data in Realtime with Kinesis

Streaming Petabytes of Data in Realtime with Kinesis | EEDSP | Scoop.it
By migrating AdRoll's real-time data pipeline to Kinesis we were able to reduce our end to end latency more than one hundredfold while simultaneously cutting costs and improving system stability. Here we'll follow architectural decisions, implementation details, and overall learnings from this process.
more...
No comment yet.
Scooped by Shiwon Cho
Scoop.it!

Improving Facebook's performance on Android with FlatBuffers

Improving Facebook's performance on Android with FlatBuffers | EEDSP | Scoop.it
In last six months, we have transitioned most of Facebook on Android to use FlatBuffers as the storage format.
more...
No comment yet.
Scooped by Shiwon Cho
Scoop.it!

New Features in Machine Learning Pipelines in Spark 1.4

New Features in Machine Learning Pipelines in Spark 1.4 | EEDSP | Scoop.it

A big part of any ML workflow is massaging the data into the right features for use in downstream processing.  To simply feature extraction, Spark provides many feature transformers out-of-the-box.  The table below outlines most of the feature transformers available in Spark 1.4 along with descriptions of each one. Much of the API is inspired by scikit-learn; for reference, we provide names of similar scikit-learn transformers where available.

more...
No comment yet.
Scooped by Shiwon Cho
Scoop.it!

Installing and Starting SparkR Locally on Windows OS and RStudio

Installing and Starting SparkR Locally on Windows OS and RStudio | EEDSP | Scoop.it
Introduction

With the recent release of Apache Spark 1.4.1 on July 15th, 2015, I wanted to write a step-by-step guide to help new users get up and running with SparkR locally on a Windows machine using command shell and RStudio. SparkR provides an R frontend to Apache Spark and using Spark’s distributed computation engine allows R-Users to run large scale data analysis from the R shell. The steps listed here are also documented in my online book title “Getting Started with SparkR for Big Data Analysis” which can be accessed at: http://www.danielemaasit.com/getting-started-with-sparkr/. These steps will get you up and running in less than 5 mins.
more...
No comment yet.
Scooped by Shiwon Cho
Scoop.it!

A Visual Introduction to Machine Learning

A Visual Introduction to Machine Learning | EEDSP | Scoop.it
A Visual Introduction to Machine Learning

In machine learning, computers apply statistical learning techniques to automatically identify patterns in data. These techniques can be used to make highly accurate predictions.

 
more...
No comment yet.