 Your new post is loading...
|
Scooped by
Shiwon Cho
|
CMS, G1, Young Gen, New Gen, Old Gen, Eden, and the hundreds of JVM start-up flags... does this all baffle you when trying to tune the garbage collector to get the required throughput and latency from your Java application?
|
Scooped by
Shiwon Cho
|
Apache Hadoop is de facto standard for Big Data storage and batch processing, while Tweeter Storm is quickly becoming a standard for large-scale event processing implementations. Unfortunately, up until recently, Storm and Hadoop required two physically different clusters for their implementation. Last week Yahoo! announced open sourcing Storm running on a Hadoop cluster.
|
Scooped by
Shiwon Cho
|
What are the top 100 (most downloaded) R packages in 2013? Thanks to the recent release of RStudio of their “0-cloud” CRAN log files, we can now answer this question (at least for the months of Jan till May)!
|
Scooped by
Shiwon Cho
|
|
Scooped by
Shiwon Cho
|
Data analysis is only half the battle; getting the data into a Hadoop cluster is the first step in any Big Data deployment. Apache Flume uses an elegant design to make data loading easy and efficient.
|
Scooped by
Shiwon Cho
|
Sometimes I just want to quickly make a simple D3 JavaScript directed network graph with data in R. Because D3 network graphs can be manipulated in the browser–i.e. nodes can be moved aroun...
|
Scooped by
Shiwon Cho
|
MongoDB is a document-based noSQL database. Different from the relational database storing data in tables with rigid schemas, MongoDB stores data in documents with dynamic schemas.
|
Scooped by
Shiwon Cho
|
By Mike O’Brien, 10gen Software engineer and maintainer of Mongo-Hadoop With the release of MongoDB 2.4, it’s now pretty simple to take an existing application that already uses MongoDB and add new...
LivePerson Developers host Byron Ellis, Chief Data Scientist, LivePerson (@fdaapproved) In this meetup, Byron will demonstrate a realtime dashboard for strea...
Via AnalyticsInnovations
|
Scooped by
Shiwon Cho
|
You’re walking home alone on a quiet street. You hear footsteps approaching quickly from behind. It’s nighttime. Your senses scramble to help your brain figure out what to do. You listen for signs of threat or glance backward.
|
Scooped by
Shiwon Cho
|
|
Scooped by
Shiwon Cho
|
Image processing is a computational task that lends itself very well to GPU compute scenarios. In many cases the most commonly used algorithms are inherently massively parallel, with each pixel in the image being processed independently from the others. As a result, image processing toolkits have been early adopters of the new GPGPU programming model.
|
Scooped by
Shiwon Cho
|
Nina Zumel and I ( John Mount ) have been working very hard on producing an exciting new book called “Practical Data Science with R.” The book has now entered Manning Early Access Progr...
|
|
Scooped by
Shiwon Cho
|
Clang 3.3 includes full C++11 support, as well as a suite of run-time checkers to help find bugs in your programs. For more information, check out the release notes for LLVM and for Clang
|
Scooped by
Shiwon Cho
|
I have updated my package solaR. This package provides calculation methods of solar radiation and performance of photovoltaic systems from …Continuar leyendo »
|
Scooped by
Shiwon Cho
|
yhat blog - how to use scikit learn to classify images based on their content
|
Scooped by
Shiwon Cho
|
Installed as a layer above Hadoop, the open-source Pydoop package enables Python scripts to do big data work easily.
|
Scooped by
Shiwon Cho
|
Building R packages is not particular hard, but it can be a bit of a daunting endeavour at the beginning, particularly if you are more of a statistician than a computer scientist or programmer. Som...
|
Scooped by
Shiwon Cho
|
Apache CloudStack is open source software designed to deploy and manage large networks of virtual machines, as a highly available, highly scalable Infrastructure as a Service (IaaS) cloud computing platform. CloudStack is used by a number of service providers to offer public cloud services, and by many companies to provide an on-premises (private) cloud offering, or as part of a hybrid cloud solution. CloudStack is a turnkey solution that includes the entire "stack" of features most organizations want with an IaaS cloud: compute orchestration, Network-as-a-Service, user and account management, a full and open native API, resource accounting, and a first-class User Interface (UI).
|
Scooped by
Shiwon Cho
|
Since R uses the computer RAM, it may handle only rather small sets of data. Nevertheless, there are some packages that allow to treat larger volumes and the best solution is to connect R with a Big Data environment.
|
Scooped by
Shiwon Cho
|
The flow graph feature available in Intel® Threading Building Blocks (Intel® TBB) allows users to easily create both dependence graphs and reactive, messaging passing graphs that execute on top of Intel TBB tasks. Users programmatically create nodes and edges that express the computations performed by their application and the dependencies between these computations.
|
Scooped by
Shiwon Cho
|
Lately I’ve seen quite a few requests for advice from younger programmers, asking questions either directly to me or in public forums about a career decision...
|
Scooped by
Shiwon Cho
|
The idea There has been a growing interest in using MongoDB as an in-memory database, meaning that the data is not stored on disk at all. This can be super useful for applications like: • a...
|
Scooped by
Shiwon Cho
|
What data structure is more sacred than the link list? If we get rid of it what silly interview...
|
Scooped by
Shiwon Cho
|
The R core group has quickly followed up with a patch to R version 3.
|