Public Datasets -...
Follow
Find tag "hadoop"
8.6K views | +2 today
Public Datasets - Open Data -
Your new post is loading...
Your new post is loading...
Rescooped by luiy from Digital Relationship and Value Generation
Scoop.it!

7 ways #BigData could revolutionize life | #health #hadoop #algorithms

7 ways #BigData could revolutionize life | #health #hadoop #algorithms | Public Datasets - Open Data - | Scoop.it
7 ways Big Data could revolutionize life by 2020 #infographic | See more about big data.

Via C. CHAMBET-FALQUET
more...
No comment yet.
Scooped by luiy
Scoop.it!

Apache #Mahout: Scalable #MachineLearning and #DataMining I #bigdata

Apache #Mahout: Scalable #MachineLearning and #DataMining I #bigdata | Public Datasets - Open Data - | Scoop.it
luiy's insight:
Mahout currently hasCollaborative FilteringUser and Item based recommendersK-Means, Fuzzy K-Means clusteringMean Shift clusteringDirichlet process clusteringLatent Dirichlet AllocationSingular value decompositionParallel Frequent Pattern miningComplementary Naive Bayes classifierRandom forest decision tree based classifierHigh performance java collections (previously colt collections)A vibrant communityand many more cool stuff to come by this summer thanks to Google summer of code
more...
No comment yet.
Scooped by luiy
Scoop.it!

X-RIME: #Hadoop based large scale social network analysis | #bigdata #SNA

luiy's insight:

Today's Internet-based social network sites possess huge user communities. They hold large amount of data about their users and want to generate core competency from the data. A key enabler for this is a cost efficient solution for social data management and social network analysis (SNA). 

Such a solution faces a few challenges. The most important one is that the solution should be able to handle massive and heterogeneous data sets. Facing this challenge, the traditional data warehouse based solutions are usually not cost efficient enough. On the other hand, existing SNA tools are mostly used in single workstation mode, and not scalable enough. To this end, low cost and highly scalable data management and processing technologies from cloud computing society should be brought in to help. 

However, most of existing cloud based data analysis solutions are trying to provide SQL-like general purpose query languages, and do not directly support social network analysis. This makes them hard to optimize and hard to use for SNA users. So, we came up with X-RIME to fix this gap. 

So, briefly speaking, X-RIME wants to provide a few value-added layers on top of existing cloud infrastructure, to support smart decision loops based on massive data sets and SNA. To end users, X-RIME is a library consists of Map-Reduce programs, which are used to do raw data pre-processing, transformation, SNA metrics and structures calculation, and graph / network visualization. The library could be integrated with other Hadoop based data warehouses (e.g., HIVE) to build more comprehensive solutions.

more...
No comment yet.
Rescooped by luiy from Big Data Technology, Semantics and Analytics
Scoop.it!

Updated Database Landscape map – June 2013 — Too much information

Updated Database Landscape map – June 2013 — Too much information | Public Datasets - Open Data - | Scoop.it

Via Tony Agresta
more...
Tony Agresta's curator insight, June 10, 2013 12:47 PM


MarkLogic is uniquely positioned on this database landscape map.    Here's what makes the position very different from other vendors:


1.  Search - MarkLogic is directly connected to all the major enterprise search vendors.   Recent recognition of this was confirmed by Gartner in its Enterprise Search Magic Quadrant.   Notice that other NoSQL technologies are nowhere close to this connection point.


2.  General Purpose - MarkLogic provides an enterprise NoSQL database and Application Services and Search.   With support for many development languages, REST and JAVA APIs, MarkLogic has clear links to SAP, Enterprise DB and a host of other database providers.


3.   Graph and Document - MarkLogic has long been recognized as a document store and used widely all over the world for this purpose.  Notice the subtle connection to Graph as well connecting MarkLogic to other vendors in this space like Neo4J.  MarkLogic 7 promises to deliver a world class triple store to index subjects, predicates and objects in XML documents or load other triples through the MarkLogic Content Pump.  For the first time, the only Enterprise NoSQL technology with search will include semantics.  Updated APIs and support for SPARQL are part of this release.


4.  Big Tables - MarkLogic's ability to handle big data has long been known.  The olive green line is designated for vendors like MarkLogic, Cassandra, HBASE, etc.   MarkLogic's partnership with Intel for Distribution of Apache Hadoop and the fact that MarkLogic ships with Hadoop connectors provide additional confirmation for this position.


5.   Key Value Stores - Data can be stored as keys without a database schema required lby relational databases.  In MarkLogic's case, huge quantities of data can be indexed in real time with data stored in memory and disk making search results instant and complete.   After a recent analysis of over 50+ MarkLogic customers, the abilty to quickly get up and running and deliver new information products to market was a business driver they mentioned over and over again.


The fact is, no one else on the list has all of these qualities.   Because of this unique position, visually you see MarkLogic distanced from other clusters or long lists of technology vendors.  


To learn more, you can go to MarkLogic Resources.






Henry Pan's curator insight, June 11, 2013 10:54 AM

Cool