Bits 'n Pieces on...
Follow
Find tag "Analysis"
713 views | +0 today
Bits 'n Pieces on Big Data R&D
Information and insight into Big Data R&D
Curated by onur savas
Your new post is loading...
Scooped by onur savas
Scoop.it!

Scientific method: Statistical errors

Scientific method: Statistical errors | Bits 'n Pieces on Big Data R&D | Scoop.it
P values, the 'gold standard' of statistical validity, are not as reliable as many scientists assume.
more...
No comment yet.
Rescooped by onur savas from Big Data and NoSQL Daily
Scoop.it!

Apache Spark for Big Analytics

Apache Spark for Big Analytics | Bits 'n Pieces on Big Data R&D | Scoop.it

Via Simon Hunanyan
more...
Simon Hunanyan's curator insight, December 23, 2013 7:09 PM

Spark, an Apache incubator project, is an open source distributed computing framework for advanced analytics in Hadoop. It's 100X faster than what they are able to achieve with MapReduce. Spark includes a machine learning library (MLLib), a graph engine (GraphX), a streaming analytics engine (Spark Streaming) and much more...

Currently, Spark supports programming interfaces for Scala, Java and Python.  The R interface is under development and this is expected to be released in the first half of 2014.

Rescooped by onur savas from Papers
Scoop.it!

Who is Dating Whom: Characterizing User Behaviors of a Large Online Dating Site

Online dating sites have become popular platforms for people to look for potential romantic partners. It is important to understand users' dating preferences in order to make better recommendations on potential dates. The message sending and replying actions of a user are strong indicators for what he/she is looking for in a potential date and reflect the user's actual dating preferences. We study how users' online dating behaviors correlate with various user attributes using a large real-world dateset from a major online dating site in China. Many of our results on user messaging behavior align with notions in social and evolutionary psychology: males tend to look for younger females while females put more emphasis on the socioeconomic status (e.g., income, education level) of a potential date. In addition, we observe that the geographic distance between two users and the photo count of users play an important role in their dating behaviors. Our results show that it is important to differentiate between users' true preferences and random selection. Some user behaviors in choosing attributes in a potential date may largely be a result of random selection. We also find that both males and females are more likely to reply to users whose attributes come closest to the stated preferences of the receivers, and there is significant discrepancy between a user's stated dating preference and his/her actual online dating behavior. These results can provide valuable guidelines to the design of a recommendation engine for potential dates.

 

Who is Dating Whom: Characterizing User Behaviors of a Large Online Dating Site
Peng Xia, Kun Tu, Bruno Ribeiro, Hua Jiang, Xiaodong Wang, Cindy Chen, Benyuan Liu, Don Towsley

http://arxiv.org/abs/1401.5710


Via Complexity Digest
more...
No comment yet.