Public Datasets - Open Data -
9.5K views | +1 today
Follow
Public Datasets - Open Data -
Your new post is loading...
Your new post is loading...
Scooped by luiy
Scoop.it!

Collecting #Twitter Data: Storing Tweets in #MongoDB | #bigdata #NoSQL

luiy's insight:

In the first three sections of the Twitter data collection tutorial, I demonstrated how to collect tweets using both R and Python and how to store these tweets first as JSON files then having R parse them into a .csv file. The .csv file works well, but tweets don’t always make good flat .csv files, since not every tweet contains the same fields or the same structure. Some of the data is well nested into the JSON object. It is possible to write a parser that has a field for each possible subfield, but this might take a while to write and will create a rather large .csv file or SQL database.

more...
No comment yet.
Rescooped by luiy from Big Data Technology, Semantics and Analytics
Scoop.it!

Visa Says Big Data Identifies Billions of Dollars in Fraud

Visa Says Big Data Identifies Billions of Dollars in Fraud | Public Datasets - Open Data - | Scoop.it
Visa’s chief enterprise risk officer, Ellen Richey, says “you see the criminal capability evolving on the technology side.” She gives CIO Journal an inside look at how the company has used Big Data to make its network more secure...

Via Tony Agresta
luiy's insight:

“From the strategic point of view, we are achieving an amazing improvement, year over year, in our ability to detect fraud,” says Richey. “It’s not just our ability to analyze our transactions, but our ability to add new kinds of data, such as geo-location, to that analysis. With every new type of data, we increase the accuracy of our models. And from a strategic point of view we can think about taking and additional step change of fraud out of our system.”

In the future, Big Data will play a bigger role in authenticating users, reducing the need for the system to ask users for multiple proofs of their identify, according to Richey, and 90% or more of transactions will be processed without asking customers those extra questions, because algorithms that analyze their behavior and the context of the transaction will dispel doubts. “Data and authentication will come together,” Richey said.

The data-driven improvement in security accomplishes two strategic goals at once, according to Richey. It improves security itself, and it increases trust in the brand, which is critical for the growth and well-being of the business, because consumers won’t put up with a lot of credit-card fraud. “To my mind, that is the importance of the security improvements we are seeing,” she said. “Our investments in data and analysis are baseline to our ability to thrive and grow as a company.”

more...
Tony Agresta's curator insight, April 25, 2013 2:21 PM



The approach Visa takes in identifying fraud is grounded in 16 different predictive models and allows for new independent variables to be added to the model.  This improves accuracy while alowing the models to be kept up to date.  Here's an excerpt from the WJS Article:

 

"The new analytic engine can study as many as 500 aspects of a transaction at once. That’s a sharp improvement from 2005, when the company’s previous analytic engine could study only 40 aspects at once. And instead of using just one analytic model, as it did in 2005, Visa now operates 16 models, covering different segments of its market, such as geographic regions."

 

The article also states that the analytics engine has the card number and not the personal information about the transaction - likley stored in a different system.  I wonder if Visa, at some point in the process, also takes the fraud transactions and analyzes them visually to identify connections and linkages based on address, other geographic identifiers, 3rd party data, employer data and more?  Are two or more of the fraud cases in some way connected?  Does this represent a ring of activity presening higher risk to merchants, customers and Visa?

 

The tools on the market to do this work are expanding.   The data used to analyze this activity (including unstructured data) is being stored in databases that allow for the visual analysis of big data.  Graph databases replete with underlying intelligence extracted from text that identify people, places and events can be used to extend the type of analysis that Visa is doing and prioritize investigations.   Through more efficient allocation of investigation resources, fraud prevention can jump to a higher level.


Rescooped by luiy from Big Data Technology, Semantics and Analytics
Scoop.it!

Updated Database Landscape map – June 2013 — Too much information

Updated Database Landscape map – June 2013 — Too much information | Public Datasets - Open Data - | Scoop.it

Via Tony Agresta
more...
Tony Agresta's curator insight, June 10, 2013 12:47 PM


MarkLogic is uniquely positioned on this database landscape map.    Here's what makes the position very different from other vendors:


1.  Search - MarkLogic is directly connected to all the major enterprise search vendors.   Recent recognition of this was confirmed by Gartner in its Enterprise Search Magic Quadrant.   Notice that other NoSQL technologies are nowhere close to this connection point.


2.  General Purpose - MarkLogic provides an enterprise NoSQL database and Application Services and Search.   With support for many development languages, REST and JAVA APIs, MarkLogic has clear links to SAP, Enterprise DB and a host of other database providers.


3.   Graph and Document - MarkLogic has long been recognized as a document store and used widely all over the world for this purpose.  Notice the subtle connection to Graph as well connecting MarkLogic to other vendors in this space like Neo4J.  MarkLogic 7 promises to deliver a world class triple store to index subjects, predicates and objects in XML documents or load other triples through the MarkLogic Content Pump.  For the first time, the only Enterprise NoSQL technology with search will include semantics.  Updated APIs and support for SPARQL are part of this release.


4.  Big Tables - MarkLogic's ability to handle big data has long been known.  The olive green line is designated for vendors like MarkLogic, Cassandra, HBASE, etc.   MarkLogic's partnership with Intel for Distribution of Apache Hadoop and the fact that MarkLogic ships with Hadoop connectors provide additional confirmation for this position.


5.   Key Value Stores - Data can be stored as keys without a database schema required lby relational databases.  In MarkLogic's case, huge quantities of data can be indexed in real time with data stored in memory and disk making search results instant and complete.   After a recent analysis of over 50+ MarkLogic customers, the abilty to quickly get up and running and deliver new information products to market was a business driver they mentioned over and over again.


The fact is, no one else on the list has all of these qualities.   Because of this unique position, visually you see MarkLogic distanced from other clusters or long lists of technology vendors.  


To learn more, you can go to MarkLogic Resources.






Henry Pan's curator insight, June 11, 2013 10:54 AM

Cool