Hadoop_Technology
700 views | +0 today
Follow
Your new post is loading...
Your new post is loading...
Scooped by Miyul Park
Scoop.it!

The New Data Engineering Ecosystem: Trends and Rising Stars

Miyul Park's insight:

http://insightdataengineering.com/blog/pipeline_map.html

 

more...
No comment yet.
Scooped by Miyul Park
Scoop.it!

Network visualization in R with the igraph package

Network visualization in R with the igraph package | Hadoop_Technology | Scoop.it
In this post I showed a visualization of the organizational network of my department. Since several people asked for details how the plot has been produced, I will provide the code and some extensions below. The plot has been done entirely in R (2.14.01) with the help of the igraph package. It is a great…
more...
No comment yet.
Scooped by Miyul Park
Scoop.it!

An Example of Social Network Analysis with R using Package igraph

Miyul Park's insight:

Social Network Analysis in R

more...
No comment yet.
Scooped by Miyul Park
Scoop.it!

Here is Something !: Hive Partitioning

This post explains about Hive partitioning. static and dynamic partitioning . Addresses how data can be stored into hive if the data /records resides in a single file or in different folders. Also contain tips to insert data as a whole into different partition.
Miyul Park's insight:

HQL for partitioning a table in Hive

more...
No comment yet.
Scooped by Miyul Park
Scoop.it!

MapReduce On Hive Tables Using HCatalog - DZone Big Data

MapReduce On Hive Tables Using HCatalog - DZone Big Data | Hadoop_Technology | Scoop.it
In my last post Introduction To Hive's Partitioning I described how we can load csv data to a partitioned hive table. Today we shall see how we can use HCatalog...
Miyul Park's insight:

Through HCatalog, MapReduce can access data in Hive

more...
No comment yet.
Scooped by Miyul Park
Scoop.it!

빅데이터 공급망 구축 방법

빅데이터 공급망 구축 방법 | Hadoop_Technology | Scoop.it
빅데이터(Big Data)가 커질수록 실행에 옮길 수 있는 비즈니스적 통찰력을 얻기 위한 관리와 분석이 더욱 힘들어진다. 빅데이터의 주된 장점이 계산 지향적인 거대 데이터 분석에 기초해 더 나은 비즈니스 의사 결정을 내릴 수 있도록 하는 것임을 감안할 때 다소 반어적으로 볼 수 있다.

이에 대한 해결책은 처음부터 비즈니스 목표를 확인해 공급망을 구축하고 이런 목표를 달성하는데 필요한 민첩한 인프라(Infrastructure)를 배치하는 것이다.
more...
No comment yet.
Scooped by Miyul Park
Scoop.it!

Paper: Parallel Graph Partitioning for Complex Networks

Paper: Parallel Graph Partitioning for Complex Networks | Hadoop_Technology | Scoop.it
Authored by a team from Karlsruhe Institute of Technology, the paper “Parallel graph partitioning for complex networks” presents a parallelized and adapting label propagation technique for...
more...
No comment yet.
Scooped by Miyul Park
Scoop.it!

How to Understand Keywords in Searcher Context

How to Understand Keywords in Searcher Context | Hadoop_Technology | Scoop.it
Is it possible for SEO professionals to understand searcher context based purely on keyword research data?
more...
No comment yet.
Scooped by Miyul Park
Scoop.it!

Scientists may soon know when you're lying on Twitter

Scientists may soon know when you're lying on Twitter | Hadoop_Technology | Scoop.it
Here's a word of warning to any would-be Orson Welles who hopes to use social media to create War of the Worlds-like panic. A social media lie detector is in the works. A multi-national team of sci...
Miyul Park's insight:

"Pheme" will be released soon as an open-source

more...
No comment yet.
Scooped by Miyul Park
Scoop.it!

Damn Cool Algorithms: Spatial indexing with Quadtrees and Hilbert Curves - Nick's Blog

Damn Cool Algorithms: Spatial indexing with Quadtrees and Hilbert Curves - Nick's Blog | Hadoop_Technology | Scoop.it
Miyul Park's insight:
How to index geographic data using geohash, quad trees and hilbert curves
more...
No comment yet.
Scooped by Miyul Park
Scoop.it!

Analyzing 1.1 Billion NYC Taxi and Uber Trips, with a Vengeance

Analyzing 1.1 Billion NYC Taxi and Uber Trips, with a Vengeance | Hadoop_Technology | Scoop.it
The New York City Taxi & Limousine Commission has released a staggeringly detailed historical dataset covering over 1.1 billion individual taxi trips …
more...
No comment yet.
Scooped by Miyul Park
Scoop.it!

Applied Spatial Data Science with R

Applied Spatial Data Science with R | Hadoop_Technology | Scoop.it
Applied Spatial Data Science with R Introduction I recently started working on my Ph.D dissertation which utilizes a vast amount of different spatial data types. During the process, I discovered that there were a lot of concepts about using...
more...
No comment yet.
Scooped by Miyul Park
Scoop.it!

R을 이용한 데이터 이상치 검출법 정리

R을 이용한 데이터 이상치 검출법 정리 | Hadoop_Technology | Scoop.it
/* * http://sosal.kr/ * made by so_Sal */ - 이상치 통계에서는 데이터 샘플에서 관찰된 한 값이 다른 관측값과 거리가 있을 때 이상치(outlier)라고 한다. 측정에 있어서 데이터들의 가변성, 변동성(variability) 때문일 수 있고 실제로 잘못된 실험에 의한 에러일 수 있다. 후자의 경우에는 분명히 데이터 분석 이전에 outlier를 제거를 해야한다. 이 포스팅에서는 이상치를 검출하는 알고리즘들을 R프로그래밍의 패키지를..
more...
No comment yet.
Scooped by Miyul Park
Scoop.it!

Tip: Using Joins in Hive - Safari Blog

Tip: Using Joins in Hive - Safari Blog | Hadoop_Technology | Scoop.it
Hive, like other SQL databases, allows users to join various tables. Joins; however, can be computationally expensive, especially on big tables, and that's
Miyul Park's insight:

Tips For Joins in Hive

1) Largest Table Last

 Use the hint for streaming /*+ STREAMTABLE(t1) */

2) Make use of map joins where possible

 Use the hint for mapjoin /*+ MAPJOIN(t1) */

more...
No comment yet.
Scooped by Miyul Park
Scoop.it!

Why Topological Data Analysis? | Ayasdi Blog

Why Topological Data Analysis? | Ayasdi Blog | Hadoop_Technology | Scoop.it
Topology within mathematics can be characterized as that part of the subject which studies notions of shape. It really consists of at least two separate threads, one in which one attempts to “measure” shape, and in the other in which one attempts to find compressed combinatorial representations of shape and analyze the degree to which these representations are faithful to the shape.
more...
No comment yet.
Scooped by Miyul Park
Scoop.it!

On Graph Computing | Java Code Geeks

On Graph Computing | Java Code Geeks | Hadoop_Technology | Scoop.it
The concept of a graph has been around since the dawn of mechanical computing and for many decades prior in the domain of pure mathematics. Due in large part to this golden age of databases, graphs are becoming increasingly popular in software engineering. Graph databases provide a way to persist and process graph data. However, the graph database is not the only way in which graphs can be stored and analyzed. Graph computing has a history prior to the use of graph databases and has a future that is not necessarily entangled with typical database concerns. There are numerous graph technologies that each have their respective benefits and drawbacks. Leveraging the right technology at the right time is required for effective graph computing. Structure: Modeling Real-World Scenarios with Graphs A graph (or network) is a data structure. It is composed of vertices (dots) and edges (lines). Many real-world scenarios can be modeled as a graph. This is not necessarily inherent to some
more...
No comment yet.
Scooped by Miyul Park
Scoop.it!

Using RCFile, SequenceFile, or Text Files

Enabling Compression for RCFile and SequenceFile Tables

If your table has partitions, you have to set hive options.

 

hive> create table TBL_SEQ (int_col int, string_col string) partitioned by (year int) stored as SEQUENCEFILE; hive> SET hive.exec.compress.output=true; hive> SET mapred.max.split.size=256000000; hive> SET mapred.output.compression.type=BLOCK; hive> SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec; hive> SET hive.exec.dynamic.partition.mode=nonstrict; hive> SET hive.exec.dynamic.partition=true; hive> insert overwrite table TBL_SEQ partition(year) select * from TBL;
more...
No comment yet.
Scooped by Miyul Park
Scoop.it!

The 5 Most Advanced Search Engines On The Web

The 5 Most Advanced Search Engines On The Web | Hadoop_Technology | Scoop.it
Search engines are internet encyclopedias that allow us to find and filter out relevant information. With any given search engine, it takes some skill to find exactly what you are looking for. You must understand how the search engine works and how your search queries are interpreted. More advanced search engines will meet you halfway,…
more...
No comment yet.