In this post I showed a visualization of the organizational network of my department. Since several people asked for details how the plot has been produced, I will provide the code and some extensions below. The plot has been done entirely in R (2.14.01) with the help of the igraph package. It is a great…
This post explains about Hive partitioning. static and dynamic partitioning . Addresses how data can be stored into hive if the data /records resides in a single file or in different folders. Also contain tips to insert data as a whole into different partition.
Applied Spatial Data Science with R Introduction I recently started working on my Ph.D dissertation which utilizes a vast amount of different spatial data types. During the process, I discovered that there were a lot of concepts about using...
/* * http://sosal.kr/ * made by so_Sal */ - 이상치 통계에서는 데이터 샘플에서 관찰된 한 값이 다른 관측값과 거리가 있을 때 이상치(outlier)라고 한다. 측정에 있어서 데이터들의 가변성, 변동성(variability) 때문일 수 있고 실제로 잘못된 실험에 의한 에러일 수 있다. 후자의 경우에는 분명히 데이터 분석 이전에 outlier를 제거를 해야한다. 이 포스팅에서는 이상치를 검출하는 알고리즘들을 R프로그래밍의 패키지를..
Topology within mathematics can be characterized as that part of the subject which studies notions of shape. It really consists of at least two separate threads, one in which one attempts to “measure” shape, and in the other in which one attempts to find compressed combinatorial representations of shape and analyze the degree to which these representations are faithful to the shape.
The concept of a graph has been around since the dawn of mechanical computing and for many decades prior in the domain of pure mathematics. Due in large part to this golden age of databases, graphs are becoming increasingly popular in software engineering. Graph databases provide a way to persist and process graph data. However, the graph database is not the only way in which graphs can be stored and analyzed. Graph computing has a history prior to the use of graph databases and has a future that is not necessarily entangled with typical database concerns. There are numerous graph technologies that each have their respective benefits and drawbacks. Leveraging the right technology at the right time is required for effective graph computing. Structure: Modeling Real-World Scenarios with Graphs A graph (or network) is a data structure. It is composed of vertices (dots) and edges (lines). Many real-world scenarios can be modeled as a graph. This is not necessarily inherent to some
Enabling Compression for RCFile and SequenceFile Tables
If your table has partitions, you have to set hive options.
hive> create table TBL_SEQ (int_col int, string_col string) partitioned by (year int) stored as SEQUENCEFILE; hive> SET hive.exec.compress.output=true; hive> SET mapred.max.split.size=256000000; hive> SET mapred.output.compression.type=BLOCK; hive> SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec; hive> SET hive.exec.dynamic.partition.mode=nonstrict; hive> SET hive.exec.dynamic.partition=true; hive> insert overwrite table TBL_SEQ partition(year) select * from TBL;
Search engines are internet encyclopedias that allow us to find and filter out relevant information. With any given search engine, it takes some skill to find exactly what you are looking for. You must understand how the search engine works and how your search queries are interpreted. More advanced search engines will meet you halfway,…
Sharing your scoops to your social media accounts is a must to distribute your curated content. Not only will it drive traffic and leads through your content, but it will help show your expertise with your followers.
How to integrate my topics' content to my website?
Integrating your curated content to your website or blog will allow you to increase your website visitors’ engagement, boost SEO and acquire new visitors. By redirecting your social media traffic to your website, Scoop.it will also help you generate more qualified traffic and leads from your curation work.
Distributing your curated content through a newsletter is a great way to nurture and engage your email subscribers will developing your traffic and visibility.
Creating engaging newsletters with your curated content is really easy.