Innovation
151 views | +0 today
Your new post is loading...
Your new post is loading...
Scooped by Jörn Franke
Scoop.it!

Collaborative Data Science: About Storing, Reusing, Composing and Deploying Machine Learning Models

Collaborative Data Science: About Storing, Reusing, Composing and Deploying Machine Learning Models | Innovation | Scoop.it
Why is this important? Machine Learning has re-emerged in recent years as new Big Data platforms provide means to use them with more data, make them more complex as well as allowing combining several models to make an even more intelligent predictive/prescriptive analysis. This requires storing as well as exchaning machine learning models to enable…
more...
No comment yet.
Scooped by Jörn Franke
Scoop.it!

Big Data Analytics on Excel files using Hadoop/Hive/Flink/Spark –

Big Data Analytics on Excel files using Hadoop/Hive/Flink/Spark – | Innovation | Scoop.it
Today we have released HadoopOffice v1.1.0 with major enhancements: Based on the latest Apache POI 3.1.7 Apache Hive: Query Excel files and write tables to Excel files using the Hive Serde Apache Flink support for Flink Table API and Flink DataSource/DataSink Signing and verification of signatures of Excel files Example to use the HadoopOffice library…
more...
No comment yet.
Scooped by Jörn Franke
Scoop.it!

HadoopOffice – A Vision for the coming Years

HadoopOffice – A Vision for the coming Years | Innovation | Scoop.it
HadoopOffice is already since more than a year available (first commit: 16.10.2016). Currently it supports Excel formats based on the Apache POI parsers/writers. Meanwhile a lot of functionality has been added, such as: Support for .xlsx and .xls formats – reading and writing Encryption/Decryption Support Support for Hadoop mapred.* and mapreduce.* APIs Support for Spark…
more...
No comment yet.
Scooped by Jörn Franke
Scoop.it!

Ethereum & Analytics: Explore the blockchain using Hadoop, Hive, Flink and Spark

Ethereum & Analytics: Explore the blockchain using Hadoop, Hive, Flink and Spark | Innovation | Scoop.it
HadoopCryptoLedger release 1.1.0 added support for another well-known cryptocurrency: Ethereum and its Altcoins. Of course similar to its Bitcoin & Altcoin support you can use the library with many different frameworks in the Hadoop ecosystem: Hadoop MR Apache Hive Apache Flink Apache Spark and Apache Spark Datasource API Furthermore, you can use it with various…
more...
No comment yet.
Scooped by Jörn Franke
Scoop.it!

Big Data Analytics on Bitcoin‘s first Altcoin: NameCoin

Big Data Analytics on Bitcoin‘s first Altcoin: NameCoin | Innovation | Scoop.it
This blog post is about analyzing the Namecoin Blockchain using different Big Data technologies based on the HadoopCryptoLedger library. Currently, this library enables you to analyze the Bitcoin blockchain and Altcoins based on Bitcoin (incl. segregated witness), such as Namecoin, Litecoin, Zcash etc., on Big Data platforms, such as Hadoop, Hive, Flink and Spark. In…
more...
No comment yet.
Scooped by Jörn Franke
Scoop.it!

Templates, low footprint mode, improved integration with Spark for the HadoopOffice library for reading/writing Excel files on Big data platforms

Templates, low footprint mode, improved integration with Spark for the HadoopOffice library for reading/writing Excel files on Big data platforms | Innovation | Scoop.it
Although it seems to be that it was only a small improvement, version 1.0.4 of the HadoopOffice library has a lot of new features for reading/writing Excel files: Templates, so you can define complex documents with diagrams or other features in MSExcel and fill it with data or formulas from your Big Data platform in…
more...
No comment yet.
Scooped by Jörn Franke
Scoop.it!

Using Apache Spark to Analyze the Bitcoin Blockchain

Using Apache Spark to Analyze the Bitcoin Blockchain | Innovation | Scoop.it
The hadoopcryptoledger library provides now a simple example how you can analyze the Bitcoin Blockchain with Apache Spark. Previously, I described how you can use Hadoop MR or any other Hadoop ecosystem-compatible application to analyze it. Basically, it leverages the HadoopRDD API to read the Hadoop File Format of the hadoopcryptoledger library. Afterwards you can…
more...
No comment yet.
Scooped by Jörn Franke
Scoop.it!

Spark+Scala+Graphx: Analyzing the Bitcoin Transaction Graph

Spark+Scala+Graphx: Analyzing the Bitcoin Transaction Graph | Innovation | Scoop.it
The hadoopcryptoledger library provides now an example how you can generate a Bitcoin Transaction Graph using the Big Data graph analysis technologies Spark+Scala+Graphx. Basically it demonstrates how to read the Bitcoin Blockchain from HDFS, transform it into a graph with Bitcoin addresses as vertices and transactions between them as edges. The example returns the 5…
more...
No comment yet.
Scooped by Jörn Franke
Scoop.it!

Lambda, Kappa, Microservice and Enterprise Architecture for Big Data

Lambda, Kappa, Microservice and Enterprise Architecture for Big Data | Innovation | Scoop.it
A few years after the emergence of the Lambda-Architecture several new architectures for Big Data have emerged. I will present and illustrate their use case scenarios. These architectures describe IT architectures, but I will describe towards the end of this blog the corresponding Enterprise Architecture artefacts, which are sometimes referred to as Zeta architecture. Lambda…
more...
No comment yet.
Scooped by Jörn Franke
Scoop.it!

Hive Optimizations with Indexes, Bloom-Filters and Statistics

Hive Optimizations with Indexes, Bloom-Filters and Statistics | Innovation | Scoop.it
This blog post describes how Storage Indexes, Bitmap Indexes, Compact Indexes, Aggregate Indexes, Covering Indexes/Materialized Views, Bloom-Filters and statistics can increase performance with Apa...
Jörn Franke's insight:

Improve performance of your Big Data warehouse in Hive on hadoop

more...
No comment yet.
Scooped by Jörn Franke
Scoop.it!

Instead of Fighting Bitcoin, the US Could Make Its Own Digital Currency - Wired

Instead of Fighting Bitcoin, the US Could Make Its Own Digital Currency - Wired | Innovation | Scoop.it
A Georgetown University professor thinks that the US Government should issue its own crypto currency.
Jörn Franke's insight:

this should be done in Europe by the European Central Bank as well

more...
No comment yet.
Scooped by Jörn Franke
Scoop.it!

Enabling WebRTC in modern Java Enterprise Web Applications

Enabling WebRTC in modern Java Enterprise Web Applications | Innovation | Scoop.it
I recently started a small project to create a sample enterprise Big Data web application using Spring. You can find the source code here and a demonstration here. One feature in this application W...
Jörn Franke's insight:

Create your own Java Enterprise Applications leveraging W3C WebRTC video/voice and data functionality in the browser without plugins

more...
No comment yet.
Scooped by Jörn Franke
Scoop.it!

clouddatalab2.pdf - Google Drive

Jörn Franke's insight:

create a #bigdata #lab in the #cloud on #Amazon #emr for your #datascientists #using a browser for R, Hadoop, SparkR

more...
No comment yet.
Scooped by Jörn Franke
Scoop.it!

Automated Machine Learning (AutoML) and Big Data Platforms

Automated Machine Learning (AutoML) and Big Data Platforms | Innovation | Scoop.it
Although machine learning exists already since decades, the typical data scientist – as you would call it today – would still have to go through a manual labor-intensive process of extracting the data, cleaning, feature extraction, regularization, training, finding the right model, testing, selecting and deploying it. Furthermore, for most machine learning scenarios you do…
more...
No comment yet.
Scooped by Jörn Franke
Scoop.it!

HadoopCryptoLedger library a vision for the coming Years

HadoopCryptoLedger library a vision for the coming Years | Innovation | Scoop.it
The first commit of the HadoopCryptoLedger has been on 26th March of 2016. Since then a lot of new functionality has been added, such as support for major Big Data platforms including Hive / Flink / Spark. Furthermore, besides Bitcoin, Altcoins based on Bitcoin (e.g. Namecoin, Litecoin or Bitcoin Cash) and Ethereum (including Altcoins) have…
more...
No comment yet.
Scooped by Jörn Franke
Scoop.it!

Blockchain Consensus Algorithms – Proof of Anything?

Blockchain Consensus Algorithms – Proof of Anything? | Innovation | Scoop.it
Blockchains have been proven over the last years to be stable distributed ledger technologies. Stable refers to the fact that they can recover from attacks and/or bugs without compromising their assets. They are most commonly known for enabling transaction with virtual cryptocurrencies not issued by a central authority. Popular examples are Bitcoin and Ethereum. However,…
more...
No comment yet.
Scooped by Jörn Franke
Scoop.it!

Mapred vs MapReduce – The API question of Hadoop and impact on the Ecosystem

Mapred vs MapReduce – The API question of Hadoop and impact on the Ecosystem | Innovation | Scoop.it
I will describe in this blog post the difference between the mapred.* and mapreduce.* API in Hadoop with respect to the custom InputFormats and OutputFormats. Additionally I will write on the impac…
more...
No comment yet.
Scooped by Jörn Franke
Scoop.it!

Spending Time on Quality in Your Big Data Open Source Projects

Spending Time on Quality in Your Big Data Open Source Projects | Innovation | Scoop.it
Open source libraries are nowadays a critical part of the economy. They are used in commercial and non-commercial applications directly or indirectly affecting virtually any human being. Ensuring quality should be at the heart of each open source project. Verifying that an open source project ensures quality is mandatory for each stakeholder of this project,…
more...
No comment yet.
Scooped by Jörn Franke
Scoop.it!

Reading/Writing Excel documents with the HadoopOffice library on Hadoop and Spark – First release

Reading/Writing Excel documents with the HadoopOffice library on Hadoop and Spark – First release | Innovation | Scoop.it
Reading/Writing office documents, such as Excel, has been always challenging on Big data platforms. Although many libraries exist for reading/writing office documents, they have never been really integrated in Hadoop or Spark and thus lead to a lot of development efforts. There are several use cases for using office documents jointly with Big data technologies:…
more...
No comment yet.
Scooped by Jörn Franke
Scoop.it!

Hive & Bitcoin: Analytics on Blockchain data with SQL

Hive & Bitcoin: Analytics on Blockchain data with SQL | Innovation | Scoop.it
You can now analyze the Bitcoin Blockchain using Hive and the hadoopcryptoledger library with the new HiveSerde plugin. Basically you can link any data that you loaded in Hive with Bitcoin Blockchain data. For example, you can link Blockchain data with important events in history to determine what causes Bitcoin exchange rates to increase or…
more...
No comment yet.
Scooped by Jörn Franke
Scoop.it!

Sneak Preview – HadoopOffice: Processing Office documents using the Hadoop Ecosystem – The example of Excel files

Sneak Preview – HadoopOffice: Processing Office documents using the Hadoop Ecosystem – The example of Excel files | Innovation | Scoop.it
I present in this blog post the sneak preview of the hadoopoffice library that will enable you to process Office files, such as MS Excel, using the Hadoop Ecosystem including Hive/Spark. It currently contains only an ExcelInputFormat, which is based on Apache POI. Additionally, it contains an example that demonstrates how an Excel input file…
more...
No comment yet.
Scooped by Jörn Franke
Scoop.it!

Batch-processing & Interactive Analytics for Big Data – the Role of in-Memory

In this blog post I will discuss various aspects of in-memory technologies and describe how various Big Data technologies fit into this context. Especially, I will focus on the difference between in-memory batch analytics and interactive in-memory analytics. Additionally, I will illustrate when in-memory technology is really beneficial. In-memory technology leverages the fast main memory…
more...
No comment yet.
Scooped by Jörn Franke
Scoop.it!

Big Data – What is next? OLTP, OLAP, Predictive Analytics, Sampling and Probabilistic Databases

Big Data – What is next? OLTP, OLAP, Predictive Analytics, Sampling and Probabilistic Databases | Innovation | Scoop.it
Big Data has matured over the last years and is becoming more and more a standard technology used in various industries. Coming from established concepts, such as OLAP or OLTP, in context of Big Da...
Jörn Franke's insight:

Thinking about near-term, medium-term and long-term future of Big Data analytics

more...
No comment yet.
Scooped by Jörn Franke
Scoop.it!

Master Data Management and the Internet of Things

Master Data Management and the Internet of Things | Innovation | Scoop.it
Master Data Management (MDM) has matured and grown significantly over the last years. The main motivation for master data management is to have a complete and accurate view on master data objects i...
Jörn Franke's insight:

Use NoSQL / Big Data technologies for next generation Master Data Management systems enabled by the Internet of Things

 

more...
No comment yet.
Scooped by Jörn Franke
Scoop.it!

clouddatalab2.pdf

clouddatalab2.pdf | Innovation | Scoop.it
Jörn Franke's insight:

create your own big data lab on Amazon EMR / Hadoop using Spark in-memory big data technology and R (SparkR / RMR)

more...
No comment yet.