We shared our overall architecture in a previous blog post. The underpinning of our big data platform is that we leverage AWS S3 for our DW. This architecture allows us to separate compute and storage layers. It allows multiple clusters to share the same data on S3 and clusters can be long-running and yet transient (for flexibility). Our users typically write Pig or Hive jobs for ETL and data analytics.
At Netflix, the Big Data Platform team is responsible for building a reliable data analytics platform shared across the whole company. In general, Netflix product decisions are very data driven. So we play a big role in helping different teams to gain product and consumer insights from a multi-petabyte scale data warehouse (DW). Their use cases range from analyzing A/B tests results to analyzing user streaming experience to training data models for our recommendation algorithms.
We shared our overall architecture in a previous blog post. The underpinning of our big data platform is that we leverage AWS S3 for our DW. This architecture allows us to separate compute and storage layers. It allows multiple clusters to share the same data on S3 and clusters can be long-running and yet transient (for flexibility). Our users typically write Pig or Hive jobs for ETL and data analytics.
We shared our overall architecture in a previous blog post. The underpinning of our big data platform is that we leverage AWS S3 for our DW. This architecture allows us to separate compute and storage layers. It allows multiple clusters to share the same data on S3 and clusters can be long-running and yet transient (for flexibility). Our users typically write Pig or Hive jobs for ETL and data analytics.
A small subset of the ETL output and some aggregated data is transferred to Teradata for interactive querying and reporting. On the other hand, we also have the need to do low latency interactive data exploration on our broader data set on S3. These are the use cases that Presto serves exceptionally well. Seven months ago, we first deployed Presto into production and it is now an integral part of our data ecosystem. In this blog post, we would like to share our experience with Presto and how we made it work for us!
No comment yet.
Sign up to comment
Your new post is loading...