#sclading

3 posts

6 Jan 2016

6 Jan 2016 17 min read

Running Apache Flink on Amazon Elastic Mapreduce

I love really Amazon EMR. Over the years it’s grown from being “Hadoop on-demand” to a full-fledged cluster management system for running OSS big-data apps (Hadoop MR of course, but also Spark, Hue, Hive, Pig, Oozie and more). While Hadoop out of the box supports reading from S3, EMR has a proprietary implementation called EMRFS that has some nice features.…

scala hadoop hdfs scladingflink

20 Dec 2015

20 Dec 2015 8 min read

Running Scalding jobs on Apache Flink

Ian Hummel

My previous post showed a very simple Scalding workflow. Apache Flink is a real time streaming framework that’s very promising. It also supports running Cascading workflows with very little modification. Surely there must be some way to run a Scalding job on top of Flink? Turns out… YES! In a nutshell Here are the high-level things we need to solve…

scala hadoop hdfs scladingflink

20 Dec 2015 1 min read

Getting started with Scalding

Ian Hummel

I’ve been using Scalding for the last few years and really love how simple it makes writing scalalbe data processing jobs. I think many of the issues beginners have with Scalding relate to project setup. I hope this post simplifies things for people so they can started with less hassle. Building your project with SBT The official getting started guide…

scala hadoop hdfs sclading