cascading.flow.hadoop.planner
Class HadoopPlanner

java.lang.Object
  extended by cascading.flow.planner.FlowPlanner
      extended by cascading.flow.hadoop.planner.HadoopPlanner

public class HadoopPlanner
extends FlowPlanner

Class HadoopPlanner is the core Hadoop MapReduce planner.

Notes:

Custom JobConf properties
A custom JobConf instance can be passed to this planner by calling copyJobConf(java.util.Map, org.apache.hadoop.mapred.JobConf) on a map properties object before constructing a new HadoopFlowConnector.

A better practice would be to set Hadoop properties directly on the map properties object handed to the FlowConnector. All values in the map will be passed to a new default JobConf instance to be used as defaults for all resulting Flow instances.

For example, properties.set("mapred.child.java.opts","-Xmx512m"); would convince Hadoop to spawn all child jvms with a heap of 512MB.


Field Summary
 
Fields inherited from class cascading.flow.planner.FlowPlanner
assertionLevel, debugLevel, properties
 
Constructor Summary
HadoopPlanner()
           
 
Method Summary
 Flow buildFlow(FlowDef flowDef)
          Method buildFlow renders the actual Flow instance.
static void copyJobConf(Map<Object,Object> properties, JobConf jobConf)
          Method copyJobConf adds the given JobConf values to the given properties object.
static JobConf createJobConf(Map<Object,Object> properties)
          Method createJobConf returns a new JobConf instance using the values in the given properties argument.
static boolean getNormalizeHeterogeneousSources(Map<Object,Object> properties)
          Method getNormalizeHeterogeneousSources returns if this planner will normalize heterogeneous input sources.
 void initialize(FlowConnector flowConnector, Map<Object,Object> properties)
           
protected  Tap makeTempTap(String name)
          Method makeTemp ...
static void setNormalizeHeterogeneousSources(Map<Object,Object> properties, boolean doNormalize)
          Method setNormalizeHeterogeneousSources adds the given doNormalize boolean to the given properites object.
 
Methods inherited from class cascading.flow.planner.FlowPlanner
createElementGraph, failOnGroupEverySplit, failOnLoneGroupAssertion, failOnMissingGroup, failOnMisusedBuffer, handleExceptionDuringPlanning, handleJobPartitioning, handleJoins, handleNonSafeOperations, insertTempTapAfter, verifyAssembly, verifyCheckpoints, verifyPipeAssemblyEndPoints, verifySourceNotSinks, verifyTaps, verifyTraps
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

HadoopPlanner

public HadoopPlanner()
Method Detail

copyJobConf

public static void copyJobConf(Map<Object,Object> properties,
                               JobConf jobConf)
Method copyJobConf adds the given JobConf values to the given properties object. Use this method to pass custom default Hadoop JobConf properties to Hadoop.

Parameters:
properties - of type Map
jobConf - of type JobConf

createJobConf

public static JobConf createJobConf(Map<Object,Object> properties)
Method createJobConf returns a new JobConf instance using the values in the given properties argument.

Parameters:
properties - of type Map
Returns:
a JobConf instance

setNormalizeHeterogeneousSources

public static void setNormalizeHeterogeneousSources(Map<Object,Object> properties,
                                                    boolean doNormalize)
Method setNormalizeHeterogeneousSources adds the given doNormalize boolean to the given properites object. Use this method if additional jobs should be planned in to handle incompatible InputFormat classes.

Normalization is off by default and should only be enabled by advanced users. Typically this will decrease application performance.

Parameters:
properties - of type Map
doNormalize - of type boolean

getNormalizeHeterogeneousSources

public static boolean getNormalizeHeterogeneousSources(Map<Object,Object> properties)
Method getNormalizeHeterogeneousSources returns if this planner will normalize heterogeneous input sources.

Parameters:
properties - of type Map
Returns:
a boolean

initialize

public void initialize(FlowConnector flowConnector,
                       Map<Object,Object> properties)
Overrides:
initialize in class FlowPlanner

buildFlow

public Flow buildFlow(FlowDef flowDef)
Description copied from class: FlowPlanner
Method buildFlow renders the actual Flow instance.

Specified by:
buildFlow in class FlowPlanner

makeTempTap

protected Tap makeTempTap(String name)
Description copied from class: FlowPlanner
Method makeTemp ...

Specified by:
makeTempTap in class FlowPlanner
Returns:
Tap


Copyright © 2007-2012 Concurrent, Inc. All Rights Reserved.