public class InputFormatInfo
extends Object
| Constructor and Description |
|---|
InputFormatInfo(org.apache.hadoop.conf.Configuration configuration,
Class<?> inputFormatClazz,
String path) |
| Modifier and Type | Method and Description |
|---|---|
static scala.collection.immutable.Map<String,scala.collection.immutable.Set<SplitInfo>> |
computePreferredLocations(scala.collection.Seq<InputFormatInfo> formats)
Computes the preferred locations based on input(s) and returned a location to block map.
|
org.apache.hadoop.conf.Configuration |
configuration() |
boolean |
equals(Object other) |
int |
hashCode() |
Class<?> |
inputFormatClazz() |
boolean |
mapredInputFormat() |
boolean |
mapreduceInputFormat() |
String |
path() |
String |
toString() |
public InputFormatInfo(org.apache.hadoop.conf.Configuration configuration,
Class<?> inputFormatClazz,
String path)
public static scala.collection.immutable.Map<String,scala.collection.immutable.Set<SplitInfo>> computePreferredLocations(scala.collection.Seq<InputFormatInfo> formats)
a) For each host, count number of splits hosted on that host. b) Decrement the currently allocated containers on that host. c) Compute rack info for each host and update rack -> count map based on (b). d) Allocate nodes based on (c) e) On the allocation result, ensure that we don't allocate "too many" jobs on a single node (even if data locality on that is very high) : this is to prevent fragility of job if a single (or small set of) hosts go down.
go to (a) until required nodes are allocated.
If a node 'dies', follow same procedure.
PS: I know the wording here is weird, hopefully it makes some sense !
formats - (undocumented)public org.apache.hadoop.conf.Configuration configuration()
public Class<?> inputFormatClazz()
public String path()
public boolean mapreduceInputFormat()
public boolean mapredInputFormat()
public String toString()
toString in class Objectpublic int hashCode()
hashCode in class Objectpublic boolean equals(Object other)
equals in class Object