public class InputFormatInfo
extends Object
Constructor and Description |
---|
InputFormatInfo(org.apache.hadoop.conf.Configuration configuration,
Class<?> inputFormatClazz,
String path) |
Modifier and Type | Method and Description |
---|---|
static scala.collection.immutable.Map<String,scala.collection.immutable.Set<SplitInfo>> |
computePreferredLocations(scala.collection.Seq<InputFormatInfo> formats)
Computes the preferred locations based on input(s) and returned a location to block map.
|
org.apache.hadoop.conf.Configuration |
configuration() |
boolean |
equals(Object other) |
int |
hashCode() |
Class<?> |
inputFormatClazz() |
boolean |
mapredInputFormat() |
boolean |
mapreduceInputFormat() |
String |
path() |
String |
toString() |
public InputFormatInfo(org.apache.hadoop.conf.Configuration configuration, Class<?> inputFormatClazz, String path)
public static scala.collection.immutable.Map<String,scala.collection.immutable.Set<SplitInfo>> computePreferredLocations(scala.collection.Seq<InputFormatInfo> formats)
a) For each host, count number of splits hosted on that host. b) Decrement the currently allocated containers on that host. c) Compute rack info for each host and update rack -> count map based on (b). d) Allocate nodes based on (c) e) On the allocation result, ensure that we don't allocate "too many" jobs on a single node (even if data locality on that is very high) : this is to prevent fragility of job if a single (or small set of) hosts go down.
go to (a) until required nodes are allocated.
If a node 'dies', follow same procedure.
PS: I know the wording here is weird, hopefully it makes some sense !
formats
- (undocumented)public org.apache.hadoop.conf.Configuration configuration()
public Class<?> inputFormatClazz()
public String path()
public boolean mapreduceInputFormat()
public boolean mapredInputFormat()
public String toString()
toString
in class Object
public int hashCode()
hashCode
in class Object
public boolean equals(Object other)
equals
in class Object