java.lang.Object org.apache.hadoop.mapred.SkipBadRecords
public class SkipBadRecords
Utility class for skip bad records functionality. It contains various settings related to skipping of bad records.
Hadoop provides an optional mode of execution in which the bad records are detected and skipped in further attempts.
This feature can be used when map/reduce tasks crashes deterministically on certain input. This happens due to bugs in the map/reduce function. The usual course would be to fix these bugs. But sometimes this is not possible; perhaps the bug is in third party libraries for which the source code is not available. Due to this, the task never reaches to completion even with multiple attempts and complete data for that task is lost.
With this feature, only a small portion of data is lost surrounding the bad record, which may be acceptable for some user applications. see setMapperMaxSkipRecords(Configuration, long)
The skipping mode gets kicked off after certain no of failures see setAttemptsToStartSkipping(Configuration, int)
In the skipping mode, the map/reduce task maintains the record range which is getting processed at all times. Before giving the input to the map/reduce function, it sends this record range to the Task tracker. If task crashes, the Task tracker knows which one was the last reported range. On further attempts that range get skipped.
Method Summarystatic int
getAttemptsToStartSkipping(Configuration conf)
static boolean
getAutoIncrMapperProcCount(Configuration conf)
COUNTER_MAP_PROCESSED_RECORDS
is incremented by MapRunner after invoking the map function. static boolean
getAutoIncrReducerProcCount(Configuration conf)
COUNTER_REDUCE_PROCESSED_GROUPS
is incremented by framework after invoking the reduce function. static long
getMapperMaxSkipRecords(Configuration conf)
static long
getReducerMaxSkipGroups(Configuration conf)
static Path
getSkipOutputPath(Configuration conf)
static void
setAttemptsToStartSkipping(Configuration conf, int attemptsToStartSkipping)
static void
setAutoIncrMapperProcCount(Configuration conf, boolean autoIncr)
COUNTER_MAP_PROCESSED_RECORDS
is incremented by MapRunner after invoking the map function. static void
setAutoIncrReducerProcCount(Configuration conf, boolean autoIncr)
COUNTER_REDUCE_PROCESSED_GROUPS
is incremented by framework after invoking the reduce function. static void
setMapperMaxSkipRecords(Configuration conf, long maxSkipRecs)
static void
setReducerMaxSkipGroups(Configuration conf, long maxSkipGrps)
static void
setSkipOutputPath(JobConf conf, Path path)
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
COUNTER_GROUP
public static final String COUNTER_GROUP
public static final String COUNTER_MAP_PROCESSED_RECORDS
getAutoIncrMapperProcCount(Configuration)
, Constant Field Values
public static final String COUNTER_REDUCE_PROCESSED_GROUPS
getAutoIncrReducerProcCount(Configuration)
, Constant Field Values
public SkipBadRecords()
public static int getAttemptsToStartSkipping(Configuration conf)
conf
- the configuration
public static void setAttemptsToStartSkipping(Configuration conf, int attemptsToStartSkipping)
conf
- the configuration
attemptsToStartSkipping
- no of task attempts
public static boolean getAutoIncrMapperProcCount(Configuration conf)
COUNTER_MAP_PROCESSED_RECORDS
is incremented by MapRunner after invoking the map function. This value must be set to false for applications which process the records asynchronously or buffer the input records. For example streaming. In such cases applications should increment this counter on their own. Default value is true.
conf
- the configuration
true
if auto increment COUNTER_MAP_PROCESSED_RECORDS
. false
otherwise.
public static void setAutoIncrMapperProcCount(Configuration conf, boolean autoIncr)
COUNTER_MAP_PROCESSED_RECORDS
is incremented by MapRunner after invoking the map function. This value must be set to false for applications which process the records asynchronously or buffer the input records. For example streaming. In such cases applications should increment this counter on their own. Default value is true.
conf
- the configuration
autoIncr
- whether to auto increment COUNTER_MAP_PROCESSED_RECORDS
.
public static boolean getAutoIncrReducerProcCount(Configuration conf)
COUNTER_REDUCE_PROCESSED_GROUPS
is incremented by framework after invoking the reduce function. This value must be set to false for applications which process the records asynchronously or buffer the input records. For example streaming. In such cases applications should increment this counter on their own. Default value is true.
conf
- the configuration
true
if auto increment COUNTER_REDUCE_PROCESSED_GROUPS
. false
otherwise.
public static void setAutoIncrReducerProcCount(Configuration conf, boolean autoIncr)
COUNTER_REDUCE_PROCESSED_GROUPS
is incremented by framework after invoking the reduce function. This value must be set to false for applications which process the records asynchronously or buffer the input records. For example streaming. In such cases applications should increment this counter on their own. Default value is true.
conf
- the configuration
autoIncr
- whether to auto increment COUNTER_REDUCE_PROCESSED_GROUPS
.
public static Path getSkipOutputPath(Configuration conf)
conf
- the configuration.
public static void setSkipOutputPath(JobConf conf, Path path)
conf
- the configuration.
path
- skip output directory path
public static long getMapperMaxSkipRecords(Configuration conf)
conf
- the configuration
public static void setMapperMaxSkipRecords(Configuration conf, long maxSkipRecs)
conf
- the configuration
maxSkipRecs
- acceptable skip records.
public static long getReducerMaxSkipGroups(Configuration conf)
conf
- the configuration
public static void setReducerMaxSkipGroups(Configuration conf, long maxSkipGrps)
conf
- the configuration
maxSkipGrps
- acceptable skip groups.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4