java.lang.Object org.apache.hadoop.mapred.FileInputFormat<K,V>
public abstract class FileInputFormat<K,V>
A base class for file-based InputFormat
.
FileInputFormat
is the base class for all file-based InputFormat
s. This provides a generic implementation of getSplits(JobConf, int)
. Subclasses of FileInputFormat
can also override the isSplitable(FileSystem, Path)
method to ensure input-files are not split-up and are processed as a whole by Mapper
s.
static org.apache.commons.logging.Log
LOG
static void
addInputPath(JobConf conf, Path path)
Path
to the list of inputs for the map-reduce job. static void
addInputPaths(JobConf conf, String commaSeparatedPaths)
protected long
computeSplitSize(long goalSize, long minSize, long blockSize)
protected int
getBlockIndex(BlockLocation[] blkLocations, long offset)
static PathFilter
getInputPathFilter(JobConf conf)
static Path[]
getInputPaths(JobConf conf)
Path
s for the map-reduce job. abstract RecordReader<K,V>
getRecordReader(InputSplit split, JobConf job, Reporter reporter)
RecordReader
for the given InputSplit
. protected String[]
getSplitHosts(BlockLocation[] blkLocations, long offset, long splitSize, NetworkTopology clusterMap)
InputSplit[]
getSplits(JobConf job, int numSplits)
listStatus(JobConf)
when they're too big. protected boolean
isSplitable(FileSystem fs, Path filename)
protected FileStatus[]
listStatus(JobConf job)
static void
setInputPathFilter(JobConf conf, Class<? extends PathFilter> filter)
static void
setInputPaths(JobConf conf, Path... inputPaths)
Path
s as the list of inputs for the map-reduce job. static void
setInputPaths(JobConf conf, String commaSeparatedPaths)
protected void
setMinSplitSize(long minSplitSize)
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
LOG
public static final org.apache.commons.logging.Log LOG
public FileInputFormat()
protected void setMinSplitSize(long minSplitSize)
protected boolean isSplitable(FileSystem fs, Path filename)
FileInputFormat
implementations can override this and return false
to ensure that individual input files are never split-up so that Mapper
s process entire files.
fs
- the file system that the file is on
filename
- the file name to check
public abstract RecordReader<K,V> getRecordReader(InputSplit split, JobConf job, Reporter reporter) throws IOException
InputFormat
RecordReader
for the given InputSplit
.
It is the responsibility of the RecordReader
to respect record boundaries while processing the logical split to present a record-oriented view to the individual task.
getRecordReader
in interface InputFormat<K,V>
split
- the InputSplit
job
- the job that this split belongs to
RecordReader
IOException
public static void setInputPathFilter(JobConf conf, Class<? extends PathFilter> filter)
filter
- the PathFilter class use for filtering the input paths.
public static PathFilter getInputPathFilter(JobConf conf)
protected FileStatus[] listStatus(JobConf job) throws IOException
job
- the job to list input paths for
IOException
- if zero items.
public InputSplit[] getSplits(JobConf job, int numSplits) throws IOException
listStatus(JobConf)
when they're too big.
getSplits
in interface InputFormat<K,V>
job
- job configuration.
numSplits
- the desired number of splits, a hint.
InputSplit
s for the job.
IOException
protected long computeSplitSize(long goalSize, long minSize, long blockSize)
protected int getBlockIndex(BlockLocation[] blkLocations, long offset)
public static void setInputPaths(JobConf conf, String commaSeparatedPaths)
conf
- Configuration of the job
commaSeparatedPaths
- Comma separated paths to be set as the list of inputs for the map-reduce job.
public static void addInputPaths(JobConf conf, String commaSeparatedPaths)
conf
- The configuration of the job
commaSeparatedPaths
- Comma separated paths to be added to the list of inputs for the map-reduce job.
public static void setInputPaths(JobConf conf, Path... inputPaths)
Path
s as the list of inputs for the map-reduce job.
conf
- Configuration of the job.
inputPaths
- the Path
s of the input directories/files for the map-reduce job.
public static void addInputPath(JobConf conf, Path path)
Path
to the list of inputs for the map-reduce job.
conf
- The configuration of the job
path
- Path
to be added to the list of inputs for the map-reduce job.
public static Path[] getInputPaths(JobConf conf)
Path
s for the map-reduce job.
conf
- The configuration of the job
Path
s for the map-reduce job.
protected String[] getSplitHosts(BlockLocation[] blkLocations, long offset, long splitSize, NetworkTopology clusterMap) throws IOException
blkLocations
- The list of block locations
offset
-
splitSize
-
IOException
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4