java.lang.Object org.apache.hadoop.filecache.DistributedCache
public class DistributedCache
Distribute application-specific large, read-only files efficiently.
DistributedCache
is a facility provided by the Map-Reduce framework to cache files (text, archives, jars etc.) needed by applications.
Applications specify the files, via urls (hdfs:// or http://) to be cached via the JobConf
. The DistributedCache
assumes that the files specified via hdfs:// urls are already present on the FileSystem
at the path specified by the url.
The framework will copy the necessary files on to the slave node before any tasks for the job are executed on that node. Its efficiency stems from the fact that the files are only copied once per job and the ability to cache archives which are un-archived on the slaves.
DistributedCache
can be used to distribute simple, read-only data/text files and/or more complex types such as archives, jars etc. Archives (zip, tar and tgz/tar.gz files) are un-archived at the slave nodes. Jars may be optionally added to the classpath of the tasks, a rudimentary software distribution mechanism. Files have execution permissions. Optionally users can also direct it to symlink the distributed cache file(s) into the working directory of the task.
DistributedCache
tracks modification timestamps of the cache files. Clearly the cache files should not be modified by the application or externally while the job is executing.
Here is an illustrative example on how to use the DistributedCache
:
It is also very common to use the DistributedCache by using// Setting up the cache for the application 1. Copy the requisite files to theFileSystem
: $ bin/hadoop fs -copyFromLocal lookup.dat /myapp/lookup.dat $ bin/hadoop fs -copyFromLocal map.zip /myapp/map.zip $ bin/hadoop fs -copyFromLocal mylib.jar /myapp/mylib.jar $ bin/hadoop fs -copyFromLocal mytar.tar /myapp/mytar.tar $ bin/hadoop fs -copyFromLocal mytgz.tgz /myapp/mytgz.tgz $ bin/hadoop fs -copyFromLocal mytargz.tar.gz /myapp/mytargz.tar.gz 2. Setup the application'sJobConf
: JobConf job = new JobConf(); DistributedCache.addCacheFile(new URI("/myapp/lookup.dat#lookup.dat"), job); DistributedCache.addCacheArchive(new URI("/myapp/map.zip", job); DistributedCache.addFileToClassPath(new Path("/myapp/mylib.jar"), job); DistributedCache.addCacheArchive(new URI("/myapp/mytar.tar", job); DistributedCache.addCacheArchive(new URI("/myapp/mytgz.tgz", job); DistributedCache.addCacheArchive(new URI("/myapp/mytargz.tar.gz", job); 3. Use the cached files in theMapper
orReducer
: public static class MapClass extends MapReduceBase implements Mapper<K, V, K, V> { private Path[] localArchives; private Path[] localFiles; public void configure(JobConf job) { // Get the cached archives/files localArchives = DistributedCache.getLocalCacheArchives(job); localFiles = DistributedCache.getLocalCacheFiles(job); } public void map(K key, V value, OutputCollector<K, V> output, Reporter reporter) throws IOException { // Use data from the cached archives/files here // ... // ... output.collect(k, v); } }
GenericOptionsParser
. This class includes methods that should be used by users (specifically those mentioned in the example above, as well as addArchiveToClassPath(Path, Configuration)
), as well as methods intended for use by the MapReduce framework (e.g., JobClient
). For implementation details, see TrackerDistributedCacheManager
and TaskDistributedCacheManager
.
TrackerDistributedCacheManager
, TaskDistributedCacheManager
, JobConf
, JobClient
static void
addArchiveToClassPath(Path archive, Configuration conf)
addArchiveToClassPath(Path, Configuration, FileSystem)
instead. The FileSystem
should be obtained within an appropriate doAs
. static void
addArchiveToClassPath(Path archive, Configuration conf, FileSystem fs)
static void
addCacheArchive(URI uri, Configuration conf)
static void
addCacheFile(URI uri, Configuration conf)
static void
addFileToClassPath(Path file, Configuration conf)
addFileToClassPath(Path, Configuration, FileSystem)
instead. The FileSystem
should be obtained within an appropriate doAs
. static void
addFileToClassPath(Path file, Configuration conf, FileSystem fs)
static void
addLocalArchives(Configuration conf, String str)
static void
addLocalFiles(Configuration conf, String str)
static boolean
checkURIs(URI[] uriFiles, URI[] uriArchives)
static void
createAllSymlink(Configuration conf, File jobCacheDir, File workDir)
static void
createSymlink(Configuration conf)
static Path[]
getArchiveClassPaths(Configuration conf)
static long[]
getArchiveTimestamps(Configuration conf)
static URI[]
getCacheArchives(Configuration conf)
static URI[]
getCacheFiles(Configuration conf)
static Path[]
getFileClassPaths(Configuration conf)
static FileStatus
getFileStatus(Configuration conf, URI cache)
FileStatus
of a given cache file on hdfs. static long[]
getFileTimestamps(Configuration conf)
static Path[]
getLocalCacheArchives(Configuration conf)
static Path[]
getLocalCacheFiles(Configuration conf)
static boolean
getSymlink(Configuration conf)
static long
getTimestamp(Configuration conf, URI cache)
static void
setArchiveTimestamps(Configuration conf, String timestamps)
static void
setCacheArchives(URI[] archives, Configuration conf)
static void
setCacheFiles(URI[] files, Configuration conf)
static void
setFileTimestamps(Configuration conf, String timestamps)
static void
setLocalArchives(Configuration conf, String str)
static void
setLocalFiles(Configuration conf, String str)
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
CACHE_FILES_SIZES
public static final String CACHE_FILES_SIZES
CACHE_FILES_SIZES
is not a *public* constant.
public static final String CACHE_ARCHIVES_SIZES
CACHE_ARCHIVES_SIZES
is not a *public* constant.
public static final String CACHE_ARCHIVES_TIMESTAMPS
CACHE_ARCHIVES_TIMESTAMPS
is not a *public* constant.
public static final String CACHE_FILES_TIMESTAMPS
CACHE_FILES_TIMESTAMPS
is not a *public* constant.
public static final String CACHE_ARCHIVES
CACHE_ARCHIVES
is not a *public* constant.
public static final String CACHE_FILES
CACHE_FILES
is not a *public* constant.
public static final String CACHE_LOCALARCHIVES
CACHE_LOCALARCHIVES
is not a *public* constant.
public static final String CACHE_LOCALFILES
CACHE_LOCALFILES
is not a *public* constant.
public static final String CACHE_SYMLINK
CACHE_SYMLINK
is not a *public* constant.
public DistributedCache()
public static FileStatus getFileStatus(Configuration conf, URI cache) throws IOException
FileStatus
of a given cache file on hdfs. Internal to MapReduce.
conf
- configuration
cache
- cache file
FileStatus
of a given cache file on hdfs
IOException
public static long getTimestamp(Configuration conf, URI cache) throws IOException
conf
- configuration
cache
- cache file
IOException
public static void createAllSymlink(Configuration conf, File jobCacheDir, File workDir) throws IOException
conf
- the configuration
jobCacheDir
- the target directory for creating symlinks
workDir
- the directory in which the symlinks are created
IOException
public static void setCacheArchives(URI[] archives, Configuration conf)
archives
- The list of archives that need to be localized
conf
- Configuration which will be changed
public static void setCacheFiles(URI[] files, Configuration conf)
files
- The list of files that need to be localized
conf
- Configuration which will be changed
public static URI[] getCacheArchives(Configuration conf) throws IOException
conf
- The configuration which contains the archives
IOException
public static URI[] getCacheFiles(Configuration conf) throws IOException
conf
- The configuration which contains the files
IOException
public static Path[] getLocalCacheArchives(Configuration conf) throws IOException
conf
- Configuration that contains the localized archives
IOException
public static Path[] getLocalCacheFiles(Configuration conf) throws IOException
conf
- Configuration that contains the localized files
IOException
public static long[] getArchiveTimestamps(Configuration conf)
conf
- The configuration which stored the timestamps
IOException
public static long[] getFileTimestamps(Configuration conf)
conf
- The configuration which stored the timestamps
IOException
public static void setArchiveTimestamps(Configuration conf, String timestamps)
conf
- Configuration which stores the timestamp's
timestamps
- comma separated list of timestamps of archives. The order should be the same as the order in which the archives are added.
public static void setFileTimestamps(Configuration conf, String timestamps)
conf
- Configuration which stores the timestamp's
timestamps
- comma separated list of timestamps of files. The order should be the same as the order in which the files are added.
public static void setLocalArchives(Configuration conf, String str)
conf
- The conf to modify to contain the localized caches
str
- a comma separated list of local archives
public static void setLocalFiles(Configuration conf, String str)
conf
- The conf to modify to contain the localized caches
str
- a comma separated list of local files
public static void addLocalArchives(Configuration conf, String str)
conf
- The conf to modify to contain the localized caches
str
- a comma separated list of local archives
public static void addLocalFiles(Configuration conf, String str)
conf
- The conf to modify to contain the localized caches
str
- a comma separated list of local files
public static void addCacheArchive(URI uri, Configuration conf)
uri
- The uri of the cache to be localized
conf
- Configuration to add the cache to
public static void addCacheFile(URI uri, Configuration conf)
uri
- The uri of the cache to be localized
conf
- Configuration to add the cache to
@Deprecated public static void addFileToClassPath(Path file, Configuration conf) throws IOException
addFileToClassPath(Path, Configuration, FileSystem)
instead. The FileSystem
should be obtained within an appropriate doAs
.
file
- Path of the file to be added
conf
- Configuration that contains the classpath setting
IOException
public static void addFileToClassPath(Path file, Configuration conf, FileSystem fs) throws IOException
file
- Path of the file to be added
conf
- Configuration that contains the classpath setting
fs
- FileSystem with respect to which archivefile
should be interpreted.
IOException
public static Path[] getFileClassPaths(Configuration conf)
conf
- Configuration that contains the classpath setting
@Deprecated public static void addArchiveToClassPath(Path archive, Configuration conf) throws IOException
addArchiveToClassPath(Path, Configuration, FileSystem)
instead. The FileSystem
should be obtained within an appropriate doAs
.
archive
- Path of the archive to be added
conf
- Configuration that contains the classpath setting
IOException
public static void addArchiveToClassPath(Path archive, Configuration conf, FileSystem fs) throws IOException
archive
- Path of the archive to be added
conf
- Configuration that contains the classpath setting
fs
- FileSystem with respect to which archive
should be interpreted.
IOException
public static Path[] getArchiveClassPaths(Configuration conf)
conf
- Configuration that contains the classpath setting
public static void createSymlink(Configuration conf)
conf
- the jobconf
public static boolean getSymlink(Configuration conf)
conf
- the jobconf
public static boolean checkURIs(URI[] uriFiles, URI[] uriArchives)
uriFiles
- The uri array of urifiles
uriArchives
- the uri array of uri archives
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4