A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://deeplearning4j.konduit.ai/spark/explanation/spark-api-reference below:

Spark API Reference | Deeplearning4j

Spark API Reference | Deeplearning4j
  1. Spark
  2. Explanation
Spark API Reference

This page provides the API reference for key classes required to do distributed training with DL4J on Spark. Make sure you have read the introduction guide for deeplearning4j Spark training.

[source]

SharedTrainingMaster implements distributed training of neural networks using a compressed quantized gradient (update) sharing implementation based on the Strom 2015 paper “Scalable Distributed DNN Training Using Commodity GPU Cloud Computing”: https://s3-us-west-2.amazonaws.com/amazon.jobs-public-documents/strom_interspeech2015.pdf . The Deeplearning4j implementation makes a number of modifications, such as having the option to use a parameter-server based implementation for fault tolerance and execution where multicast networking support is not available.

fromJson

public static SharedTrainingMaster fromJson(String jsonStr)

Create a SharedTrainingMaster instance by deserializing a JSON string that has been serialized with {- link #toJson()}

fromYaml

public static SharedTrainingMaster fromYaml(String yamlStr)

Create a SharedTrainingMaster instance by deserializing a YAML string that has been serialized with {- link #toYaml()}

collectTrainingStats

public Builder collectTrainingStats(boolean enable)

Create a SharedTrainingMaster with defaults other than the RDD number of examples

repartitionData

public Builder repartitionData(Repartition repartition)

This parameter defines when repartition is applied (if applied).

repartitionStrategy

public Builder repartitionStrategy(RepartitionStrategy repartitionStrategy)

Used in conjunction with {- link #repartitionData(Repartition)} (which defines when repartitioning should be conducted), repartitionStrategy defines how the repartitioning should be done. See {- link RepartitionStrategy} for details

storageLevel

public Builder storageLevel(StorageLevel storageLevel)

Set the storage level for {- code RDD}s. Default: StorageLevel.MEMORY_ONLY_SER() - i.e., store in memory, in serialized form To use no RDD persistence, use {- code null} Note that this only has effect when {- code RDDTrainingApproach.Direct} is used (which is not the default), and when fitting from an {- code RDD}.

Note: Spark’s StorageLevel.MEMORY_ONLY() and StorageLevel.MEMORY_AND_DISK() can be problematic when it comes to off-heap data (which DL4J/ND4J uses extensively). Spark does not account for off-heap memory when deciding if/when to drop blocks to ensure enough free memory; consequently, for DataSet RDDs that are larger than the total amount of (off-heap) memory, this can lead to OOM issues. Put another way: Spark counts the on-heap size of DataSet and INDArray objects only (which is negligible) resulting in a significant underestimate of the true DataSet object sizes. More DataSets are thus kept in memory than we can really afford.

Note also that fitting directly from an {- code RDD} is discouraged - it is better to export your prepared data once and call (for example} {- code SparkDl4jMultiLayer.fit(String savedDataDirectory)}. See DL4J's Spark website documentation for details.

rddTrainingApproach

public Builder rddTrainingApproach(RDDTrainingApproach rddTrainingApproach)

The approach to use when training on a {- code RDD} or {- code RDD}. Default: {- link RDDTrainingApproach#Export}, which exports data to a temporary directory first. The default cluster temporary directory is used, though can be configured using {- link #exportDirectory(String)} Note also that fitting directly from an {- code RDD} is discouraged - it is better to export your prepared data once and call (for example} {- code SparkDl4jMultiLayer.fit(String savedDataDirectory)}. See DL4J's Spark website documentation for details.

exportDirectory

public Builder exportDirectory(String exportDirectory)

When {- link #rddTrainingApproach(RDDTrainingApproach)} is set to {- link RDDTrainingApproach#Export} (as it is by default) the data is exported to a temporary directory first.

Default: null. -> use {hadoop.tmp.dir}/dl4j/. In this case, data is exported to {hadoop.tmp.dir}/dl4j/SOME_UNIQUE_ID/ If you specify a directory, the directory {exportDirectory}/SOME_UNIQUE_ID/ will be used instead.

rngSeed

public Builder rngSeed(long rngSeed)

Random number generator seed, used mainly for enforcing repeatable splitting/repartitioning on RDDs Default: no seed set (i.e., random seed)

updatesThreshold

public Builder updatesThreshold(double updatesThreshold)

thresholdAlgorithm

public Builder thresholdAlgorithm(ThresholdAlgorithm thresholdAlgorithm)

Algorithm to use to determine the threshold for updates encoding. Lower values might improve convergence, but increase amount of network communication Values that are too low may also impact network convergence. If convergence problems are observed, try increasing or decreasing this by a factor of 10 - say 1e-4 and 1e-2. For technical details, see the paper Scalable Distributed DNN Training Using Commodity GPU Cloud Computing See also {- link ThresholdAlgorithm}

Default: {- link AdaptiveThresholdAlgorithm} with default parameters

residualPostProcessor

public Builder residualPostProcessor(ResidualPostProcessor residualPostProcessor)

Residual post processor. See {- link ResidualPostProcessor} for details.

Default: {- code new ResidualClippingPostProcessor(5.0, 5)} - i.e., a {- link ResidualClippingPostProcessor} that clips the residual to +/- 5x current threshold, every 5 iterations.

batchSizePerWorker

public Builder batchSizePerWorker(int batchSize)

Minibatch size to use when training workers. In principle, the source data (i.e., {- code RDD} etc) can have a different number of examples in each {- code DataSet} than we want to use when training. i.e., we can split or combine DataSets if required.

workersPerNode

public Builder workersPerNode(int numWorkers)

This method allows to configure number of network training threads per cluster node. Default value: -1, which defines automated number of workers selection, based on hardware present in system (i.e., number of GPUs, if training on a GPU enabled system). When training on GPUs, you should use 1 worker per GPU (which is the default). For CPUs, 1 worker per node is usually preferred, though multi-CPU (i.e., multiple physical CPUs) or CPUs with large core counts may have better throughput (i.e., more examples per second) when increasing the number of workers, at the expense of more memory consumed. Note that if you increase the number of workers on a CPU system, you should set the number of OpenMP threads using the {- code OMP_NUM_THREADS} property - see {- link org.nd4j.config.ND4JEnvironmentVars#OMP_NUM_THREADS} for more details. For example, a machine with 32 physical cores could use 4 workers with {- code OMP_NUM_THREADS=8}

debugLongerIterations

public Builder debugLongerIterations(long timeMs)

This method allows you to artificially extend iteration time using Thread.sleep() for a given time.

PLEASE NOTE: Never use that option in production environment. It’s suited for debugging purposes only.

transport

public Builder transport(Transport transport)

Optional method: Transport implementation to be used as TransportType.CUSTOM for VoidParameterAveraging method Generally not used by users

workerPrefetchNumBatches

public Builder workerPrefetchNumBatches(int prefetchNumBatches)

Number of minibatches to asynchronously prefetch on each worker when training. Default: 2, which is usually suitable in most cases. Increasing this might help in some cases of ETL (data loading) bottlenecks, at the expense of greater memory consumption

repartitioner

public Builder repartitioner(Repartitioner repartitioner)

Repartitioner to use to repartition data before fitting. DL4J performs a MapPartitions operation for training, hence how the data is partitioned can matter a lot for performance - too few partitions (or very imbalanced partitions can result in poor cluster utilization, due to some workers being idle. A larger number of smaller partitions can help to avoid so-called “end-of-epoch” effects where training can only complete once the last/slowest worker finishes it’s partition. Default repartitioner is {- link DefaultRepartitioner}, which repartitions equally up to a maximum of 5000 partitions, and is usually suitable for most purposes. In the worst case, the “end of epoch” effect when using the partitioner should be limited to a maximum of the amount of time required to process a single partition.

workerTogglePeriodicGC

public Builder workerTogglePeriodicGC(boolean workerTogglePeriodicGC)

Used to disable the periodic garbage collection calls on the workers. Equivalent to {- code Nd4j.getMemoryManager().togglePeriodicGc(workerTogglePeriodicGC);} Pass false to disable periodic GC on the workers or true (equivalent to the default, or not setting it) to keep it enabled.

workerPeriodicGCFrequency

public Builder workerPeriodicGCFrequency(int workerPeriodicGCFrequency)

Used to set the periodic garbage collection frequency on the workers. Equivalent to calling {- code Nd4j.getMemoryManager().setAutoGcWindow(workerPeriodicGCFrequency);} on each worker Does not have any effect if {- link #workerTogglePeriodicGC(boolean)} is set to false

encodingDebugMode

public Builder encodingDebugMode(boolean enabled)

Enable debug mode for threshold encoding. When enabled, various statistics for the threshold and the residual will be calculated and logged on each worker (at info log level). This information can be used to check if the encoding threshold is too big (for example, virtually all updates are much smaller than the threshold) or too big (majority of updates are much larger than the threshold). encodingDebugMode is disabled by default. IMPORTANT: enabling this has a performance overhead, and should not be enabled unless the debug information is actually required.

[source]

Main class for training ComputationGraph networks using Spark. Also used for performing distributed evaluation and inference on these networks

getSparkContext

public JavaSparkContext getSparkContext()

Instantiate a ComputationGraph instance with the given context, network and training master.

getNetwork

public ComputationGraph getNetwork()

getTrainingMaster

public TrainingMaster getTrainingMaster()

setNetwork

public void setNetwork(ComputationGraph network)

getDefaultEvaluationWorkers

public int getDefaultEvaluationWorkers()

Returns the currently set default number of evaluation workers/threads. Note that when the number of workers is provided explicitly in an evaluation method, the default value is not used. In many cases, we may want this to be smaller than the number of Spark threads, to reduce memory requirements. For example, with 32 Spark threads and a large network, we don’t want to spin up 32 instances of the network to perform evaluation. Better (for memory requirements, and reduced cache thrashing) to use say 4 workers. If it is not set explicitly, {- link #DEFAULT_EVAL_WORKERS} will be used

setDefaultEvaluationWorkers

public void setDefaultEvaluationWorkers(int workers)

Set the default number of evaluation workers/threads. Note that when the number of workers is provided explicitly in an evaluation method, the default value is not used. In many cases, we may want this to be smaller than the number of Spark threads, to reduce memory requirements. For example, with 32 Spark threads and a large network, we don’t want to spin up 32 instances of the network to perform evaluation. Better (for memory requirements, and reduced cache thrashing) to use say 4 workers. If it is not set explicitly, {- link #DEFAULT_EVAL_WORKERS} will be used

fit

public ComputationGraph fit(RDD<DataSet> rdd)

Fit the ComputationGraph with the given data set

fit

public ComputationGraph fit(JavaRDD<DataSet> rdd)

Fit the ComputationGraph with the given data set

fit

public ComputationGraph fit(String path)

Fit the SparkComputationGraph network using a directory of serialized DataSet objects The assumption here is that the directory contains a number of {- link DataSet} objects, each serialized using {- link DataSet#save(OutputStream)}

fit

public ComputationGraph fit(String path, int minPartitions)

fitPaths

public ComputationGraph fitPaths(JavaRDD<String> paths)

Fit the network using a list of paths for serialized DataSet objects.

fitPathsMultiDataSet

public ComputationGraph fitPathsMultiDataSet(JavaRDD<String> paths)

Fit the ComputationGraph with the given data set

fitMultiDataSet

public ComputationGraph fitMultiDataSet(String path, int minPartitions)

getScore

Gets the last (average) minibatch score from calling fit. This is the average score across all executors for the last minibatch executed in each worker

calculateScore

public double calculateScore(JavaRDD<DataSet> data, boolean average)

Calculate the score for all examples in the provided {- code JavaRDD}, either by summing or averaging over the entire data set. To calculate a score for each example individually, use {- link #scoreExamples(JavaPairRDD, boolean)} or one of the similar methods. Uses default minibatch size in each worker, {- link SparkComputationGraph#DEFAULT_EVAL_SCORE_BATCH_SIZE}

calculateScore

public double calculateScore(JavaRDD<DataSet> data, boolean average, int minibatchSize)

Calculate the score for all examples in the provided {- code JavaRDD}, either by summing or averaging over the entire data set. To calculate a score for each example individually, use {- link #scoreExamples(JavaPairRDD, boolean)} or one of the similar methods

calculateScoreMultiDataSet

public double calculateScoreMultiDataSet(JavaRDD<MultiDataSet> data, boolean average)

Calculate the score for all examples in the provided {- code JavaRDD}, either by summing or averaging over the entire data set. Uses default minibatch size in each worker, {- link SparkComputationGraph#DEFAULT_EVAL_SCORE_BATCH_SIZE}

calculateScoreMultiDataSet

public double calculateScoreMultiDataSet(JavaRDD<MultiDataSet> data, boolean average, int minibatchSize)

Calculate the score for all examples in the provided {- code JavaRDD}, either by summing or averaging over the entire data set.

scoreExamples

public JavaDoubleRDD scoreExamples(JavaRDD<DataSet> data, boolean includeRegularizationTerms)

DataSet version of {- link #scoreExamples(JavaRDD, boolean)}

scoreExamples

public JavaDoubleRDD scoreExamples(JavaRDD<DataSet> data, boolean includeRegularizationTerms, int batchSize)

DataSet version of {- link #scoreExamples(JavaPairRDD, boolean, int)}

scoreExamplesMultiDataSet

public JavaDoubleRDD scoreExamplesMultiDataSet(JavaRDD<MultiDataSet> data, boolean includeRegularizationTerms)

DataSet version of {- link #scoreExamples(JavaPairRDD, boolean)}

scoreExamplesMultiDataSet

public JavaDoubleRDD scoreExamplesMultiDataSet(JavaRDD<MultiDataSet> data, boolean includeRegularizationTerms,
                    int batchSize)

Score the examples individually, using a specified batch size. Unlike {- link #calculateScore(JavaRDD, boolean)}, this method returns a score for each example separately. If scoring is needed for specific examples use either {- link #scoreExamples(JavaPairRDD, boolean)} or {- link #scoreExamples(JavaPairRDD, boolean, int)} which can have a key for each example.

evaluate

public Evaluation evaluate(String path, DataSetLoader loader)

Score the examples individually, using the default batch size {- link #DEFAULT_EVAL_SCORE_BATCH_SIZE}. Unlike {- link #calculateScore(JavaRDD, boolean)}, this method returns a score for each example separately Note: The provided JavaPairRDD has a key that is associated with each example and returned score. Note: The DataSet objects passed in must have exactly one example in them (otherwise: can’t have a 1:1 association between keys and data sets to score)

evaluate

public Evaluation evaluate(String path, MultiDataSetLoader loader)

Evaluate the single-output network on a directory containing a set of MultiDataSet objects to be loaded with a {- link MultiDataSetLoader}. Uses default batch size of {- link #DEFAULT_EVAL_SCORE_BATCH_SIZE}

evaluateROCMDS

public ROC evaluateROCMDS(JavaRDD<MultiDataSet> data)

{- code RDD} overload of {- link #evaluate(JavaRDD)}

Main class for training MultiLayerNetwork networks using Spark. Also used for performing distributed evaluation and inference on these networks

getSparkContext

public JavaSparkContext getSparkContext()

Instantiate a multi layer spark instance with the given context and network. This is the prediction constructor

getNetwork

public MultiLayerNetwork getNetwork()

getTrainingMaster

public TrainingMaster getTrainingMaster()

setNetwork

public void setNetwork(MultiLayerNetwork network)

Set the network that underlies this SparkDl4jMultiLayer instacne

getDefaultEvaluationWorkers

public int getDefaultEvaluationWorkers()

Returns the currently set default number of evaluation workers/threads. Note that when the number of workers is provided explicitly in an evaluation method, the default value is not used. In many cases, we may want this to be smaller than the number of Spark threads, to reduce memory requirements. For example, with 32 Spark threads and a large network, we don’t want to spin up 32 instances of the network to perform evaluation. Better (for memory requirements, and reduced cache thrashing) to use say 4 workers. If it is not set explicitly, {- link #DEFAULT_EVAL_WORKERS} will be used

setDefaultEvaluationWorkers

public void setDefaultEvaluationWorkers(int workers)

Set the default number of evaluation workers/threads. Note that when the number of workers is provided explicitly in an evaluation method, the default value is not used. In many cases, we may want this to be smaller than the number of Spark threads, to reduce memory requirements. For example, with 32 Spark threads and a large network, we don’t want to spin up 32 instances of the network to perform evaluation. Better (for memory requirements, and reduced cache thrashing) to use say 4 workers. If it is not set explicitly, {- link #DEFAULT_EVAL_WORKERS} will be used

setCollectTrainingStats

public void setCollectTrainingStats(boolean collectTrainingStats)

Set whether training statistics should be collected for debugging purposes. Statistics collection is disabled by default

getSparkTrainingStats

public SparkTrainingStats getSparkTrainingStats()

Get the training statistics, after collection of stats has been enabled using {- link #setCollectTrainingStats(boolean)}

predict

public Matrix predict(Matrix features)

Predict the given feature matrix

predict

public Vector predict(Vector point)

Predict the given vector

fit

public MultiLayerNetwork fit(RDD<DataSet> trainingData)

Fit the DataSet RDD. Equivalent to fit(trainingData.toJavaRDD())

fit

public MultiLayerNetwork fit(JavaRDD<DataSet> trainingData)

Fit the DataSet RDD

fit

public MultiLayerNetwork fit(String path)

Fit the SparkDl4jMultiLayer network using a directory of serialized DataSet objects The assumption here is that the directory contains a number of {- link DataSet} objects, each serialized using {- link DataSet#save(OutputStream)}

fit

public MultiLayerNetwork fit(String path, int minPartitions)

fitPaths

public MultiLayerNetwork fitPaths(JavaRDD<String> paths)

Fit the network using a list of paths for serialized DataSet objects.

fitLabeledPoint

public MultiLayerNetwork fitLabeledPoint(JavaRDD<LabeledPoint> rdd)

Fit a MultiLayerNetwork using Spark MLLib LabeledPoint instances. This will convert the labeled points to the internal DL4J data format and train the model on that

fitContinuousLabeledPoint

public MultiLayerNetwork fitContinuousLabeledPoint(JavaRDD<LabeledPoint> rdd)

Fits a MultiLayerNetwork using Spark MLLib LabeledPoint instances This will convert labeled points that have continuous labels used for regression to the internal DL4J data format and train the model on that

getScore

Gets the last (average) minibatch score from calling fit. This is the average score across all executors for the last minibatch executed in each worker

calculateScore

public double calculateScore(RDD<DataSet> data, boolean average)

Overload of {- link #calculateScore(JavaRDD, boolean)} for {- code RDD} instead of {- code JavaRDD}

calculateScore

public double calculateScore(JavaRDD<DataSet> data, boolean average)

Calculate the score for all examples in the provided {- code JavaRDD}, either by summing or averaging over the entire data set. To calculate a score for each example individually, use {- link #scoreExamples(JavaPairRDD, boolean)} or one of the similar methods. Uses default minibatch size in each worker, {- link SparkDl4jMultiLayer#DEFAULT_EVAL_SCORE_BATCH_SIZE}

calculateScore

public double calculateScore(JavaRDD<DataSet> data, boolean average, int minibatchSize)

Calculate the score for all examples in the provided {- code JavaRDD}, either by summing or averaging over the entire data set. To calculate a score for each example individually, use {- link #scoreExamples(JavaPairRDD, boolean)} or one of the similar methods

scoreExamples

public JavaDoubleRDD scoreExamples(RDD<DataSet> data, boolean includeRegularizationTerms)

{- code RDD} overload of {- link #scoreExamples(JavaPairRDD, boolean)}

scoreExamples

public JavaDoubleRDD scoreExamples(JavaRDD<DataSet> data, boolean includeRegularizationTerms)

Score the examples individually, using the default batch size {- link #DEFAULT_EVAL_SCORE_BATCH_SIZE}. Unlike {- link #calculateScore(JavaRDD, boolean)}, this method returns a score for each example separately. If scoring is needed for specific examples use either {- link #scoreExamples(JavaPairRDD, boolean)} or {- link #scoreExamples(JavaPairRDD, boolean, int)} which can have a key for each example.

scoreExamples

public JavaDoubleRDD scoreExamples(RDD<DataSet> data, boolean includeRegularizationTerms, int batchSize)

{- code RDD} overload of {- link #scoreExamples(JavaRDD, boolean, int)}

scoreExamples

public JavaDoubleRDD scoreExamples(JavaRDD<DataSet> data, boolean includeRegularizationTerms, int batchSize)

Score the examples individually, using a specified batch size. Unlike {- link #calculateScore(JavaRDD, boolean)}, this method returns a score for each example separately. If scoring is needed for specific examples use either {- link #scoreExamples(JavaPairRDD, boolean)} or {- link #scoreExamples(JavaPairRDD, boolean, int)} which can have a key for each example.

ParameterAveragingTrainingMaster

[source]

implementation for training networks on Spark. This is standard parameter averaging with a configurable averaging period.

removeHook

public void removeHook(TrainingHook trainingHook)

addHook

public void addHook(TrainingHook trainingHook)

Add a hook for the master for pre and post training

fromJson

public static ParameterAveragingTrainingMaster fromJson(String jsonStr)

Create a ParameterAveragingTrainingMaster instance by deserializing a JSON string that has been serialized with {- link #toJson()}

fromYaml

public static ParameterAveragingTrainingMaster fromYaml(String yamlStr)

Create a ParameterAveragingTrainingMaster instance by deserializing a YAML string that has been serialized with {- link #toYaml()}

trainingHooks

public Builder trainingHooks(Collection<TrainingHook> trainingHooks)

Adds training hooks to the master. The training master will setup the workers with the desired hooks for training. This can allow for tings like parameter servers and async updates as well as collecting statistics.

trainingHooks

public Builder trainingHooks(TrainingHook... hooks)

Adds training hooks to the master. The training master will setup the workers with the desired hooks for training. This can allow for tings like parameter servers and async updates as well as collecting statistics.

batchSizePerWorker

public Builder batchSizePerWorker(int batchSizePerWorker)

Same as {- link #Builder(Integer, int)} but automatically set number of workers based on JavaSparkContext.defaultParallelism()

averagingFrequency

public Builder averagingFrequency(int averagingFrequency)

Frequency with which to average worker parameters. Note: Too high or too low can be bad for different reasons.

aggregationDepth

public Builder aggregationDepth(int aggregationDepth)

The number of levels in the aggregation tree for parameter synchronization. (default: 2) Note: For large models trained with many partitions, increasing this number will reduce the load on the driver and help prevent it from becoming a bottleneck.

workerPrefetchNumBatches

public Builder workerPrefetchNumBatches(int prefetchNumBatches)

Set the number of minibatches to asynchronously prefetch in the worker.

Default: 0 (no prefetching)

saveUpdater

public Builder saveUpdater(boolean saveUpdater)

Set whether the updater (i.e., historical state for momentum, adagrad, etc should be saved). NOTE: This can double (or more) the amount of network traffic in each direction, but might improve network training performance (and can be more stable for certain updaters such as adagrad).

This is enabled by default.

repartionData

public Builder repartionData(Repartition repartition)

Set if/when repartitioning should be conducted for the training data. Default value: always repartition (if required to guarantee correct number of partitions and correct number of examples in each partition).

repartitionStrategy

public Builder repartitionStrategy(RepartitionStrategy repartitionStrategy)

Used in conjunction with {- link #repartionData(Repartition)} (which defines when repartitioning should be conducted), repartitionStrategy defines how the repartitioning should be done. See {- link RepartitionStrategy} for details

storageLevel

public Builder storageLevel(StorageLevel storageLevel)

Set the storage level for {- code RDD}s. Default: StorageLevel.MEMORY_ONLY_SER() - i.e., store in memory, in serialized form To use no RDD persistence, use {- code null}

Note: Spark’s StorageLevel.MEMORY_ONLY() and StorageLevel.MEMORY_AND_DISK() can be problematic when it comes to off-heap data (which DL4J/ND4J uses extensively). Spark does not account for off-heap memory when deciding if/when to drop blocks to ensure enough free memory; consequently, for DataSet RDDs that are larger than the total amount of (off-heap) memory, this can lead to OOM issues. Put another way: Spark counts the on-heap size of DataSet and INDArray objects only (which is negligible) resulting in a significant underestimate of the true DataSet object sizes. More DataSets are thus kept in memory than we can really afford.

storageLevelStreams

public Builder storageLevelStreams(StorageLevel storageLevelStreams)

Set the storage level RDDs used when fitting data from Streams: either PortableDataStreams (sc.binaryFiles via {- link SparkDl4jMultiLayer#fit(String)} and {- link SparkComputationGraph#fit(String)}) or String paths (via {- link SparkDl4jMultiLayer#fitPaths(JavaRDD)}, {- link SparkComputationGraph#fitPaths(JavaRDD)} and {- link SparkComputationGraph#fitPathsMultiDataSet(JavaRDD)}).

Default storage level is StorageLevel.MEMORY_ONLY() which should be appropriate in most cases.

rddTrainingApproach

public Builder rddTrainingApproach(RDDTrainingApproach rddTrainingApproach)

The approach to use when training on a {- code RDD} or {- code RDD}. Default: {- link RDDTrainingApproach#Export}, which exports data to a temporary directory first

exportDirectory

public Builder exportDirectory(String exportDirectory)

When {- link #rddTrainingApproach(RDDTrainingApproach)} is set to {- link RDDTrainingApproach#Export} (as it is by default) the data is exported to a temporary directory first.

Default: null. -> use {hadoop.tmp.dir}/dl4j/. In this case, data is exported to {hadoop.tmp.dir}/dl4j/SOME_UNIQUE_ID/ If you specify a directory, the directory {exportDirectory}/SOME_UNIQUE_ID/ will be used instead.

rngSeed

public Builder rngSeed(long rngSeed)

Random number generator seed, used mainly for enforcing repeatable splitting on RDDs Default: no seed set (i.e., random seed)

collectTrainingStats

public Builder collectTrainingStats(boolean collectTrainingStats)

Whether training stats collection should be enabled (disabled by default).


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4