RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://spark.apache.org/docs/latest/api/java/org/apache/spark/ml/regression/LinearRegression.html below:

LinearRegression (Spark 4.0.0 JavaDoc)

All Implemented Interfaces:: Serializable, org.apache.spark.internal.Logging, Params, HasAggregationDepth, HasElasticNetParam, HasFeaturesCol, HasFitIntercept, HasLabelCol, HasLoss, HasMaxBlockSizeInMB, HasMaxIter, HasPredictionCol, HasRegParam, HasSolver, HasStandardization, HasTol, HasWeightCol, PredictorParams, LinearRegressionParams, DefaultParamsWritable, Identifiable, MLWritable

Linear regression.

The learning objective is to minimize the specified loss function, with regularization. This supports two kinds of loss: - squaredError (a.k.a squared loss) - huber (a hybrid of squared error for relatively small errors and absolute error for relatively large ones, and we estimate the scale parameter from training data)

This supports multiple types of regularization: - none (a.k.a. ordinary least squares) - L2 (ridge regression) - L1 (Lasso) - L2 + L1 (elastic net)

The squared error objective function is:

$$ \begin{align} \min_{w}\frac{1}{2n}{\sum_{i=1}^n(X_{i}w - y_{i})^{2} + \lambda\left[\frac{1-\alpha}{2}{||w||_{2}}^{2} + \alpha{||w||_{1}}\right]} \end{align} $$

The huber objective function is:

$$ \begin{align} \min_{w, \sigma}\frac{1}{2n}{\sum_{i=1}^n\left(\sigma + H_m\left(\frac{X_{i}w - y_{i}}{\sigma}\right)\sigma\right) + \frac{1}{2}\lambda {||w||_2}^2} \end{align} $$

where

$$ \begin{align} H_m(z) = \begin{cases} z^2, & \text {if } |z| < \epsilon, \\ 2\epsilon|z| - \epsilon^2, & \text{otherwise} \end{cases} \end{align} $$

Since 3.1.0, it supports stacking instances into blocks and using GEMV for better performance. The block size will be 1.0 MB, if param maxBlockSizeInMB is set 0.0 by default.

Note: Fitting with huber loss only supports none and L2 regularization.

See Also:

Serialized Form

Nested Class Summary
Nested classes/interfaces inherited from interface org.apache.spark.internal.Loggingorg.apache.spark.internal.Logging.LogStringContext, org.apache.spark.internal.Logging.SparkShellLoggingFilter
Constructor Summary
Constructors
Method Summary
Param for suggested depth for treeAggregate (>= 2).

Creates a copy of this instance with the same UID and some extra params.

Param for the ElasticNet mixing parameter, in range [0, 1].

The shape parameter to control the amount of robustness.

Param for whether to fit an intercept term.

The loss function to be optimized.

static int

When using LinearRegression.solver == "normal", the solver must limit the number of features to at most this number.

Param for Maximum memory in MB for stacking input data into blocks.

Param for maximum number of iterations (>= 0).

Param for regularization parameter (>= 0).

Suggested depth for treeAggregate (greater than or equal to 2).

Set the ElasticNet mixing parameter.

Set if we should fit the intercept.

Sets the value of param
loss()
.

Set the maximum number of iterations.

Set the regularization parameter.

Set the solver algorithm used for optimization.

Whether to standardize the training features before fitting the model.

Set the convergence tolerance of iterations.

Whether to over-/under-sample training instances according to the given weights in weightCol.

The solver algorithm for optimization.

Param for whether to standardize the training features before fitting the model.

Param for the convergence tolerance for iterative algorithms (>= 0).

An immutable unique ID for the object and its derivatives.

Param for weight column name.

Methods inherited from interface org.apache.spark.internal.LogginginitializeForcefully, initializeLogIfNecessary, initializeLogIfNecessary, initializeLogIfNecessary$default$2, isTraceEnabled, log, logDebug, logDebug, logDebug, logDebug, logError, logError, logError, logError, logInfo, logInfo, logInfo, logInfo, logName, LogStringContext, logTrace, logTrace, logTrace, logTrace, logWarning, logWarning, logWarning, logWarning, org$apache$spark$internal$Logging$$log_, org$apache$spark$internal$Logging$$log__$eq, withLogContext
Methods inherited from interface org.apache.spark.ml.util.MLWritablesave Methods inherited from interface org.apache.spark.ml.param.Paramsclear, copyValues, defaultCopy, defaultParamMap, explainParam, explainParams, extractParamMap, extractParamMap, get, getDefault, getOrDefault, getParam, hasDefault, hasParam, isDefined, isSet, onParamChange, paramMap, params, set, set, set, setDefault, setDefault, shouldOwn

Constructor Details
- LinearRegression public LinearRegression(String uid)
- LinearRegression
  public LinearRegression()
Method Details
- load
- MAX_FEATURES_FOR_NORMAL_SOLVER
  public static int MAX_FEATURES_FOR_NORMAL_SOLVER()
  
  When using LinearRegression.solver == "normal", the solver must limit the number of features to at most this number. The entire covariance matrix X^T^X will be collected to the driver. This limit helps prevent memory overflow errors.
  
  Returns:
  
  (undocumented)
- read
- solver
  
  Specified by:
  
  solver in interface HasSolver
  
  Specified by:
  
  solver in interface LinearRegressionParams
  
  Returns:
  
  (undocumented)
- loss
  
  Specified by:
  
  loss in interface HasLoss
  
  Specified by:
  
  loss in interface LinearRegressionParams
  
  Returns:
  
  (undocumented)
- epsilon
  The shape parameter to control the amount of robustness. Must be > 1.0. At larger values of epsilon, the huber criterion becomes more similar to least squares regression; for small values of epsilon, the criterion is more similar to L1 regression. Default is 1.35 to get as much robustness as possible while retaining 95% statistical efficiency for normally distributed data. It matches sklearn HuberRegressor and is "M" from
  A robust hybrid of lasso and ridge regression
  . Only valid when "loss" is "huber".
  
  Specified by:
  
  epsilon in interface LinearRegressionParams
  
  Returns:
  
  (undocumented)
- maxBlockSizeInMB
  Param for Maximum memory in MB for stacking input data into blocks. Data is stacked within partitions. If more than remaining data size in a partition then it is adjusted to the data size. Default 0.0 represents choosing optimal value, depends on specific algorithm. Must be >= 0..
  
  Specified by:
  
  maxBlockSizeInMB in interface HasMaxBlockSizeInMB
  
  Returns:
  
  (undocumented)
- aggregationDepth public final IntParam aggregationDepth
  ()
  
  Param for suggested depth for treeAggregate (>= 2).
  
  Specified by:
  
  aggregationDepth in interface HasAggregationDepth
  
  Returns:
  
  (undocumented)
- weightCol
  Param for weight column name. If this is not set or empty, we treat all instance weights as 1.0.
  
  Specified by:
  
  weightCol in interface HasWeightCol
  
  Returns:
  
  (undocumented)
- standardization
  Param for whether to standardize the training features before fitting the model.
  
  Specified by:
  
  standardization in interface HasStandardization
  
  Returns:
  
  (undocumented)
- fitIntercept
  Param for whether to fit an intercept term.
  
  Specified by:
  
  fitIntercept in interface HasFitIntercept
  
  Returns:
  
  (undocumented)
- tol Description copied from interface: HasTol
  Param for the convergence tolerance for iterative algorithms (>= 0).
  
  Specified by:
  
  tol in interface HasTol
  
  Returns:
  
  (undocumented)
- maxIter
  Param for maximum number of iterations (>= 0).
  
  Specified by:
  
  maxIter in interface HasMaxIter
  
  Returns:
  
  (undocumented)
- elasticNetParam
  Param for the ElasticNet mixing parameter, in range [0, 1]. For alpha = 0, the penalty is an L2 penalty. For alpha = 1, it is an L1 penalty.
  
  Specified by:
  
  elasticNetParam in interface HasElasticNetParam
  
  Returns:
  
  (undocumented)
- regParam
  Param for regularization parameter (>= 0).
  
  Specified by:
  
  regParam in interface HasRegParam
  
  Returns:
  
  (undocumented)
- uid
  An immutable unique ID for the object and its derivatives.
  
  Specified by:
  
  uid in interface Identifiable
  
  Returns:
  
  (undocumented)
- setRegParam
  
  Parameters:
  
  value - (undocumented)
  
  Returns:
  
  (undocumented)
- setFitIntercept
  
  Parameters:
  
  value - (undocumented)
  
  Returns:
  
  (undocumented)
- setStandardization
  
  Parameters:
  
  value - (undocumented)
  
  Returns:
  
  (undocumented)
  
  Note:
  
  With/without standardization, the models should be always converged to the same solution when no regularization is applied. In R's GLMNET package, the default behavior is true as well.
- setElasticNetParam
  Set the ElasticNet mixing parameter. For alpha = 0, the penalty is an L2 penalty. For alpha = 1, it is an L1 penalty. For alpha in (0,1), the penalty is a combination of L1 and L2. Default is 0.0 which is an L2 penalty.
  
  Note: Fitting with huber loss only supports None and L2 regularization, so throws exception if this param is non-zero value.
  
  Parameters:
  
  value - (undocumented)
  
  Returns:
  
  (undocumented)
- setMaxIter
  
  Parameters:
  
  value - (undocumented)
  
  Returns:
  
  (undocumented)
- setTol
  
  Parameters:
  
  value - (undocumented)
  
  Returns:
  
  (undocumented)
- setWeightCol
  
  Parameters:
  
  value - (undocumented)
  
  Returns:
  
  (undocumented)
- setSolver
  Set the solver algorithm used for optimization. In case of linear regression, this can be "l-bfgs", "normal" and "auto". - "l-bfgs" denotes Limited-memory BFGS which is a limited-memory quasi-Newton optimization method. - "normal" denotes using Normal Equation as an analytical solution to the linear regression problem. This solver is limited to
  LinearRegression.MAX_FEATURES_FOR_NORMAL_SOLVER
  . - "auto" (default) means that the solver algorithm is selected automatically. The Normal Equations solver will be used when possible, but this will automatically fall back to iterative optimization methods when needed.
  
  Note: Fitting with huber loss doesn't support normal solver, so throws exception if this param was set with "normal".
  
  Parameters:
  
  value - (undocumented)
  
  Returns:
  
  (undocumented)
- setAggregationDepth
  
  Parameters:
  
  value - (undocumented)
  
  Returns:
  
  (undocumented)
- setLoss
  Sets the value of param
  loss()
  . Default is "squaredError".
  
  Parameters:
  
  value - (undocumented)
  
  Returns:
  
  (undocumented)
- setEpsilon
  Sets the value of param
  epsilon()
  . Default is 1.35.
  
  Parameters:
  
  value - (undocumented)
  
  Returns:
  
  (undocumented)
- setMaxBlockSizeInMB
  
  Parameters:
  
  value - (undocumented)
  
  Returns:
  
  (undocumented)
- copy Description copied from interface: Params
  Creates a copy of this instance with the same UID and some extra params. Subclasses should implement this method and set the return type properly. See defaultCopy().
  
  Specified by:
  
  copy in interface Params
  
  Specified by:
  
  copy in class Predictor<Vector,LinearRegression,LinearRegressionModel>
  
  Parameters:
  
  extra - (undocumented)
  
  Returns:
  
  (undocumented)

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4