Memory leak issue while using Adam optimizer with MXNet Scala Bindings. Running the code below will keep consuming more and more memory till you run out.
Steps to Reproduce// Simple MLP network def mlpNetwork(): Symbol = { val input = Symbol.Variable("data") val label = Symbol.Variable("label") val fc1 = Symbol.FullyConnected(name = "fc1")()(Map("data" -> input, "num_hidden" -> 128)) val act1 = Symbol.Activation(name = "relu")()(Map("data" -> fc1, "act_type" -> "relu")) val fc2 = Symbol.FullyConnected(name = "fc2")()(Map("data" -> act1, "num_hidden" -> 1)) val loss = Symbol.LinearRegressionOutput(name="loss")()(Map("data" -> fc2, "label" -> label)) loss } def getNDArrayIter: NDArrayIter = { val f = NDArray.zeros(100, 20, 20) val l = NDArray.zeros(100, 1) val data = Array(f) val labels = Array(l) val batchSize = 10 val iter = new NDArrayIter(data, labels, batchSize) iter } val net = mlpNetwork() val iter = getNDArrayIter() val optimizer = new Adam(0.001f, 0.9f, 0.999f, 1e-8f, 1 - 1e-8f, 0f, 10f, null); val init = new Normal(0.01f); val model = FeedForward.newBuilder(modelSpec) .setContext(Array(Context.gpu(0))) .setInitializer(init) .setNumEpoch(100000) .setOptimizer(optimizer) .setTrainData(iter) .setEvalData(iter) .build();Issue
The issue is (I think) some temporary NDArrays are not getting disposed in Adam optimizer when using disposeDepsExcept
.
The places exactly where the memory leak occurs is in 3 locations where the method disposeDepsExcept
is used in Adam's update
method.
Replace all the 3 lines that use disposeDepsExcept
in update
method of Adam.scala
by explicitly disposing the temporary NDArrays that were created as shown below
Instead of the 3 following lines in Adam.scala
val meanT = (beta1t * mean + (1.0 - beta1t) * resdGrad) .disposeDepsExcept(mean, resdGrad) val varianceT = (beta2 * variance + (1.0f - beta2) * resdGrad * resdGrad) .disposeDepsExcept(variance, resdGrad) val step = (learningRate * meanT / (NDArray.sqrt(varianceT) + epsilon)) .disposeDepsExcept(meanT, varianceT)
Replace it by:
val beta1Mean = beta1 * mean val beta1ResGrad = (1.0 - beta1t) * resdGrad val meanT = beta1Mean + beta1ResGrad // dipose temp NDArrays betaMean.dispose() betaResGrad.dispose() val beta2Variance = beta2 * variance val beta2ResGrad = (1.0f - beta2) * resdGrad val beta2ResGradSquare = beta2ResGrad * resdGrad val varianceT = beta2Variance + beta2ResGradSquare // dipose temp NDArrays beta2Variance.dispose() beta2ResGrad.dispose() beta2ResGradSquare.dispose() val lrMeanT = learningRate * meanT val sqrtVar = NDArray.sqrt(varianceT) val sqrtVarPlusEpsilon = sqrtVar + epsilon val step = lrMeanT / sqrtVarPlusEpsilon // dipose temp NDArrays lrMeanT.dispose() sqrtVar.dispose() sqrtVarPlusEpsilon.dispose()
The above changes fixes things for now, but for some reason disposeDepsExcept
is not doing its job in this case.
----------Python Info----------
Version : 3.7.1
Compiler : GCC 7.3.0
Build : ('default', 'Dec 14 2018 19:28:38')
Arch : ('64bit', '')
------------Pip Info-----------
Version : 18.1
Directory : /home/satya/anaconda3/lib/python3.7/site-packages/pip
----------MXNet Info-----------
Version : 1.3.1
Directory : /home/satya/Documents/workspace/mxnet_1.3.x/python/mxnet
Hashtag not found. Not installed from pre-built package.
----------System Info----------
Platform : Linux-4.4.0-141-generic-x86_64-with-debian-stretch-sid
system : Linux
node : DS5
release : 4.4.0-141-generic
version : #167-Ubuntu SMP Wed Dec 5 10:40:15 UTC 2018
----------Network Test----------
Setting timeout: 10
Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0405 sec, LOAD: 0.6186 sec.
Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.1403 sec, LOAD: 0.4726 sec.
Timing for Gluon Tutorial(cn): https://zh.gluon.ai, DNS: 0.2418 sec, LOAD: 0.4049 sec.
Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.0445 sec, LOAD: 0.1894 sec.
Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0779 sec, LOAD: 0.2447 sec.
Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0409 sec, LOAD: 0.0746 sec.
Package used (Python/R/Scala/Julia): Scala
For Scala user, please provide:
Compiler (gcc/clang/mingw/visual studio): gcc
MXNet commit hash: 96b4b6e
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4