RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://doi.org/10.1007/BF00992697 below:

Practical issues in temporal difference learning

References

Anderson, C.W. (1987). Strategy learning with multilayer connectionist representations.Proceedings of the Fourth International Workshop on Machine Learning (pp. 103–114).
Barto, A.G., Sutton, R.S., & Anderson, C.W. (1983). Neuronlike adaptive elements that can solve difficult learning control problems.IEEE Transactions on Systems, Man and Cybernetics, 13 835–846.

Google Scholar
Berliner, H. (1977). Experiences in evaluation with BKG—a program that plays backgammon.Proceedings of IJCAI (pp. 428–433).
Berliner, H. (1979). On the construction of evaluation functions for large domains.Proceedings of IJCAI (pp. 53–55).
Blumer, A., Ehrenfeucht, A., Haussler, D., & Warmuth, M. (1989). Learnability and the Vapnik-Chervonenkis dimension.JACM, 36 929–965.

Google Scholar
Christensen, J. & Korf, R. (1986). A unified theory of heuristic evaluation functions and its application to learning.Proceeding of AAAI-86 (pp. 148–152).
Dayan, P. (1992). The convergence of TD(λ).Machine Learning, 8 341–362.

Google Scholar
Frey, P.W. (1986). Algorithmic strategies for improving the performance of game playing programs. In: D. Farmer, et al. (Eds.),Evolution, games and learning. Amsterdam: North Holland.

Google Scholar
Griffith, A.K. (1974). A comparison and evaluation of three machine learning procedures as applied to the game of checkers.Artificial Intelligence, 5 137–148.

Google Scholar
Holland, J.H. (1986). Escaping brittleness: The possibilities of general-purpose learning algorithms applied to parallel rule-based systems. In: R.S. Michalski, J.G. Carbonell & T.M. Mitchell, (Eds.),Machine learning: An artificial intelligence approach (Vol. 2). Los Altos, CA: Morgan Kaufmann.

Google Scholar
Hornik, K., Stinchcombe, M., & White, H. (1989). Multilayer feedforward networks are universal approximators.Neural Networks, 2 359–366.

Google Scholar
Lee, K.-F. & Majahan, S. (1988). A pattern classification approach to evaluation function learning.Artificial Intelligence, 36 1–25.

Google Scholar
Magriel, P. (1976).Backgammon. New York: Times Books.

Google Scholar
Minsky, M.L. & Papert, S.A. (1969).Perceptrons. Cambridge, MA: MIT Press. (Republished as an expanded edition in 1988).

Google Scholar
Mitchell, D.H. (1984). Using features to evaluate positions in experts' and novices' Othello games. Master's Thesis, Northwestern Univ., Evanston, IL.

Google Scholar
Quinlan, J.R. (1983). Learning efficient classification procedures and their application to chess end games. In: R.S. Michalski, J.G. Carbonell & T.M. Mitchell (Eds.),Machine learning. Palo Alto, CA: Tioga.

Google Scholar
Robbins, H. & Monro, S. (1951). A stochastic approximation method.Annals of Mathematical Statistics, 22 400–407.

Google Scholar
Rumelhart, D.E., Hinton, G.E., & Williams, R.J. (1986). Learning internal representations by error propagation. In: D. Rumelhart & J. McClelland, (Eds.),Parallel distributed processing. Vol. 1. Cambridge, MA: MIT Press.

Google Scholar
Samuel, A. (1959). Some studies in machine learning using the game of checkers.IBM J. of Research and Development, 3 210–229.

Google Scholar
Samuel, A. (1967). Some studies in machine learning using the game of checkers, II—recent progress.IBM J. of Research and Development, 11 601–617.

Google Scholar
Sutton, R.S. (1984). Temporal credit assignment in reinforcement learning. Doctoral Dissertation, Dept. of Computer and Information Science, Univ. of Massachusetts, Amherst.

Google Scholar
Sutton, R.S. (1988). Learning to predict by the methods of temporal differences.Machine Learning, 3 9–44.

Google Scholar
Tesauro, G. & Sejnowski, T.J. (1989). A parallel network that learns to play backgammon.Artificial Intelligence, 39 357–390.

Google Scholar
Tesauro, G. (1989). Connectionist learning of expert preferences by comparison training. In D. Touretzky (Ed.),Advances in neural information processing, 1 99–106.

Google Scholar
Tesauro, G. (1990). Neurogammon: a neural network backgammon program.IJCNN Proceedings III, 33–39.

Google Scholar
Utgoff, P.E. & Clouse, J.A. (1991). Two kinds of training information for evaluation function training. To appear in:Proceedings of AAAI-91.
Vapnik, V.N. & Chervonenkis (1971). On the uniform convergence of relative frequencies of events to their probabilities.Theory Prob. Appl., 16 264–280.

Google Scholar
Widrow, B., et al. (1976). Stationary and nonstationary learning characteristics of the LMS adaptive filter.Proceedings of the IEEE, 64 1151–1162.

Google Scholar
Zadeh, N. & Kobliska, G. (1977). On optimal doubling in backgammon.Management Science, 23 853–858.

Google Scholar

Download references

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4