《深度强化学习实践（影印版英文版）》

书海网短评：
　　强化学习（RL）的新发展结合深度学习（DL），在训练代理以类似人的方式解决复杂问题方面取得了未有的进步。Google使用算法在著名的Atari街机游戏中获胜将该领域推至高峰，研究人员也在源源不断地产生新的想法。　　《深

内容简介

　　强化学习（RL）的新发展结合深度学习（DL），在训练代理以类似人的方式解决复杂问题方面取得了未有的进步。Google使用算法在著名的Atari街机游戏中获胜将该领域推至高峰，研究人员也在源源不断地产生新的想法。
　　《深度强化学习实践（影印版英文版）》介绍了RL的基础知识，为你提供了编写智能学习代理所需的原理，以承担一系列艰巨的实际任务。让你了解如何在“网格世界”环境中实现Q-learning，教你的代理购买和交易股票，发现自然语言模型如何推动了聊天机器人的火爆。

作者简介

　　MaximLapan，isadeeplearningenthusiastandindependentresearcher.Hisbackgroundand15years'workexpertiseasasoftwaredeveloperandasystemsarchitectlaysfromlow-levelLinuxkerneldriverdevelopmenttoperformanceoptimizationanddesignofdistributedapplicationsworkingonthousandsofservers.Withvastworkexperiencesinbigdata，MachineLearning，andlargeparalleldistributedHPCandnonHPCsystems，hehasatalenttoexplainagistofcomplicatedthingsinsimplewordsandvividexamples.HiscurrentareasofinterestlieinpracticalapplicationsofDeepLearning，suchasDeepNaturalLanguageProcessingandDeepReinforcementLearning.
　　MaximlivesinMoscow，RussianFederation，withhisfamily，andheworksforanIsraelistart-upasaSeniorNLPdeveloper.

Preface
Chapter1：WhatisReinforcementLearning?
Learning-supervised，unsupervised，andreinforcement
RLformalismsandrelations
Reward
Theagent
Theenvironment
Actions
Observations
Markovdecisionprocesses
Markovprocess
Markovrewardprocess
Markovdecisionprocess
Summary

Chapter2：OpenAIGym
Theanatomyoftheagent
Hardwareandsoftwarerequirements
OpenAIGymAPI
Actionspace
Observationspace
Theenvironment
Creationoftheenvironment
TheCartPolesession
TherandomCartPoleagent
TheextraGymfunctionality-wrappersandmonitors
Wrappers
Monitor
Summary

Chapter3：DeepLearningwithPyTorch
Tensors
Creationoftensors
Scalartensors
Tensoroperations
GPUtensors
Gradients
Tensorsandgradients
NNbuildingblocks
Customlayers
Finalglue-lossfunctionsandoptimizers
Lossfunctions
Optimizers
MonitoringwithTensorBoard
TensorBoard101
Plottingstuff
Example-GANonAtariimages
Summary

Chapter4：TheCross-EntropyMethod
TaxonomyofRLmethods
Practicalcross-entropy
Cross-entropyonCartPole
Cross-entropyonFrozenLake
Theoreticalbackgroundofthecross-entropymethod
Summary

Chapter5：TabularLearningandtheBellmanEquation
Value，state，andoptimality
TheBellmanequationofoptimality
Valueofaction
Thevalueiterationmethod
Valueiterationinpractice
Q-learningforFrozenLake
Summary

Chapter6：DeepQ-Networks
Chapter7：DQNExtensions
Chapter8：StocksTradingUsingRL
Chapter9：PolicyGradients-AnAlternative
Chapter10：TheActor-CriticMethod
Chapter11：AsynchronousAdvantaqeActor-Critic
Chapter12：ChatbotsTrainingwithRL
Chapter13：WebNavigation
Chapter14：ContinuousActionSpace
Chapter15：TrustRegions-TRPO，PPO，andACKTR
Chapter16：Black-BoxOptimizationinRL
Chapter17：BeyondModel-Free-Imagination
Chapter18：AlphaGoZero
OtherBooksYouMayEnjoy
Index