内容简介

  强化学习(RL)的新发展结合深度学习(DL),在训练代理以类似人的方式解决复杂问题方面取得了未有的进步。Google使用算法在著名的Atari街机游戏中获胜将该领域推至高峰,研究人员也在源源不断地产生新的想法。
  《深度强化学习实践(影印版英文版)》介绍了RL的基础知识,为你提供了编写智能学习代理所需的原理,以承担一系列艰巨的实际任务。让你了解如何在“网格世界”环境中实现Q-learning,教你的代理购买和交易股票,发现自然语言模型如何推动了聊天机器人的火爆。

作者简介

  MaximLapan,isadeeplearningenthusiastandindependentresearcher.Hisbackgroundand15years'workexpertiseasasoftwaredeveloperandasystemsarchitectlaysfromlow-levelLinuxkerneldriverdevelopmenttoperformanceoptimizationanddesignofdistributedapplicationsworkingonthousandsofservers.Withvastworkexperiencesinbigdata,MachineLearning,andlargeparalleldistributedHPCandnonHPCsystems,hehasatalenttoexplainagistofcomplicatedthingsinsimplewordsandvividexamples.HiscurrentareasofinterestlieinpracticalapplicationsofDeepLearning,suchasDeepNaturalLanguageProcessingandDeepReinforcementLearning.
  MaximlivesinMoscow,RussianFederation,withhisfamily,andheworksforanIsraelistart-upasaSeniorNLPdeveloper.

目录

Preface
Chapter1:WhatisReinforcementLearning?
Learning-supervised,unsupervised,andreinforcement
RLformalismsandrelations
Reward
Theagent
Theenvironment
Actions
Observations
Markovdecisionprocesses
Markovprocess
Markovrewardprocess
Markovdecisionprocess
Summary

Chapter2:OpenAIGym
Theanatomyoftheagent
Hardwareandsoftwarerequirements
OpenAIGymAPI
Actionspace
Observationspace
Theenvironment
Creationoftheenvironment
TheCartPolesession
TherandomCartPoleagent
TheextraGymfunctionality-wrappersandmonitors
Wrappers
Monitor
Summary

Chapter3:DeepLearningwithPyTorch
Tensors
Creationoftensors
Scalartensors
Tensoroperations
GPUtensors
Gradients
Tensorsandgradients
NNbuildingblocks
Customlayers
Finalglue-lossfunctionsandoptimizers
Lossfunctions
Optimizers
MonitoringwithTensorBoard
TensorBoard101
Plottingstuff
Example-GANonAtariimages
Summary

Chapter4:TheCross-EntropyMethod
TaxonomyofRLmethods
Practicalcross-entropy
Cross-entropyonCartPole
Cross-entropyonFrozenLake
Theoreticalbackgroundofthecross-entropymethod
Summary

Chapter5:TabularLearningandtheBellmanEquation
Value,state,andoptimality
TheBellmanequationofoptimality
Valueofaction
Thevalueiterationmethod
Valueiterationinpractice
Q-learningforFrozenLake
Summary

Chapter6:DeepQ-Networks
Chapter7:DQNExtensions
Chapter8:StocksTradingUsingRL
Chapter9:PolicyGradients-AnAlternative
Chapter10:TheActor-CriticMethod
Chapter11:AsynchronousAdvantaqeActor-Critic
Chapter12:ChatbotsTrainingwithRL
Chapter13:WebNavigation
Chapter14:ContinuousActionSpace
Chapter15:TrustRegions-TRPO,PPO,andACKTR
Chapter16:Black-BoxOptimizationinRL
Chapter17:BeyondModel-Free-Imagination
Chapter18:AlphaGoZero
OtherBooksYouMayEnjoy
Index

其他推荐