Python强化练习之Tensorflow2 opp算法实现月球登陆器

概述

从今天开始我们会开启一个新的篇章, 带领大家来一起学习 (卷进) 强化学习 (Reinforcement Learning). 强化学习基于环境, 分析数据采取行动, 从而最大化未来收益.

Python强化练习之Tensorflow2 opp算法实现月球登陆器_第1张图片

强化学习算法种类

Python强化练习之Tensorflow2 opp算法实现月球登陆器_第2张图片

On-policy vs Off-policy:

On-policy: 训练数据由当前 agent 不断与环境交互得到Off-policy: 训练的 agent 和与环境交互的 agent 不是同一个 agent, 即别人与环境交互为我提供训练数据

PPO 算法

PPO (Proximal Policy Optimization) 即近端策略优化. PPO 是一种 on-policy 算法, 通过实现小批量更新, 解决了训练过程中新旧策略的变化差异过大导致不易学习的问题.

Python强化练习之Tensorflow2 opp算法实现月球登陆器_第3张图片

Actor-Critic 算法

Actor-Critic 算法共分为两部分. 第一部分为策略函数 Actor, 负责生成动作并与环境交互; 第二部分为价值函数, 负责评估 Actor 的表现.

Python强化练习之Tensorflow2 opp算法实现月球登陆器_第4张图片

Gym

Gym 是一个强化学习会经常用到的包. Gym 里收集了很多游戏的环境. 下面我们就会用 LunarLander-v2 来实现一个自动版的 “阿波罗登月”.

Python强化练习之Tensorflow2 opp算法实现月球登陆器_第5张图片

安装:

pip install gym

如果遇到报错:

AttributeError: module 'gym.envs.box2d' has no attribute 'LunarLander'

解决办法:

pip install gym[box2d]

LunarLander-v2

LunarLander-v2 是一个月球登陆器. 着陆平台位于坐标 (0, 0). 坐标是状态向量的前两个数字, 从屏幕顶部移动到着陆台和零速度的奖励大约是 100 到 140分. 如果着陆器坠毁或停止, 则回合结束, 获得额外的 -100 或 +100点. 每脚接地为 +10, 点火主机每帧 -0.3分, 正解为200分.

Python强化练习之Tensorflow2 opp算法实现月球登陆器_第6张图片

启动登陆器

代码:

import gym# 创建环境env = gym.make("LunarLander-v2")# 重置环境env.reset()# 启动for i in range(180):    # 渲染环境    env.render()    # 随机移动    observation, reward, done, info = env.step(env.action_space.sample())    if i % 10 == 0:        # 调试输出        print("观察:", observation)        print("得分:", reward)

输出结果:

观察: [ 0.00861025  1.4061487   0.42930993 -0.11858992 -0.00789343 -0.05729095  0.          0.        ]得分: 0.4097546298543773观察: [ 0.04917412  1.3876126   0.41002613 -0.13066985 -0.06578191 -0.12604967  0.          0.        ]得分: -1.0858669952763478观察: [ 0.08917055  1.3429415   0.43598312 -0.2890789  -0.17471936 -0.23913136  0.          0.        ]得分: -2.9339827504803666观察: [ 0.1326253   1.2450166   0.44708318 -0.5567949  -0.32039645 -0.28250334  0.          0.        ]得分: -2.2779730990326357观察: [ 0.18323365  1.1110108   0.615291   -0.61922276 -0.43743232 -0.2921057  0.          0.        ]得分: -3.107298313736037观察: [ 0.24544087  0.94960684  0.66677517 -0.7835077  -0.5929364  -0.2968613  0.          0.        ]得分: -0.5472611013563438观察: [ 0.3148238   0.75122666  0.7238519  -0.98458177 -0.72915816 -0.26130882  0.          0.        ]得分: -2.5665300894414416观察: [ 0.38628978  0.49828076  0.74157137 -1.2624744  -0.85754734 -0.37227553  0.          0.        ]得分: -3.2562193227533087观察: [ 0.46820658  0.18855602  0.92624503 -1.4677961  -1.08614    -0.4508995  0.          0.        ]得分: -4.017106927961208观察: [ 0.57930076 -0.09440845  1.4345247  -0.693939   -2.0783656  -5.4039164  1.          0.        ]得分: -100观察: [ 0.7383894  -0.08930686  1.4662493  -0.13461255 -3.653495   -3.109081  0.          0.        ]得分: -100观察: [ 0.859124   -0.08471288  0.9377837   0.21408719 -3.8998525   0.10151418  0.          0.        ]得分: -100观察: [ 9.3801367e-01 -4.6761338e-02  6.5999150e-01  1.4583524e-01 -3.9281998e+00 -4.7179851e-06  0.0000000e+00  1.0000000e+00]得分: -100观察: [ 0.9879366  -0.04012476  0.33624884  0.08859511 -4.253908   -1.0233303  0.          0.        ]得分: -100观察: [ 1.0056045  -0.03840658  0.0733737   0.01812508 -4.6796274  -0.6103991  0.          0.        ]得分: -100观察: [ 1.0112988  -0.03921754  0.07890484 -0.00624387 -4.845023   -0.17111658  0.          0.        ]得分: -100观察: [ 1.0234139  -0.04488504  0.15701209 -0.0331554  -4.829875    0.07602684  0.          0.        ]得分: -100观察: [ 1.0306002e+00 -4.8987642e-02 -1.1189224e-02  8.7506004e-04 -4.8712435e+00 -1.5446089e-01  0.0000000e+00  0.0000000e+00]得分: -100

PPO 算法实现月球登录器

PPO

import numpy as npimport tensorflow as tffrom tensorflow_probability.python.distributions import Categoricalclass Memory:    def __init__(self):        """初始化"""        self.actions = []  # 行动(共4种)        self.states = []  # 状态, 由8个数字组成        self.logprobs = []  # 概率        self.rewards = []  # 奖励        self.is_terminals = []  # 游戏是否结束    def clear_memory(self):        """清除memory"""        del self.actions[:]        del self.states[:]        del self.logprobs[:]        del self.rewards[:]        del self.is_terminals[:]class ActorCritic(tf.keras.Model):    def __init__(self, state_dim, action_dim, n_latent_var):        super(ActorCritic, self).__init__()        # 行动        self.action_layer = tf.keras.Sequential([            # [b, 8] => [b, 64]            tf.keras.layers.Dense(n_latent_var, activation="tanh"),            # [b, 64] => [b, 64]            tf.keras.layers.Dense(n_latent_var, activation="tanh"),            # [b, 64] => [b, 4]            tf.keras.layers.Dense(action_dim, activation="softmax")        ])        # 评判        self.value_layer = tf.keras.Sequential([            # [b, 8] => [b, 64]            tf.keras.layers.Dense(n_latent_var, activation="tanh"),            # [b, 64] => [b, 64]            tf.keras.layers.Dense(n_latent_var, activation="tanh"),            # [b, 64] => [b, 1]            tf.keras.layers.Dense(1)        ])    def forward(self):        """前向传播, 由act替代"""        raise NotImplementedError    def build(self, input_shape):        # No weight to train.        super(ActorCritic, self).build(input_shape)  # Be sure to call this at the end    def act(self, state, memory):        """计算行动"""        # 计算4个方向概率        action_probs = self.action_layer(state)        # 通过最大概率计算最终行动方向        dist = Categorical(action_probs)        action = dist.sample()        # 存入memory        memory.states.append(state)        memory.actions.append(action)        memory.logprobs.append(dist.log_prob(action))        # 返回行动        return action.numpy()[0]    def evaluate(self, state, action):        """        评估        :param state: 状态, 2000个一组, 形状为 [2000, 8]        :param action: 行动, 2000个一组, 形状为 [2000]        :return:        """        # 计算行动概率        action_probs = self.action_layer(state)        dist = Categorical(action_probs)  # 转换成类别分布        # 计算概率密度, log(概率)        action_logprobs = dist.log_prob(action)        # 计算熵        dist_entropy = dist.entropy()        dist_entropy = tf.squeeze(dist_entropy)        # 评判        state_value = self.value_layer(state)        state_value = tf.squeeze(state_value)  # [2000, 1] => [2000]        # 返回行动概率密度, 评判值, 行动概率熵        return action_logprobs, state_value, dist_entropyclass PPO:    def __init__(self, state_dim, action_dim, n_latent_var, lr, betas, gamma, K_epochs, eps_clip):        self.lr = lr  # 学习率        self.betas = betas  # betas        self.gamma = gamma  # gamma        self.eps_clip = eps_clip  # 裁剪, 限制值范围        self.K_epochs = K_epochs  # 迭代次数        # 初始化policy        self.policy = ActorCritic(state_dim, action_dim, n_latent_var)        self.policy_old = ActorCritic(state_dim, action_dim, n_latent_var)        self.optimizer = tf.keras.optimizers.Adam(lr=lr)  # 优化器        self.MseLoss = tf.keras.losses.MeanSquaredError()  # 损失函数    def update(self, memory):        """更新梯度"""        # 蒙特卡罗预测状态回报        rewards = []        discounted_reward = 0        for reward, is_terminal in zip(reversed(memory.rewards), reversed(memory.is_terminals)):            # 回合结束            if is_terminal:                discounted_reward = 0            # 更新削减奖励(当前状态奖励 + 0.99*上一状态奖励            discounted_reward = reward + (self.gamma * discounted_reward)            # 首插入            rewards.insert(0, discounted_reward)        # 标准化奖励        rewards = tf.convert_to_tensor(rewards, dtype=tf.float32)        rewards = (rewards - np.mean(rewards)) / (np.std(rewards) + 1e-5)        # 张量转换        old_states = tf.stack(memory.states)        old_actions = tf.stack(memory.actions)        old_logprobs = tf.stack(memory.logprobs)        # 迭代优化 K 次:        for _ in range(self.K_epochs):            with tf.GradientTape() as tape:                # 评估                logprobs, state_values, dist_entropy = self.policy.evaluate(old_states, old_actions)                # 计算ratios                ratios = tf.exp(logprobs - old_logprobs)                ratios = tf.squeeze(ratios)                # 计算损失                advantages = rewards - state_values                surr1 = ratios * advantages                surr2 = tf.clip_by_value(ratios, 1 - self.eps_clip, 1 + self.eps_clip) * advantages                loss = -tf.minimum(surr1, surr2) + 0.5 * self.MseLoss(state_values, rewards) - 0.01 * dist_entropy            # 更新梯度            grads = tape.gradient(loss, self.policy.action_layer.trainable_variables + self.policy.value_layer.trainable_variables)            self.optimizer.apply_gradients(zip(grads, self.policy.action_layer.trainable_variables + self.policy.value_layer.trainable_variables))        # 将新的权重赋值给旧policy        self.policy_old.action_layer = self.policy.action_layer        self.policy_old.value_layer = self.policy.value_layer

main

import gymimport tensorflow as tffrom PPO import Memory, PPO############## 超参数 ##############env_name = "LunarLander-v2"  # 游戏名字env = gym.make(env_name)state_dim = 8  # 状态维度action_dim = 4  # 行动维度render = False  # 可视化solved_reward = 230  # 停止循环条件 (奖励 > 230)log_interval = 20  # print avg reward in the intervalmax_episodes = 50000  # 最大迭代次数max_timesteps = 300  # 最大单次游戏步数n_latent_var = 64  # 全连接隐层维度update_timestep = 2000  # 每2000步policy更新一次lr = 0.002  # 学习率betas = (0.9, 0.999)  # betasgamma = 0.99  # gammaK_epochs = 4  # policy迭代更新次数eps_clip = 0.2  # PPO 限幅#############################################def main():    # 实例化    memory = Memory()    ppo = PPO(state_dim, action_dim, n_latent_var, lr, betas, gamma, K_epochs, eps_clip)    # 存放    total_reward = 0    total_length = 0    timestep = 0    # 训练    for i_episode in range(1, max_episodes + 1):        # 环境初始化        state = env.reset()  # 初始化(重新玩)        # 转换成tensor        state = tf.convert_to_tensor(state)        state = tf.reshape(state, [1, 8])        # 迭代        for t in range(max_timesteps):            timestep += 1            # 用旧policy得到行动            action = ppo.policy_old.act(state, memory)            # 行动            state, reward, done, _ = env.step(action)  # 得到(新的状态,奖励,是否终止,额外的调试信息)            # 转换成tensor            state = tf.convert_to_tensor(state)            state = tf.reshape(state, [1, 8])            # 更新memory(奖励/游戏是否结束)            memory.rewards.append(reward)            memory.is_terminals.append(done)            # 更新梯度            if timestep % update_timestep == 0:                ppo.update(memory)                # memory清零                memory.clear_memory()                # 累计步数清零                timestep = 0            # 累加            total_reward += reward            # 可视化            if render:                env.render()            # 如果游戏结束, 退出            if done:                break        # 游戏步长        total_length += t        # 如果达到要求(230分), 退出循环        if total_reward >= (log_interval * solved_reward):            print("########## Solved! ##########")            # 保存模型            tf.keras.models.save_model(ppo.policy.action_layer, r"\model\action")            tf.keras.models.save_model(ppo.policy.value_layer, r"\model\value")            # 退出循环            break        # 输出log, 每20次迭代        if i_episode % log_interval == 0:            # 求20次迭代平均时长/收益            avg_length = int(total_length / log_interval)            running_reward = int(total_reward / log_interval)            # 调试输出            print('Episode {} \t avg length: {} \t average_reward: {}'.format(i_episode, avg_length, running_reward))            # 清零            total_reward = 0            total_length = 0if __name__ == '__main__':    main()

输出结果

Episode 20  avg length: 93  reward: -243Episode 40  avg length: 92  reward: -172Episode 60  avg length: 79  reward: -192Episode 80  avg length: 85  reward: -164Episode 100  avg length: 90  reward: -179Episode 120  avg length: 100  reward: -201Episode 140  avg length: 91  reward: -175Episode 160  avg length: 101  reward: -141Episode 180  avg length: 86  reward: -153Episode 200  avg length: 93  reward: -189Episode 220  avg length: 96  reward: -221Episode 240  avg length: 105  reward: -140Episode 260  avg length: 94  reward: -121Episode 280  avg length: 91  reward: -131Episode 300  avg length: 91  reward: -122Episode 320  avg length: 90  reward: -113Episode 340  avg length: 100  reward: -110Episode 360  avg length: 110  reward: -92Episode 380  avg length: 110  reward: -75Episode 400  avg length: 119  reward: -76Episode 420  avg length: 162  reward: -77Episode 440  avg length: 194  reward: -91Episode 460  avg length: 144  reward: -28Episode 480  avg length: 192  reward: -8Episode 500  avg length: 244  reward: -25Episode 520  avg length: 239  reward: -1Episode 540  avg length: 269  reward: 21Episode 560  avg length: 289  reward: 27Episode 580  avg length: 270  reward: 65Episode 600  avg length: 264  reward: 86Episode 620  avg length: 256  reward: 66Episode 640  avg length: 278  reward: 75Episode 660  avg length: 235  reward: 11Episode 680  avg length: 244  reward: 84Episode 700  avg length: 253  reward: 73Episode 720  avg length: 292  reward: 63Episode 740  avg length: 293  reward: 104Episode 760  avg length: 279  reward: 109Episode 780  avg length: 246  reward: 86Episode 800  avg length: 260  reward: 124Episode 820  avg length: 276  reward: 131Episode 840  avg length: 269  reward: 121Episode 860  avg length: 194  reward: 67Episode 880  avg length: 241  reward: 94Episode 900  avg length: 259  reward: 98Episode 920  avg length: 211  reward: 83Episode 940  avg length: 260  reward: 105Episode 960  avg length: 194  reward: 65Episode 980  avg length: 202  reward: 68Episode 1000  avg length: 243  reward: 79Episode 1020  avg length: 260  reward: 66Episode 1040  avg length: 289  reward: 117Episode 1060  avg length: 252  reward: 94Episode 1080  avg length: 262  reward: 114Episode 1100  avg length: 272  reward: 112Episode 1120  avg length: 263  reward: 97Episode 1140  avg length: 256  reward: 93Episode 1160  avg length: 274  reward: 120Episode 1180  avg length: 256  reward: 117Episode 1200  avg length: 241  reward: 105Episode 1220  avg length: 238  reward: 103Episode 1240  avg length: 267  reward: 121Episode 1260  avg length: 283  reward: 124Episode 1280  avg length: 299  reward: 149Episode 1300  avg length: 281  reward: 126Episode 1320  avg length: 266  reward: 102Episode 1340  avg length: 282  reward: 128Episode 1360  avg length: 275  reward: 114Episode 1380  avg length: 285  reward: 105Episode 1400  avg length: 294  reward: 123Episode 1420  avg length: 293  reward: 132Episode 1440  avg length: 248  reward: 85Episode 1460  avg length: 281  reward: 115Episode 1480  avg length: 291  reward: 152Episode 1500  avg length: 279  reward: 130Episode 1520  avg length: 267  reward: 103Episode 1540  avg length: 270  reward: 137Episode 1560  avg length: 269  reward: 120Episode 1580  avg length: 260  reward: 113Episode 1600  avg length: 282  reward: 147Episode 1620  avg length: 259  reward: 125Episode 1640  avg length: 240  reward: 90Episode 1660  avg length: 284  reward: 125Episode 1680  avg length: 282  reward: 123Episode 1700  avg length: 274  reward: 123Episode 1720  avg length: 273  reward: 130Episode 1740  avg length: 260  reward: 117Episode 1760  avg length: 243  reward: 106Episode 1780  avg length: 241  reward: 90Episode 1800  avg length: 290  reward: 144Episode 1820  avg length: 258  reward: 131Episode 1840  avg length: 283  reward: 142Episode 1860  avg length: 262  reward: 100Episode 1880  avg length: 273  reward: 132Episode 1900  avg length: 255  reward: 92Episode 1920  avg length: 251  reward: 117Episode 1940  avg length: 220  reward: 103Episode 1960  avg length: 221  reward: 111Episode 1980  avg length: 205  reward: 83Episode 2000  avg length: 227  reward: 102Episode 2020  avg length: 251  reward: 123Episode 2040  avg length: 227  reward: 100Episode 2060  avg length: 255  reward: 135Episode 2080  avg length: 273  reward: 136Episode 2100  avg length: 256  reward: 126Episode 2120  avg length: 273  reward: 141Episode 2140  avg length: 280  reward: 109Episode 2160  avg length: 266  reward: 112Episode 2180  avg length: 249  reward: 88Episode 2200  avg length: 247  reward: 119Episode 2220  avg length: 270  reward: 143Episode 2240  avg length: 257  reward: 65Episode 2260  avg length: 250  reward: 30Episode 2280  avg length: 261  reward: 112Episode 2300  avg length: 270  reward: 139Episode 2320  avg length: 275  reward: 128Episode 2340  avg length: 290  reward: 149Episode 2360  avg length: 269  reward: 139Episode 2380  avg length: 272  reward: 137Episode 2400  avg length: 232  reward: 105Episode 2420  avg length: 242  reward: 127Episode 2440  avg length: 241  reward: 134Episode 2460  avg length: 249  reward: 113Episode 2480  avg length: 287  reward: 154Episode 2500  avg length: 289  reward: 149Episode 2520  avg length: 258  reward: 129Episode 2540  avg length: 250  reward: 101Episode 2560  avg length: 287  reward: 158Episode 2580  avg length: 271  reward: 145Episode 2600  avg length: 253  reward: 120Episode 2620  avg length: 255  reward: 127Episode 2640  avg length: 254  reward: 122Episode 2660  avg length: 238  reward: 123Episode 2680  avg length: 243  reward: 115Episode 2700  avg length: 241  reward: 93Episode 2720  avg length: 232  reward: 90Episode 2740  avg length: 215  reward: 83Episode 2760  avg length: 241  reward: 112Episode 2780  avg length: 273  reward: 129Episode 2800  avg length: 269  reward: 133Episode 2820  avg length: 246  reward: 91Episode 2840  avg length: 261  reward: 130Episode 2860  avg length: 261  reward: 136Episode 2880  avg length: 289  reward: 128Episode 2900  avg length: 271  reward: 131Episode 2920  avg length: 277  reward: 145Episode 2940  avg length: 251  reward: 117Episode 2960  avg length: 253  reward: 120Episode 2980  avg length: 270  reward: 133Episode 3000  avg length: 240  reward: 85Episode 3020  avg length: 284  reward: 141Episode 3040  avg length: 255  reward: 117Episode 3060  avg length: 299  reward: 134Episode 3080  avg length: 263  reward: 122Episode 3100  avg length: 259  reward: 126Episode 3120  avg length: 270  reward: 125Episode 3140  avg length: 299  reward: 150Episode 3160  avg length: 256  reward: 116Episode 3180  avg length: 264  reward: 124Episode 3200  avg length: 271  reward: 128Episode 3220  avg length: 259  reward: 122Episode 3240  avg length: 261  reward: 125Episode 3260  avg length: 271  reward: 129Episode 3280  avg length: 242  reward: 126Episode 3300  avg length: 218  reward: 93Episode 3320  avg length: 230  reward: 116Episode 3340  avg length: 223  reward: 109Episode 3360  avg length: 249  reward: 122Episode 3380  avg length: 224  reward: 104Episode 3400  avg length: 261  reward: 131Episode 3420  avg length: 280  reward: 140Episode 3440  avg length: 264  reward: 125Episode 3460  avg length: 247  reward: 105Episode 3480  avg length: 276  reward: 141Episode 3500  avg length: 282  reward: 149Episode 3520  avg length: 282  reward: 141Episode 3540  avg length: 290  reward: 152Episode 3560  avg length: 282  reward: 141Episode 3580  avg length: 291  reward: 151Episode 3600  avg length: 289  reward: 166Episode 3620  avg length: 266  reward: 142Episode 3640  avg length: 277  reward: 91Episode 3660  avg length: 272  reward: 114Episode 3680  avg length: 281  reward: 159Episode 3700  avg length: 287  reward: 160Episode 3720  avg length: 254  reward: 78Episode 3740  avg length: 296  reward: 174Episode 3760  avg length: 267  reward: 124Episode 3780  avg length: 273  reward: 148Episode 3800  avg length: 275  reward: 147Episode 3820  avg length: 276  reward: 145Episode 3840  avg length: 283  reward: 151Episode 3860  avg length: 275  reward: 142Episode 3880  avg length: 290  reward: 142Episode 3900  avg length: 290  reward: 154Episode 3920  avg length: 283  reward: 141Episode 3940  avg length: 273  reward: 145Episode 3960  avg length: 290  reward: 161Episode 3980  avg length: 268  reward: 145Episode 4000  avg length: 270  reward: 142Episode 4020  avg length: 283  reward: 156Episode 4040  avg length: 283  reward: 149Episode 4060  avg length: 299  reward: 172Episode 4080  avg length: 292  reward: 158Episode 4100  avg length: 274  reward: 143Episode 4120  avg length: 299  reward: 163Episode 4140  avg length: 290  reward: 153Episode 4160  avg length: 299  reward: 165Episode 4180  avg length: 290  reward: 160Episode 4200  avg length: 299  reward: 157Episode 4220  avg length: 299  reward: 171Episode 4240  avg length: 271  reward: 148Episode 4260  avg length: 265  reward: 139Episode 4280  avg length: 258  reward: 137Episode 4300  avg length: 280  reward: 137Episode 4320  avg length: 262  reward: 133Episode 4340  avg length: 255  reward: 110Episode 4360  avg length: 275  reward: 134Episode 4380  avg length: 282  reward: 154Episode 4400  avg length: 264  reward: 128Episode 4420  avg length: 299  reward: 150Episode 4440  avg length: 275  reward: 151Episode 4460  avg length: 257  reward: 116Episode 4480  avg length: 256  reward: 104Episode 4500  avg length: 263  reward: 134Episode 4520  avg length: 299  reward: 164Episode 4540  avg length: 265  reward: 137Episode 4560  avg length: 265  reward: 147Episode 4580  avg length: 283  reward: 138Episode 4600  avg length: 299  reward: 152Episode 4620  avg length: 281  reward: 154Episode 4640  avg length: 289  reward: 161Episode 4660  avg length: 264  reward: 143Episode 4680  avg length: 285  reward: 138Episode 4700  avg length: 291  reward: 143Episode 4720  avg length: 280  reward: 154Episode 4740  avg length: 284  reward: 125Episode 4760  avg length: 296  reward: 136Episode 4780  avg length: 254  reward: 127Episode 4800  avg length: 281  reward: 147Episode 4820  avg length: 282  reward: 143Episode 4840  avg length: 243  reward: 119Episode 4860  avg length: 280  reward: 139Episode 4880  avg length: 270  reward: 137Episode 4900  avg length: 278  reward: 150Episode 4920  avg length: 203  reward: 83Episode 4940  avg length: 272  reward: 153Episode 4960  avg length: 289  reward: 151Episode 4980  avg length: 289  reward: 157Episode 5000  avg length: 299  reward: 168Episode 5020  avg length: 292  reward: 136Episode 5040  avg length: 290  reward: 158Episode 5060  avg length: 286  reward: 157Episode 5080  avg length: 282  reward: 154Episode 5100  avg length: 278  reward: 121Episode 5120  avg length: 291  reward: 138Episode 5140  avg length: 297  reward: 143Episode 5160  avg length: 290  reward: 165Episode 5180  avg length: 290  reward: 157Episode 5200  avg length: 276  reward: 150Episode 5220  avg length: 278  reward: 149Episode 5240  avg length: 287  reward: 153Episode 5260  avg length: 274  reward: 145Episode 5280  avg length: 299  reward: 176Episode 5300  avg length: 299  reward: 173Episode 5320  avg length: 299  reward: 164Episode 5340  avg length: 271  reward: 157Episode 5360  avg length: 299  reward: 180Episode 5380  avg length: 279  reward: 156Episode 5400  avg length: 268  reward: 133Episode 5420  avg length: 279  reward: 136Episode 5440  avg length: 278  reward: 130Episode 5460  avg length: 268  reward: 137Episode 5480  avg length: 273  reward: 152Episode 5500  avg length: 299  reward: 168Episode 5520  avg length: 266  reward: 95Episode 5540  avg length: 294  reward: 146Episode 5560  avg length: 289  reward: 165Episode 5580  avg length: 288  reward: 139Episode 5600  avg length: 299  reward: 174Episode 5620  avg length: 291  reward: 168Episode 5640  avg length: 281  reward: 147Episode 5660  avg length: 270  reward: 126Episode 5680  avg length: 263  reward: 153Episode 5700  avg length: 283  reward: 161Episode 5720  avg length: 271  reward: 154Episode 5740  avg length: 281  reward: 154Episode 5760  avg length: 281  reward: 144Episode 5780  avg length: 272  reward: 145Episode 5800  avg length: 275  reward: 128Episode 5820  avg length: 290  reward: 159Episode 5840  avg length: 274  reward: 142Episode 5860  avg length: 243  reward: 122Episode 5880  avg length: 236  reward: 124Episode 5900  avg length: 255  reward: 139Episode 5920  avg length: 288  reward: 140Episode 5940  avg length: 271  reward: 140Episode 5960  avg length: 254  reward: 108Episode 5980  avg length: 299  reward: 149Episode 6000  avg length: 289  reward: 149Episode 6020  avg length: 258  reward: 109Episode 6040  avg length: 289  reward: 129Episode 6060  avg length: 238  reward: 94Episode 6080  avg length: 270  reward: 87Episode 6100  avg length: 268  reward: 96Episode 6120  avg length: 279  reward: 142Episode 6140  avg length: 233  reward: 112Episode 6160  avg length: 268  reward: 142Episode 6180  avg length: 260  reward: 133Episode 6200  avg length: 210  reward: 109Episode 6220  avg length: 248  reward: 111Episode 6240  avg length: 229  reward: 92Episode 6260  avg length: 210  reward: 98Episode 6280  avg length: 218  reward: 102Episode 6300  avg length: 225  reward: 117Episode 6320  avg length: 235  reward: 112Episode 6340  avg length: 259  reward: 124Episode 6360  avg length: 252  reward: 113Episode 6380  avg length: 239  reward: 119Episode 6400  avg length: 242  reward: 95Episode 6420  avg length: 249  reward: 111Episode 6440  avg length: 257  reward: 136Episode 6460  avg length: 259  reward: 123Episode 6480  avg length: 259  reward: 112Episode 6500  avg length: 259  reward: 129Episode 6520  avg length: 215  reward: 101Episode 6540  avg length: 249  reward: 137Episode 6560  avg length: 245  reward: 121Episode 6580  avg length: 259  reward: 127Episode 6600  avg length: 267  reward: 142Episode 6620  avg length: 257  reward: 86Episode 6640  avg length: 278  reward: 141Episode 6660  avg length: 255  reward: 92Episode 6680  avg length: 289  reward: 145Episode 6700  avg length: 259  reward: 133Episode 6720  avg length: 247  reward: 116Episode 6740  avg length: 243  reward: 56Episode 6760  avg length: 274  reward: 114Episode 6780  avg length: 279  reward: 133Episode 6800  avg length: 269  reward: 152Episode 6820  avg length: 252  reward: 105Episode 6840  avg length: 254  reward: 123Episode 6860  avg length: 253  reward: 98Episode 6880  avg length: 273  reward: 132Episode 6900  avg length: 249  reward: 108Episode 6920  avg length: 248  reward: 84Episode 6940  avg length: 250  reward: 107Episode 6960  avg length: 279  reward: 99Episode 6980  avg length: 279  reward: 140Episode 7000  avg length: 270  reward: 105Episode 7020  avg length: 250  reward: 109Episode 7040  avg length: 202  reward: 87Episode 7060  avg length: 188  reward: 56Episode 7080  avg length: 229  reward: 93Episode 7100  avg length: 248  reward: 105Episode 7120  avg length: 218  reward: 105Episode 7140  avg length: 213  reward: 77Episode 7160  avg length: 279  reward: 128Episode 7180  avg length: 247  reward: 110Episode 7200  avg length: 269  reward: 124Episode 7220  avg length: 217  reward: 64Episode 7240  avg length: 258  reward: 140Episode 7260  avg length: 279  reward: 116Episode 7280  avg length: 244  reward: 97Episode 7300  avg length: 245  reward: 104Episode 7320  avg length: 213  reward: 81Episode 7340  avg length: 268  reward: 126Episode 7360  avg length: 277  reward: 124Episode 7380  avg length: 251  reward: 122Episode 7400  avg length: 234  reward: 108Episode 7420  avg length: 267  reward: 127Episode 7440  avg length: 218  reward: 89Episode 7460  avg length: 199  reward: 80Episode 7480  avg length: 154  reward: 55Episode 7500  avg length: 228  reward: 114Episode 7520  avg length: 197  reward: 49Episode 7540  avg length: 147  reward: 59Episode 7560  avg length: 139  reward: 49Episode 7580  avg length: 181  reward: 74Episode 7600  avg length: 191  reward: 61Episode 7620  avg length: 176  reward: 78Episode 7640  avg length: 160  reward: 35Episode 7660  avg length: 159  reward: 50Episode 7680  avg length: 143  reward: 68Episode 7700  avg length: 227  reward: 103Episode 7720  avg length: 192  reward: 59Episode 7740  avg length: 248  reward: 118Episode 7760  avg length: 250  reward: 128Episode 7780  avg length: 261  reward: 110Episode 7800  avg length: 279  reward: 157Episode 7820  avg length: 249  reward: 153Episode 7840  avg length: 212  reward: 78Episode 7860  avg length: 249  reward: 144Episode 7880  avg length: 257  reward: 107Episode 7900  avg length: 271  reward: 136Episode 7920  avg length: 244  reward: 129Episode 7940  avg length: 262  reward: 145Episode 7960  avg length: 224  reward: 94Episode 7980  avg length: 247  reward: 110Episode 8000  avg length: 190  reward: 81Episode 8020  avg length: 157  reward: 67Episode 8040  avg length: 171  reward: 67Episode 8060  avg length: 203  reward: 96Episode 8080  avg length: 225  reward: 87Episode 8100  avg length: 166  reward: 84Episode 8120  avg length: 196  reward: 82Episode 8140  avg length: 249  reward: 120Episode 8160  avg length: 216  reward: 112Episode 8180  avg length: 178  reward: 97Episode 8200  avg length: 221  reward: 120Episode 8220  avg length: 265  reward: 122Episode 8240  avg length: 240  reward: 125Episode 8260  avg length: 266  reward: 146Episode 8280  avg length: 253  reward: 116Episode 8300  avg length: 233  reward: 129Episode 8320  avg length: 260  reward: 126Episode 8340  avg length: 264  reward: 138Episode 8360  avg length: 196  reward: 88Episode 8380  avg length: 189  reward: 60Episode 8400  avg length: 227  reward: 66Episode 8420  avg length: 257  reward: 114Episode 8440  avg length: 254  reward: 99Episode 8460  avg length: 268  reward: 127Episode 8480  avg length: 263  reward: 131Episode 8500  avg length: 246  reward: 107Episode 8520  avg length: 281  reward: 127Episode 8540  avg length: 273  reward: 146Episode 8560  avg length: 290  reward: 124Episode 8580  avg length: 261  reward: 103Episode 8600  avg length: 294  reward: 140Episode 8620  avg length: 236  reward: 110Episode 8640  avg length: 261  reward: 125Episode 8660  avg length: 284  reward: 108Episode 8680  avg length: 278  reward: 141Episode 8700  avg length: 256  reward: 124Episode 8720  avg length: 245  reward: 95Episode 8740  avg length: 258  reward: 136Episode 8760  avg length: 289  reward: 147Episode 8780  avg length: 229  reward: 98Episode 8800  avg length: 277  reward: 138Episode 8820  avg length: 237  reward: 129Episode 8840  avg length: 276  reward: 141Episode 8860  avg length: 224  reward: 102Episode 8880  avg length: 220  reward: 108Episode 8900  avg length: 277  reward: 137Episode 8920  avg length: 259  reward: 120Episode 8940  avg length: 242  reward: 124Episode 8960  avg length: 275  reward: 119Episode 8980  avg length: 256  reward: 140Episode 9000  avg length: 263  reward: 110Episode 9020  avg length: 247  reward: 101Episode 9040  avg length: 251  reward: 99Episode 9060  avg length: 266  reward: 128Episode 9080  avg length: 247  reward: 119Episode 9100  avg length: 227  reward: 95Episode 9120  avg length: 242  reward: 95Episode 9140  avg length: 234  reward: 120Episode 9160  avg length: 271  reward: 145Episode 9180  avg length: 234  reward: 106Episode 9200  avg length: 230  reward: 102Episode 9220  avg length: 217  reward: 111Episode 9240  avg length: 182  reward: 68Episode 9260  avg length: 225  reward: 111Episode 9280  avg length: 224  reward: 110Episode 9300  avg length: 195  reward: 97Episode 9320  avg length: 245  reward: 110Episode 9340  avg length: 249  reward: 87Episode 9360  avg length: 238  reward: 105Episode 9380  avg length: 231  reward: 83Episode 9400  avg length: 245  reward: 60Episode 9420  avg length: 251  reward: 81Episode 9440  avg length: 218  reward: 86Episode 9460  avg length: 177  reward: 62Episode 9480  avg length: 212  reward: 64Episode 9500  avg length: 213  reward: 96Episode 9520  avg length: 267  reward: 121Episode 9540  avg length: 195  reward: 89Episode 9560  avg length: 259  reward: 140Episode 9580  avg length: 246  reward: 116Episode 9600  avg length: 266  reward: 122Episode 9620  avg length: 255  reward: 104Episode 9640  avg length: 203  reward: 116Episode 9660  avg length: 239  reward: 117Episode 9680  avg length: 239  reward: 118Episode 9700  avg length: 254  reward: 137Episode 9720  avg length: 269  reward: 144Episode 9740  avg length: 274  reward: 136Episode 9760  avg length: 259  reward: 123Episode 9780  avg length: 230  reward: 102Episode 9800  avg length: 268  reward: 139Episode 9820  avg length: 258  reward: 120Episode 9840  avg length: 271  reward: 111Episode 9860  avg length: 260  reward: 130Episode 9880  avg length: 280  reward: 135Episode 9900  avg length: 269  reward: 126Episode 9920  avg length: 290  reward: 159Episode 9940  avg length: 286  reward: 129Episode 9960  avg length: 259  reward: 117Episode 9980  avg length: 299  reward: 139Episode 10000  avg length: 298  reward: 141Episode 10020  avg length: 294  reward: 115Episode 10040  avg length: 284  reward: 117Episode 10060  avg length: 299  reward: 156Episode 10080  avg length: 290  reward: 145Episode 10100  avg length: 280  reward: 151Episode 10120  avg length: 299  reward: 163Episode 10140  avg length: 290  reward: 151Episode 10160  avg length: 269  reward: 133Episode 10180  avg length: 259  reward: 134Episode 10200  avg length: 272  reward: 137Episode 10220  avg length: 260  reward: 121Episode 10240  avg length: 259  reward: 103Episode 10260  avg length: 260  reward: 126Episode 10280  avg length: 279  reward: 150Episode 10300  avg length: 268  reward: 128Episode 10320  avg length: 261  reward: 140Episode 10340  avg length: 243  reward: 111Episode 10360  avg length: 236  reward: 113Episode 10380  avg length: 219  reward: 112Episode 10400  avg length: 267  reward: 140Episode 10420  avg length: 279  reward: 146Episode 10440  avg length: 285  reward: 137Episode 10460  avg length: 255  reward: 107Episode 10480  avg length: 249  reward: 115Episode 10500  avg length: 241  reward: 106Episode 10520  avg length: 219  reward: 102Episode 10540  avg length: 200  reward: 52Episode 10560  avg length: 267  reward: 124Episode 10580  avg length: 235  reward: 111Episode 10600  avg length: 223  reward: 86Episode 10620  avg length: 220  reward: 90Episode 10640  avg length: 269  reward: 145Episode 10660  avg length: 255  reward: 133Episode 10680  avg length: 277  reward: 130Episode 10700  avg length: 280  reward: 142Episode 10720  avg length: 278  reward: 128Episode 10740  avg length: 260  reward: 90Episode 10760  avg length: 288  reward: 145Episode 10780  avg length: 238  reward: 94Episode 10800  avg length: 278  reward: 136Episode 10820  avg length: 288  reward: 150Episode 10840  avg length: 280  reward: 148Episode 10860  avg length: 240  reward: 117Episode 10880  avg length: 257  reward: 124Episode 10900  avg length: 261  reward: 130Episode 10920  avg length: 229  reward: 115Episode 10940  avg length: 259  reward: 144Episode 10960  avg length: 238  reward: 138Episode 10980  avg length: 230  reward: 112Episode 11000  avg length: 254  reward: 126Episode 11020  avg length: 281  reward: 141Episode 11040  avg length: 270  reward: 120Episode 11060  avg length: 297  reward: 174Episode 11080  avg length: 261  reward: 138Episode 11100  avg length: 259  reward: 125Episode 11120  avg length: 292  reward: 173Episode 11140  avg length: 275  reward: 146Episode 11160  avg length: 299  reward: 165Episode 11180  avg length: 299  reward: 175Episode 11200  avg length: 289  reward: 161Episode 11220  avg length: 299  reward: 166Episode 11240  avg length: 278  reward: 160Episode 11260  avg length: 290  reward: 142Episode 11280  avg length: 299  reward: 164Episode 11300  avg length: 279  reward: 155Episode 11320  avg length: 299  reward: 178Episode 11340  avg length: 299  reward: 150Episode 11360  avg length: 265  reward: 110Episode 11380  avg length: 288  reward: 156Episode 11400  avg length: 278  reward: 146Episode 11420  avg length: 268  reward: 141Episode 11440  avg length: 291  reward: 130Episode 11460  avg length: 299  reward: 161Episode 11480  avg length: 284  reward: 142Episode 11500  avg length: 262  reward: 132Episode 11520  avg length: 287  reward: 149Episode 11540  avg length: 288  reward: 150Episode 11560  avg length: 288  reward: 157Episode 11580  avg length: 288  reward: 156Episode 11600  avg length: 284  reward: 133Episode 11620  avg length: 287  reward: 152Episode 11640  avg length: 249  reward: 130Episode 11660  avg length: 240  reward: 106Episode 11680  avg length: 271  reward: 131Episode 11700  avg length: 271  reward: 117Episode 11720  avg length: 286  reward: 143Episode 11740  avg length: 293  reward: 150Episode 11760  avg length: 289  reward: 155Episode 11780  avg length: 290  reward: 137Episode 11800  avg length: 289  reward: 133Episode 11820  avg length: 273  reward: 121Episode 11840  avg length: 274  reward: 109Episode 11860  avg length: 261  reward: 147Episode 11880  avg length: 210  reward: 114Episode 11900  avg length: 245  reward: 143Episode 11920  avg length: 210  reward: 115Episode 11940  avg length: 218  reward: 102Episode 11960  avg length: 214  reward: 102Episode 11980  avg length: 269  reward: 133Episode 12000  avg length: 262  reward: 144Episode 12020  avg length: 235  reward: 131Episode 12040  avg length: 253  reward: 149Episode 12060  avg length: 227  reward: 120Episode 12080  avg length: 202  reward: 98Episode 12100  avg length: 240  reward: 117Episode 12120  avg length: 231  reward: 108Episode 12140  avg length: 230  reward: 122Episode 12160  avg length: 228  reward: 108Episode 12180  avg length: 233  reward: 96Episode 12200  avg length: 252  reward: 123Episode 12220  avg length: 272  reward: 154Episode 12240  avg length: 251  reward: 122Episode 12260  avg length: 273  reward: 147Episode 12280  avg length: 239  reward: 111Episode 12300  avg length: 287  reward: 126Episode 12320  avg length: 278  reward: 121Episode 12340  avg length: 258  reward: 120Episode 12360  avg length: 265  reward: 104Episode 12380  avg length: 279  reward: 118Episode 12400  avg length: 254  reward: 72Episode 12420  avg length: 187  reward: 74Episode 12440  avg length: 244  reward: 90Episode 12460  avg length: 228  reward: 116Episode 12480  avg length: 258  reward: 125Episode 12500  avg length: 247  reward: 118Episode 12520  avg length: 244  reward: 101Episode 12540  avg length: 267  reward: 135Episode 12560  avg length: 253  reward: 99Episode 12580  avg length: 285  reward: 135Episode 12600  avg length: 259  reward: 113Episode 12620  avg length: 256  reward: 108Episode 12640  avg length: 238  reward: 114Episode 12660  avg length: 265  reward: 128Episode 12680  avg length: 289  reward: 145Episode 12700  avg length: 287  reward: 147Episode 12720  avg length: 283  reward: 139Episode 12740  avg length: 255  reward: 108Episode 12760  avg length: 299  reward: 150Episode 12780  avg length: 277  reward: 138Episode 12800  avg length: 290  reward: 151Episode 12820  avg length: 284  reward: 159Episode 12840  avg length: 299  reward: 150Episode 12860  avg length: 289  reward: 146Episode 12880  avg length: 299  reward: 158Episode 12900  avg length: 299  reward: 144Episode 12920  avg length: 279  reward: 129Episode 12940  avg length: 282  reward: 132Episode 12960  avg length: 280  reward: 132Episode 12980  avg length: 278  reward: 108Episode 13000  avg length: 284  reward: 136Episode 13020  avg length: 289  reward: 128Episode 13040  avg length: 291  reward: 149Episode 13060  avg length: 299  reward: 140Episode 13080  avg length: 292  reward: 141Episode 13100  avg length: 290  reward: 139Episode 13120  avg length: 299  reward: 139Episode 13140  avg length: 291  reward: 151Episode 13160  avg length: 291  reward: 141Episode 13180  avg length: 299  reward: 169Episode 13200  avg length: 299  reward: 162Episode 13220  avg length: 299  reward: 170Episode 13240  avg length: 299  reward: 170Episode 13260  avg length: 299  reward: 155Episode 13280  avg length: 299  reward: 153Episode 13300  avg length: 299  reward: 163Episode 13320  avg length: 281  reward: 131Episode 13340  avg length: 289  reward: 153Episode 13360  avg length: 285  reward: 133Episode 13380  avg length: 280  reward: 134Episode 13400  avg length: 282  reward: 134Episode 13420  avg length: 268  reward: 114Episode 13440  avg length: 290  reward: 142Episode 13460  avg length: 270  reward: 145Episode 13480  avg length: 257  reward: 127Episode 13500  avg length: 272  reward: 139Episode 13520  avg length: 270  reward: 129Episode 13540  avg length: 279  reward: 149Episode 13560  avg length: 269  reward: 95Episode 13580  avg length: 270  reward: 113Episode 13600  avg length: 258  reward: 125Episode 13620  avg length: 217  reward: 88Episode 13640  avg length: 157  reward: 59Episode 13660  avg length: 132  reward: 41Episode 13680  avg length: 220  reward: 92Episode 13700  avg length: 241  reward: 109Episode 13720  avg length: 252  reward: 127Episode 13740  avg length: 253  reward: 104Episode 13760  avg length: 269  reward: 128Episode 13780  avg length: 230  reward: 96Episode 13800  avg length: 258  reward: 127Episode 13820  avg length: 290  reward: 151Episode 13840  avg length: 299  reward: 135Episode 13860  avg length: 280  reward: 111Episode 13880  avg length: 268  reward: 124Episode 13900  avg length: 255  reward: 93Episode 13920  avg length: 258  reward: 128Episode 13940  avg length: 244  reward: 127Episode 13960  avg length: 238  reward: 117Episode 13980  avg length: 237  reward: 104Episode 14000  avg length: 251  reward: 123Episode 14020  avg length: 267  reward: 114Episode 14040  avg length: 271  reward: 109Episode 14060  avg length: 247  reward: 117Episode 14080  avg length: 282  reward: 129Episode 14100  avg length: 266  reward: 144Episode 14120  avg length: 256  reward: 132Episode 14140  avg length: 267  reward: 140Episode 14160  avg length: 289  reward: 149Episode 14180  avg length: 262  reward: 95Episode 14200  avg length: 278  reward: 128Episode 14220  avg length: 279  reward: 136Episode 14240  avg length: 249  reward: 105Episode 14260  avg length: 235  reward: 112Episode 14280  avg length: 273  reward: 131Episode 14300  avg length: 278  reward: 130Episode 14320  avg length: 259  reward: 123Episode 14340  avg length: 234  reward: 78Episode 14360  avg length: 268  reward: 125Episode 14380  avg length: 294  reward: 153Episode 14400  avg length: 299  reward: 150Episode 14420  avg length: 278  reward: 129Episode 14440  avg length: 297  reward: 155Episode 14460  avg length: 247  reward: 106Episode 14480  avg length: 289  reward: 154Episode 14500  avg length: 270  reward: 133Episode 14520  avg length: 259  reward: 133Episode 14540  avg length: 280  reward: 151Episode 14560  avg length: 268  reward: 129Episode 14580  avg length: 299  reward: 159Episode 14600  avg length: 279  reward: 131Episode 14620  avg length: 242  reward: 100Episode 14640  avg length: 236  reward: 114Episode 14660  avg length: 253  reward: 132Episode 14680  avg length: 272  reward: 134Episode 14700  avg length: 297  reward: 175Episode 14720  avg length: 278  reward: 148Episode 14740  avg length: 289  reward: 154Episode 14760  avg length: 288  reward: 148Episode 14780  avg length: 278  reward: 140Episode 14800  avg length: 266  reward: 128Episode 14820  avg length: 288  reward: 161Episode 14840  avg length: 278  reward: 145Episode 14860  avg length: 290  reward: 161Episode 14880  avg length: 279  reward: 139Episode 14900  avg length: 284  reward: 155Episode 14920  avg length: 245  reward: 136Episode 14940  avg length: 269  reward: 137Episode 14960  avg length: 262  reward: 146Episode 14980  avg length: 299  reward: 154Episode 15000  avg length: 273  reward: 172Episode 15020  avg length: 278  reward: 142Episode 15040  avg length: 277  reward: 150Episode 15060  avg length: 232  reward: 119Episode 15080  avg length: 280  reward: 141Episode 15100  avg length: 260  reward: 137Episode 15120  avg length: 285  reward: 167Episode 15140  avg length: 280  reward: 149Episode 15160  avg length: 237  reward: 118Episode 15180  avg length: 223  reward: 111Episode 15200  avg length: 243  reward: 134Episode 15220  avg length: 269  reward: 138Episode 15240  avg length: 251  reward: 127Episode 15260  avg length: 289  reward: 157Episode 15280  avg length: 229  reward: 107Episode 15300  avg length: 277  reward: 143Episode 15320  avg length: 288  reward: 154Episode 15340  avg length: 289  reward: 149Episode 15360  avg length: 288  reward: 145Episode 15380  avg length: 260  reward: 134Episode 15400  avg length: 246  reward: 126Episode 15420  avg length: 244  reward: 132Episode 15440  avg length: 272  reward: 129Episode 15460  avg length: 267  reward: 134Episode 15480  avg length: 263  reward: 135Episode 15500  avg length: 280  reward: 141Episode 15520  avg length: 254  reward: 126Episode 15540  avg length: 275  reward: 133Episode 15560  avg length: 271  reward: 120Episode 15580  avg length: 270  reward: 130Episode 15600  avg length: 299  reward: 144Episode 15620  avg length: 254  reward: 88Episode 15640  avg length: 271  reward: 126Episode 15660  avg length: 289  reward: 153Episode 15680  avg length: 231  reward: 104Episode 15700  avg length: 227  reward: 127Episode 15720  avg length: 174  reward: 82Episode 15740  avg length: 214  reward: 92Episode 15760  avg length: 190  reward: 89Episode 15780  avg length: 159  reward: 49Episode 15800  avg length: 222  reward: 100Episode 15820  avg length: 269  reward: 133Episode 15840  avg length: 243  reward: 100Episode 15860  avg length: 191  reward: 68Episode 15880  avg length: 221  reward: 86Episode 15900  avg length: 206  reward: 109Episode 15920  avg length: 228  reward: 89Episode 15940  avg length: 250  reward: 108Episode 15960  avg length: 229  reward: 110Episode 15980  avg length: 263  reward: 139Episode 16000  avg length: 250  reward: 125Episode 16020  avg length: 270  reward: 140Episode 16040  avg length: 251  reward: 131Episode 16060  avg length: 258  reward: 124Episode 16080  avg length: 268  reward: 130Episode 16100  avg length: 263  reward: 125Episode 16120  avg length: 280  reward: 150Episode 16140  avg length: 267  reward: 132Episode 16160  avg length: 284  reward: 137Episode 16180  avg length: 275  reward: 128Episode 16200  avg length: 269  reward: 132Episode 16220  avg length: 280  reward: 132Episode 16240  avg length: 279  reward: 145Episode 16260  avg length: 299  reward: 152Episode 16280  avg length: 238  reward: 112Episode 16300  avg length: 284  reward: 159Episode 16320  avg length: 280  reward: 136Episode 16340  avg length: 271  reward: 120Episode 16360  avg length: 281  reward: 139Episode 16380  avg length: 267  reward: 141Episode 16400  avg length: 299  reward: 164Episode 16420  avg length: 239  reward: 113Episode 16440  avg length: 276  reward: 143Episode 16460  avg length: 268  reward: 144Episode 16480  avg length: 269  reward: 134Episode 16500  avg length: 273  reward: 148Episode 16520  avg length: 247  reward: 97Episode 16540  avg length: 266  reward: 129Episode 16560  avg length: 267  reward: 119Episode 16580  avg length: 270  reward: 124Episode 16600  avg length: 262  reward: 101Episode 16620  avg length: 257  reward: 121Episode 16640  avg length: 233  reward: 99Episode 16660  avg length: 268  reward: 114Episode 16680  avg length: 261  reward: 126Episode 16700  avg length: 278  reward: 143Episode 16720  avg length: 278  reward: 117Episode 16740  avg length: 266  reward: 135Episode 16760  avg length: 282  reward: 140Episode 16780  avg length: 299  reward: 154Episode 16800  avg length: 279  reward: 144Episode 16820  avg length: 281  reward: 124Episode 16840  avg length: 280  reward: 132Episode 16860  avg length: 278  reward: 148Episode 16880  avg length: 280  reward: 113Episode 16900  avg length: 268  reward: 133Episode 16920  avg length: 291  reward: 147Episode 16940  avg length: 274  reward: 150Episode 16960  avg length: 281  reward: 137Episode 16980  avg length: 251  reward: 126Episode 17000  avg length: 261  reward: 135Episode 17020  avg length: 267  reward: 105Episode 17040  avg length: 274  reward: 176Episode 17060  avg length: 262  reward: 131Episode 17080  avg length: 186  reward: 184Episode 17100  avg length: 225  reward: 150Episode 17120  avg length: 201  reward: 218Episode 17140  avg length: 211  reward: 220Episode 17160  avg length: 221  reward: 218Episode 17180  avg length: 232  reward: 210Episode 17200  avg length: 216  reward: 220Episode 17220  avg length: 226  reward: 203Episode 17240  avg length: 198  reward: 170Episode 17260  avg length: 196  reward: 222Episode 17280  avg length: 214  reward: 196Episode 17300  avg length: 229  reward: 205Episode 17320  avg length: 183  reward: 192Episode 17340  avg length: 212  reward: 186Episode 17360  avg length: 192  reward: 164########## Solved! ##########

到此这篇关于Python强化练习之Tensorflow2 opp算法实现月球登陆器的文章就介绍到这了,更多相关Python Tensorflow2 OPP内容请搜索脚本之家以前的文章或继续浏览下面的相关文章希望大家以后多多支持脚本之家!

你可能感兴趣的