[1435] reward: -0.72, reward 100-step MA: -0.01, Exploration: 0.17, action: [-2.], td-error: 0.0000: 36%|███▌ | 1436/4000 [01:51<04:07, 10.35it/s] | 1/4000 [00:00<20:46, 3.21it/s]
Total episode reward: -8226.80643325. Finished in 1435 steps.
[1595] reward: -6.25, reward 100-step MA: -0.06, Exploration: 0.13, action: [ 1.08452916], td-error: 0.2675: 40%|███▉ | 1596/4000 [02:07<03:52, 10.33it/s] : -0.88, reward 100-step MA: -0.04, Exploration: 0.17, action: [-1.89823723], td-error: 0.1890: 36%|███▌ | 1440/4000 [01:52<04:05, 10.42it/s]
Total episode reward: -256.861952677. Finished in 160 steps.
[1735] reward: -0.00, reward 100-step MA: -0.00, Exploration: 0.10, action: [ 0.22459596], td-error: 0.0000: 43%|████▎ | 1736/4000 [02:20<03:38, 10.37it/s] 1599] reward: -6.73, reward 100-step MA: -0.32, Exploration: 0.13, action: [-1.62731051], td-error: 0.0000: 40%|████ | 1600/4000 [02:07<03:51, 10.38it/s]
Total episode reward: -263.8297128. Finished in 141 steps.
[1863] reward: -7.65, reward 100-step MA: -0.08, Exploration: 0.08, action: [ 0.37197751], td-error: 0.0000: 47%|████▋ | 1864/4000 [02:33<03:26, 10.35it/s] 1739] reward: -0.46, reward 100-step MA: -0.02, Exploration: 0.10, action: [-1.56952918], td-error: 1.2981: 44%|████▎ | 1740/4000 [02:21<03:36, 10.42it/s]
Total episode reward: -129.013232911. Finished in 127 steps.
[2029] reward: -8.44, reward 100-step MA: -0.08, Exploration: 0.06, action: [ 2.], td-error: 0.5105: 51%|█████ | 2030/4000 [02:49<03:11, 10.30it/s] 1867] reward: -8.59, reward 100-step MA: -0.40, Exploration: 0.08, action: [-1.76676381], td-error: 0.0000: 47%|████▋ | 1868/4000 [02:33<03:25, 10.36it/s]
Total episode reward: -394.722244656. Finished in 166 steps.
[2213] reward: -0.06, reward 100-step MA: -0.00, Exploration: 0.04, action: [ 1.91606069], td-error: 0.0000: 55%|█████▌ | 2214/4000 [03:06<02:56, 10.14it/s] 2033] reward: -7.22, reward 100-step MA: -0.39, Exploration: 0.06, action: [ 1.98056054], td-error: 0.6367: 51%|█████ | 2034/4000 [02:49<03:09, 10.35it/s]
Total episode reward: -466.202745731. Finished in 184 steps.
[2313] reward: -0.01, reward 100-step MA: -0.00, Exploration: 0.03, action: [-1.62057149], td-error: 0.0000: 58%|█████▊ | 2314/4000 [03:16<02:45, 10.18it/s][2217] reward: -0.05, reward 100-step MA: -0.00, Exploration: 0.04, action: [-0.38357228], td-error: 0.0000: 55%|█████▌ | 2218/4000 [03:07<02:54, 10.22it/s]
Total episode reward: -0.679593910063. Finished in 101 steps.
[2475] reward: -0.15, reward 100-step MA: -0.00, Exploration: 0.02, action: [ 2.], td-error: 0.0000: 62%|██████▏ | 2476/4000 [03:32<02:34, 9.90it/s] 2317] reward: -9.52, reward 100-step MA: -0.34, Exploration: 0.03, action: [ 2.], td-error: 0.3446: 58%|█████▊ | 2318/4000 [03:17<02:45, 10.14it/s]
Total episode reward: -377.271135441. Finished in 161 steps.
[2575] reward: -0.01, reward 100-step MA: -0.00, Exploration: 0.02, action: [ 2.], td-error: 0.0000: 64%|██████▍ | 2576/4000 [03:42<02:23, 9.93it/s] [2479] reward: -0.05, reward 100-step MA: -0.01, Exploration: 0.02, action: [-0.02235568], td-error: 0.0000: 62%|██████▏ | 2480/4000 [03:33<02:30, 10.10it/s]
Total episode reward: -1.24576439359. Finished in 101 steps.
[2729] reward: -3.23, reward 100-step MA: -0.03, Exploration: 0.01, action: [-1.65990257], td-error: 0.0000: 68%|██████▊ | 2730/4000 [03:57<02:03, 10.26it/s] 579] reward: -7.58, reward 100-step MA: -0.32, Exploration: 0.02, action: [ 1.94697154], td-error: 0.2970: 64%|██████▍ | 2580/4000 [03:42<02:20, 10.10it/s]
Total episode reward: -346.408582049. Finished in 153 steps.
[2850] reward: -0.00, reward 100-step MA: -0.00, Exploration: 0.01, action: [ 0.86981618], td-error: 0.0000: 71%|███████▏ | 2851/4000 [04:09<01:51, 10.30it/s] 2733] reward: -3.36, reward 100-step MA: -0.16, Exploration: 0.01, action: [-1.72752595], td-error: 0.0000: 68%|██████▊ | 2734/4000 [03:57<02:04, 10.14it/s]
Total episode reward: -134.790559379. Finished in 122 steps.
[2951] reward: -0.00, reward 100-step MA: -0.00, Exploration: 0.00, action: [-0.06073409], td-error: 0.0000: 74%|███████▍ | 2952/4000 [04:19<01:43, 10.08it/s][2854] reward: -0.18, reward 100-step MA: -0.01, Exploration: 0.01, action: [ 2.], td-error: 0.7065: 71%|███████▏ | 2855/4000 [04:09<01:52, 10.20it/s]
Total episode reward: -2.31354317179. Finished in 101 steps.
[3096] reward: -2.55, reward 100-step MA: -0.03, Exploration: 0.00, action: [-1.79324269], td-error: 0.3259: 77%|███████▋ | 3097/4000 [04:33<01:29, 10.12it/s] 2955] reward: -6.92, reward 100-step MA: -0.24, Exploration: 0.00, action: [ 1.67555666], td-error: 0.2448: 74%|███████▍ | 2956/4000 [04:19<01:42, 10.21it/s]
Total episode reward: -251.240980277. Finished in 144 steps.
[3216] reward: -0.00, reward 100-step MA: -0.08, Exploration: 0.00, action: [-0.34413767], td-error: 0.0000: 80%|████████ | 3217/4000 [04:45<01:16, 10.23it/s][3100] reward: -2.99, reward 100-step MA: -0.13, Exploration: 0.00, action: [-1.63164282], td-error: 0.0000: 78%|███████▊ | 3101/4000 [04:33<01:28, 10.14it/s]
Total episode reward: -131.774528854. Finished in 122 steps.
[3318] reward: -0.00, reward 100-step MA: -0.00, Exploration: 0.00, action: [ 1.29666018], td-error: 0.0000: 83%|████████▎ | 3319/4000 [04:55<01:06, 10.17it/s][3220] reward: -0.01, reward 100-step MA: -0.00, Exploration: 0.00, action: [ 1.20537007], td-error: 0.0000: 81%|████████ | 3221/4000 [04:45<01:17, 10.04it/s]
Total episode reward: -0.262464451205. Finished in 101 steps.
[3439] reward: -0.00, reward 100-step MA: -0.09, Exploration: 0.00, action: [ 1.13709033], td-error: 0.0000: 86%|████████▌ | 3440/4000 [05:07<00:55, 10.15it/s][3322] reward: -2.02, reward 100-step MA: -0.07, Exploration: 0.00, action: [ 1.90731287], td-error: 0.3582: 83%|████████▎ | 3323/4000 [04:55<01:07, 10.10it/s]
Total episode reward: -128.877967011. Finished in 122 steps.
[3542] reward: -0.33, reward 100-step MA: -0.00, Exploration: 0.00, action: [ 2.], td-error: 0.0000: 89%|████████▊ | 3543/4000 [05:18<00:45, 9.96it/s] [3443] reward: -0.04, reward 100-step MA: -0.00, Exploration: 0.00, action: [-1.82677627], td-error: 0.7489: 86%|████████▌ | 3444/4000 [05:07<00:55, 10.02it/s]
Total episode reward: -0.794406954628. Finished in 101 steps.
[3670] reward: -6.56, reward 100-step MA: -0.07, Exploration: 0.00, action: [-0.91683942], td-error: 0.0416: 92%|█████████▏| 3671/4000 [05:31<00:33, 9.97it/s] ward: -0.39, reward 100-step MA: -0.01, Exploration: 0.00, action: [ 2.], td-error: 0.0000: 89%|████████▊ | 3545/4000 [05:18<00:46, 9.81it/s]
Total episode reward: -129.253171385. Finished in 128 steps.
[3849] reward: -0.00, reward 100-step MA: -0.00, Exploration: 0.00, action: [ 1.30503345], td-error: 0.0393: 96%|█████████▋| 3850/4000 [05:48<00:14, 10.14it/s] 3674] reward: -7.40, reward 100-step MA: -0.34, Exploration: 0.00, action: [ 1.96057427], td-error: 0.0000: 92%|█████████▏| 3675/4000 [05:31<00:32, 10.11it/s]
Total episode reward: -393.294628039. Finished in 180 steps.
[3951] reward: -0.86, reward 100-step MA: -0.01, Exploration: 0.00, action: [ 2.], td-error: 0.1530: 99%|█████████▉| 3952/4000 [05:58<00:04, 10.04it/s] [3853] reward: -0.00, reward 100-step MA: -0.00, Exploration: 0.00, action: [-0.98713779], td-error: 0.0000: 96%|█████████▋| 3854/4000 [05:49<00:14, 10.07it/s]
Total episode reward: -0.383064396993. Finished in 101 steps.
[3999] reward: -0.01, reward 100-step MA: -1.31, Exploration: 0.00, action: [ 1.62858033], td-error: 0.0000: 100%|██████████| 4000/4000 [06:03<00:00, 10.06it/s] ward: -0.70, reward 100-step MA: -0.04, Exploration: 0.00, action: [ 1.94709253], td-error: 0.1725: 99%|█████████▉| 3956/4000 [05:59<00:04, 10.12it/s]