GLOBAL RESEARCH SYNDICATE
No Result
View All Result
  • Login
  • Latest News
  • Consumer Research
  • Survey Research
  • Marketing Research
  • Industry Research
  • Data Collection
  • More
    • Data Analysis
    • Market Insights
  • Latest News
  • Consumer Research
  • Survey Research
  • Marketing Research
  • Industry Research
  • Data Collection
  • More
    • Data Analysis
    • Market Insights
No Result
View All Result
globalresearchsyndicate
No Result
View All Result
Home Data Analysis

Generalization in Reinforcement Learning – Exploration vs Exploitation

globalresearchsyndicate by globalresearchsyndicate
February 28, 2020
in Data Analysis
0
Generalization in Reinforcement Learning – Exploration vs Exploitation
0
SHARES
3
VIEWS
Share on FacebookShare on Twitter


In Reinforcement learning, the generalization of the agents is benchmarked on the environments they have been trained on. In a supervised learning setting, this would mean testing the model using the training dataset.

OpenAI has open-sourced Procgen-benchmark emphasizing the generalization for RL agents as they struggle to generalize in new environments.

Procgen consists of 16 simple-to-use procedurally-generated gym environments which provide a direct measure of how quickly a reinforcement learning agent learns generalization skills. The environments run at high speed (thousands of steps per second) on a single core and The observation space is a box space with the RGB pixels the agent sees in a NumPy array of shape (64, 64, 3). The expected step rate for a human player is 15 Hz.



Benchmarking RL agents using Arcade Learning Environment has been considered a standard because of the diverse environment provided by ALE.

Nevertheless, the question must be asked whether the agents are learning generalization or they are simply memorizing the specifics of the environments?


W3Schools


Procedurally Generated Environments

To support this notion, Procgen has environments that are procedurally generated. Let’s understand this from one of the environment descriptions,

“Inspired by the Atari game “MsPacman”. Maze layouts are generated using Kruskal’s algorithm, and then walls are removed until no dead-ends remain in the maze. The player must collect all the green orbs. 3 large stars spawn that will make enemies vulnerable for a short time when collected. A collision with an enemy that isn’t vulnerable results in the player’s death. When a vulnerable enemy is eaten, an egg spawns somewhere on the map that will hatch into a new enemy after a short time, keeping the total number of enemies constant. The player receives a small reward for collecting each orb and a large reward for completing the level.”

Procedural generation also helped to develop intrinsically diverse environments, that forces the agent to learn robust policies to generalize instead of just overfitting the environment. Hence, finding the sweet spot between exploration and exploitation.

Features

All Procgen environments were designed keeping the following criterion in mind, 

  • High Diversity – Higher diversity presents agents with a generalization challenge.
  • Fast Evaluation – The environments support a thousand steps per second on a single core machine for faster evaluation.
  • Tunable Efficiency – All the environments support Easy, Medium and Hard levels of gameplay. However, the easy level uses 1/8th of resources to create the environment.

The above features were cited from the procgen release article by OpenAI.

Comparison with Gym Retro

The gym retro environment also supports diverse environments to train RL agents. However, there is a vast gap in terms of design and features when compared to procgen

  • Faster – Gym Retro environments are already fast, but Procgen environments can run >4x faster.
  • Non-deterministic – Gym Retro environments are always the same, so you can memorize a sequence of actions that will get the highest reward. Procgen environments are randomized so this is not possible.
  • Customizable – If you install from source, you can perform experiments where you change the environments, or build your own environments. The environment-specific code for each environment is often less than 300 lines. This is almost impossible with Gym Retro.

Training Agents to Play in Procgen Environment

The following snippet will train an RL agent to play in various environments such as Coin run, Starpilot, and Chaser supported by procgen.

import imageio

import time

import numpy as np

import gym

from stable_baselines.common.vec_env import DummyVecEnv, VecVideoRecorder

from stable_baselines.ddpg.policies import CnnPolicy

from stable_baselines.common.policies import MlpLstmPolicy, CnnLstmPolicy

from stable_baselines import A2C, PPO2

video_folder = '/gdrive//videos'

video_length = 5000

env_id = "procgen:procgen-chaser-v0" 

env = DummyVecEnv([lambda: gym.make(env_id)])

model = PPO2("CnnPolicy", env, verbose=1)

s_time = time.time()

model.learn(total_timesteps=int(1e4))

e_time = time.time()

print(f"Total Run-Time : , {round(((e_time - s_time) * 1000), 3)} seconds")

# Record the video starting at the first step

env = VecVideoRecorder(env, video_folder, record_video_trigger=lambda x: x == 1000, 

                       video_length=video_length, name_prefix="trained-agent-{}".format(env_id))

env.reset()

for _ in range(video_length + 1):

  action = [env.action_space.sample()]

  obs, _, _, _ = env.step(action)

# Save the video

env.close()

The above agent was trained for 10,000 timesteps using CNN policy and Proximal Policy Optimization.

Have a look at the agent’s gameplay in the below video, the agent was trained under 3 minutes using GPU for the star-pilot environment. Have a look till the end to see the rational behaviour of the agent.

The benchmark published by OpenAI clearly reveals the vast gap in the performance of agents in train and test environment. It also highlights the flaw in using the same sequence of steps for training the agents clearing the longstanding puzzle in Reinforcement Learning research.


Enjoyed this story? Join our Telegram group. And be part of an engaging community.

Provide your comments below

comments

Related Posts

How Machine Learning has impacted Consumer Behaviour and Analysis
Consumer Research

How Machine Learning has impacted Consumer Behaviour and Analysis

January 4, 2024
Market Research The Ultimate Weapon for Business Success
Consumer Research

Market Research: The Ultimate Weapon for Business Success

June 22, 2023
Unveiling the Hidden Power of Market Research A Game Changer
Consumer Research

Unveiling the Hidden Power of Market Research: A Game Changer

June 2, 2023
7 Secrets of Market Research Gurus That Will Blow Your Mind
Consumer Research

7 Secrets of Market Research Gurus That Will Blow Your Mind

May 8, 2023
The Shocking Truth About Market Research Revealed!
Consumer Research

The Shocking Truth About Market Research: Revealed!

April 25, 2023
market research, primary research, secondary research, market research trends, market research news,
Consumer Research

Quantitative vs. Qualitative Research. How to choose the Right Research Method for Your Business Needs

March 14, 2023
Next Post
Liberal gun owners face dilemma in 2020 field

Liberal gun owners face dilemma in 2020 field

Categories

  • Consumer Research
  • Data Analysis
  • Data Collection
  • Industry Research
  • Latest News
  • Market Insights
  • Marketing Research
  • Survey Research
  • Uncategorized

Recent Posts

  • Ipsos Revolutionizes the Global Market Research Landscape
  • How Machine Learning has impacted Consumer Behaviour and Analysis
  • Market Research: The Ultimate Weapon for Business Success
  • Privacy Policy
  • Terms of Use
  • Antispam
  • DMCA

Copyright © 2024 Globalresearchsyndicate.com

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In
This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Cookie settingsACCEPT
Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these cookies, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may have an effect on your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Non-necessary
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.
SAVE & ACCEPT
No Result
View All Result
  • Latest News
  • Consumer Research
  • Survey Research
  • Marketing Research
  • Industry Research
  • Data Collection
  • More
    • Data Analysis
    • Market Insights

Copyright © 2024 Globalresearchsyndicate.com