Beyond Classical Search

This notebook serves as supporting material for the chapter Beyond Classical Search. The notebooks illustrate the use of the code repository and demonstrate how the code can be extended to solve various search-related problems. Classical Search (Chapter 3) addresses a single category of problems: observable, deterministic, known environments where the solution is a sequence of actions. Here, we look at what happens when these assumptions are relaxed. The discussion begins with the local search on state space. Then we'll examine what happens if we relax the assumptions of determinism and observability. At last, we'll study online search, in which the agent is faced with a state space that is initially unknown and must be explored.


In [60]:
%classpath add jar ../out/artifacts/aima_core_jar/aima-core.jar

Local Search Algorithms and Optimization Problems

These algorithms are suitable for problems in which all that matters is the solution state, not the path cost to reach it. For example, in the 8-queens problem, what matters is the final configuration of queens, not the order in which they were added. Therefore, Local search algorithms operate using a single current node (rather than multiple paths) and generally move only to the neighbors of that node i.e. the paths followed by the search are not retained. In addition to finding goals, local search algorithms are useful for solving pure optimization problems, in which the aim is to find the best state according to an objective function.

To understand local search, we consider the state-space landscape. A landscape has both location (defined by the state) and elevation (defined by the value of heuristic cost function or objective function). If elevation corresponds to cost, the aim is to find the global minimum; if elevation corresponds to an objective function, the aim is to find the global maximum.

It is simply a loop that continually moves in the direction of increasing value. It terminates when it reaches a "peak" where no neighbor has a higher value. It does not maintain a search tree, so the data structure for the current node need only record the state and value of objective function. Therefore, this algorithm does not look ahead beyond the immediate neighbors of the current state.

Hill climbing is sometimes called greedy local search because it grabs a good neighbor state without thinking ahead where to go next. It chooses randomly among the set of best successors if there are more than one. Unfortunately, hill climbing often gets stuck for the following reasons:-

  • Local maxima: A local maxima is a peak that is higher than each of its neighboring states but lower than the global maximum. Hill-climbing algorithms that reach the vicinity of a local maximum will be drawn upwards towards the peak but will then be stuck with nowhere else to go.
  • Ridges: Ridges result in a sequence of local maxima that is very difficult for greedy algorithms to navigate.
  • Plateaux: A plateau is a flat area of the state space landscape.

Let's have a look at the pseudo code of hill-climbing search.


In [61]:
%%python
from notebookUtils import *
pseudocode('Hill Climbing')


Out[61]:

AIMA4e

function HILL-CLIMBING(problem) returns a state that is a local maximum
currentproblem.INITIAL-STATE
loop do
   neighbor ← a highest-valued successor of current
   if VALUE(neighbour) ≤ VALUE(current) then return current
   currentneighbor


Figure 4.2 The hill-climbing search algorithm, which is the most basic local search technique. At each step the current node is replaced by the best neighbor.

AIMA3e

function HILL-CLIMBING(problem) returns a state that is a local maximum
current ← MAKE-NODE(problem.INITIAL-STATE)
loop do
   neighbor ← a highest-valued successor of current
   if neighbor.VALUE ≤ current.VALUE then return current.STATE
   currentneighbor


Figure ?? The hill-climbing search algorithm, which is the most basic local search technique. At each step the current node is replaced by the best neighbor; in this version, that means the neighbor with the highest VALUE, but if a heuristic cost estimate h is used, we would find the neighbor with the lowest h.

The implementation of the above pseudo code can be viewed here.

This algorithm halts if it reaches a plateau where the best successor has the same value as the current state. It might be a good idea to allow sideways move in the hope that plateau is really a shoulder, but we must take care. An infinite loop can occur whenever the algorithm reaches a flat local maximum that is not a shoulder. One common solution is to put a limit on the number of consecutive sideways moves allowed.

Let's try to solve 8-Queens problem using Hill-climbing search.


In [62]:
import aima.core.agent.Action;
import aima.core.environment.nqueens.NQueensBoard;
import aima.core.environment.nqueens.NQueensFunctions;
import aima.core.environment.nqueens.QueenAction;
import aima.core.search.agent.SearchAgent;
import aima.core.search.framework.problem.Problem;
import aima.core.search.local.HillClimbingSearch;

int boardSize = 8;
Problem<NQueensBoard, QueenAction> problem =
        NQueensFunctions.createCompleteStateFormulationProblem(boardSize, NQueensBoard.Config.QUEEN_IN_EVERY_COL);
HillClimbingSearch<NQueensBoard, QueenAction> search = new HillClimbingSearch<>
        (NQueensFunctions.createAttackingPairsHeuristicFunction());
SearchAgent<NQueensBoard, QueenAction> agent = new SearchAgent<>(problem, search);

for (Action action : agent.getActions()) {
    System.out.println(action.toString());
}
System.out.println("Search Outcome=" + search.getOutcome());
System.out.println("Final State=\n" + search.getLastSearchState());


Action[name=moveQueenTo, location=(2, 1)]
Action[name=moveQueenTo, location=(1, 3)]
Action[name=moveQueenTo, location=(4, 1)]
Action[name=moveQueenTo, location=(6, 2)]
Action[name=moveQueenTo, location=(2, 0)]
Search Outcome=FAILURE
Final State=
--Q-----
----Q---
------Q-
-Q------
---Q----
-----Q--
Q-------
-------Q

Out[62]:
null

Notice that the above search can either attain the goal state configuration or can get stuck at a local minimum of the queen-pair attacking heuristic function.

Random-Restart Hill Climbing

The hill climbing algorithms described so far are incomplete as they often get stuck on local maxima. Therefore, Random-restart hill climbing conducts a series of hill climbing searches from randomly generated initial states, until a goal is found. If each hill climbing has a probability $p$ of success, then the expected number of restarts required is $1/p$. The success of hill climbing depends very much on the shape of the state space landscape: if there are few local maxima and plateaux, random restart hill climbing will find a good solution very quickly.

Simulated Annealing

A hill climbing algorithm that never makes "downhill" moves toward states with lower values is guaranteed to be incomplete, because it can get stuck on a local maximum. In contrast, a purely random walk is complete but extremely inefficient. Therefore we need to combine hill climbing with a random walk in some way that yields both efficiency and completeness. Simulated annealing is such an algorithm.

Instead of picking the best move, it picks a random move. If the move improves the situation, it is always accepted. Otherwise, the algorithm accepts the move with some probability less than one. Suppose ${\Delta E}$ is the amount by which evaluation is worsened (${\Delta E}<0$) and $T$ is the temperature such that: "bad" moves are more likely to be allowed at high $T$ and they become more unlikely as $T$ decreases. Then the probability decreases exponentially with the "badness" of the move. Mathematically, the probability of accepting the move worsening the evaluation function is given by $e^{\tfrac{\Delta E}{T}}$.

Let's have a look at the pseudo code of simulated annealing.


In [63]:
%%python
from notebookUtils import *
pseudocode('Simulated Annealing')


Out[63]:

AIMA4e

function SIMULATED-ANNEALING(problem,schedule) returns a solution state

currentproblem.INITIAL-STATE
for t = 1 todo
   Tschedule(t)
   if T = 0 then return current
   next ← a randomly selected successor of current
   ΔE ← VALUE(next) - VALUE(current)
   if ΔE > 0 then currentnext
   else currentnext only with probability e_ΔE_/_T_


Figure 4.6 The simulated annealing algorithm, a version of stochastic hill climbing where some downhill moves are allowed. The schedule input determines the value of the “temper- ature” T as a function of time; higher temperatures early in the schedule mean that downhill moves are accepted more readily; late in the schedule with low temperatures, downhill moves are mostly rejected.

AIMA3e

function SIMULATED-ANNEALING(problem,schedule) returns a solution state
inputs: problem, a problem
    schedule, a mapping from time to "temperature"

current ← MAKE-NODE(problem.INITIAL-STATE)
for t = 1 todo
   Tschedule(t)
   if T = 0 then return current
   next ← a randomly selected successor of current
   ΔEnext.VALUE - current.VALUE
   if ΔE > 0 then currentnext
   else currentnext only with probability e_ΔE_/_T_


Figure ?? The simulated annealing algorithm, a version of stochastic hill climbing where some downhill moves are allowed. Downhill moves are accepted readily early in the annealing schedule and then less often as time goes on. The schedule input determines the value of the temperature T as a function of time.

The implementation of the above pseudo code can be viewed here and it can be visualized here.

Let's find out the probability of accepting a move at 2 different temperatures. As per the algorithm, bad moves are more likely to be accepted at a higher temperature.


In [64]:
import aima.core.agent.Action;
import aima.core.search.local.SimulatedAnnealingSearch;

SimulatedAnnealingSearch<String, Action> search = new SimulatedAnnealingSearch<>(null);
int deltaE = -1; //bad move as ΔE<0
double lowerTemperature = 1.0;
double higherTemperature = 20.0;

System.out.println("Probability of acceptance of move at lower temperature: " + search.probabilityOfAcceptance(lowerTemperature, deltaE));
System.out.println("Probability of acceptance of move at higher temperature: " + search.probabilityOfAcceptance(higherTemperature, deltaE));


Probability of acceptance of move at lower temperature: 0.36787944117144233
Probability of acceptance of move at higher temperature: 0.951229424500714
Out[64]:
null

This algorithm keeps track of $k$ states rather than just one. It begins with $k$ randomly generated states. At each step, all the successors of all $k$ states are generated. If anyone is a goal then algorithm halts. Otherwise, it selects the $k$ best successors from the complete list and repeats.

Note that the two algorithms i.e. random-restart search and local beam search are quite different. In a random restart search, each search process runs independently of the others. In a local beam, useful information is passed among the parallel search threads. In effect, the algorithm quickly abandons unfruitful searches and moves its resources to where the most progress has been made.

Genetic Algorithms

A genetic algorithm(GA) is a variant of beam search in which successor states are generated by combining two parent states rather than by modifying a single state. Like beam search, GA begins with a set of $k$ randomly generated states, called the population. Each state is rated by an objective function, also called the fitness function. A fitness function should return higher values for better states. Therefore, the probability of being chosen for reproducing is directly proportional to the fitness score. Two pairs are selected at random for reproduction, in accordance with the probabilities described above. Then, for each pair to be mated, a crossover point is chosen at random and hence the offsprings are created by crossing over the parents at the crossover point. Finally, each offspring is subject to random mutation with a small independent probability.

Let's have a look at the pseudo code of the genetic algorithm.


In [65]:
%%python
from notebookUtils import *
pseudocode('Genetic Algorithm')


Out[65]:

AIMA4e

function GENETIC-ALGORITHM(population, FITNESS-FN) returns an individual
inputs: population, the initial random population of individuals
    FITNESS-FN, a function that measures the fitness of an individual

repeat
   population ← [MUTATE(RECOMBINE(SELECT(2, population, FITNESS-FN)))
          for i in population]
until some individual is fit enough, or enough time has elapsed
return the best individual in population, according to FITNESS-FN


function SELECT(ρ, population, FITNESS-FN) returns a set of ρ individuals
selection ← a uniform random sample of 2 * ρ individuals from population
return the top ρ individuals in selection, ranked by FITNESS-FN


function RECOMBINE(x, y) returns an individual
inputs: x,y, parent individuals

n ← LENGTH(x)
crossover ← random integer from 0 to n
return APPEND(x[0:crossover], y[crossover: n])


Figure ?? A genetic algorithm. The algorithm is the same as the one diagrammed in Figure ??, with one variation: in this version, each recombination of two parents produces only one offspring, not two.

AIMA3e

function GENETIC-ALGORITHM(population,FITNESS-FN) returns an individual
inputs: population, a set of individuals
    FITNESS-FN, a function that measures the fitness of an individual

repeat
   _new_population_ ← empty set
   for i = 1 to SIZE(population) do
     x ← RANDOM-SELECTION(population,FITNESS-FN)
     y ← RANDOM-SELECTION(population,FITNESS-FN)
     child ← REPRODUCE(x,y)
     if (small random probability) then child ← MUTATE(child)
     add child to _new_population_
   population ← _new_population_
until some individual is fit enough, or enough time has elapsed
return the best individual in population, according to FITNESS-FN


function REPRODUCE(x, y) returns an individual
inputs: x,y, parent individuals

n ← LENGTH(x); c ← random number from 1 to n
return APPEND(SUBSTRING(x, 1, c),SUBSTRING(y, c+1, n))


Figure ?? A genetic algorithm. The algorithm is the same as the one diagrammed in Figure ??, with one variation: in this more popular version, each mating of two parents produces only one offspring, not two.

Above pseudo code is implemented here and can be visualized here. Now let's try to solve 8 queens problem using genetic algorithm.


In [66]:
import aima.core.environment.nqueens.NQueensGenAlgoUtil;
import aima.core.search.framework.problem.GoalTest;
import aima.core.search.local.FitnessFunction;
import aima.core.search.local.GeneticAlgorithm;
import aima.core.search.local.Individual;

import java.util.*;

int boardSize = 8;
FitnessFunction<Integer> fitnessFunction = NQueensGenAlgoUtil.getFitnessFunction();
GoalTest<Individual<Integer>> goalTest = NQueensGenAlgoUtil.getGoalTest();
// Generate an initial population
Set<Individual<Integer>> population = new HashSet<>();
for (int i = 0; i < 50; i++) {
    population.add(NQueensGenAlgoUtil.generateRandomIndividual(boardSize));
}

GeneticAlgorithm<Integer> ga = new GeneticAlgorithm<>(boardSize,
        NQueensGenAlgoUtil.getFiniteAlphabetForBoardOfSize(boardSize), 0.15);

Individual<Integer> bestIndividual  = ga.geneticAlgorithm(population, fitnessFunction, goalTest, 0L);

System.out.println("");
System.out.println("Goal Test Best Individual=\n" + NQueensGenAlgoUtil.getBoardForIndividual(bestIndividual));
System.out.println("Board Size      = " + boardSize);
System.out.println("Fitness         = " + fitnessFunction.apply(bestIndividual));
System.out.println("Is Goal         = " + goalTest.test(bestIndividual));
System.out.println("Population Size = " + ga.getPopulationSize());
System.out.println("Itertions       = " + ga.getIterations());
System.out.println("Took            = " + ga.getTimeInMilliseconds() + "ms.");


Goal Test Best Individual=
--Q-----
-----Q--
---Q----
Q-------
-------Q
----Q---
------Q-
-Q------

Board Size      = 8
Fitness         = 28.0
Is Goal         = true
Population Size = 50
Itertions       = 175
Took            = 504ms.
Out[66]:
null

Searching with Non-Deterministic Actions

When the environment is either partially observable or non-deterministic, percepts become useful. In a partially observable environment, every percept helps narrow down the set of possible states the agent might be in, thus making it easier for the agent to achieve its goal. When the environment is non-deterministic, percept tells the agent which of the possible outcomes of its action has actually occurred. In both cases, the future percepts cannot be determined in advance and the agent's future action will depend on those future percepts. So the solution to a problem is not an action sequence but a contingency plan (also known as a strategy) that specifies what to do depending on what percepts are received. Therefore, the solution for non-deterministic problems can contain nested if-then-else statements; this means that they are trees rather than sequences. This allows the selection of actions based on the contingencies arising during execution.

AND-OR Search Trees

In a deterministic environment, the only branching is introduced by the agent's own choices in each state. We call these nodes OR nodes. In a non-deterministic environment, branching is also introduced by the environment's choice of outcome for each action. We call these nodes AND nodes. These 2 kinds of nodes alternate, leading to an AND-OR search tree.

Let's have a look at the pseudo code of the AND-OR Graph Search algorithm.


In [67]:
%%python
from notebookUtils import *
pseudocode('And Or Graph Search')


Out[67]:

AIMA3e

function AND-OR-GRAPH-SEARCH(problem) returns a conditional plan, or failure
 OR-SEARCH(problem.INITIAL-STATE, problem, [])


function OR-SEARCH(state, problem, path) returns a conditional plan, or failure
if problem.GOAL-TEST(state) then return the empty plan
if state is on path then return failure
for each action in problem.ACTIONS(state) do
   plan ← AND-SEARCH(RESULTS(state,action), problem, [state | path])
   if planfailure then return [action | plan]
return failure


function AND-SEARCH(states, problem, path) returns a conditional plan, or failure
for each si in states do
   plani ← OR-SEARCH(si, problem, path)
   if plani = failure then return failure
return [if s1 then plan1 else if s2 then plan2 else ... if sn-1 then plann-1 else plann]


Figure ?? An algorithm for searching AND-OR graphs generated by nondeterministic environments. It returns a conditional plan that reaches a goal state in all circumstances. (The notations [x | l] refers to the list formed by adding object x to the front of list l).

One key aspect of the algrithm is the way in which it deals with cycles, which often arises in nondeterministic problems. If the current state is identical to a state on the path from the root, then it returns with failure. This does not means that there is no solution from the current state; it simply means that if there exists a non cyclic solution, it must be reachable from the earlier incarnation of the current state, so the new incarnation can be discarded. With this check, we ensure that the algorithm terminates in every finite sapce, because every path must reach a goal, a dead end, or a repeated state.

Above pseudo code is implemented here.

Let's try to solve the Erratic Vacuum world problem using AND-OR Graph Search algorithm. We will first define the rules of erratic vacuum world. Then, we will have a look at how the code from the repository can be used to solve this problem. Note that, we want a contingency plan or a strategy as a solution to this problem.

In the erratic vacuum world, the Suck action works as follows:

  • When applied to a dirty square the action cleans the square and sometimes cleans up the dirt in an adjacent square, too.
  • When applied to a clean square the action sometimes deposits dirt on the carpet.

Erratic vacuum world can be visualized here.


In [68]:
import aima.core.agent.Action;
import aima.core.environment.vacuum.*;
import aima.core.search.agent.NondeterministicSearchAgent;
import aima.core.search.nondeterministic.NondeterministicProblem;
import static aima.core.environment.vacuum.VacuumEnvironment.*;

NondeterministicSearchAgent<VacuumEnvironmentState, Action> agent = new NondeterministicSearchAgent<>(percept -> (VacuumEnvironmentState) percept);

NondeterministicVacuumEnvironment world = new NondeterministicVacuumEnvironment(LocationState.Dirty, LocationState.Dirty);
world.addAgent(agent, LOCATION_A);

NondeterministicProblem<VacuumEnvironmentState, Action> problem = new NondeterministicProblem<>(
        (VacuumEnvironmentState) world.getCurrentState(),
        VacuumWorldFunctions::getActions,
        VacuumWorldFunctions.createResultsFunction(agent),
        VacuumWorldFunctions::testGoal,
        (s, a, sPrimed) -> 1.0);

agent.makePlan(problem);
System.out.println(agent.getPlan());


[Action[name=Suck], if {A=Clean, B=Dirty, Loc1=A} then [Action[name=Right], Action[name=Suck]], if {A=Clean, B=Clean, Loc1=A} then []]
Out[68]:
null

Let's try to solve the same problem using AND-OR Graph Search.


In [69]:
import aima.core.agent.Action;
import aima.core.environment.vacuum.*;
import aima.core.search.agent.NondeterministicSearchAgent;
import aima.core.search.nondeterministic.AndOrSearch;
import aima.core.search.nondeterministic.NondeterministicProblem;
import static aima.core.environment.vacuum.VacuumEnvironment.*;

NondeterministicSearchAgent<VacuumEnvironmentState, Action> agent = new NondeterministicSearchAgent<>(percept -> (VacuumEnvironmentState) percept);

NondeterministicVacuumEnvironment world = new NondeterministicVacuumEnvironment(LocationState.Dirty, LocationState.Dirty);
world.addAgent(agent, LOCATION_A);

NondeterministicProblem<VacuumEnvironmentState, Action> problem = new NondeterministicProblem<>(
        (VacuumEnvironmentState) world.getCurrentState(),
        VacuumWorldFunctions::getActions,
        VacuumWorldFunctions.createResultsFunction(agent),
        VacuumWorldFunctions::testGoal,
        (s, a, sPrimed) -> 1.0);

AndOrSearch andOrSearch = new AndOrSearch();
System.out.println(andOrSearch.search(problem));


Optional[[Action[name=Suck], if {A=Clean, B=Dirty, Loc1=A} then [Action[name=Right], Action[name=Suck]], if {A=Clean, B=Clean, Loc1=A} then []]]
Out[69]:
null

Searching with Partial Observation

Here the agent's percept does not suffice to pin down the exact state. Therefore, if the agent is one of the several possible states, then an action may lead to one of the several possible outcomes. The key concept required for solving the partially observable problems is the belief state, representing the agent's current belief state about the possible physical states it might be in, given the sequence of action and percept upto that point.

When the agent's percept provide no information at all, we have what is called a sensorless problem. To solve sensorless problems, we search in the space of belief states rather than physical states. Notice that, in belief state space the problem is fully observable because the agent always knows its own belief state. Furthermore, the solution(if any) is always a sequence of actions.

Online Search Agents and Unknown Environments

Offline search algorithms compute a complete solution before setting the foot into the real world and then execute the solution. In contrast, an online search agent interleaves computation and action: first, it takes an action, then it observes the environment and computes the next action. Online search is a necessary idea for unknown environments, where the agent does not know what states exist or what its actions do. In this state of ignorance, the agent faces an exploration problem and must use its action as experiments in order to learn enough to make deliberations worthwhile.

Note that, the agent cannot determine RESULT(s,a) except by actually being in $s$ and doing $a$.

The total path cost is the cost of the path the agent actually travels. The ratio of this cost to the path cost of the path agent would follow if it knew the search space in advance i.e. the actual shortest path, is called competitive ratio; we would like it to be as small as possible.

Online Search Agents

After each action, online agent recieves a percept telling it what state it has reached; from this information, it can augument its map of the environment. The agent stores this map in a table, RESULT[s,a], that records tha state resulting from executing action $a$ in state $s$.

Let's have a look at the pseudo code of Online-DFS agent.


In [70]:
%%python
from notebookUtils import *
pseudocode('Online DFS Agent')


Out[70]:

AIMA3e

function ONLINE-DFS-AGENT(s') returns an action
inputs: s', a percept that identifies the current state
persistent: result, a table indexed by state and action, initially empty
      untried, a table that lists, for each state, the actions not yet tried
      unbacktracked, a table that lists, for each state, the backtracks not yet tried
      s, a, the previous state and action, initially null

if GOAL-TEST(s') then return stop
if s' is a new state (not in untried) then untried[s'] ← ACTIONS(s')
if s is not null and s' != result[s, a] then
   result[s, a] ← s'
   add s to front of unbacktracked[s']
if untried[s'] is empty then
   if unbacktracked[s'] is empty then return stop
   else a ← an action b such that result[s', b] = POP(unbacktracked[s'])
else a ← POP(untried[s'])
ss'
return a


Figure ?? An online search agent that uses depth-first exploration. The agent is applicable only in state spaces in which every action can be "undone" by some other action.

Note that, due to its method of backtracking, Online-DFS Agent works only in state spaces where the actions are reversible. The implementation of above pseudo code can be viewed here.

Let's try to solve the maze problem (AIMA-3e Fig. 4.19) using Online-DFS agent.


In [71]:
import aima.core.agent.*;
import aima.core.environment.map.ExtendableMap;
import aima.core.environment.map.MapEnvironment;
import aima.core.environment.map.MapFunctions;
import aima.core.search.framework.problem.GeneralProblem;
import aima.core.search.framework.problem.GoalTest;
import aima.core.search.framework.problem.OnlineSearchProblem;
import aima.core.search.online.OnlineDFSAgent;

ExtendableMap extendableMap = new ExtendableMap();
extendableMap.addBidirectionalLink("1,1", "1,2", 1.0);
extendableMap.addBidirectionalLink("1,1", "2,1", 1.0);
extendableMap.addBidirectionalLink("2,1", "2,2", 1.0);
extendableMap.addBidirectionalLink("3,1", "3,2", 1.0);
extendableMap.addBidirectionalLink("2,2", "2,3", 1.0);
extendableMap.addBidirectionalLink("3,2", "3,3", 1.0);
extendableMap.addBidirectionalLink("2,3", "1,3", 1.0);
extendableMap.addBidirectionalLink("2,1", "3,1", 1.0);

StringBuffer envChange = new StringBuffer();

MapEnvironment mapEnvironment = new MapEnvironment(extendableMap);
OnlineSearchProblem problem = new GeneralProblem(null,
        MapFunctions.createActionsFunction(extendableMap),
        null,
        GoalTest.isEqual("3,3"),
        MapFunctions.createDistanceStepCostFunction(extendableMap));

OnlineDFSAgent onlineDFSAgent = new OnlineDFSAgent(problem, MapFunctions.createPerceptToStateFunction());

mapEnvironment.addAgent(onlineDFSAgent, "1,1");
mapEnvironment.addEnvironmentView(new EnvironmentView() {
    public void notify(String msg) {
        envChange.append(msg).append(" -> ");
    }

    public void agentAdded(Agent agent, Environment source) {
    }

    public void agentActed(Agent agent, Percept percept, Action action, Environment source) {
        envChange.append(action).append(" -> ");
    }
});
mapEnvironment.stepUntilDone();

System.out.println(envChange.toString());


Action[name=moveTo, location=1,2] -> Action[name=moveTo, location=1,1] -> Action[name=moveTo, location=2,1] -> Action[name=moveTo, location=1,1] -> Action[name=moveTo, location=2,1] -> Action[name=moveTo, location=2,2] -> Action[name=moveTo, location=2,1] -> Action[name=moveTo, location=3,1] -> Action[name=moveTo, location=2,1] -> Action[name=moveTo, location=3,1] -> Action[name=moveTo, location=3,2] -> Action[name=moveTo, location=3,1] -> Action[name=moveTo, location=3,2] -> Action[name=moveTo, location=3,3] -> Action[name=NoOp] -> 
Out[71]:
null

Since hill-climbing search keeps just one current state in memory, a hill-climbing search is already an online search algorithm! Unfortunately, it is not useful in its simplest form because it leaves the agent sitting at local maxima with nowhere to go. However, augmenting hill climbing search with a memory turns out to be an effective approach. The basic idea is to store a "current best estimate" $H(s)$ of the cost to reach the goal from each state that has been visited. $H(s)$ starts out being just the heuristic estimate $h(s)$ and is updated as the agent gains experience in the state space. If the agent seems to be stuck in the local minimum, rather than staying where it is, the agent should follow what seems to be the best path to the goal, given the current cost estimates for its neighbors. The estimated cost to reach the goal through a neighbor $s'$ is the cost to get $s'$ plus the estimated cost to get to a goal from there i.e. $c(s,a,s')+H(s')$. The agent will then move to the best neighbor $s'$ and $H(s)$ will get updated. Continuing this process, the agent will move back and forth several times, updating $H$ each time and "flattening out" the local minimum until it escapes the minima.

An agent implementing this scheme, is called learning real time $A^*$ ($LRTA^*$). Let's have a look at the pseudo code of $LRTA^*$ Agent.


In [72]:
%%python
from notebookUtils import *
pseudocode('LRTAStar Agent')


Out[72]:

AIMA3e

function LRTA*-AGENT(s') returns an action
inputs: s', a percept that identifies the current state
persistent: result, a table, indexed by state and action, initially empty
      H, a table of cost estimates indexed by state, initially empty
      s, a, the previous state and action, initially null

if GOAL-TEST(s') then return stop
if s' is a new state (not in H) then H[s'] ← h(s')
if s is not null
   result[s, a] ← s'
   H[s] ←  min  LRTA*-COST(s, b, result[s, b], H)
      _b_ &Element ACTIONS(_s_)
a ← an action b in ACTIONS(s') that minimizes LRTA*-COST(s', b, result[s', b], H)
ss'
return a

function LRTA*-COST(s, a, s', H) returns a cost estimate
if s' is undefined then return h(s)
else return c(s, a, s') + H[s']


Figure ?? LRTA*-AGENT selects an action according to the values of neighboring states, which are updated as the agent moves about the state space.

The above pseudo code is implemented here. Now let's try to solve the maze problem discussed earlier using $LRTA^*$ Agent.


In [73]:
import aima.core.agent.*;
import aima.core.environment.map.ExtendableMap;
import aima.core.environment.map.MapEnvironment;
import aima.core.environment.map.MapFunctions;
import aima.core.search.framework.problem.GeneralProblem;
import aima.core.search.framework.problem.GoalTest;
import aima.core.search.framework.problem.OnlineSearchProblem;
import aima.core.search.online.LRTAStarAgent;
import java.util.function.ToDoubleFunction;

ExtendableMap extendableMap = new ExtendableMap();
extendableMap.addBidirectionalLink("1,1", "1,2", 1.0);
extendableMap.addBidirectionalLink("1,1", "2,1", 1.0);
extendableMap.addBidirectionalLink("2,1", "3,1", 1.0);
extendableMap.addBidirectionalLink("2,1", "2,2", 1.0);
extendableMap.addBidirectionalLink("3,1", "3,2", 1.0);
extendableMap.addBidirectionalLink("2,2", "2,3", 1.0);
extendableMap.addBidirectionalLink("3,2", "3,3", 1.0);
extendableMap.addBidirectionalLink("2,3", "1,3", 1.0);

StringBuffer envChange = new StringBuffer();

MapEnvironment mapEnvironment = new MapEnvironment(extendableMap);
OnlineSearchProblem problem = new GeneralProblem(null,
        MapFunctions.createActionsFunction(extendableMap),
        null,
        GoalTest.isEqual("3,3"),
        MapFunctions.createDistanceStepCostFunction(extendableMap));

ToDoubleFunction<String> h = (state) -> 1.0;

LRTAStarAgent lrtaStarAgent = new LRTAStarAgent(problem,MapFunctions.createPerceptToStateFunction() ,h);


mapEnvironment.addAgent(lrtaStarAgent, "1,1");
mapEnvironment.addEnvironmentView(new EnvironmentView() {
    public void notify(String msg) {
        envChange.append(msg).append(" -> ");
    }

    public void agentAdded(Agent agent, Environment source) {
    }

    public void agentActed(Agent agent, Percept percept, Action action, Environment source) {
        envChange.append(action).append(" -> ");
    }
});
mapEnvironment.stepUntilDone();

System.out.println(envChange.toString());


Action[name=moveTo, location=1,2] -> Action[name=moveTo, location=1,1] -> Action[name=moveTo, location=2,1] -> Action[name=moveTo, location=1,1] -> Action[name=moveTo, location=2,1] -> Action[name=moveTo, location=2,2] -> Action[name=moveTo, location=2,1] -> Action[name=moveTo, location=3,1] -> Action[name=moveTo, location=2,1] -> Action[name=moveTo, location=2,2] -> Action[name=moveTo, location=2,3] -> Action[name=moveTo, location=1,3] -> Action[name=moveTo, location=2,3] -> Action[name=moveTo, location=2,2] -> Action[name=moveTo, location=2,1] -> Action[name=moveTo, location=3,1] -> Action[name=moveTo, location=3,2] -> Action[name=moveTo, location=3,1] -> Action[name=moveTo, location=3,2] -> Action[name=moveTo, location=3,3] -> Action[name=NoOp] -> 
Out[73]:
null

In [ ]: