This notebook serves as supporting material for the chapter Beyond Classical Search. The notebooks illustrate the use of the code repository and demonstrate how the code can be extended to solve various search-related problems. Classical Search (Chapter 3) addresses a single category of problems: observable, deterministic, known environments where the solution is a sequence of actions. Here, we look at what happens when these assumptions are relaxed. The discussion begins with the local search on state space. Then we'll examine what happens if we relax the assumptions of determinism and observability. At last, we'll study online search, in which the agent is faced with a state space that is initially unknown and must be explored.
In [60]:
%classpath add jar ../out/artifacts/aima_core_jar/aima-core.jar
These algorithms are suitable for problems in which all that matters is the solution state, not the path cost to reach it. For example, in the 8-queens problem, what matters is the final configuration of queens, not the order in which they were added. Therefore, Local search algorithms operate using a single current node (rather than multiple paths) and generally move only to the neighbors of that node i.e. the paths followed by the search are not retained. In addition to finding goals, local search algorithms are useful for solving pure optimization problems, in which the aim is to find the best state according to an objective function.
To understand local search, we consider the state-space landscape. A landscape has both location (defined by the state) and elevation (defined by the value of heuristic cost function or objective function). If elevation corresponds to cost, the aim is to find the global minimum; if elevation corresponds to an objective function, the aim is to find the global maximum.
It is simply a loop that continually moves in the direction of increasing value. It terminates when it reaches a "peak" where no neighbor has a higher value. It does not maintain a search tree, so the data structure for the current node need only record the state and value of objective function. Therefore, this algorithm does not look ahead beyond the immediate neighbors of the current state.
Hill climbing is sometimes called greedy local search because it grabs a good neighbor state without thinking ahead where to go next. It chooses randomly among the set of best successors if there are more than one. Unfortunately, hill climbing often gets stuck for the following reasons:-
Let's have a look at the pseudo code of hill-climbing search.
In [61]:
%%python
from notebookUtils import *
pseudocode('Hill Climbing')
Out[61]:
The implementation of the above pseudo code can be viewed here.
This algorithm halts if it reaches a plateau where the best successor has the same value as the current state. It might be a good idea to allow sideways move in the hope that plateau is really a shoulder, but we must take care. An infinite loop can occur whenever the algorithm reaches a flat local maximum that is not a shoulder. One common solution is to put a limit on the number of consecutive sideways moves allowed.
Let's try to solve 8-Queens problem using Hill-climbing search.
In [62]:
import aima.core.agent.Action;
import aima.core.environment.nqueens.NQueensBoard;
import aima.core.environment.nqueens.NQueensFunctions;
import aima.core.environment.nqueens.QueenAction;
import aima.core.search.agent.SearchAgent;
import aima.core.search.framework.problem.Problem;
import aima.core.search.local.HillClimbingSearch;
int boardSize = 8;
Problem<NQueensBoard, QueenAction> problem =
NQueensFunctions.createCompleteStateFormulationProblem(boardSize, NQueensBoard.Config.QUEEN_IN_EVERY_COL);
HillClimbingSearch<NQueensBoard, QueenAction> search = new HillClimbingSearch<>
(NQueensFunctions.createAttackingPairsHeuristicFunction());
SearchAgent<NQueensBoard, QueenAction> agent = new SearchAgent<>(problem, search);
for (Action action : agent.getActions()) {
System.out.println(action.toString());
}
System.out.println("Search Outcome=" + search.getOutcome());
System.out.println("Final State=\n" + search.getLastSearchState());
Out[62]:
Notice that the above search can either attain the goal state configuration or can get stuck at a local minimum of the queen-pair attacking heuristic function.
The hill climbing algorithms described so far are incomplete as they often get stuck on local maxima. Therefore, Random-restart hill climbing conducts a series of hill climbing searches from randomly generated initial states, until a goal is found. If each hill climbing has a probability $p$ of success, then the expected number of restarts required is $1/p$. The success of hill climbing depends very much on the shape of the state space landscape: if there are few local maxima and plateaux, random restart hill climbing will find a good solution very quickly.
A hill climbing algorithm that never makes "downhill" moves toward states with lower values is guaranteed to be incomplete, because it can get stuck on a local maximum. In contrast, a purely random walk is complete but extremely inefficient. Therefore we need to combine hill climbing with a random walk in some way that yields both efficiency and completeness. Simulated annealing is such an algorithm.
Instead of picking the best move, it picks a random move. If the move improves the situation, it is always accepted. Otherwise, the algorithm accepts the move with some probability less than one. Suppose ${\Delta E}$ is the amount by which evaluation is worsened (${\Delta E}<0$) and $T$ is the temperature such that: "bad" moves are more likely to be allowed at high $T$ and they become more unlikely as $T$ decreases. Then the probability decreases exponentially with the "badness" of the move. Mathematically, the probability of accepting the move worsening the evaluation function is given by $e^{\tfrac{\Delta E}{T}}$.
Let's have a look at the pseudo code of simulated annealing.
In [63]:
%%python
from notebookUtils import *
pseudocode('Simulated Annealing')
Out[63]:
In [64]:
import aima.core.agent.Action;
import aima.core.search.local.SimulatedAnnealingSearch;
SimulatedAnnealingSearch<String, Action> search = new SimulatedAnnealingSearch<>(null);
int deltaE = -1; //bad move as ΔE<0
double lowerTemperature = 1.0;
double higherTemperature = 20.0;
System.out.println("Probability of acceptance of move at lower temperature: " + search.probabilityOfAcceptance(lowerTemperature, deltaE));
System.out.println("Probability of acceptance of move at higher temperature: " + search.probabilityOfAcceptance(higherTemperature, deltaE));
Out[64]:
This algorithm keeps track of $k$ states rather than just one. It begins with $k$ randomly generated states. At each step, all the successors of all $k$ states are generated. If anyone is a goal then algorithm halts. Otherwise, it selects the $k$ best successors from the complete list and repeats.
Note that the two algorithms i.e. random-restart search and local beam search are quite different. In a random restart search, each search process runs independently of the others. In a local beam, useful information is passed among the parallel search threads. In effect, the algorithm quickly abandons unfruitful searches and moves its resources to where the most progress has been made.
A genetic algorithm(GA) is a variant of beam search in which successor states are generated by combining two parent states rather than by modifying a single state. Like beam search, GA begins with a set of $k$ randomly generated states, called the population. Each state is rated by an objective function, also called the fitness function. A fitness function should return higher values for better states. Therefore, the probability of being chosen for reproducing is directly proportional to the fitness score. Two pairs are selected at random for reproduction, in accordance with the probabilities described above. Then, for each pair to be mated, a crossover point is chosen at random and hence the offsprings are created by crossing over the parents at the crossover point. Finally, each offspring is subject to random mutation with a small independent probability.
Let's have a look at the pseudo code of the genetic algorithm.
In [65]:
%%python
from notebookUtils import *
pseudocode('Genetic Algorithm')
Out[65]:
In [66]:
import aima.core.environment.nqueens.NQueensGenAlgoUtil;
import aima.core.search.framework.problem.GoalTest;
import aima.core.search.local.FitnessFunction;
import aima.core.search.local.GeneticAlgorithm;
import aima.core.search.local.Individual;
import java.util.*;
int boardSize = 8;
FitnessFunction<Integer> fitnessFunction = NQueensGenAlgoUtil.getFitnessFunction();
GoalTest<Individual<Integer>> goalTest = NQueensGenAlgoUtil.getGoalTest();
// Generate an initial population
Set<Individual<Integer>> population = new HashSet<>();
for (int i = 0; i < 50; i++) {
population.add(NQueensGenAlgoUtil.generateRandomIndividual(boardSize));
}
GeneticAlgorithm<Integer> ga = new GeneticAlgorithm<>(boardSize,
NQueensGenAlgoUtil.getFiniteAlphabetForBoardOfSize(boardSize), 0.15);
Individual<Integer> bestIndividual = ga.geneticAlgorithm(population, fitnessFunction, goalTest, 0L);
System.out.println("");
System.out.println("Goal Test Best Individual=\n" + NQueensGenAlgoUtil.getBoardForIndividual(bestIndividual));
System.out.println("Board Size = " + boardSize);
System.out.println("Fitness = " + fitnessFunction.apply(bestIndividual));
System.out.println("Is Goal = " + goalTest.test(bestIndividual));
System.out.println("Population Size = " + ga.getPopulationSize());
System.out.println("Itertions = " + ga.getIterations());
System.out.println("Took = " + ga.getTimeInMilliseconds() + "ms.");
Out[66]:
When the environment is either partially observable or non-deterministic, percepts become useful. In a partially observable environment, every percept helps narrow down the set of possible states the agent might be in, thus making it easier for the agent to achieve its goal. When the environment is non-deterministic, percept tells the agent which of the possible outcomes of its action has actually occurred. In both cases, the future percepts cannot be determined in advance and the agent's future action will depend on those future percepts. So the solution to a problem is not an action sequence but a contingency plan (also known as a strategy) that specifies what to do depending on what percepts are received. Therefore, the solution for non-deterministic problems can contain nested if-then-else statements; this means that they are trees rather than sequences. This allows the selection of actions based on the contingencies arising during execution.
In a deterministic environment, the only branching is introduced by the agent's own choices in each state. We call these nodes OR nodes. In a non-deterministic environment, branching is also introduced by the environment's choice of outcome for each action. We call these nodes AND nodes. These 2 kinds of nodes alternate, leading to an AND-OR search tree.
Let's have a look at the pseudo code of the AND-OR Graph Search algorithm.
In [67]:
%%python
from notebookUtils import *
pseudocode('And Or Graph Search')
Out[67]:
One key aspect of the algrithm is the way in which it deals with cycles, which often arises in nondeterministic problems. If the current state is identical to a state on the path from the root, then it returns with failure. This does not means that there is no solution from the current state; it simply means that if there exists a non cyclic solution, it must be reachable from the earlier incarnation of the current state, so the new incarnation can be discarded. With this check, we ensure that the algorithm terminates in every finite sapce, because every path must reach a goal, a dead end, or a repeated state.
Above pseudo code is implemented here.
Let's try to solve the Erratic Vacuum world problem using AND-OR Graph Search algorithm. We will first define the rules of erratic vacuum world. Then, we will have a look at how the code from the repository can be used to solve this problem. Note that, we want a contingency plan or a strategy as a solution to this problem.
In the erratic vacuum world, the Suck action works as follows:
Erratic vacuum world can be visualized here.
In [68]:
import aima.core.agent.Action;
import aima.core.environment.vacuum.*;
import aima.core.search.agent.NondeterministicSearchAgent;
import aima.core.search.nondeterministic.NondeterministicProblem;
import static aima.core.environment.vacuum.VacuumEnvironment.*;
NondeterministicSearchAgent<VacuumEnvironmentState, Action> agent = new NondeterministicSearchAgent<>(percept -> (VacuumEnvironmentState) percept);
NondeterministicVacuumEnvironment world = new NondeterministicVacuumEnvironment(LocationState.Dirty, LocationState.Dirty);
world.addAgent(agent, LOCATION_A);
NondeterministicProblem<VacuumEnvironmentState, Action> problem = new NondeterministicProblem<>(
(VacuumEnvironmentState) world.getCurrentState(),
VacuumWorldFunctions::getActions,
VacuumWorldFunctions.createResultsFunction(agent),
VacuumWorldFunctions::testGoal,
(s, a, sPrimed) -> 1.0);
agent.makePlan(problem);
System.out.println(agent.getPlan());
Out[68]:
Let's try to solve the same problem using AND-OR Graph Search.
In [69]:
import aima.core.agent.Action;
import aima.core.environment.vacuum.*;
import aima.core.search.agent.NondeterministicSearchAgent;
import aima.core.search.nondeterministic.AndOrSearch;
import aima.core.search.nondeterministic.NondeterministicProblem;
import static aima.core.environment.vacuum.VacuumEnvironment.*;
NondeterministicSearchAgent<VacuumEnvironmentState, Action> agent = new NondeterministicSearchAgent<>(percept -> (VacuumEnvironmentState) percept);
NondeterministicVacuumEnvironment world = new NondeterministicVacuumEnvironment(LocationState.Dirty, LocationState.Dirty);
world.addAgent(agent, LOCATION_A);
NondeterministicProblem<VacuumEnvironmentState, Action> problem = new NondeterministicProblem<>(
(VacuumEnvironmentState) world.getCurrentState(),
VacuumWorldFunctions::getActions,
VacuumWorldFunctions.createResultsFunction(agent),
VacuumWorldFunctions::testGoal,
(s, a, sPrimed) -> 1.0);
AndOrSearch andOrSearch = new AndOrSearch();
System.out.println(andOrSearch.search(problem));
Out[69]:
Here the agent's percept does not suffice to pin down the exact state. Therefore, if the agent is one of the several possible states, then an action may lead to one of the several possible outcomes. The key concept required for solving the partially observable problems is the belief state, representing the agent's current belief state about the possible physical states it might be in, given the sequence of action and percept upto that point.
When the agent's percept provide no information at all, we have what is called a sensorless problem. To solve sensorless problems, we search in the space of belief states rather than physical states. Notice that, in belief state space the problem is fully observable because the agent always knows its own belief state. Furthermore, the solution(if any) is always a sequence of actions.
Offline search algorithms compute a complete solution before setting the foot into the real world and then execute the solution. In contrast, an online search agent interleaves computation and action: first, it takes an action, then it observes the environment and computes the next action. Online search is a necessary idea for unknown environments, where the agent does not know what states exist or what its actions do. In this state of ignorance, the agent faces an exploration problem and must use its action as experiments in order to learn enough to make deliberations worthwhile.
Note that, the agent cannot determine RESULT(s,a)
except by actually being in $s$ and doing $a$.
The total path cost is the cost of the path the agent actually travels. The ratio of this cost to the path cost of the path agent would follow if it knew the search space in advance i.e. the actual shortest path, is called competitive ratio; we would like it to be as small as possible.
After each action, online agent recieves a percept telling it what state it has reached; from this information, it can augument its map of the environment. The agent stores this map in a table, RESULT[s,a]
, that records tha state resulting from executing action $a$ in state $s$.
Let's have a look at the pseudo code of Online-DFS agent.
In [70]:
%%python
from notebookUtils import *
pseudocode('Online DFS Agent')
Out[70]:
Note that, due to its method of backtracking, Online-DFS Agent works only in state spaces where the actions are reversible. The implementation of above pseudo code can be viewed here.
Let's try to solve the maze problem (AIMA-3e Fig. 4.19) using Online-DFS agent.
In [71]:
import aima.core.agent.*;
import aima.core.environment.map.ExtendableMap;
import aima.core.environment.map.MapEnvironment;
import aima.core.environment.map.MapFunctions;
import aima.core.search.framework.problem.GeneralProblem;
import aima.core.search.framework.problem.GoalTest;
import aima.core.search.framework.problem.OnlineSearchProblem;
import aima.core.search.online.OnlineDFSAgent;
ExtendableMap extendableMap = new ExtendableMap();
extendableMap.addBidirectionalLink("1,1", "1,2", 1.0);
extendableMap.addBidirectionalLink("1,1", "2,1", 1.0);
extendableMap.addBidirectionalLink("2,1", "2,2", 1.0);
extendableMap.addBidirectionalLink("3,1", "3,2", 1.0);
extendableMap.addBidirectionalLink("2,2", "2,3", 1.0);
extendableMap.addBidirectionalLink("3,2", "3,3", 1.0);
extendableMap.addBidirectionalLink("2,3", "1,3", 1.0);
extendableMap.addBidirectionalLink("2,1", "3,1", 1.0);
StringBuffer envChange = new StringBuffer();
MapEnvironment mapEnvironment = new MapEnvironment(extendableMap);
OnlineSearchProblem problem = new GeneralProblem(null,
MapFunctions.createActionsFunction(extendableMap),
null,
GoalTest.isEqual("3,3"),
MapFunctions.createDistanceStepCostFunction(extendableMap));
OnlineDFSAgent onlineDFSAgent = new OnlineDFSAgent(problem, MapFunctions.createPerceptToStateFunction());
mapEnvironment.addAgent(onlineDFSAgent, "1,1");
mapEnvironment.addEnvironmentView(new EnvironmentView() {
public void notify(String msg) {
envChange.append(msg).append(" -> ");
}
public void agentAdded(Agent agent, Environment source) {
}
public void agentActed(Agent agent, Percept percept, Action action, Environment source) {
envChange.append(action).append(" -> ");
}
});
mapEnvironment.stepUntilDone();
System.out.println(envChange.toString());
Out[71]:
Since hill-climbing search keeps just one current state in memory, a hill-climbing search is already an online search algorithm! Unfortunately, it is not useful in its simplest form because it leaves the agent sitting at local maxima with nowhere to go. However, augmenting hill climbing search with a memory turns out to be an effective approach. The basic idea is to store a "current best estimate" $H(s)$ of the cost to reach the goal from each state that has been visited. $H(s)$ starts out being just the heuristic estimate $h(s)$ and is updated as the agent gains experience in the state space. If the agent seems to be stuck in the local minimum, rather than staying where it is, the agent should follow what seems to be the best path to the goal, given the current cost estimates for its neighbors. The estimated cost to reach the goal through a neighbor $s'$ is the cost to get $s'$ plus the estimated cost to get to a goal from there i.e. $c(s,a,s')+H(s')$. The agent will then move to the best neighbor $s'$ and $H(s)$ will get updated. Continuing this process, the agent will move back and forth several times, updating $H$ each time and "flattening out" the local minimum until it escapes the minima.
An agent implementing this scheme, is called learning real time $A^*$ ($LRTA^*$). Let's have a look at the pseudo code of $LRTA^*$ Agent.
In [72]:
%%python
from notebookUtils import *
pseudocode('LRTAStar Agent')
Out[72]:
The above pseudo code is implemented here. Now let's try to solve the maze problem discussed earlier using $LRTA^*$ Agent.
In [73]:
import aima.core.agent.*;
import aima.core.environment.map.ExtendableMap;
import aima.core.environment.map.MapEnvironment;
import aima.core.environment.map.MapFunctions;
import aima.core.search.framework.problem.GeneralProblem;
import aima.core.search.framework.problem.GoalTest;
import aima.core.search.framework.problem.OnlineSearchProblem;
import aima.core.search.online.LRTAStarAgent;
import java.util.function.ToDoubleFunction;
ExtendableMap extendableMap = new ExtendableMap();
extendableMap.addBidirectionalLink("1,1", "1,2", 1.0);
extendableMap.addBidirectionalLink("1,1", "2,1", 1.0);
extendableMap.addBidirectionalLink("2,1", "3,1", 1.0);
extendableMap.addBidirectionalLink("2,1", "2,2", 1.0);
extendableMap.addBidirectionalLink("3,1", "3,2", 1.0);
extendableMap.addBidirectionalLink("2,2", "2,3", 1.0);
extendableMap.addBidirectionalLink("3,2", "3,3", 1.0);
extendableMap.addBidirectionalLink("2,3", "1,3", 1.0);
StringBuffer envChange = new StringBuffer();
MapEnvironment mapEnvironment = new MapEnvironment(extendableMap);
OnlineSearchProblem problem = new GeneralProblem(null,
MapFunctions.createActionsFunction(extendableMap),
null,
GoalTest.isEqual("3,3"),
MapFunctions.createDistanceStepCostFunction(extendableMap));
ToDoubleFunction<String> h = (state) -> 1.0;
LRTAStarAgent lrtaStarAgent = new LRTAStarAgent(problem,MapFunctions.createPerceptToStateFunction() ,h);
mapEnvironment.addAgent(lrtaStarAgent, "1,1");
mapEnvironment.addEnvironmentView(new EnvironmentView() {
public void notify(String msg) {
envChange.append(msg).append(" -> ");
}
public void agentAdded(Agent agent, Environment source) {
}
public void agentActed(Agent agent, Percept percept, Action action, Environment source) {
envChange.append(action).append(" -> ");
}
});
mapEnvironment.stepUntilDone();
System.out.println(envChange.toString());
Out[73]:
In [ ]: