miércoles, 1 de septiembre de 2021

Genetic Algorithms in Search, Optimization, and Machine Learning

 Genetic Algorithms in Search, Optimization, and Machine Learning

David E. Goldberg

The University of Alabama

ADDISON-WESLEY PUBLISHING COMPANY, INC.

Reading, Massachusetts Menlo Park, California Sydney Don Mills, Ontario Madrid San Juan New York Amsterdam Wokingham, England Tokyo Bonn Singapore


Foreword

I first encountered David Goldberg as a young, PhD-bound Civil Engineer inquir ing about my course Introduction to Adaptive Systems. He was something of an anomaly becuase his severely practical field experience in the gas-pipeline indus try, and his evident interest in that industry, did not seem to dampen his interest in what was, after all, an abstract course involving a lot of "biological stuff." After he enrolled in the course, I soon realized that his expressed interests in control, gas pipelines, and AI were the surface tell-tales of a wide-ranging curiosity and a talent for careful analysis. He was, and is, an engineer interested in building, but he was, and is, equally interested in ideas.

Not long thereafter, Dave asked if I would be willing to co-chair (with Ben Wylie, the chairman of our Civil Engineering Department) a dissertation investi gating the use of genetic algorithms and classifier systems in the control of gas pipeline transmission. My first reaction was that this was too difficult a problem for a dissertation-there are no closed analytic solutions to even simple versions of the problem, and actual operation involves long, craftsmanlike apprenticeships. Dave persisted, and in a surprisingly short time produced a dissertation that, in turn, produced for him a 1985 NSF Presidential Young Investigator Award. So much for my intuition as to what constitutes a reasonable dissertation.

In the past few years GAS have gone from an arcane subject known to a few of my students, and their students, to a subject engaging the curiosity of many different research communities including researchers in economics, political sci ence, psychology, linguistics, immunology, biology, and computer science. A ma jor reason for this interest is that GAS really work. GAS offer robust procedures that can exploit massively parallel architectures and, applied to classifier systems. they provide a new route toward an understanding of intelligence and adaptation. David Goldberg's book provides a turnpike into this territory.

One cannot be around David Goldberg for long without being infected by his enthusiasm and energy. That enthusiasm comes across in this book. It is also

eir in

partic

oes it


iv

Foreword

an embodiment of his passion for clear explanations and carefully worked ex amples. His book does an exceptional job of making the methods of GAS and classifier systems available to a wide audience. Dave is deeply interested in the intellectual problems of GAS and classifier systems, but he is interested even more in seeing these systems used. This book, I think, will be instrumental in realizing that ambition.

John Holland Ann Arbor, Michigan

browerole

re

g

d

Preface

This book is about genetic algorithms (GAS)-search procedures based on the mechanics of natural selection and natural genetics. In writing it, I have tried to bring together the computer techniques, mathematical tools, and research results that will enable you to apply genetic algorithms to problems in your field. If you choose to do so, you will join a growing group of researchers and practitioners who have come to appreciate the natural analogues, mathematical analyses, and computer techniques comprised by the genetic algorithm methodology.

The book is designed to be a textbook and a self-study guide. I have tested the draft text in a one semester, senior-level undergraduate/first-year graduate course devoted to genetic algorithms. Although the students came from different backgrounds (biochemistry, chemical engineering, computer science, electrical engineering, engineering mechanics, English, mathematics, mechanical engineer ing, and physics) and had wide differences in mathematical and computational maturity, they all acquired an understanding of the basic algorithm and its theory of operation. To reach such a diverse audience, the tone of the book is intention ally casual, and rigor has almost always been sacrificed in the interest of building intuition and understanding. Worked out examples illustrate major topics, and computer assignments are available at the end of each chapter.

I have minimized the mathematics, genetics, and computer background re quired to read this book. An understanding of introductory college-level mathe matics (algebra and a little calculus) is assumed. Elementary notions of counting and finite probability are used, and Appendix A summarizes the important con cepts briefly. I assume no particular knowledge of genetics and define all required genetic terminology and concepts within the text. Last, some computer program ming ability is necessary. If you have programmed a computer in any language, you should be able to follow the computer examples I present. All computer code in this book is written in Pascal, and Appendix B presents a brief introduction to the essentials of that language.Preface

Although I have not explicitly subdivided the book into separate parts, the chapters may be grouped in two major categories: those dealing with search and optimization and those dealing with machine learning.

The first five chapters are devoted to genetic algorithms in search and opti mization. Chapter I introduces the topic of genetic search; it also describes a simple genetic algorithm and illustrates the GA's application through a hand cal culation. Chapter 2 introduces the essential theoretical basis of GAS, covering topics including schemata, the fundamental theorem, and extended analysis. If you dislike theory, you can safely skip Chapter 2 without excessive loss of con tinuity; however, before doing so, I suggest you try reading it anyway. The math ematical underpinnings of GAS are not difficult to follow, but their ramifications are subtle: some attention to analysis early in the study of GAS promotes fuller understanding of algorithm power. Chapter 3 introduces computer implementa tion of genetic algorithms through example. Specifically, a Pascal code called the simple genetic algorithm (SGA) is presented along with a number of extensions. Chapter 4 presents a historical account of early genetic algorithms together with a potpourri of current applications. Chapter 5 examines more advanced genetic operators and presents a number of applications illustrating their use. These in clude applications of micro- and macro-level operators as well as hybrid techniques.

Chapters 6 and 7 present the application of genetic algorithms in machine learning systems. Chapter 6 gives a generic description of one type of genetics based machine learning (GBML) system, a classifier system. The theory of oper ation of such a system is briefly reviewed, and one Pascal implementation called the simple classifier system (SCS) is presented and applied to the learning of a boolean function. Chapter 7 rounds out the picture of GBMI. by presenting a historical review of early GBML systems together with a selective survey of other current systems and topics.

ACKNOWLEDGMENTS

In writing acknowledgments for a book on genetic algorithms, there is no ques tion who should get top billing. I thank John H. Holland from the University of Michigan for his encouragement of this project and for giving birth to the infant we now recognize as the genetic algorithms movement. It hasn't been easy nur turing such a child. At times she showed signs of stunted intellectual growth, and the other kids on the block haven't always treated her very nicely. Nonetheless, John stood by his baby with the quiet confidence only a father can possess, know ing that his daughter would one day take her rightful place in the community of ideas.

I also thank two men who have influenced me in more ways than they know: E. Benjamin Wylie and William D. Jordan. Ben Wylie was my dissertation adviserPreface

Corporation, Mr. Peter Prater, the Rowland Institute for Science, Texas Instru ments Incorporated, and The University of Alabama.

Last, it has become a cliche in textbooks and monographs; after thanking one and all for their assistance, the author gallantly accepts blame for all remaining errors in the text. This is usually done with no small amount of pomp and cir cumstance-a ritualistic incantation to ward off the evil spirits of error. I will forgo this exercise and close these acknowledgments by paraphrasing a piece of graffiti that I first spotted on the third floor of the West Engineering Building at the University of Michigan:

To err is human. To really foul up, use a computer.

Unfortunately, in writing this book, I find myself subject to both of these sources of error, and no doubt many mistakes remain. I can only take comfort in knowing that error is the one inevitable side effect of our human past and the probable destiny of our artificially intelligent future.


ru

ne

ng

of

at

6 Contents

FOREWORD iii PREFACE V

A GENTLE INTRODUCTION TO GENETIC ALGORITHMS

What Are Genetic Algorithms?

1

1

1 Robustness of Traditional Optimization and Search Methods 2

The Goals of Optimization 6 How Are Genetic Algorithms Different from Traditional Methods?

23A Simple Genetic Algorithm 10

Genetic Algorithms at Work-a Simulation by hand 15 Grist for the Search Mill-Important Similarities 18 Similarity Templates (Schemata) 19

Learning the Lingo 21

Summary 22

Problems 23

Computer Assignments 25

GENETIC ALGORITHMS REVISITED: MATHEMATICAL FOUNDATIONS 27

Who Shall Live and Who Shall Die? The Fundamental Theorem 28 Schema Processing at Work: An Example by Hand Revisited 33 The Two-armed and k-armed Bandit Problem 36 How Many Schemata Are Processed Usefully? 40

7

2


Contents

Contents

The Building Block Hypothesis +1

Another Perspective: The Minimal Deceptive Problem 46 Schemata Revisited: Similarity Templates as Hyperplanes 53 Problems

Summary 54 55

Computer Assignments 56

3 COMPUTER IMPLEMENTATION OF A GENETIC ALGORITHM 59

Data Structures 60

Reproduction, Crossover, and Mutation 62 Get with the Main Program How Well Does it Work? 70 Summary Problems 86 87

A Time to Reproduce, a Time to Cross 66 68

Mapping Objective Functions to Fitness Form 75

Fitness Scaling 76 Codings 80

A Multiparameter, Mapped, Fixed-Point Coding 82

Discretization 84

Constraints 85

Computer Assignments 88

SOME APPLICATIONS OF GENETIC ALGORITHMS 89

The Rise of Genetic Algorithms 89

Genetic Algorithm Applications of Historical Interest Current Applications of Genetic Algorithms 125 92

De Jong and Function Optimization 106 Improvements in Basic Technique 120

Summary 142 Problems 143

Computer Assignments 145

ENG

5 ADVANCED OPERATORS AND TECHNIQUES IN GENETIC SEARCH 147

Dominance, Diploidy, and Abeyance 148

Inversion and Other Reordering Operators 166

Other Micro-operators. 179 Niche and Speciation 185

Multiobjective Optimization 197 Knowledge-Based Techniques 201

Genetic Algorithms and Parallel Processors Summary: 212 Computer Assignments 214 208

Problems 213

INTRODUCTION TO GENETICS-BASED MACHINE LEARNING 217

Genetics-Based Machine Learning: Whence It Came 218 A Simple Classifier System in Pascal 230

What is a Classifier System? 221 Rule and Message System 223

Apportionment of Credit: The Bucket Brigade 225 Genetic Algorithm 229

Results Using the Simple Classifier System 245

Summary Problems. 256 258

Computer Assignments 259

7 APPLICATIONS OF GENETICS-BASED MACHINE LEARNING 261

The Rise of GBML 261

Development of CS-1, the First Classifier System 265 Smith's Poker Player 270

Other Early GBML Efforts 276

A Potpourri of Current Applications 293 Summary 304

Problems 306

Computer Assignments 307

8 A LOOK BACK, A GLANCE AHEAD 309

APPENDIXES 313


xii

Contents:

A A REVIEW OF COMBINATORICS AND ELEMENTARY PROBABILITY 313

Counting 313

Permutations 314 Combinations 316.

Binomial Theorem 316. Axioms of Probability 318 Partitions of an Event 321 Independent Events 322 Two Probability Distributions: Bernoulli and Binomial 323

Events and Spaces 317

Equally Likely Outcomes 319 Conditional Probability 321

Bayes' Rule 322

Expected Value of a Random Variable 323 Limit Theorems 324

Summary 324

Problems. 325

B PASCAL WITH RANDOM NUMBER GENERATION FOR FORTRAN, BASIC, AND COBOL PROGRAMMERS 327

Simple 1: An Extremely Simple Code 327

Simple 2: Functions. Procedures, and More I/O Let's Do Something 332 Last Stop Before Freeway 338 Summary 341 330

C A SIMPLE GENETIC ALGORITHM (SGA) IN PASCAL 343

D A SIMPLE CLASSIFIER SYSTEM (SCS) IN PASCAL 351

E PARTITION COEFFICIENT TRANSFORMS FOR PROBLEM-CODING ANALYSIS 373

Partition Coefficient Transform 374 An Example: f(x) = x² on Three Bits a Day 375 What do the Partition Coefficients Mean? 376

Contents.

Using Partition Coefficients to Analyze Deceptive Problems Designing GA-Deceptive Problems with Partition Coefficients 377 377

Summary 378

Problems 378 Computer Assignments 379

BIBLIOGRAPHY 381 INDEX 403

Algonthims


1 A Gentle Introduction to Genetic Algorithms

In this chapter, we introduce genetic algorithms: what they are, where they came from, and how they compare to and differ from other search procedures. We illustrate how they work with a hand calculation, and we start to understand their power through the concept of a schema or similarity template.

WHAT ARE GENETIC ALGORITHMS?

Genetic algorithms are search algorithms based on the mechanics of natural se lection and natural genetics. They combine survival of the fittest among string structures with a structured yet randomized information exchange to form a search algorithm with some of the innovative flair of human search. In every generation, a new set of artificial creatures (strings) is created using bits and pieces of the fittest of the old; an occasional new part is tried for good measure. While randomized, genetic algorithms are no simple random walk. They effi ciently exploit historical information to speculate on new search points with ex pected improved performance.

Genetic algorithms have been developed by John Holland, his colleagues, and his students at the University of Michigan. The goals of their research have been twofold: (1) to abstract and rigorously explain the adaptive processes of natural systems, and (2) to design artificial systems software that retains the important mechanisms of natural systems. This approach has led to important discoveries in both natural and artificial systems science.

The central theme of research on genetic algorithms has been robustness. the balance between efficiency and efficacy necessary for survival in many differ


3

Chapter 1/A Gentle Introduction to Genetic Algorithms

ent environments. The implications of robustness for artificial systems are mani fold. If artificial systems can be made more robust, costly redesigns can be reduced or eliminated. If higher levels of adaptation can be achieved. existing systems can perform their functions longer and better. Designers of artificial sys tems-both software and hardware, whether engineering systems, computer sys tems, or business systems can only marvel at the robustness, the efficiency, and the flexibility of biological systems. Features for self-repair. self-guidance, and re production are the rule in biological systems, whereas they barely exist in the most sophisticated artificial systems.

Thus, we are drawn to an interesting conclusion: where robust performance is desired (and where is it not?), nature does it better: the secrets of adaptation and survival are best learned from the careful study of biological example. Yet we do not accept the genetic algorithm method by appeal to this beauty-of-nature argument alone. Genetic algorithms are theoretically and empirically proven to provide robust search in complex spaces. The primary monograph on the topic is Holland's (1975) Adaptation in Natural and Artificial Systems. Many papers and dissertations sh the validity of the technique in function optimization and control applications. Having been established as a valid approach to problems requiring efficient and effective search. genetic algorithms are now finding more widespread application in business, scientific, and engineering circles. The rea sons behind the growing numbers of applications are clear. These algorithms are computationally simple yet powerful in their search for improvement. Further more, they are not fundamentally limited by restrictive assumptions about the search space (assumptions concerning continuity, existence of derivatives, uni modality, and other matters). We will investigate the reasons behind these attrac tive qualities: but before this, we need to explore the robustness of more widely accepted search procedures.

ROBUSTNESS OF TRADITIONAL OPTIMIZATION AND SEARCH METHODS

This book is not a comparative study of search and optimization techniques. Nonetheless, it is important to question whether conventional search methods meet our robustness requirements. The current literature identifies three main types of search methods: calculus-based, enumerative, and random. Let us ex amine each type to see what conclusions may be drawn without formal testing. Calculus-based methods have been studied heavily. These subdivide into two main classes: indirect and direct. Indirect methods seek local extrema by solving the usually nonlinear set of equations resulting from setting the gradient of the objective function equal to zero. This is the multidimensional generalization of the elementary calculus notion of extremal points, as illustrated in Fig. 1.1. Given a smooth, unconstrained function, finding a possible peak starts by restricting search to those points with slopes of zero in all directions. On the other hand.

Robustness of Traditional Optimization and Search Methods

y

f(x,y)

FIGURE 1.1 The single-peak function is easy for calculus-based methods.

direct (search) methods seek local optima by hopping on the function and mov ing in a direction related to the local gradient. This is simply the notion of bill. climbing to find the local best, climb the function in the steepest permissible direction. While both of these calculus-based methods have been improved. extended, hashed, and rehashed, some simple reasoning shows their lack of robustness.

First, both methods are local in scope; the optima they seek are the best in a neighborhood of the current point. For example, suppose that Fig. 1.1 shows a portion of the complete domain of interest: a more complete picture is shown in Fig. 1.2. Clearly, starting the search or zero-finding procedures in the neighbor hood of the lower peak will cause us to miss the main event (the higher peak). Furthermore, once the lower peak is reached, further improvement must be sought through random restart or other trickery. Second, calculus-based methods depend upon the existence of derivatives (well-defined slope values). Even if we allow numerical approximation of derivatives, this is a severe shortcoming. Many practical parameter spaces have little respect for the notion of a derivative and the smoothness this implies. Theorists interested in optimization have been too willing to accept the legacy of the great eighteenth and nineteenth-century math ematicians who painted a clean world of quadratic objective functions, ideal con straints, and ever present derivatives. The real world of search is fraught with discontinuities and vast multimodal, noisy search spaces as depicted in a less calculus-friendly function in Fig. 1.3. It comes as no surprise that methods de pending upon the restrictive requirements of continuity and derivative existence are unsuitable for all but a very limited problem domain. For this reason andChapter 1/A Gentle Introduction to Genetic Algorithms

5

Robustness of Traditional Optimization and Search Methods

algorithm is attractive, and enumeration is a very human kind of search (when the number of possibilities is small), such schemes must ultimately be discounted in the robustness race for one simple reason: lack of efficiency. Many practical. spaces are simply too large to search one at a time and still have a chance of using the information to some practical end. Even the highly touted enumerative scheme dynamic programming breaks down on problems of moderate size and complexity, suffering from a malady melodramatically labeled the "curse of di mensionality" by its creator (Bellman, 1961). We must conclude that less clever enumerative schemes are similarly, and more abundantly, cursed for real problems.

Random search algorithms have achieved increasing popularity as research ers have recognized the shortcomings of calculus-based and enumerative schemes. Yet, random walks and random schemes that search and save the best must also be discounted because of the efficiency requirement. Random searches, in the long run, can be expected to do no better than enumerative schemes. In our haste to discount strictly random search methods, we must be careful to separate them from randomized techniques. The genetic algorithm is an example of a search procedure that uses random choice as a tool to guide a highly exploi tative search through a coding of a parameter space. Using random choice as a tool in a directed search process seems strange at first, but nature contains many examples. Another currently popular search technique, simulated annealing, uses random processes to help guide its form of search for minimal energy states. A recent book (Davis, 1987) explores the connections between simulated an nealing and genetic algorithms. The important thing to recognize at this juncture is that randomized search does not necessarily imply directionless search.

While our discussion has been no exhaustive examination of the myriad methods of traditional optimization, we are left with a somewhat unsettling con clusion: conventional search methods are not robust. This does not imply that they are not useful. The schemes mentioned and countless hybrid combinations and permutations have been used successfully in many applications; however, as more complex problems are attacked, other methods will be necessary. To put this point in better perspective, inspect the problem spectrum of Fig. 1.4. In the figure a mythical effectiveness index is plotted across a problem continuum for a specialized scheme, an enumerative scheme, and an idealized robust scheme. The gradient technique performs well in its narrow problem class, as we expect, but it becomes highly inefficient (if useful at all) elsewhere. On the other hand, the enumerative scheme performs with egalitarian inefficiency across the spectrum of problems, as shown by the lower performance curve. Far more desirable would be a performance curve like the one labeled Robust Scheme. It would be worth while sacrificing peak performance on a particular problem to achieve a relatively high level of performance across the spectrum of problems. (Of course, with broad, efficient methods we can always create hybrid schemes that combine the best of the local search method with the more general robust scheme. We will have more to say about this possibility in Chapter 5.) We shall soon see how genetic algorithms help fill this robustness gap.

f(x, y)

FIGURE 1.2 we climb? The multiple-peak function causes a dilemma. Which hill should

because of their inherently local scope of search, we must reject calculus-based methods. They are insufficiently robust in unintended domains. Enumerative schemes have been considered in many shapes and sizes. The idea is fairly straightforward: within a finite search space, or a discretized infinite search space, the search algorithm starts looking at objective function values at every point in the space, one at a time. Although the simplicity of this type of

160

150

140

1:30

120

110

f(x) 100

40

30

20

10

0.00

0.20

0.40

0.60

0.80

1.00

X

FIGURE 1.3 Many functions are noisy and discontinuous and thus unsuitable for search by traditional methods.Chapter 1/A Gentle Introduction to Genetic Algorithms

How are Genetic Algorithms Different from Traditional Methods?

Consider a human decision maker, for example, a businessman. How do we judge his decisions? What criteria do we use to decide whether he has done a good or bad job? Usually we say he has done well when he makes adequate selections within the time and resources allotted. Goodness is judged relative to his competition. Does he produce a better widget? Does he get it to market. more efficiently? With better promotion? We never judge a businessman by an attainment-of-the-best criterion; perfection is all too stern a taskmaster. As a re sult, we conclude that convergence to the best is not an issue in business or in most walks of life; we are only concerned with doing better relative to others. Thus, if we want more humanlike optimization tools, we are led to a reordering of the priorities of optimization. The most important goal of optimization is im provement. Can we get to some good, "satisficing" (Simon, 1969) level of per formance quickly? Attainment of the optimum is much less important for complex systems. It would be nice to be perfect: meanwhile, we can only strive to improve. In the next chapter we watch the genetic algorithm for these quali ties: here we outline some important differences between genetic algorithms and more traditional methods.

HOW ARE GENETIC ALGORITHMS DIFFERENT FROM TRADITIONAL METHODS?

In order for genetic algorithms to surpass their more traditional cousins in the quest for robustness, GAS must differ in some very fundamental ways. Genetic algorithms are different from more normal optimization and search procedures in four ways:

1. GAS work with a coding of the parameter set, not the parameters themselves. 2. GAS search from a population of points, not a single point. 3. GAS use payoff (objective function) information, not derivatives or other auxiliary knowledge.

4. GAS use probabilistic transition rules, not deterministic rules. Genetic algorithms require the natural parameter set of the optimization problem to be coded as a finite-length string over some finite alphabet. As an example, consider the optimization problem posed in Fig. 1.5. We wish to maxi mize the function f(x) = x² on the integer interval [0, 31]. With more traditional methods we would be tempted to twiddle with the parameter x, turning it like the vertical hold knob on a television set, until we reached the highest objective function value. With GAS, the first step of our optimization process is to code the parameter x as a finite-length string. There are many ways to code the x param eter, and Chapter 3 examines some of these in detail. At the moment, let's con sider an optimization problem where the coding comes a bit more naturally. Consider the black box switching problem illustrated in Fig. 1.6. This prob

lem concerns a black box device with a bank of five input switches. For every

setting of the five switches, there is an output signal f mathematically f = f(s).

Robust Scheme

Specialized Scheme

Efficiency

Enumeration or

Random Walk

combinatorial

unimodal

Problem Type

FIGURE 1.4 Many traditional schemes work well in a narrow problem domain. Enumerative schemes and random walks work equally inefficiently across a broad spectrum. A robust method works well across a broad spectrum of problems.

THE GOALS OF OPTIMIZATION

Before examining the mechanics and power of a simple genetic algorithm, we must be clearer about our goals when we say we want to optimize a function or a process. What are we trying to accomplish when we optimize? The conven tional view is presented well by Beightler. Phillips, and Wilde (1979. p. 1)

Man's longing for perfection finds expression in the theory of optimiza tion. It studies how to describe and attain what is Best, once one knows. how to measure and alter what is Good or Bad.... Optimization theory encompasses the quantitative study of optima and methods for finding them.

Thus optimization seeks to improve performance toward some optimal point or points. Note that this definition has two parts: (1) we seek improvement to ap proach some (2) optimal point. There is a clear distinction between the process of improvement and the destination or optimum itself. Yet, in judging optimiza tion procedures we commonly focus solely upon convergence (does the method reach the optimum?) and forget entirely about interim performance. This empha sis stems from the origins of optimization in the calculus. It is not, however, a natural emphasis.

multimodalChapter 1/A Gentle Introduction to Genetic Algorithms

How are Genetic Algorithms Different from Traditional Methods?2

1000

500

FIGURE 1.5 A simple function the integer interval [0, 31].

where s is a particular setting of the five switches. The objective of the problem is to set the switches to obtain the maximum possible f value. With other meth ods of optimization we might work directly with the parameter set (the switch settings) and toggle switches from one setting to another using the transition rules of our particular method. With genetic algorithms, we first code the switches as a finite-length string. A simple code can be generated by considering a string of five I's and 0's where each of the five switches is represented by a 1 if the switch is on and a 0 if the switch is off. With this coding, the string 11110 codes the setting where the first four switches are on and the fifth switch is off. Some of the codings introduced later will not be so obvious, but at this juncture we acknowledge that genetic algorithms use codings. Later it will be apparent

|მემმე

PAYOFF $

f(s) OUTPUT SIGNAL

FIGURE 1.6 A black box optimization problem with five on-off switches illus trates the idea of a coding and a payoff measure. Genetic algorithms only require these two things: they don't need to know the workings of the black box.

optimization example, the function f(x) = x² on

that genetic algorithms exploit coding similarities in a very general way: as a

result, they are largely unconstrained by the limitations of other methods (con tinuity, derivative existence, unimodality, and so on). In many optimization methods, we move gingerly from a single point in the decision space to the next using some transition rule to determine the next point. This point-to-point method is dangerous because it is a perfect prescription for locating false peaks in multimodal (many-peaked) search spaces. By contrast, GAS work from a rich database of points simultaneously (a population of strings). climbing many peaks in parallel; thus, the probability of finding a false peak is reduced over methods that go point to point. As an example, let's consider our black box optimization problem (Fig. 1.6) again. Other techniques for solving this problem might start with one set of switch settings, apply some transition rules, and generate a new trial switch setting. A genetic algorithm starts with a population of strings and thereafter generates successive populations of strings. For example, in the five-switch problem, a random start using successive coin flips (head = 1, tail 0) might generate the initial population of size n = 4 (small by genetic algorithm standards):

01101 11000 01000

10011

After this start, successive populations are generated using the genetic algorithm. By working from a population of well-adapted diversity instead of a single point. the genetic algorithm adheres to the old adage that there is safety in numbers: we will soon see how this parallel flavor contributes to a genetic algorithm's robustness.

Many search techniques require much auxiliary information in order to work properly. For example, gradient techniques need derivatives (calculated analyti cally or numerically) in order to be able to climb the current peak, and other local search procedures like the greedy techniques of combinatorial optimization (Lawler, 1976: Syslo, Deo, and Kowalik, 1983) require access to most if not all tabular parameters. By contrast, genetic algorithms have no need for all this aux iliary information: GAS are blind. To perform an effective search for better-and better structures, they only require payoff values (objective function values) as sociated with individual strings. This characteristic makes a GA a more canonical method than many search schemes. After all, every search problem has a metric (or metrics) relevant to the search; however, different search problems have vastly different forms of auxiliary information. Only if we refuse to use this aux iliary information can we hope to develop the broadly based schemes we desire. On the other hand, the refusal to use specific knowledge when it does exist can place an upper bound on the performance of an algorithm when it goes head to head with methods designed for that problem. Chapter 5 examines ways to use nonpayoff information in so-called knowledge-directed genetic algorithms; how ever, at this juncture we stress the importance of the blindness assumption to pure genetic algorithm robustness.10

Chapter 1/A Gentle Introduction to Genetic Algorithms

Unlike many methods, GAS use probabilistic transition rules to guide their search. To persons familiar with deterministic methods this seems odd, but the use of probability does not suggest that the method is some simple random search; this is not decision making at the toss of a coin. Genetic algorithms use random choice as a tool to guide a search toward regions of the search space with likely improvement.

Taken together, these four differences-direct use of a coding, search from a population, blindness to auxiliary information, and randomized operators-con tribute to a genetic algorithm's robustness and resulting advantage over other more commonly used techniques. The next section introduces a simple three operator genetic algorithm.

A SIMPLE GENETIC ALGORITHM

The mechanics of a simple genetic algorithm are surprisingly simple, involving nothing more complex than copying strings and swapping partial strings. The explanation of why this simple process works is much more subtle and powerful. Simplicity of operation and power of effect are two of the main attractions of the genetic algorithm approach.

The previous section pointed out how genetic algorithms process popula tions of strings. Recalling the black box switching problem, remember that the initial population had four strings:

01101

11000

01000

10011

Also recall that this population was chosen at random through 20 successive flips of an unbiased coin. We now must define a set of simple operations that take this initial population and generate ecessive populations that (we hope) improve

over time.

A simple genetic algorithm that yields good results in many practical prob lems is composed of three operators:

1. Reproduction 2. Crossover

3. Mutation

Reproduction is a process in which individual strings are copied according to their objective function values, f(biologists call this function the fitness func tion). Intuitively, we can think of the function as some measure of profit, utility. or goodness that we want to maximize. Copying strings according to their fitness values means that strings with a higher value have a higher probability of con tributing one or more offspring in the next generation. This operator, of course, is an artificial version of natural selection, a Darwinian survival of the fittest

11

A Simple Genetic Algorithm

Values.

TABLE 1.1 Sample Problem Strings and Fitness

No.

String

01101

11000

01000

10011

Fitness.

169

576

64

361

1170

%% of Total

14.4

49.2

5.5

30.9

100.0

2

3

Total

among string creatures. In natural populations fitness is determined by a crea ture's ability to survive predators, pestilence, and the other obstacles to adult hood and subsequent reproduction. In our unabashedly artificial setting, the objective function is the final arbiter of the string-creature's life or death. The reproduction operator may be implemented in algorithmic form in a number of ways. Perhaps the easiest is to create a biased roulette wheel where each current string in the population has a roulette wheel slot sized in proportion to its fitness. Suppose the sample population of four strings in the black box problem has objective or fitness function values f as shown in Table 1.1 (for now we accept these values as the output of some unknown and arbitrary black box later we will examine a function and coding that generate these same values). Summing the fitness over all four strings, we obtain a total of 1170. The percentage of population total fitness is also shown in the table. The correspond ing weighted roulette wheel for this generation's reproduction is shown in Fig. 1.7. To reproduce, we simply spin the weighted roulette wheel thus defined four times. For the example problem, string number 1 has a fitness value of 169, which represents 14.4 percent of the total fitness. As a result, string 1 is given 14.4 percent of the biased roulette wheel, and each spin turns up string 1 with prob

4

30.9%

3 5.5%

14.4%

49.2%

FIGURE 1.7 Simple reproduction allocates offspring strings using a roulette wheel with slots sized according to fitness. The sample wheel is sized for the problem of Tables 1.1 and 1.2.

TT12

Chapter 1/A Gentle Introduction to Genetic Algorithms

ability 0.144. Each time we require another offspring, a simple spin of the weighted roulette wheel yields the reproduction candidate. In this way, more highly fit strings have a higher number of offspring in the succeeding generation Once a string has been selected for reproduction, an exact replica of the string is made. This string is then entered into a mating pool, a tentative new population. for further genetic operator action.

After reproduction, simple crossover (Fig. 1.8) may proceed in two steps. First, members of the newly reproduced strings in the mating pool are mated at random. Second, each pair of strings undergoes crossing over as follows: an in teger position k along the string is selected uniformly at random between 1 and the string length less one [1, 1-1. Two new strings are created by swapping all characters between positions k + 1 and 1 inclusively. For example, consider strings A, and A, from our example initial population:

A₁ = 0 1 1 0 1 A₂ = 1 1 0 0 0

Suppose in choosing a random number between 1 and 4, we obtain a k = 4 (as indicated by the separator symbol). The resulting crossover yields two new strings where the prime (') means the strings are part of the new generation:

A₁ = 0 1 1 0 0 A₂ = 1 1 0 0 1

BEFORE CROSSOVER

CROSSING SITE

AFTER CROSSOVER

NEV STRING 1

STRING

CROSSOVER

13

A Simple Genetic Algorithm

The mechanics of reproduction and crossover are surprisingly simple, involv. ing random number generation, string copies, and some partial string exchanges. Nonetheless, the combined emphasis of reproduction and the structured, though randomized, information exchange of crossover give genetic algorithms much of their power. At first this seems surprising. How can two such simple (and com putationally trivial) operators result in anything useful, let alone a rapid and ro bust search mechanism? Furthermore, doesn't it seem a little strange that chance should play such a fundamental role in a directed search process? We will ex amine a partial answer to the first of these two questions in a moment; the answer to the second question was well recognized by the mathematician J. Hadamard (1949, p. 29);

We shall see a little later that the possibility of imputing discovery to pure chance is already excluded. On the contrary, that there is an intervention of chance but also a necessary work of unconsciousness. the latter implying and not contradicting the former.... Indeed, it is ob vious that invention or discovery, be it in mathematics or anywhere else. takes place by combining ideas.

Hadamard suggests that even though discovery is not a result-cannot be a re sult-of pure chance, it is almost certainly guided by directed serendipity. Fur thermore, Hadamard hints that a proper role for chance in a more humanlike discovery mechanism is to cause the juxtaposition of different notions. It is in teresting that genetic algorithms adopt Hadamard's mix of direction and chance in a manner that efficiently builds new solutions from the best partial solutions of previous trials.

To see this, consider a population of n strings (perhaps the four-string pop ulation for the black box problem) over some appropriate alphabet, coded so that each is a complete idea or prescription for performing a particular task (in this case, each string is one complete switch-setting idea). Substrings within each string (idea) contain various notions of what is important or relevant to the task. Viewed in this way, the population contains not just a sample of n ideas; rather, it contains a multitude of notions and rankings of those notions for task perfor mance. Genetic algorithms ruthlessly exploit this wealth of information by (1) reproducing high-quality notions according to their performance and (2) cross ing these notions with many other high-performance notions from other strings. Thus, the action of crossover with previous reproduction speculates on new ideas constructed from the high-performance building blocks (notions) of past trials. In passing, we note that despite the somewhat fuzzy definition of a notion, we have not limited a notion to simple linear combinations of sing features or pairs of features. Biologists have long recognized that evolution must efficiently pro cess the epistasis (positionwise nonlinearity) that arises in nature. In a similar manner, the notion processing of genetic algorithms must effectively process no tions even when they depend upon their component features in highly nonlinear

and complex ways.

epistasis positionwise nonlinearity.

STRING 2 M

WILL NEW STRING

FIGURE 1.8 A schematic of simple crossover shows the alignment of two strings and the partial exchange of information, using a cross site chosen at random.14

Chapter 1/A Gentle Introduction to Genetic Algorithms4

Exchanging of notions to form new ideas is appealing intuitively, if we think in terms of the process of innovation. What is an innovative idea? As Hadamard suggests, most often it is a juxtaposition of things that have worked well in the past. In much the same way, reproduction and crossover combine to search po. tentially pregnant new ideas. This experience of emphasis and crossing is analo. gous to the human interaction many of us have observed at a trade show or scientific conference. At a widget conference, for example. various widget perts from around the world gather to discuss the latest in widget technology After the lecture sessions, they all pair off around the bar to exchange widget stories. Well-known widget experts, of course, are in greater demand and ex change more ideas, thoughts, and notions with their lesser known widget col. leagues. When the show ends, the widget people return to their widget laboratories to try out a surfeit of widget innovations. The process of reproduc tion and crossover in a genetic algorithm is this kind of exchange. High-perfor mance notions are repeatedly tested and exchanged in the search for better and better performance.

If reproduction according to fitness combined with crossover gives genetic algorithms the bulk of their processing power, what then is the purpose of the mutation operator? Not surprisingly, there is much confusion about the role of mutation in genetics (both natural and artificial). Perhaps it is the result of too many B movies detailing the exploits of mutant eggplants that consume mass quantities of Tokyo or Chicago, but whatever the cause for the confusion, we find that mutation plays a decidedly secondary role in the operation of genetic algo rithms. Mutation is needed because, even though reproduction and crossover effectively search and recombine extant notions, occasionally they may become overzealous and lose some potentially useful genetic material (I's or 0's at partic ular locations). In artificial genetic systems, the mutation operator protects against such an irrecoverable loss. In the simple GA, mutation is the occasional (with small probability) random alteration of the value of a string position. In the binary coding of the black box problem, this simply means changing a 1 to a 0 and vice versa. By itself, mutation is a random walk through the string space. When used sparingly with reproduction and crossover, it is an insurance policy against premature loss of important notions.

That the mutation operator plays a secondary role in the simple GA, we sim ply note that the frequency of mutation to obtain good results in empirical genetic algorithm studies is on the order of one mutation per thousand bit (po sition) transfers. Mutation rates are similarly small (or smaller) in natural popu lations, leading us to conclude that mutation is appropriately considered as a secondary mechanism of genetic algorithm adaptation.

Other genetic operators and reproductive plans have been abstracted from the study of biological example. However, the three examined in this section. reproduction, simple crossover, and mutation, have proved to be both computa tionally simple and effective in attacking a number of important optimization problems. In the next section, we perform a hand simulation of the simple genetic algorithm to demonstrate both its mechanics and wer.

15

Genetic Algorithms at Work-A Simulation by Hand

GENETIC ALGORITHMS AT WORK-A SIMULATION BY HANDAT

Let's apply our simple genetic algorithm to a particular optimization problem step by step. Consider the problem of maximizing the function f(x) x², where x is permitted to vary between 0 and 31, a function displayed earlier as Fig. 1.5. To use a genetic algorithm we must first code the decision variables of our problem as some finite-length string. For this problem, we will code the variable x simply as a binary unsigned integer of length 5. Before we proceed with the simulation, let's briefly review the notion of a binary integer. As decadigited creatures, we have little problem handling base 10 integers and arithmetic. For example, the five-digit number 53.095 may be thought of as

5-10 + 5-103 + 0-10 + 9-10 + 5-1 = 53,095.

In base 2 arithmetic, we of course only have two digits to work with, 0 and 1. and as an example the number 10,011 decodes to the base 10 number

1.2 + 0-23 +0-22 + 1-21 + 1-20 16 + 2 + 1 = 19.

With a five-bit (binary digit) unsigned integer we can obtain numbers between 0 (00000) and 31 (11111). With a well-defined objective function and coding. we now simulate a single generation of a genetic algorithm with reproduction. crossover, and mutation. To start off, we select an initial population at random. We select a population

of size 4 by tossing a fair coin 20 times. We can skip this step by using the initial population created in this way earlier for the black box switching problem. Look ing at this population, shown on the left-hand side of Table 1.2, we observe that the decoded x values are presented along with the fitness or objective function values f(x). To make sure we know how the fitness values f(x) are calculated from the string representation, let's take a look at the third string of the initial population, string 01000. Decoding this string as an unsigned binary integer, we note that there is a single one in the 2' = 8's position. Hence for string 01000 we obtain x = 8. To calculate the fitness or objective function we simply square the x value and obtain the resulting fitness value f(x) = 64. Other x and f(x) values may be obtained similarly.

You may notice that the fitness or objective function values are the same as the black box values (compare Tables 1.1 and 1.2). This is no coincidence, and the black box optimization problem was well represented by the particular func tion, f(x), and coding we are now using. Of course, the genetic algorithm need not know any of this; it is just as happy to optimize some arbitrary switching function (or any other finite coding and function for that matter) as some poly nomial function with straightforward binary coding. This discussion simply rein forces one of the strengths of the genetic algorithm: by exploiting similarities in codings, genetic algorithms can deal effectively with a broader class of functions than can many other procedures.

A generation of the genetic algorithm begins with reproduction. We select the mating pool of the next generation by spinning the weighted roulette wheel17

16

Chapter 1/A Gentle Introduction to Genetic Algorithms

Genetic Algorithms at Work-A Simulation by Hand

TABLE 1.2 A Genetic Algorithm by Hand

Initial

Population

Randomly Generated

01101

11000

01000

1 0 0 1 1

TABLE 1.2 (Continued)

x Value:

Unsigned Integer

13

24

19

pselect,

Expected count

0.58 1.97

0.22

1.23

4.00

1.00

1.97

Actual Count

from Roulette

Wheel

2

1

4.0

1.0

2.0

String No.

2 3

4

Sum Average

Max

f(x)

169

576

64

361

1170 293

576

Mate

Randomly Selected.

Crossover Site

Randomly Selected

Mating Pool after Reproduction

(Cross Site Shown)

0 1 1 0 1 1 100 0

11000 10011

NOTES:

1) Initial population chosen by four repetitions of five coin tosses where heads= 1. tails = 0. 2) Reproduction performed through 1 part in 8 simulation of roulette wheel selection (three coin tosses),

3) Crossover performed through binary decoding of 2 coin tosses (TT 00, 0 = cross site

1, HH = 11, = 3 = cross site 4) 4) Crossover probability assumed to be unity p.= 1.0.

5) Mutation probability assumed to be 0.001. p = 0.001, Expected mutations = 5-+-0.001 = 0.02. No mutations expected during a single generation. None simulated.

439 in one generation. The maximum fitness has increased from 576 to 729 dur ing that same period. Although random processes help cause these happy circum stances, we start to see that this improvement is no fluke. The best string of the first generation (11000) receives two copies because of its high, above-average performance. When this combines at random with the next highest string (10011) and is crossed at location 2 (again at random), one of the resulting strings (11011) proves to be a very good choice indeed.

This event is an excellent illustration of the ideas and notions analogy devel oped in the previous section. In this case, the resulting good idea is the combi nation of two above-average notions, namely the substrings 11---and---11. Although the argument is still somewhat heuristic, we start to see how genetic algorithms effect a robust search. In the next section, we expand our understand ing of these concepts by analyzing genetic algorithms in terms of schemata or similarity templates.

The intuitive viewpoint developed thus far has much appeal. We have com

pared the genetic algorithm with certain human search processes commonly called innovative or creative. Furthermore, hand simulation of the simple genetic algorithm has given us some confidence that indeed something interesting is going on here. Yet, something is missing. What is being processed by genetic algorithms and how do we know whether processing it (whatever it is) will lead to optimal or near optimal results in a particular problem? Clearly, as scientists,

0.14

0.49

0.06

0.31

1.00 0.25

0.49

(shown in Fig. 1.7) four times. Actual simulation of this process using coin tosses has resulted in strin string 4 receiving one copy in the mating pool, string 2 receiving two copies, and string 3 receiving no copies, as shown in the center of Table 1.2. Comparing this with the expected number of copies (n-pselect,) we have obtained what we should expect: the best get more copies, the average stay even, and the worst die off.

With an active pool of strings looking for mates, simple crossover proceeds in two steps: (1) strings are mated randomly, using coin tosses to pair off the happy couples, and (2) mated string couples cross over, using coin tosses to select the crossing sites. Referring again to Table 1.2, random choice of mates has selected the second string in the mating pool to be mated with the first. With a crossing site of 4, the two strings 01101 and 11000 cross and yield two new strings 01100 and 11001. The remaining two strings in the mating pool are crossed at site 2: the resulting strings may be checked in the table.

The last operator, mutation, is performed on a bit-by-bit basis. We assume that the probability of mutation in this test is 0.001. With 20 transferred bit po sitions we should expect 20-0.001 = 0.02 bits to undergo mutation during a given generation. Simulation of this process indicates that no bits undergo mu tation for this probability value. As a result, no bit positions are changed from 0 to 1 or vice versa during this generation.

Following reproduction, crossover, and mutation, the new population is ready to be tested. To do this, we simply decode the new strings created by the simple genetic algorithm and calculate the fitness function values from the x values thus decoded. The results of a single generation of the simulation are shown at the right of Table 1.2. While drawing concrete conclusions from a single trial of a stochastic process is, at best, a risky business, we start to see how genetic algorithms combine high-performance notions to achieve better performance. In the table, note how both the maximal and average performance have improved in the new population. The population average fitness has improved from 293 to

New Population

01100

Value

12

25

1 2

1100 11011

16

144

625

729

256

1754

439

729

ON THE18

Chapter 1/A Gentle Introduction to Genetic Algorithms

engineers, and business managers we need to understand the what and the how of genetic algorithm performance.

To obtain this understanding, we examine the raw data available for any search procedure and discover that we can search more effectively if we exploit important similarities in the coding we use. This leads us to develop the impor tant notion of a similarity template, or schema. This in turn leads us to a key. stone of the genetic algorithm approach, the building block hypothesis.

GRIST FOR THE SEARCH MILL-IMPORTANT SIMILARITIES

For much too long we have ignored a fundamental question. In a search process given only payoff data (fitness values), what information is contained in a popu lation of strings and their objective function values to help guide a directed search for improvement? To ask this question more clearly, consider the strings and fitness values originally displayed in Table 1.1 from the simulation of the previous section (the black box problem) and gathered below for convenience:

String

01101

11000 01000

10011

What information is contained in this population to guide a directed search for improvement? On the face of it. there is not very much: four independent samples of different strings with their fitness values. As we stare at the page, however, quite naturally we start scanning up and down the string column, and we notice certain similarities among the strings. Exploring these similarities in more depth. we notice that certain string patterns seem highly associated with good perfor mance. The longer we stare at the strings and their fitness values, the greater is the temptation to experiment with these high fitness associations. It seems per fectly reasonable to play mix and match with some of the substrings that are highly correlated with past success. For example, in the sample population, the strings starting with a I seem to be among the best. Might this be an important ingredient in optimizing this function? Certainly with our function (f(x) = x²) and our coding (a five-bit unsigned integer) we know it is (why is this true?). But, what are we doing here? Really, two separate things. First, we are seeking similarities among strings in the population. Second, we are looking for causal relationships between these similarities and high fitness. In so doing, we admit a wealth of new information to help guide a search. To see how much and precisely

Fitness

169

576

64

361

19

Similarity Templates (Schemata)

what information we admit, let us consider the important concept of a schema (plural, schemata). or similarity template.

SIMILARITY TEMPLATES (SCHEMATA)

In some sense we are no longer interested in strings as strings alone. Since im portant similarities among highly fit strings can help guide a search, we question how one string can be similar to its fellow strings. Specifically we ask, in what ways is a string a representative of other string classes with similarities at certain. string positions? The framework of schemata provides the tool to answer these questions.

A schema (Holland, 1968. 1975) is a similarity template describing a subset of strings with similarities at certain string positions. For this discussion, let us once again limit ourselves without loss of generality to the binary alphabet (0,1). We motivate a schema most easily by appending a special symbol to this alphabet; we add the or don't care symbol. With this extended alphabet we can now create strings (schemata) over the ternary alphabet (0, 1, *), and the meaning of the schema is clear if we think of it as a pattern matching device: a schema matches a particular string if at every location in the schema a 1 matches a 1 in the string, a 0 matches a 0, or a matches either. As an example, consider the strings and schemata of length 5. The schema *0000 matches two strings, namely (10000, 00000). As another example, the schema *111* describes a subset with four members (01110, 01111, 11110. 11111). As one last example, the schema 0*1** matches any of the eight strings of length 5 that begin with a 0 and have a 1 in the third position. As you can start to see, the idea of a schema gives us a powerful and compact way to talk about all the well-defined similarities among finite-length strings over a finite alphabet. We should emphasize that the is only a metasymbol (a symbol about other symbols); it is never explicitly processed by the genetic algorithm. It is simply a notational device that allows description of all possible similarities among strings of a particular length and alphabet.

Counting the total number of possible schemata is an enlightening exercise. In the previous example, with / = 5, we note there are 3-3-3-3-3 = 3² = 243 different similarity templates because each of the five positions may be a 0, 1, or. In general, for alphabets of cardinality (number of alphabet characters) k, there are (k+ 1) schemata. At first blush, it appears that schemata are making the search more difficult. For an alphabet with k elements there are only (only?) k' different strings of length & Why consider the (k+ 1)' schemata and enlarge the space of concern? Put another way, the length 5 example now has only 2¹ = 32 different alternative strings. Why make matters more difficult by considering 3¹= 243 schemata? In fact, the reasoning discussed in the previous section makes things easier. Do you recall glancing up and down the list of four strings and fitness values and trying to figure out what to do next? We recognized that if we considered the strings separately, then we only had four pieces of information:

No hay comentarios:

Publicar un comentario

zen consultora

Blogger Widgets

Entrada destacada

Platzy y el payaso Freddy Vega, PLATZI APESTA, PLATZI NO SIRVE, PLATZI ES UNA ESTAFA

  Platzy y los payasos fredy vega y cvander parte 1,  PLATZI ES UNA ESTAFA Hola amigos, este post va a ir creciendo conforme vaya escribiend...