tceic.com
学霸学习网 这下你爽了
广告
当前位置:首页 >> >>

Exact schema theorems for GP with one-point and standard crossover operating on linear stru


Exact Schema Theorems for GP with One-Point and Standard Crossover Operating on Linear Structures and their Application to the Study of the Evolution of Size
Riccardo Poli and Nicholas Freitag McPhee

School of Computer Science, The University of Birmingham Birmingham, B15 2TT, UK R.Poli@cs.bham.ac.uk, http://www.cs.bham.ac.uk/?rmp/ Division of Science and Mathematics, University of Minnesota, Morris Morris, MN, USA mcphee@mrs.umn.edu, http://www.mrs.umn.edu/?mcphee

Abstract. In this paper, ?rstly we specialise the exact GP schema theorem for one-point crossover to the case of linear structures of variable length, for example binary strings or programs with arity-1 primitives only. Secondly, we extend this to an exact schema theorem for GP with standard crossover applicable to the case of linear structures. Then we study, both mathematically and numerically, the schema equations and their ?xed points for in?nite populations for both a constant and a length-related ?tness function. This allows us to characterise the bias induced by standard crossover. This is very peculiar. In the case of a constant ?tness function, at the ?xed-point, structures of any length are present with non-zero probability. However, shorter structures are sampled exponentially much more frequently than longer ones.

1 Introduction
In recent work [6] an exact schema theorem for GP with one-point crossover has been introduced. This gives an exact expression for the expected number of instances of a , , in terms of macroscopic quantities (i.e. schema at generation properties of schemata, like their ?tness or number of instances, as opposed to microscopic properties of the individuals in the population, like their selection probability) measured at generation . The theorem has the form , where is the number of individuals in the population and , which we term the total transmission probability of , is the probability that an individual created through the selection/crossover/mutation process samples [12]. [6] provides an exact value of for a GP system with one-point crossover [9, 11]. This operator works by selecting a common crossover point in the parent programs and then swapping the corresponding subtrees, like standard crossover. To account for the possible structural diversity of the two parents, one-point crossover selects the crossover point only in the part of the two trees which have the same topology. This is called the common region. The theory is based on the de?nition of GP schema proposed in [9] in which a schema is a tree composed of functions from the set and terminals from the

 3?

 2¤

 1)

'

?

@ % 7 5 &9864

 3?

 2¤

¤  3? 1)    ) ' % ! 2¨ ¤ ? 10(&" $#?

! ¨ ? " §?

 ¤

 ¤

   

    ?§? ¨ ?

?

?

¤

?

?

¤

set , where and are the function set and the terminal set used in a GP run. The symbol is a “don’t care” symbol which stands for a single terminal or function. A schema represents programs having the same shape as and the same labels for the non- nodes. In order to be able to represent programs of different sizes and shapes the theory requires also the de?nition of the concept of hyperschema [7]. A GP hyperschema is a rooted tree composed of functions from the set and terminals from the set . The symbol = is as above, while the symbol # stands for any valid subtree. The notion of hyperschema is a generalisation of both the GP schemata de?ned above (which are hyperschemata without # symbols) and Rosca’s schemata [13] (which are hyperschemata without = symbols). With these de?nitions one can prove [6] that the total transmission probability for a GP schema under one-point crossover and no mutation is:

where: is the probability of crossover; is the probability of selecting an individual matching the schema ; , , are all the different schemata that can be built from = nodes only;1 the indices and range over all the different ; is the set of crossover points in the common region between schema and schema ; is the number of nodes in the common region; the index ranges over all the crossover points in ; is the hyperschema obtained by replacing all the nodes on the path between crossover point and the root node with = nodes, and all the subtrees connected to these nodes with # nodes; is the hyperschema obtained by replacing the subtree below crossover point with a # node (more details on these de?nitions can be found in [6, 7]). In ?tness proportionate selection, where is the number of programs matching at generation , is the mean ?tness of the programs matching the schema , and is the mean ?tness of the programs in the population. The hyperschemata and are important because they allow the identi?cation of parents that may lead to the creation of offspring in : if one crosses over at point any individual in with any individual in , the resulting offspring is always an instance of . In [6] an example was presented which used only unary functions. With this function set only linear trees can be created and, therefore, GP becomes a sort of variable-length GA. In that example, the new schema theorem was applied to a speci?c schema and a speci?c population. In this paper we study in much greater depth what happens in a GP system handling linear structures. We start by specialising the schema theorem for onepoint crossover to such a case, but without focusing on any particular schema (as was the case in the example in [6]), and by then extending this to an exact schema theorem for GP with standard crossover applicable to the case of linear structures.

y kx

1

Thus the

can be seen as cataloguing all the different program shapes.





? ?2¤

g

q

 e

 ?? ?

q r?g t e q

 3?

?

@ % 7 5 &9DC4

 g

q

?



q ? Ud

g



q

? 32¤

 e ¨

?32¤ q r

¤

 j? 

 ? T ????3?

i ph

g

q

g c e c V T ?  9fFdX Wb3?

 e

 e

 3?

q r?g

 2¤

q ? ??

 3?

 ¤

f e ss dYs ? q ? q  ?T

? ?2¤

 

  ??   j¤ ? ?32¤ ? 32¤ ¤   ??   j? ? ?2¤ 3`vu 3¤ ?w ? ¤  3? vu ? 2¤ ¤ l %   ? o y sp `txr mk3? ?T  2¤ ? ? ny p? ? n `o #qxv`o #?y

 ? ?vaT

 2¤

? ? ? ? ?y w u H?? ?d??xvt c s

 T V T R ¨ %  a`YX WUSQPI3?

h Y

¤

g

q

 e

A

q h % v?g iD

@

G H

4

% 7 5 F2EA

 2¤

g

 1)

q

 e

% ¤ @ % 7 5 &9B?A % ¤ X WT V q r

(1)

? q i ph g

2 Exact Schema Theory for Linear Structures
When only unary functions are used in GP, schemata (and programs) can only take the form where , for , and . Therefore, they can be written unambiguously as strings of symbols of the form . In order to make the specialisation of Equation 1 to the linear case easier we represent repeated symbols in a string using the power notation where means repeated times. For example, the schema 11100000===1 can be written as . Since in this case all trees are linear, the space of program shapes can be enumerated by where is for . Given this, the common region between shapes and is simply the shorter of the two schemata, and the size of the common region, , is simply . Therefore, the set of crossover points in the common region, , can be identi?ed with the set of indices where the index 0 represent a crossover point before the ?rst symbol in a string (the root node). In this linear representation the hyperschemata and are particularly simple: is and is , where (with the convention that for the hyperschema is simply ). So, if , otherwise,

where is the empty set. Therefore, the summation in in Equation 1 disappears because for all . As a result of these simpli?cations, one can transform Equation 1 into: Theorem 1. The total transmission probability for a linear GP schema of the form under one-point crossover and no mutation is

This result can then be extended to obtain the following: Theorem 2. The total transmission probability for a linear GP schema of the form under standard crossover with uniform selection of the crossover points and no mutation is

t ?? ? ? ? ? ??t ? f } ± c 3?  % z}} z  ?? t ?q?} ? r?T c ? ?? c XV ? ¨ g d?§T  3? Wz ?}?} ? ??t z §? ? ???aT } % t ?  td? g ? ? ? ? y ?   }} z  T V T R ¨  %   z}} z  ?3? Wz ?q} ? ra`°X WUSg QP?3? |?q?} ? ?1)  

? ? ??t  |f rbq?? ? ?? g c }?3? Wz ?}?} ? ??t z ??3??3? ? ?" t z ??} ? raT  } % T  % }} z c ? ?Ys  ? ? ¨  t t g ? ?? ? y T ?  }} z T T R ¨ %  z}} z  s X V d3?   z ?q} ? ra` g X V U?Q6?3?   ?q?} ? ?1)

? % ??e

e

q ?   z ??} ? ?t z ?" ? ? ? e U? 32¤ j? }} ?  % ? t

and

if , otherwise,

(2)

(3)

e

q ?   % z}} ?? t ??q} ? z? ¤?g ?b ?2¤ ?? f ? ? ? ? I? t ? g z}} ? ?% t ?q?} ? z ? ? ? G  z}} ?  % W?q?} ? G ?t z ??   j? z ?q?} } ? z  }  ?? ? D? ? 32¤ Gt ? 3¤   ??   ?t ? ? ?2¤ ? ?2¤ @ ¨ R  e ?? }} x&d?f ?dq?? ?q} ¨ ?7 ?    

?

q

?

? { ?? ?d{ ?  % ? ? ?

? S?

? ?¨

? §?

@ % 7 5 &FC?4 1t z ??? ? ? ?

? ¨% f?e

 Wz ?  z } } } } { ? jW?q?qx|z ? z ? z @ % 7 5 &9??A ? |z    }}}}   ?  z }}}} { z z z ?3?q????|z ? ?|????q~|? ? r ? ? ? %  §?3?  e q ? pv ? ?2¤  ? ?v?T ?  }} Wz ?q} ? z  }} Wz ?q} ? z

 q v?g q ?f qb??? g  e e ??  q r q  i g  e q ph g §??7 ? ? EC?  % ? ?q @ ? q ? ?"

Proof. The theorem can be derived as a special case of the more general result reported in [8]. However, here we provide an direct proof which shows how this result can be derived by modifying Equation 2. Equation 2 clearly indicates that one-point crossover with a given crossover point can create new instances of the schema only if selection picks up a ?rst parent whose ?rst nodes match and a second parent whose last nodes match . Given that one-point crossover forces the selection of a common crossover point, this means that the second parent must be always of length . However, if one used standard crossover, one could create instances of even if the length of the second parent is different from , provided that the last nodes of the second parent match and the second crossover point excised such nodes. If the second crossover point is at position , this can happen if selection picks up a second parent matching for any value of . Thus, an extra summation needs to be added to Equation 2 to deal with standard crossover. The probability of choosing crossover point in a second parent matching is . Therefore, after the change of variable one obtains that the probability of obtaining the subschema from the second parent is . This should replace the term in Equation 2. Since standard crossover does not limit the crossover point in the ?rst parent to belong to the common region, the probability of selecting crossover point needs to . This completes the proof of the theorem. to change from Equation 3 is in a form which makes it easy to see the similarities with Equation 2. However, the reader might ?nd it easier to understand the same result rewritten as in the following: Corollary 1. The total transmission probability for a linear GP schema of the form under standard crossover with uniform selection of the crossover points and no mutation can be written in the following equivalent forms:

(4)

(5)

Equation 4 makes the idea of summing over the set of possible crossover points clearer. Equation 5 makes the idea of summing over varying lengths of “don’t care” symbols clearer.

3 Evolution of Size in Linear Systems
Equations 2 and 3 can be used to study, among other things, the evolution of size in linear GP/GA systems. This is because they can be specialised to describe the transmis-

?

? 3? |??q} ? ?t z ?"3aT  z}} ?  %    t  ??q} ? ?t z z}} ? e R ? ? ? % ???a?6?  }} ? z? % Wz ??} ? ?t ????

? ? ??t p??? ?v° D? ? ?? ? R ? ? f ? c ?cg ? c X V T }°3?  z??}q} ? qt z ? ??3??3? ?" t z ??} ? zraT  } ? % T % }}  ? ? ¨   ? g ? j  ?q?} ? r?? ?V ?S3?I3?  z ??} ? r#) z}} z  T X T R ¨  %  }} z 

?

?

? ? ? ?f t j ?? t ? ?g c  3? Wz ?}q} } % T  % z}} z  c ? ??t z td? §? ? ???a?3?  t ? ?? t ?q?} ? r?T ¨   g ? ?3?  z ??} ? r?? X V USQP?3? }} z T T R ¨ %   

?

R??  z}} W?q?} ? z

R ??

? · i??

?

 z ?q} ? z }} ? ? ?3?

 z}} ? z ?  % W?q?} ? qt r??"



}} t z ??} ? z ? f ? F9¨

 }} ? Wz ?q} ? ?t z

± ? ? ? §t j ? ? ? ? ? ? ? ` ?o ? ? ????H???? `?~???~?9v?dq`y ? ?y  (?br?? ¨ R ? ? ? ?

 |f



? ?? ? ? rbq??9¨

? ? ?qt c X WT V ? ? ?  ??q} ? ?1) z}} z 

?

 ??q} ? ?t z z}} ?

 }} Wz ?q} ? z

?

since represents the entire where we exploited the fact that search space and is the empty set for any . This result indicates that length evolves under one-point crossover as if selection only was acting ( , for example, has no effect). So, one-point crossover is totally unbiased with respect to program length. This is made particularly clear if one assumes a ?at ?tness landscape in which for all . In these conditions all the dynamics in the system must be caused by crossover or by sampling effects. For a ?at landscape under ?tness proportionate selection Equation 6 becomes for ?nite in the in?nite population limit. This populations, and is because, in the in?nite population case, the quantity can be interpreted in two entirely equivalent ways: as the total transmission probability of the schema at generation or as the proportion of individuals in the population in at generation . In this second interpretation, we conventionally de?ne as the proportion of programs in at generation 0. Clearly, is entirely determined by the initialisation procedure adopted. The equation obtained for the in?nite population case is particularly important because it shows that when one-point crossover alone is acting, any initial distribution of lengths, , is a ?xed point for the system. For standard crossover (Equation 3) one obtains:

Alternative formulations for this equation can be obtained specialising Corollary 1. For example, from Equation 4 one obtains:

? ? t ? ?? t ? ? ??t ? f c ? ?g c c X V d3?  ??3?? X V USQPI3?  ??3#) } T ?   % T T R ¨ %   %   % 3? ?"3aT   % 3? ???aT    ?  ? ? ? g

s 3? 

?? ??? ? ? ? ? ? ? ? y ? ? ? g }?p?U8?f ? f va??? R ? ? ? ?? ±c   % 3? ? ???aT     f g d?§2d3?  ???a?Y?§?S36?3?  ???1) c XV T ?   % T X V T R ¨ %   %  % ??3?T  



g

which can be transformed into

  % g  3? ???aT xc 3?  ?"3?T ?V 2a3?  ???a` ?V ?jQP%  % X T ?   % T X T R ¨    ? ? ??t  ?f rb??g ? ? ?? c ? ?? g d?§2a3?  ???a`°?§?jQP?3?  ?"3#) c XV T ?   % T X V T R ¨ %   %  ?  3? ???a?3? ?"3?T  % T  %      ? ?? ? y g g ¤ XV ?§T t ?? ? ? ? ? ??t ? f c ? ?? g d?§T c XV ±c ?  3? ?%"3?T  %  3? ???aT  ?  ? ?? ? y g %   T  X V T R ¨  %  ???a?d?§?S36?3?  ???1)  %    '? °?3? ¨ ??R ¤  3? #)   2¤ ¨ R ??#?  ???1i?3?  ???1)  % ) %   %   %  %  ?"3?3?  D?3#)   %     2¤ g e ¨% ?$f  % ?" ? g  #)  ?R ¨  2¤ g ?%"E? ?"  % ¨?%?3? ?%"3?T  g  g ? e  % T ???a?%    1) ?  b3? ??R  D?3#) ¨  %  ¨ R ??x?  D?3#??3?  ???1)  % ) %   %   g ¤ 
(6)

sion probability of schemata of the form one obtains:

 %  ??

. For one-point crossover (Equation 2)

 3?

 ?  ?%  3`vu ?3? w ¤

 ¤ ?

 vu

¨ ? ?a?

(7)

(8)

in which is the population at generation , and vary over all the possible parents (i.e. the members of ) while and vary over all the possible crossover points in and , respectively. This equation explicitly includes one term for each of the possible ways in which offspring can be created from the parents in the population for all possible crossover points. In the equation, the term represents the probability that each of such event be the case, while the term makes sure that only the probabilities of the events that lead to the creation of an offspring of length are included in the sum. These equations show that for standard crossover not every initial distribution of program lengths is a ?xed point (for an in?nite population) even if one considers the case of a ?at landscape. For example, if one started at generation 0 with only programs of length , i.e. , assuming one would obtain the following distribution of lengths at generation 1:

This implies that which is in general different from (except for the trivial case in which ).2 So, standard crossover imposes its own speci?c bias on the distribution of lengths, although we expect that such a bias will have no in?uence on the average length of programs, since on average the subtrees/substrings swapped by crossover are of the same size. This conjecture can actually be proven mathematically obtaining the following: Theorem 3. The mean size of the programs at generation , , in a linear GP system with standard crossover, uniform selection of the crossover points, no mutation and an in?nite population is

with the same meaning of the symbols as in previous theorems.

2

If all programs include only one node (a terminal), standard crossover, like one-point crossover, cannot produce programs with more than one node.

 3?

Proof. By de?nition of mean into this equation one obtains:

? B ? p??% ? R ×  ? ? ? C? ? v??? ? ` 3?  

. By substituting Equation 9

% ¨ ? ?R

% ? ??1 ? v??E? ? R × ? ? ?y ? ` ?o ? ? y



á % Q?"3#)

 ¨ ? ¨ ?2? `?? 2? ?

? ? ` ? ? ? ? y  ? ? ` ?o ? ? y

} à F?

¨ ?% X V T

¨ % 0?? ? 90??? Q?"3#) ? ¨ %  á %  ? ? ? ? ? %  BF???~?  ???1)  % ? h ? R h R  ??? ??? 

 3?

? × ? × ? ? ? ? ? ? ? × ? × ? ? ? ? ? V ? d? ? V xu 3? xu ? ? × r??Q ×  ? v?? ?? ? c c c c X ?V T × T × ? ra?3? v?T  ? ? ? ? ? 3? y  ? ? ? ? ? y  ?  b3?  ???a?d?§?S36?3?  ???1)  %  T X V T R ¨ %   %   ? ??  % ) ???1p?  %  ¨ ? ? ???dv??   ?  % T ???a??    c XV T ?  d?§2a3?  ? % ? ?$8v  c ? ?bv?? % ¨ ? ?  % T ???a??   ? % ¨ ???R V ?"3#)  %   c  X V T R ¨ % ¨ ? ? |d?§?S36? ?b`?? ? ? ¨

 3?

An alternative way of expressing microscopic quantities is the following:

 % ??3#)  

for standard crossover in terms of (9)

 p?

(10)

(11)

?  × T × 3? va`3? ? raT  ? v??d ? v??  ? × ? ? × 

It is very dif?cult to ?nd the ?xed points (or to prove that there are none) for a GP system using standard crossover even on the hypothesis of in?nite populations and ?at landscapes. However, it is possible to ?nd such ?xed points numerically by implementing the schema equations and iterating them as described in the next section. We will show in Section 5 how the information provided by such simulations, along with a substantial amount of luck, has allowed us to identify a mathematical function which is provably a ?xed point for the size distribution in a linear GP system with standard crossover operating on a ?at landscape.

} ? ? %  ¨ ? ? °3`?????dv??
Corollary 2. On a ?at landscape,

?  r?Q ? v???p?% ? ?? ? v??? ? ` × ? × ? %  ? ? R × ? ?  ? v??d ? v?? ? × ? ? ×

? ?

 ?? ? V ? ? ?? ? V c c c ? ? ? ? ?? y  ? ? ? ? ? y 

 ?? ? V ? ? ?? ? V ? c c } ? °p?% ? R ×  ? ? ? ?v ? v??? ? ` ? c ? ? ? ? ?? y  ? ? ? ? ? y   ? ? ?  ×v??3 ? v?? xu 3? xu ? ?  × X T ?  ? ? c c ?V 2a3?  ???a?? | ?V ?S36%  % T c X T R ¨ 3? ? ×va`3? ? raT  T ×    ? ? ? ?  ? r?Q ? r?? ?? ? V ? ? ?? ? V xu ?? xu ? ? × ? × C??% ? CR? ? v???? ? ?` ? × c c c c  × T × 3? ? v??3? ? v?T   ? ? ? ? 3? y  ? ? ? ? ? y 

whereby

With a few calculations it is possible to show that

(12)

?  }?3?  ??3??? ?%  % T c   c  ?V ?S36% X T R ¨   c XV T R ¨ |Y?§?S36% c  ?V ?S36% X T R ¨ ? ? xu ?? X c ?V T

 % T  ???a??  % T  ???a??  % T  ???a??

? 3? v?T  ×  ? r??  ? ×  % T  ???a??  

?  × d3? ? vaT 

g X T ?    % 3? ??3?T f xc ?V 2a3?   g ?d?qWbxu ? y ? ?   3? ×vaT ? c s  ????? g d?§2a3? % c XV T ?    g ? xu ? c X T ?  ?r?Q3? raT ??V 2a3? × ? ×   ? xu ? ? 3? ? raT × c  ? xu ? ? ?xu ? ? c XV T ?  ?§2a3?  c ? 3? raT  ×  ? v??  ? × ? xu ?3? xu ? ? ? X T ?  c c ?V 2a3?

c XV T R ¨ |Y?§?S36% c X T R ¨ % ¨ ? ? | ?V ?S36? ?b`??

 % T ???a??  

4 Experimental Results
In our experiments we were interested in studying the evolution of program size and the bias imposed by different crossovers. Since the effects of selection on its own have been already studied in great depth in various studies, we concentrated on the effects produced by crossover, i.e. by setting . In the experiments we iterated the schema equations provided in the previous section on the assumption of in?nite populations. This corresponds to studying either the exact behaviour of a GP system with an in?nite population or the average behaviour of the GP system with a ?nite population over an in?nite number of runs. In the simulations we kept track of for such that . Since in an in?nite population standard crossover nearly doubles the length of the that need to be tracked grows longest program in each generation, the number of exponentially as a function of . As a result we restricted our attention to . Fortunately, this small number of generations was still suf?cient to show the system convergence. In the experiments we used three different initial conditions: the one peak distribution, where only programs of length 20 are present, the two peak distribution

and the uniform distribution

These three distributions are all characterised by the same average lengths of 20. In addition to a ?at ?tness landscape, realised by a ?tness function which always returned the value 10, we used the following size-related ?tness function

which returns 20 minus the distance between the program length, , and a target length of 20 for , and 0 for . On the ?at ?tness landscape one-point crossover behaved as expected. All initial distributions we tried were ?xed points. The situation was very different for standard crossover. As shown in Figure 1, with the one-peak initialisation, after one iteration (i.e. at ) we obtained the triangular pro?le calculated in the previous section.3 As hypothesised, in later generations the average program size remained constant. However, the distribution of sizes does not remain symmetric. In fact, it quickly approaches a limit distribution, where most of the
3

In this and the following ?gures, the plots for generation 0 and 1 are shown with thin lines. The lines become progressively thicker as the number of generations increase. Because of the quick convergence to a ?xed point, the plots for and are often indistinguishable.

 ? ?~é

?

? ~é

if otherwise.

?

% i?

if or otherwise,

,

,

? ?  E#3?

 % D?3#)  

é ? ?S?

? F?

ò ? b?

? ? ???

?

?

¨ % §S?

 ~?

?

?

?

? ? h ? R h R  % v  ???? ??? P?  ?q?} ? rvu z}} z  ? ?

é¨ % Y6??

?

?

% ? iv

?

? ? ?d?

 3?

¨ % XV è?§T ?

 % ???1)   % ¨ ???R

? ì % ¨  % ? í xF? ¨ $???R   ?"3#)

? % ¨  % ê} ??? ? ???R   ???1)

? é · ~???

 % D?3#)  

?

(13)

(14)

 ?R ¨

 % ?"3#)  

? ~é

?

?

?

?

programs were quite short. Exactly the same ?xed point distribution was approached when initialising the system using the two-peak and the uniform distributions as shown in Figures 2 and 3. This ?xed point distribution closely resembles a gamma distribution, a fact that is explored in more detail in the next section. As a sanity check to make sure that these results were not an artifact of our numerical simulations, we performed real GP runs with a population of 10,000 individuals initialised with the uniform distribution of lengths in Equation 13. The ?tness function was ?at. The runs were continued for 250 generations. No depth limit was imposed on the offspring produced by crossover. The distribution of lengths in the last 50 generations was recorded. Figure 4 shows the average proportion of programs of each length in the last 50 generations of one run (middle line) along with the 2-standard-deviation wide con?dence intervals. It is easy to see how close the average distribution of lengths is to the ?xed point distribution obtained by iterating the schema equations (Figure 3, thick line). In the schema-equation experiments we obtained different ?xed point distributions only when we used initial conditions with a different average size, suggesting that the ?xed point distribution for standard crossover is only a function of the average size, and independent of the actual shape of the initial distribution. This family of ?x-point distributions characterises the search bias imposed by standard crossover when acting on linear structures: in the absence of other biases standard crossover will tend to more heavily sample the space of smaller-than-average programs and will be unable to focus its search on programs of a particular size. This means that if selection prefers longer-than-average programs or programs of a certain length, standard crossover may negatively bias the search. This does not happen with one-point crossover, provided that the initial population includes suf?cient variety of shapes. This effect becomes evident when the size-related ?tness function in Equation 14 is used. With this ?tness function programs of length 20 are maximally ?t. However, due to the biases of standard crossover, the ?xed point distribution obtained with different initial conditions (see Figures 5, 6 and 7) all share the same features: inability to focus on the maximally ?t length and bias towards short programs. For comparison, Figure 8 shows the behaviour of one-point crossover on the same function with uniform initial conditions. This corroborates the theoretical results in Section 3 which indicated that one-point crossover is a transparent (i.e. unbiased) operator as far as program lengths are concerned.

5 Fixed-point Size Distribution for Standard Crossover on a Flat Landscape
As noted in the previous section, the ?xed-point distribution of lengths approached in the experiments with a ?at landscape under standard crossover strongly resembles a gamma distribution. This observation prompted us to try to verify mathematically whether this is indeed the case. A gamma distribution has the following form:

? ?v  ? %  ?§? ? v ? ?  ? 9? ó úùV ÷ ? H??? °? ? §?

(15)

alpha 0.2

alpha 0.2

0.15

0.15

0.1

0.1

0.05

0.05

0

20

40

60

80

100 N

0

20

40

60

80

100 N

Fig. 1. Total transmission probability for standard crossover on a ?at landscape with onepeak initial conditions. The ?rst four generations are shown.
alpha 0.2

Fig. 2. Total transmission probability for standard crossover on a ?at landscape with twopeak initial conditions. The ?rst four generations are shown.
alpha 0.2

0.15

0.15

0.1

0.1

0.05

0.05

0

20

40

60

80

100 N

0

20

40

60

80

100 N

Fig. 3. Total transmission probability for standard crossover on a ?at landscape with uniform initial conditions. The ?rst four generations are shown.
alpha 0.2

Fig. 4. Proportion of programs of each length in a real GP run for standard crossover on a ?at landscape with uniform initial conditions.
alpha 0.2

0.15

0.15

0.1

0.1

0.05

0.05

0

20

40

60

80

100 N

0

20

40

60

80

100 N

Fig. 5. Total transmission probability for standard crossover on a size-related landscape (Equation 14) with one-point initial conditions. The ?rst four generations are shown.

Fig. 6. Total transmission probability for standard crossover on a size-related landscape with two-point initial conditions. The ?rst four generations are shown.

alpha 0.2

alpha 0.2

0.15

0.15

0.1

0.1

0.05

0.05

0

20

40

60

80

100 N

0

20

40

60

80

100 N

Fig. 7. Total transmission probability for standard crossover on a size-related landscape with uniform initial conditions. The ?rst four generations are shown.
alpha 0.2

Fig. 8. Total transmission probability for onepoint crossover on a size-related landscape with uniform initial conditions. The ?rst four generations are shown.
alpha 0.2

0.15

0.15

0.1

0.1

0.05

0.05

0

20

40

60

80

100 N

0

20

40

60

80

100 N

alpha 0.2

alpha 0.2

0.15

0.15

0.1

0.1

0.05

0.05

0

20

40

60

80

100 N

0

20

40

? ¤?

? ? § ¨?



Fig. 11. Total transmission probability for standard crossover on a ?at landscape with initial conditions .

Fig. 12. Total transmission probability for standard crossover on a ?at landscape with initial conditions .

? ¤?

? ?? ? ??F?ü

Fig. 9. Total transmission probability for standard crossover on a ?at landscape with initial . conditions

Fig. 10. Total transmission probability for standard crossover on a ?at landscape with ini. tial conditions

? ¤?

? ? § ¨?

? ¤?



? ?? ? ?vF?ü

60

80

100 N

The gamma distribution has two parameters and that change its shape. For values of , . Also and the distribution has its maximum when . The mean of a gamma distribution is . So, the maximum of the distribution is shifted with respect to the mean by a distance (i.e. the maximum is when ). In order to verify if is a ?xed-point distribution for standard crossover on a ?at landscape, ?rstly one has to specialise Equation 8 by setting . This produces

Then one assumes that the system is in the hypothesised ?xed point and substitutes in this equation. If is indeed a ?xed-point for the system, then the r.h.s. resulting from this substitution must be equivalent to . , it is actually very If one uses the general form for dif?cult to check whether this is a ?xed point for Equation 16. However, the experiments in the previous section suggested that a good value for the parameter would be , which gives

This function is intuitively very appealing as a potential ?xed point because it transforms terms of the form in Equation 16 into exponentials of the form . Indeed, assuming allows one to simplify the r.h.s. of Equation 16 dramatically, obtaining:
$# %"?

So,

§ 2 &? 431E?

&? 1?0 )(& ? '

4

For integer

,

.

¨ R ?pW?

(17)

(18)

?

?ú !

% ??

s ?E?? ¨ R

 2¤

? f?? ? %

?

 ) %  1?I3?

 §?

 ? ?

? ? ? ? ? ? ? y ? g ±c } ? R ? ? °p???f ? f rbq?? ? ??  ??? ? ???1) ¨ R  %    



} 8à

f

g

} à

? v ó ???I? V ?"3#) % ¨ R  % 

 % ???1)

?

 ¤

¨ ?R

¨ ?R

 §?

 aT

 ? ?

?$ ÷

g dX Wb?a? c V T ? ¨ R

?$ ÷

 Y R ¨ ?Q ? ? S ?v? ?"3#?13? ???1) ?$ ? ÷ ? ¨ R  % ) %   % ¨   

? r ó

 Y $ ? SQ R ¨ ? ÷

?

¨

} ??
(16)

% ¨ R B??j? V ???1)  % 

%  p??

}

 ?

%  ?§?  ??  ? 9? ? ó ú V Hù §? ÷ ?

?

± ? ??v %  ? ? ??o ? ÷ ?  ? r  H ? ó ? ? ? ?6à ? ?? úù  ???? ÷ ?  % ) V T R ¨ %  ???13°X WBSQ6I3?  ?"3#)  %     §?  ? ?b? ???1) % ¨ R  %  ? ? ó ? V  V ? ? ? ?o ?bq`y ? ?y ±
 V ?? ?q? 5 6?

 §? ? v 9? ó ? R ? % BEB? ?  ¨?R?v 8? ? % ? %  ??2~? ? r ¨  ? ó ?

%  ?3?

 §?

 ? 9?

 % ???1)  

?

p? ? v  ? 9? ó ? v ó ???I? V ???1) % ¨ R  % 

?? ???

? ?

 ? ?v

where

?

is a generalisation of the factorial function4 de?ned by

? 6?

Since, in general, , we can conclude from this result that the assumed gamma distribution is not a ?xed point for the system.5 This had to be expected because the gamma distribution is a continuous probability distribution. While , it is not true that . So, sampling does not produce a discrete probability distribution. This is why cannot be a ?xed point. However, the distribution obtained by sampling can be transformed into a probability distribution by simply normalising each sample, obtaining: (19)

then reformulated in terms of the mean in , obtaining

which can be written more simply

. (For an alternative derivation of this and related results where see [14].) This result was also corroborated numerically by initialising the system at for different values of and iterating the schema equations for a few generations, as shown in Figures 9–12 where the plots for the ?rst four generations coincide perfectly. As noted before, on average standard crossover inserts and removes the same amount of genetic material. Therefore, on a ?at landscape and with an in?nite population no change in mean length can occur. So, given any initial distribution of lengths , if one assumes that there are no ?xed points other than the family of discrete gamma functions , then, by setting (the mean
, is always very small and decreases very quickly as increases. For

example, , , , , and . So, for most practical purposes a gamma distribution can be considered to be a ?xed point for the evolution of size under standard crossover on a ?at landscape.

? hG? ? ? ? e

?

5

d g?

? ?? iv?T

U

f ? ? e GhG? ?

d g?

? §? ¨?T

?? e ?G? ?

d g?

?? ??T

? ? ? e

?§sqrf ? ? §? h¨?T dp t ?v?Y T G? ? ?c f e §¨?T ?? d@? d@? ` b ` Y W $? ? c%? W ? V $? ` W $? ? a%? X? V b ` Y W

The

¨ ? ??? ¨ R B? ?  ? FF ? à R ¨ ? ਠS??d?R ? % ¨ R ?? ?v?  ??3#)  % D ¨ ? ??? ???? G E ÷ ¨ R  B?  ??  C?  8 ó  ?  p?  ? ??¨U?#v????¨R?v3d?C? ??% ? ? ? ¨ R  ? 8  ó E?? ???1?1?Bd? ???1) ? A ¨ R  % ) %  A ?  %     ?  ?R $ ° $ ? S3 H?§? ???a?  ???1) ¨  R ¨ úù  ? % ¨ R  %  ÷ ? ÷ ? ÷

of

for any positive value of and by setting

. This can be

relative

difference

between

and

,

? ? ? ¨ % ?Q?0?

Naturally, for , like

and and the distribution has a maximum . The mean of the discrete gamma distribution is , which is only slightly different from (and quickly converges to) . So, the maximum of the distribution is shifted with respect to the mean by a distance . This discrete gamma distribution can be shown mathematically to be a ?xed point. So, if (20)

 ??  ? % ó  % ?f ?   H ? ? ?"3#)   ? ó  ?? ? v g  H ó ? 

U 1??T § 2 ? SR"? ??? v?vP ? "? ??? v?vP ? ? Q %? ? %? Q  ¨?R  ?"3#?   % ) % ??  ?

 ±  ??   0k~ó ? ? ? %  0S§?  "ó  V ?? q? ? H % ?   ? 8 } ¨ ? ?R $ ° $ ? S3 Hù §? ?I§?  "ó  R ¨ ú V ? %   ? 8 ÷ ? ÷ ? ÷

¨ %  ??§?  ?

 ??? ? I???a?  ???1) ¨ R H ? H ? % ¨ R  %  ?

 ? ?

? v ó ? ? V ? ?

¨ ¨%  ¨ $??ER $ Y $ ? ?3  R ¨ ? ÷ ? ÷ ??  §?  "ó  ? 8  p?

 0Bv?????`?H ¨ ? ? ? ¨ R ? %

 % ¨ ???R  ???1)  %  ? 8 ó 

¨ % ??? ?? 
b WY ?$ ? ? ? §Y b WY

R ? % BEU? ? 9 ?@? R ¨ ? ?Q?? $ ? ?$ ? ÷ % ?? ? ÷  8 ?  ? "ó



¨ ??R

 H ?

 % ???1)  

? r ó ? ? V 7

(21)

(22)

length of the programs in the initial generation), Equation 21 gives the ?xed point to which GP will converge. At this stage we are unable to prove that there are no other ?xed points in the system. So, we cannot guarantee that the length distribution in GP with standard crossover on a ?at landscape will always converge towards a discrete gamma distribution or converge at all. The experiments described in the previous section always seemed to do so, which suggests that other ?xed points might be unlikely. Also, even if there are other ?xed point distributions, it seems likely that they will share important characteristics with . This is because GP with standard crossover will always be able to produce programs which are much longer than average. So, the only way in which the average length can remain constant is to have more shorter-than-average programs than longerthan-average ones.

6 Search Space Sampling under Standard Crossover
It is important to understand the consequences of the bias described in the previous section. Let us imagine that our linear GP system operating on a ?at landscape is at the ?xed point for some value of . Since, there are different programs of length in the search space, it is possible to compute the average probability that each of these will be sampled by standard crossover, namely

It is easy to study this function and to conclude that, for a ?at landscape, standard GP will sample a particular short program much more often than it will sample a particular long one. Figures 13 and 14 show the average sampling probability for programs of a given length for standard crossover on a ?at landscape at the ?xed points for assuming . In general, an increase in length of one order of magnitude corresponds to a drop in sampling probability of many orders of magnitude and this trend is largely independent of the particular value of . For example, when (i.e. when the average program length is about 20), on average (and approximately) GP will resample the same program of length 5 every 1054 crossovers, while it will resample the same program of length 50 every crossovers with a difference of 13 orders of magnitude! This difference does not change signi?cantly even for much larger values of . Indeed which is still a huge number. An alternative way of looking at this is to calculate the ratio , which for large values of is approximately 2. Preliminary work suggests that similar results hold for GP with standard crossover operating on trees. If these results are con?rmed the widely reported tendency to bloat is particularly remarkable.

7 Conclusions
In this paper, an exact schema theorem for genetic programming operating on linear structures using standard crossover has been provided. Using this theorem and the cor-

 §?

?

 ?  8 9? "ó

h ? hA

? 3?

h h %  ? ? 24 ???`?? ? FV

? ¨ ? ê} dCD??í

% h h % h ?UA ??B h4

9 ¨?? ??

??

% $ ?? ? ?? ¨? ÷ ?6?

}°§?v?? ????  "ó I?`dy D   %  ?  ? 8

? ?¨

?

%

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? xxxF~Y¨ FxF~Y¨ Fxd¨ ~d¨ Fê ? ?¨ ê ? %      ?   ? ?

?

±

 ¨ú

?? %  ? ê q??v?F? y D

?

w ? x??

w ? x??

? ??

u vT

u

?¨ ? í } d?xí ?í

? ????X? ? ? ??? ? V ? V ? $ ? V ? V % ? ? ? V ? y ????i? ? ? ??? ? ÷ ? ? ± ? Vy w ? x?? u T  ?? ¨ú ?q T? ê H?~r y D

?

?

 ??

 ? ?`dy D

 8  ? "ó

w ? x??

u vT

 ??

 8  ? "ó

(23)

log(prob) 0

log(prob) 0

-20

-20

-40

-40

-60

-60

-80

20

40

60

80

100 N

-80

20

40

60

80

100 N

responding version for one-point crossover, which is also reported, we have been able to study the evolution of size and the biases introduced by the operators both mathematically and in simulations. This research establishes a link between important areas of theoretical research in GP: the study of the evolution of shape and size [1], the biases imposed by different operators [10] and the theory of schemata [9, 7, 6]. In recent research we have started using these results to study bloat in linear representations [3]. In the future we hope to be able to use this theory to model mathematically and better understand the operator bias as well as the reasons for bloat, intron proliferation and code compression [2, 4, 5] in non-linear structures.

Acknowledgements
The authors would like to thank Bill Langdon, Jonathan Rowe and other members of the EEBIC (Evolutionary and Emergent Behaviour Intelligence and Computation) group at Birmingham for helpful comments and discussion. The second author would like to extend special thanks to The University of Birmingham School of Computer Science for graciously hosting him during his sabbatical, and various of?ces and individuals at the University of Minnesota, Morris, for making that sabbatical possible.

References
[1] W. B. Langdon, T. Soule, R. Poli, and J. A. Foster. The evolution of size and shape. In L. Spector, W. B. Langdon, U.-M. O’Reilly, and P. J. Angeline, editors, Advances in Genetic

?

j ? mU jk

?

? ¤?

?h?G?hGGXGhGhXhGG(l j ? ? ? ? § ? ? ? ? ? ? § ? ? ? ? § ? ? ? § j h ? ? ? U? ? i1F?ü

? ¤?

? g e d?? ?i f%?a?

? ¤?

Fig. 13. Plot of the average sampling probability vs. program length for standard crossover on a ?at landscape at the ?xed points for for . Thicker plots represent higher values of . Note the use of a logarithmic scale.

Fig. 14. Plot of the average sampling probability vs. program length for standard crossover on a ?at landscape at the ?xed points for for . Thicker plots represent higher values of . Note the use of a logarithmic scale.

U

? ? ? ? § ? ? ? ? ? i?h6??U

?

? ¤?

? U? ? h1Fdü

? ¤?

? g e d?? Hi hf%?a? ? ??
j ( jl U

?

j E jk

Programming 3, chapter 8, pages 163–190. MIT Press, Cambridge, MA, USA, June 1999. [2] N. F. McPhee and J. D. Miller. Accurate replication in genetic programming. In L. Eshelman, editor, Genetic Algorithms: Proceedings of the Sixth International Conference (ICGA95), pages 303–309, Pittsburgh, PA, USA, 15-19 July 1995. Morgan Kaufmann. [3] N. F. McPhee and R. Poli. A schema theory analysis of the evolution of size in genetic programming with linear representations. In Genetic Programming, Proceedings of EuroGP 2001, LNCS, Milan, 18-20 Apr. 2001. Springer-Verlag. [4] P. Nordin and W. Banzhaf. Complexity compression and evolution. In L. Eshelman, editor, Genetic Algorithms: Proceedings of the Sixth International Conference (ICGA95), pages 310–317, Pittsburgh, PA, USA, 15-19 July 1995. Morgan Kaufmann. [5] P. Nordin, F. Francone, and W. Banzhaf. Explicitly de?ned introns and destructive crossover in genetic programming. In J. P. Rosca, editor, Proceedings of the Workshop on Genetic Programming: From Theory to Real-World Applications, pages 6–22, Tahoe City, California, USA, 9 July 1995. [6] R. Poli. Exact schema theorem and effective ?tness for GP with one-point crossover. In D. Whitley, D. Goldberg, E. Cantu-Paz, L. Spector, I. Parmee, and H.-G. Beyer, editors, Proceedings of the Genetic and Evolutionary Computation Conference, pages 469–476, Las Vegas, July 2000. Morgan Kaufmann. [7] R. Poli. Hyperschema theory for GP with one-point crossover, building blocks, and some new results in GA theory. In R. Poli, W. Banzhaf, and et al., editors, Genetic Programming, Proceedings of EuroGP 2000. Springer-Verlag, 15-16 Apr. 2000. [8] R. Poli. General schema theory for genetic programming with subtree-swapping crossover. In Genetic Programming, Proceedings of EuroGP 2001, LNCS, Milan, 18-20 Apr. 2001. Springer-Verlag. [9] R. Poli and W. B. Langdon. A new schema theory for genetic programming with one-point crossover and point mutation. In J. R. Koza, K. Deb, M. Dorigo, D. B. Fogel, M. Garzon, H. Iba, and R. L. Riolo, editors, Genetic Programming 1997: Proceedings of the Second Annual Conference, pages 278–285, Stanford University, CA, USA, 13-16 July 1997. Morgan Kaufmann. [10] R. Poli and W. B. Langdon. On the search properties of different crossover operators in genetic programming. In J. R. Koza, W. Banzhaf, K. Chellapilla, K. Deb, M. Dorigo, D. B. Fogel, M. H. Garzon, D. E. Goldberg, H. Iba, and R. Riolo, editors, Genetic Programming 1998: Proceedings of the Third Annual Conference, pages 293–301, University of Wisconsin, Madison, Wisconsin, USA, 22-25 July 1998. Morgan Kaufmann. [11] R. Poli and W. B. Langdon. Schema theory for genetic programming with one-point crossover and point mutation. Evolutionary Computation, 6(3):231–252, 1998. [12] R. Poli, W. B. Langdon, and U.-M. O’Reilly. Analysis of schema variance and short term extinction likelihoods. In J. R. Koza, W. Banzhaf, K. Chellapilla, K. Deb, M. Dorigo, D. B. Fogel, M. H. Garzon, D. E. Goldberg, H. Iba, and R. Riolo, editors, Genetic Programming 1998: Proceedings of the Third Annual Conference, pages 284–292, University of Wisconsin, Madison, Wisconsin, USA, 22-25 July 1998. Morgan Kaufmann. [13] J. P. Rosca. Analysis of complexity drift in genetic programming. In J. R. Koza, K. Deb, M. Dorigo, D. B. Fogel, M. Garzon, H. Iba, and R. L. Riolo, editors, Genetic Programming 1997: Proceedings of the Second Annual Conference, pages 286–294, Stanford University, CA, USA, 13-16 July 1997. Morgan Kaufmann. [14] J. E. Rowe and N. F. McPhee. The effects of crossover and mutation operators on variable length linear structures. Technical Report CSRP-01-7, University of Birmingham, School of Computer Science, January 2001.


推荐相关:
加微信领牛股 | 网站地图
All rights reserved Powered by 学霸学习网 www.tceic.com
copyright ©right 2010-2021。
文档资料库内容来自网络,如有侵犯请联系客服。zhit325@126.com