Here comes sex

The power of recombination

As we have seen, having a population and mutating it only takes you so far; there must be something more in Evolution that makes possible to create efficient structures and organisms. And one of these things is probably sex: after fusion of male and female genetic material, recombination takes place, so that the resulting organism takes some traits from each of its parents. In evolutionary computation, the operator that combines "genetic material" from two parents to generate one or more offspring is called crossover.

In its simplest form, crossover interchanges a chunk of the two parent's string, spawning two new strings.

Table 1. Two-point crossover on a string

Parent 1xxxxxxxxxx
Parent 2yyyyyyyyyy
Offspring 1xxyyyyxxxx
Sibling 2yyxxxxyyyy

The exact form of the crossover will obviously depend on the data structure we are using; in some cases it might not even be possible; but the general idea is to combine two or more solutions, so that whatever is good about them mingles, to (maybe) give something even better. Since recombination is blind, the result might be better or not, but it is quite enough that combination yields something better some times for climbing up the evolutionary ladder.

Crossover will be moved, along with the other utility functions, to a small module called LittleEA.pm, and takes the following form:

sub crossover {
  my ($chr1, $chr2) = @_;
  my $crossoverPoint = int (rand( length( $chr1->{_str})));
  my $range = int( rand( length( $chr1->{_str}) - $crossoverPoint + 1));
  my $str = $chr1->{_str};
  substr( $chr1->{_str}, $crossoverPoint, $range, 
	  substr( $chr2->{_str}, $crossoverPoint, $range));
  substr( $chr2->{_str}, $crossoverPoint, $range, 
	  substr( $str This is a possible implementation of a simple string
crossover, with two parents and two offspring. Both parameters are
passed by reference, and offspring take the place of parents. , $crossoverPoint, $range ));
}
. A crossover point is chosen randomly, and them, a length to swap that cannot be bigger that the total length of both strings. The characters spanned by that range are swapped between the two chromosomes. Since both parents have the same length, it does not matter which parent's length is used to generate the random crossover point; obviously, if variable-length strings are used, the minimal length will have to be used; for more complicated data structures, markers, or "hot points", are used sometimes.

Crossover is used in the following program (2ndga.pl; some parts have been suppressed for brevity):


require "LittleEA.pm";
my $generations = shift || 500; #Which might be enough
my $popSize = 100; # No need to keep it variable
my $targetString = 'yetanother';
my $strLength = length( $targetString );
my @alphabet = ('a'..'z');
sub fitness ($;$) {
  my $string = shift;
  my $distance = 0;
  for ( 0..($strLength -1)) {
    $distance += abs( ord( substr( $string, $_, 1)) -   ord( substr( $targetString, $_, 1)));
  }
  return $distance;
}
my @population = initPopulation( $popSize, $strLength, \@alphabet );
printPopulation( \@population);                           (1)
@population = sort { $a->{_fitness} <=> $b->{_fitness} } @population;
for ( 1..$generations ) {
  my $chr1 = $population[ rand( @population)];
  my $chr2 = $population[ rand( @population)];
  #Generate offspring that is better
  my $clone1 ={};
  my $clone2 ={};
  do {
    $clone1 = { _str => $chr1->{_str},
		_fitness => 0 };
    $clone2 = { _str => $chr2->{_str},
		_fitness => 0 };
    mutate( $clone1, \@alphabet );
    mutate( $clone2, \@alphabet );
    crossover( $clone1, $clone2 );
    $clone1->{_fitness} = fitness( $clone1->{_str} );
    $clone2->{_fitness} = fitness( $clone2->{_str} );
  } until ( ($clone1->{_fitness} <  $population[$#population]->{_fitness}) || 
	    ($clone2->{_fitness} <  $population[$#population]->{_fitness}));
  if ($clone1->{_fitness} >  $population[$#population]->{_fitness}) {
    $population[$#population]=$clone1;
  } else {
    $population[$#population]=$clone1;
  }
  @population = sort { $a->{_fitness} <=> $b->{_fitness} } @population;
  print "Best so far: $population[0]->{_str}\n";          (1)
  printPopulation( \@population );
  last if $population[0]->{_fitness} == 0;
}
(1)
The main loop is very similar to the first example, except that now two parents, instead of only one, are generated randomly, then mutated to generate variation, and then crossed over. In this case, new offspring is generated until at least one is better than the worst in the population, which it eventually substitutes. This requisite is a bit weaker than before: in the previous program, a new chromosome was admitted in the population only if it was better than its (single) parent.

The output of this program, after running it typing perl 2ndga.pl 100000 will be something like thishnbsqpgknl -> 97 gaheewieww -> 92 [More like this...] cwceluxeih kdcseymlot [Strings before crossover] cwceluxeot kdcseymlih [And after] ============= Best so far: zjrcstrhhk [More stuff...] Best so far: yetanother yetanother -> 0 yetanotier -> 1 yetaoother -> 1 yeuanother -> 1 [And the rest of the initial population]

In fact, in most cases, a few thousands evaluations are enough to reach the target string. The fitness of the best individual proceeds typically as shown in the figure below:

The last two fittest words found before the solution are also shown in the generation they showed up for the first time. They are at a distance of 2 and 1 from the target string, respectively; in this case, solution was found after around 2100 iterations; with two new individuals generated each generation, that means 4200 evaluations were needed to hit target. Not a bad mark. Not very good either, but not too bad.

Summing up, looks like crossover has made a simple random search something something a bit more complicated, which combines information about search space already present in the population to find better solutions; population allows to keep track of the solutions found so far, and recombination combines them, usually for the better.

Tip

Sex, is after all, important. And, for many, evolutionary computation without crossover cannot really be called that way, but something else: stochastic population-based search, maybe.