X-Men approach to evolutionary computation

Warning

The programs shown here have not been optimized for efficiency or elegance, but rather for clarity and brevity. Even so, they can probably be improved, so I would like to hear any comment or criticism you have on them.

Although Darwin's view of forces at work in the evolutionary process seems quite simple, putting them in black on white in an actual algorithm is something completely different. What kind of modifications should be applied to the data structures? What do we do with modified data structures? Should we put them in the population, just like that? Should we (gulp), like, off some other member of the population? If so, which one?

Historically, the first solutions to this conundrum seemed to be: create a population of possible solutions, take a member of the population, change it until you obtain something better (from the point of view of how close it is to the solution, that is, its fitness), and then eliminate one of the least fit members of the population, possible the least fit. That way, every time you introduce a new member of the population you are going one step up the evolutionary ladder (just like the X-Men.)

We will work, from now on, on the following problem: evolve a population of ASCII strings so that they are as close as possible to the string yetanother. The fitness of each (initially random) string will be the difference between the ASCII values of the letter in each position and the ASCII value of the letter in the target string. Optimal strings will have 0 fitness, so the process will try to minimize fitness. The job is done by the following Perl program (1stga.pl, whitespace and POD comments have been suppressed):

#Declare variables                                         (1)
my $generations = shift || 500; #Which might be enough
my $popSize = 100; # No need to keep it variable
my $targetString = 'yetanother';
my $strLength = length( $targetString );
my @alphabet = ('a'..'z');                                (1)
my @population;
#Declare some subs (not the best place to do it, but we are going to
#need them                                                (2)
sub fitness ($;$) {
  my $string = shift;
  my $distance = 0;
  for ( 0..($strLength -1)) {
    $distance += abs( ord( substr( $string, $_, 1)) -   ord( substr( $targetString, $_, 1)));
  }
  return $distance;                                       (2)
}
sub printPopulation {
  for (@population) {
    print "$_->{_str} -> $_->{_fitness} \n";
  }
}                                                         (3)
sub mutate {
  my $chromosome = shift;
  my $mutationPoint = rand( length( $chromosome->{_str}));
  substr(  $chromosome->{_str}, $mutationPoint, 1 ) = $alphabet[( rand( @alphabet))]; (3)
}
#Init population                                          (4)
for ( 1..$popSize ) {
  my $chromosome = { _str => '',
		     _fitness => 0 };
  for ( 1..$strLength ) {
    $chromosome->{_str} .= $alphabet[( rand( @alphabet))]; 
  }
  $chromosome->{_fitness} = fitness( $chromosome->{_str} );
  push @population, $chromosome;                          (4)
}
#Print initial population
printPopulation();
#Sort population
@population = sort { $a->{_fitness} <=> $b->{_fitness} } @population;
#Go ahead with the algorithm                              (5)
for ( 1..$generations ) {
  my $chromosome = $population[ rand( @population)];
  #Generate offspring that is better
  my $clone ={};
  do {
    $clone = { _str => $chromosome->{_str},
		  _fitness => 0 };
    mutate( $clone );
    $clone->{_fitness} = fitness( $clone->{_str} );
  } until ( $clone->{_fitness} >  $chromosome->{_fitness});
  #Substitute worst
  $population[$#population]=$clone;
  #Re-sort
  @population = sort { $a->{_fitness} <=> $b->{_fitness} } @population;
  #Print best
  print "Best so far: $population[0]->{_str} => $population[0]->{_fitness} \n";
  #Early exit
  last if $population[0]->{_fitness} == 0;
}                                                         (5)
(1)
This is the initial setup for the evolutionary algorithm; it simply declares a group of variables: the number of generations, that is, the number of iterations that are going to be done if the solution is not found, the size of the population, or the number of different data structures present in the population, and several other constants that will be used through the program.
(2)
This function computes fitness. And yes, it could be done in a single line.
(3)
This is the only variation operator we are going to use in this iteration of the evolutionary algorithm: it mutates a single charactere in a random position in the string, substituting it by another random character.
(4)
The population is initialized with random strings; at the same time, the data structure used for chromosomes, which is the conventional name the stuff that evolves is called, is also introduced. From the point of view of evolutionary computation, a chromosome is anything that can be changed (mutated) and evaluated to be assigned a fitness, and fitness is anything that can be compared. In most cases is a real number, but in some cases it can be something more complicated: a vector of different values, for instance. The data structure used for evolution is really unimportant, although, traditionally, some data structures, such as binary strings, floating-point vectors, or trees, have been used used; in many cases, also, there is a mapping between the data structure evolved and the data structure that is a solution to the problem: for instance, we might want to use binary strings to represent 2D floating point vectors, which are solutions to a numeric optimization problems. All in all, the representation issue has been the origin of endless wars in the field of evolutionary computation.

Tip

Use the data structure that best suits your expertise, tickles your fancy, or the one that is closest to the problem you want to solve. Testing different data structures for performance will not hurt you, either.

(5)
This is the meat of the program, the loop that actually does the evolution. Takes a random element of the population, creates a copy of it, mutates this copy until it finds a new string whose fitness is better than the original, which is then inserted in the population eliminating the worst, which probably deserved it.

This program, when run, produces this output: cekyhtvvjh -> 97 mwehwoxscv -> 82 lalmbnbghi -> 81 [More like this...] Best so far: vowjmwwgft => 41 Best so far: vowjmwwgft => 41 Best so far: vowjmwwgft => 41 [Maaany more like this...]

There are several problems with this algorithm. First, the population is not really used, and it is not actually needed. It is actually a hill-climbing algorithm, and very ineffective at that, since it takes an element, improves it a bit, puts it back into the population, takes another one... it would be much better to just take a random string, and start to change it until it hits target, right? In any case, since it is using a random mutation, what we are performing is basically random search over the space of all possible strings. Not an easy task, and this is the reason why the solution is usually not found, even given several hundred thousands iterations.

Tip

Blind mutation usually takes you nowhere, and it takes a long time to do so.

This indicates there is something amiss here; even if nature is a blind watchmaker, it has the help of a very powerful tool: sex. And that is what we will use in the next section.