Extending Algorithm::Evolutionary

With a little help from our friend Perl

First, we are going to take advantage of a Perl module, to extend our library with new classes. There's a wealth of Perl modules out there, and many of them are devoted to working with data structures such as lists or strings. For instance, the Algorithm::Permute class comes in handy to create a permutation operator that acts on strings (included binary strings), as is done next

                                                           (1)
package  Algorithm::Evolutionary::Op::Permutation;
use Carp;
use Algorithm::Evolutionary::Op::Base;
use Algorithm::Permute;
our @ISA = qw (Algorithm::Evolutionary::Op::Base);
our $APPLIESTO =  'Algorithm::Evolutionary::Individual::String';(1)
our $ARITY = 1;                                           (2)
sub new {
  my $class = shift;
  my $rate = shift || 1;
  my $self = Algorithm::Evolutionary::Op::Base::new( 'Algorithm::Evolutionary::Op::Permutation', $rate );
  return $self;                                           (2)
}                                                         (3)
sub create {
  my $class = shift;
  my $rate = shift || 1; 
  my $self =  { rate => $rate };
  bless $self, $class;                                    (3)
  return $self;                                           (4)
}
sub apply ($;$) {
  my $self = shift;
  my $arg = shift || croak "No victim here!";
  my $victim = $arg->clone();
  croak "Incorrect type ".(ref $victim) if ! $self->check( $victim );
  my @arr = split("",$victim->{_str});
  my $p = new Algorithm::Permute( \@arr );
  $victim->{_str} = join( "",$p->next );                  (4)
  return $victim;
}
(1)
This is the usual introduction to modules, which should be preceded with some POD documentation: description, synopsis, and so on. After declaration of the package name, we declare needed modules:Algorithm::Permute, a class for fast permutations, and base class for all operators, Algorithm::Evolutionary::Op::Base. Two constants should be defined also for the module: one of them is optional, the $APPLIESTO variable, which states to which individual class it might apply to; this will be used in the apply method, but if it applies to a whole hierarchy, for instance, all subclasses of String, it's better to find out a more sophisticated check; the second one, $ARITY, is used by other objects to find the number of arguments the apply method needs.

Tip

Do not reinvent the wheel: always look up CPAN when writing operators or individuals; you might find the right class for the job.

(2)
The new method does not do much this time, other than forward object creation to the base class (as all objects should do). An operator just has a rate as default variable; and this one has not got any other.
(3)
This is equivalent to new, and it's a fossil. Do not worry about it, it will probably be eliminated in other versions of the library
(4)
This is the most important method: is the one that actually does the job, and the one it is called to modify chromosomes. Chromosomes are passed by value, that is why it is cloned, and the result of the modification is returned; this is the way the higher-level classes, such as Algorithm::Evolutionary::Op::Full, expect them to behave that way, but they might do something different for particular algorithms (for instance, Algorithm::Evolutionary::Op::QuadXOver takes both arguments by reference, and is used within the canonical GA). This method creates a permutation object from the chromosome, and permutes it, assigns it back to the created chromosome, and returns it.

Other methods, such as set or asXML, can also be overridden from the base class, but just if you want to do something very specific; the base class versions will do just fine in most cases. Sub-classing an Individual, or creating new kinds of data structures to evolve, is just as simple, and it is left as an exercise to the reader.

We will use this class to evolve DNA strings; just as we did before, but, first, our target string will only be composed of A, C, G and T, and, second, we will have no "distance to char", but overall distance among strings. The exercise is purely academic, but a similar problem is solved when sequence alignments want to be done, only in that cases there are several targets. We will use another module, String::Approx to compute approximate distances among strings (ea-ex5.pl).


use Algorithm::Evolutionary::Op::Creator;
use  Algorithm::Evolutionary::Op::Permutation;
use  Algorithm::Evolutionary::Op::IncMutation;
use Algorithm::Evolutionary::Op::Crossover;
use Algorithm::Evolutionary::Op::CanonicalGA;
use String::Approx qw( adistr );
my $target =shift || 'CGATACGTTGCA';
my $maxGenerations = shift || 100;
my $popSize = shift || 100;
my $fitness = sub {
  my $indi = shift;
  return 1 - abs ( adistr( $indi->Chrom, $target ) );
};
my $incmutation =  new Algorithm::Evolutionary::Op::IncMutation;
my $mutation = new Algorithm::Evolutionary::Op::Permutation;
my $crossover = new Algorithm::Evolutionary::Op::Crossover;
my $ez = new Algorithm::Evolutionary::Op::Easy $fitness, 0.4, [$mutation, $crossover, $incmutation ];
my $indiType = 'String';
my $hash = { length => length( $target ),
	     chars => ['A','C','G','T']} ;
my $creator = new Algorithm::Evolutionary::Op::Creator( $popSize, 'String', $hash);
my @population = ();
$creator->apply( \@population );
my $gen;
do { 
  $ez->apply (\@population );
  print "Best so far: ", $population[0]->asString(), "\n";
} until ( $population[0]->Chrom eq $target ) || ($gen++ > $maxGenerations) ;

print "Final\n", $population[0]->asString();
This program is very similar to previous examples. The only differences are that we use a different kind of chromosome, Individual::String, which uses any alphabet, and that we use several variation operators: Op::IncMutation, which increments a single element in the chromosome by one, taking into account the alphabet (that is, it would cycle A -> C -> G -> T); Op::Permutation, which we just declared. The fitness returns the distance between the string and the target string, taking into account the length difference, and the insertions and deletions needed to turn a string into the other. This is a problem, since AA and TA will have the same distance to GA, and there are many mutations which are neutral, leading to no change in fitness. Furthermore, strings such as GAAAAA are at a distance of 1 (or 1/divided by total length) from AAAAAG, but a very lucky permutation is needed to turn one into the other. This leads to the fact that, in this case, the evolutionary algorithm does not always find the solution.

Warning

Combinatorial optimization problems like this one are usually hard for evolutionary algorithms (and for any other search method, for that matter). It always help to have a good knowledge of the problem, and use any other methods available to us to improve search and make the fitness landscape less rugged.