# Parallel Genetic Algorithm in C++ Using TBB

| Comments

I’ve just finished my AI course, and the idea of evolutionary computing has appealed me incredibly. Basically, evolutionary computing is a set of algorithms inspired by Charles Darwin’s Theory of Evolution with intent of providing some intel on solving optimization problems. On this category, it’s included its most prominent example, the Genetic Algorithm.

But what is the Genetic Algorithm? It’s really pretty simple (and beautiful): Genetic Algorithm is a heuristic to search for the best solution of a optimization problem (such as the Knapsack problem, finding the best parameters of a neural network, and many others…), the ‘best’ here having direct analogy to the Theory of Evolution ‘survival of the fittest’.

Each ‘individual’ (a.k.a., candidate solution) is codified as a ‘chromosome’, which is generally representated by a vector of numbers. Each vector value is called ‘gene’. A set of individuals is called ‘population’. Those ‘individuals’ will then compete to have a chance to ‘reproduce’, which is the act of mixing two chromosomes to generate a new one, different but similar to both parents. The fitter the individual, the greater the chance it has to reproduce (controlled by a function called ‘crossover function’, which will always generate two new individuals, both having a certain percentage of genes of both parents. For example, if we define a crossover rate to being 30%, one child will have 30% of parent A’s genes and 70% of parent B’s genes, while the other will have 70% of parent A’s genes and 30% of parent B’s genes. There’s also a chance of a gene suffer ‘mutation’, e.g., changing to a random value, during the crossover process). The fitness of an individual is measured through a function called ‘fitness function’. This is an iterative algorithm, and on each iteration we have a new ‘generation’ of individuals. There are many stop criterias for this algorithm, one of them being the number of generations. Crystal clear? Then let’s see a simple pseudo-code, shall we? :))

## Implementation on C++

As you can notice, the functions ‘GENERATE_RANDOM_POPULATION’, ‘GET_FITNESS’ and ‘MUTATE’ are all problem dependents. It depains on informations like chromosome size, domain of allowed values, etc., … For our implementation on C++, we are letting the user provide pointers to their own-defined functions. Also, the user may use other informations other than numbers to codify their chromosomes. That’s why we are going to use templates. Our little program starts as simple (or not) as follows:

As C++ is unable to look up an array size on runtime, we also had to add a new parameter to obtain that information (chromosomeSize). Also, notice the parameter ‘maximization’. It serves to turn the algorithm flexible to both, maximization and minimization problems.

Let’s advance and start implementing the main loop.

The function ‘getFitnesses’ is responsible to calculate the fitness value for all individuals. Since it’s only necessary to do once for generation, we moved it to outside the inner loop (which populates the new population). We also calculated a ‘upperBound’, necessary to implementate our roullete (check the pseudo-code). We then finally allocated memory for the new populate, and after it has been populated, we deallocated memory for the old population (through the method ‘clear’) and assigned the reference to the new population. Implementations for the methods ‘getFitnesses’ and ‘clear’ can be found below:

Notice that on ‘getFitnesses’, we had to “invert” the fitness values in case of maximization is disabled. That’s because the roulette always favours the individuals which have the biggest range of values. So we need to do this little trick by subtracing by the maximum value found.

Questions, please? No? Alright, let’s then advance with the implementation of the inner loop.

Notice we are getting the selected individuals through the method ‘selectIndividuals’, that returns a pair, the parents which will generate two new children through the method ‘crossover’. Let’s first take a look at the method ‘selectIndividuals’:

It was a pretty straight-forward implementation of the pseudo-code previously shown. We first select two random numbers between 0 and the upper bound, which is the sum of population all fitness values. We then, on a loop, decrement those values until they reach a value less or equal to zero, indicating which the random individual was found (resembling, in fact, the rotation of a roulette). As a picture is worth a thousand words, here’s an illustration: Hey, did you noticed we assigned the variable ‘chosen’ initially to two random individuals? Well, that’s a protection against when the fitness value for all individuals are equal on minimization problems (can you tell me why would it bug? :)). Crystal clear? Excellent! Now let’s check out the ‘crossover’ function:

There’s nothing much complex happening here. We are just copying parts of chromosomes of both parents to the two new children, following the logic described right on the beginning of this tutorial. Here’s an illustration to make it clearer: We then finally apply the mutation operator over every gene of the new children with a certain probability. Both, probability and mutation behavior are user-defined. Mutations are very useful to help preventing the solutions to get stuck on a local maxima.

And that’s it! That’s our simple implementation of Genetic Algorithm on C++! The full code can be found below:

## Algorithm parallelization

If you read until here, you may feel cheated, since we didn’t even mention the word ‘parallel’. Fear not, my dear friend! Now is the time! However, we must first analyze what can and what cannot be parallelized.

If you give a quick look at the algorithm we just implemented, we may notice the existence of two loops, one nested in another, inside the function ‘geneticAlgorithm’. There’s also a loop for calculating the fitness value for all individuals (function ‘getFitnesses’). There’s a loop for applying the mutation operator (function ‘crossover’) and finally there’s a loop for selecting the individuals (function ‘selectIndividuals’). So far we have five candidates. Which of them can be parallelized? All (yay!)? None (awww…)?

• Candidate I: The outer loop on ‘geneticAlgorithm’
That… cannot be parallelized. Aww… The reason is simple: The individuals of the next generation depains of the individuals of the previous generation. We cannot skip to the next generation without knowing which individuals composed the previous generation.
• Candidate II: The inner loop on ‘geneticAlgorithm’
That… can be parallelized! Yay! And the reason is as simple as the previous one: We can generate new children parallely (you see, even in the real world, parents can give birth to many children at once). In fact, we are not dealing with any collateral effect during the children creation, other than adding them to a write-only list without collision (and which will be just read on the next iteration).
• Candidate III: Loop on ‘getFitnesses’
You may be asking: Which loop? There are actually three on that function, but only one worth to be parallelized. Why? Because the computation being accomplished on last two is too low. The overhead of starting new threads would surpass the computation of the block itself. We just have the first loop now, and that… can be parallelized! Yay! Calculating the fitness for an individual doesn’t depains of the fitness of the other individuals of a population, so it can be easily be done on parallel.
• Candidate IV: Mutation loop on ‘crossover’
Though it can be parallelized, the mutation operator is too simple (it just assigns the position to a random value), so the overhead wouldn’t compensate. So, aww…
• Candidate V: Roulette loop on ‘selectIndividuals’
That’s a though one, since we need to syncronize two values (probA and probB) among several threads. Since we use the two values during the whole block, the lock would certainly break the parallelism and turn everything in a single-thread like schema! So, the answer is no. Aww…

Our score was 2/5, not bad, not bad at all. ;) Now that we know what can be parallelized, let’s find out how we can do it.

We can do it using the library TBB, which stands for Threading Building Bblocks. It provides several features for parallelism, such as multi-threads, locks, concurrent containers, etc., … Everything on a high-level abstraction. Of course it’s not our intent to explain the details of the library, so let’s start with the simplest example: the parallel for.

As the name indicates, the parallel for is a kind of loop where the iterations are executed in parallel. The syntax is as follow:

Here’s an example using blocked_range:

On the above example, we are summing the values of a vector in parallel. Notice the use of a lambda function to implement the ‘body’ of the parallel for. It always expects to receive a range of same type as defined on the ‘range’ parameter (in this case, size_t). A ‘blocked_range’ expects to receive two iterable values, a begin and an end, which can later be accessed inside the ‘parallel_for’ body. The TBB then is responsible to not assign any repeated range value for any thread. In fact, it will create x threads initially, where x is the number of cores of your processor, and will assign the first x values of the range to those threads. The next range values will then be distributed on a queue-like schema.

But enough of theory! That’s all we need to know for while. Let’s starting turning the inner loop on ‘geneticAlgorithm’ into a parallel for!

That was pretty simple. We just simply created a ‘parallel_for’ with a blocked range from 0 to populationSize (identical to the previous for limits), and on the parallel_for body we almost use the same block of code, with the difference we don’t increment the counter ‘j’ before add a new child. Instead of, we increment += 2 on the internal for loop. The reason we decided to adopt this logic instead of previous one is simple: Incrementing the range inside the body would imply in a lock before adding a new child, hence slowing the process unnecessarily. We also don’t need a lock before assigning a new child to the new population, because it’s a write-only array without key collision (we don’t assign two or more to the same index).

OBS: As we are using lambda functions, don’t forget to add the `-std=c++11` flag to the compiler, since lambda functions are a feature from C++11. Also, don’t forget to link the TBB library (`-ltbb`).

Now let’s parallelize our second candidate, the function ‘getFitnesses’. That’s also very simple. Indeed, simpler than the previous one.

And that’s all! You can check the full code here:

## A concrete example: Finding roots of a polynomium

Until now, we just touched the realm of theory, without showing the practicability of the genetic algorithms. After all, for what they are for? So here’s a “simple” minimization problem: Given any polynomium:

axn + bxn - 1 + cxn - 2 + … + C

The objective is to find the roots of the equation, e.g., the value of ‘x’ which evaluates the expression to zero when substituted.

I chose this problem because it’s non-trivial and in fact proved to be NP-Hard. However, numerical approximations do exist. So why not use a genetic algorithm to provide us an approximated result too?

Let’s formalize our problem:

Input:
A real value X indicating the min value of a gene allowed; A real value Y indicating the max value of a gene allowed (X < Y); An integer value N indicating the population size (N > 10); An integer value M indicating the maximum number of generations (M > 0); A real value C indicating the mutation chance (0 <= C <= 1); A real value P indicating the crossover rate (0 <= P <= 1); An integer value I (I > 0) followed by I real values, the coefficients of the polynomial equation. For example: `4 3 4.5 2 1` translates to 3x3 + 4.5x2 + 2x + 1.

Output:
A list of the 10 most fittest individuals after M generations.

Our first step: Let’s start reading the input.

That’s a piece of cake. The next step is to create our own definition of the fitness function. It seems that simply substituing the value of ‘x’ on the polynomial expression will be enough. The closer the result it is to zero, the fitter the solution. As we are dealing with a minimization problem, we have to be careful to prevent negative values.

Now let’s implementate the functions ‘generateRandomPopulation’ and ‘mutate’. Those are also very simple:

Notice we are using chromosome of size 1. That’s because we just want to find one value (the value of ‘x’). Clear? Excellent, because that’s all we need! Now let’s just connect the dots:

We use `srand(time(0))` to send a random seed every time the program is executed, and then a map to sort the retrieved results by the fitness value. The full code can be found below:

Let’s make some simple experiments. For example, for the input: `-100 100 100 1000 0.05 0.5 2 1 0` representating the polynomial equation x = 0, we get values similar to that:

As you can notice, the values are approximating to zero, which is, indeed, the solution for this equation.

Let’s try a harder example. For the input: `-100 100 100 1000 0.05 0.5 3 1 30 2` representating the polynomial equation x2 + 30x + 2 = 0, we get values similar to that:

Where the real solution is `x' ~= -0.066` and `x'' ~= -29.933` (make the calculations). As you can notice, it tried to approximate both roots, through the values -31.4706 and -1.913. That’s already an impressive fact, indeed.

## Conclusion

Wow! This tutorial was longer than I expected. Here we learnt about i) Genetic Algorithms ii) Implementation of Genetic Algorithms on C++ iii) Use of a TBB element, the parallel for, to speed up our algorithm iv) A pratical example using Genetic Algorithm. That’s a lot of things! I really hope you guys had enjoyed as much as I had writing this tutorial. See you on next tutorial! ;)