Many of you know this, but I’m a genetic algorithm fanboy. I’ve coded dozens of simulation programs and tools using these evolutionary algorithms, and even recently presented a paper on GAs at the AIAA Modeling and Simulation Technologies Conference in Chicago.
As an optimization algorithm, I can understand and appreciate the simplicity behind the GA concept –- mimic evolutionary biology to discover a solution to a system. And I must say, it’s a very elegant and sometimes creepy method of optimization; I still smile when my creatures suddenly learn how to do something new.
The application of such an algorithm is seemingly unbounded to the programmer’s ability to use them effectively. I’ve used GAs to schedule spacecraft tasks, play Yahtzee, play the stock market, bet on horse races, and even simulate a zombie apocalypse. From my own experience, I can say without hesitation that GAs can be a highly effective optimization tool.
I recently stumbled upon an article on www.Creation.com, a website by Creation Ministries International, whose purpose is to “support the church in proclaiming the truth of the Bible and thus its gospel message” and to provide “real-world answers to the most-asked questions in the vital area of creation/evolution.”
The article in question is by Dr. Don Batten and titled “Genetic algorithms – do they show that evolution works?”.
Now as of yet, neoSprockets has not joined the debunking crowd. But this area is of great interest to me, and I can’t help but drop some strongly-typed Boolean truth on Batten’s GA-denying ass. So get ready for a long post people — idiots get me fired up.
Batten, who has a Ph.D. in Agronomy and Horticultural Science, posts of a list of reasons why he thinks genetic algorithms “do not mimic or simulate biological evolution”.
Having read his points many times, my general sense here is that Batten is basing his entire response on one or two specific GA applications, when he should have read the less-interesting but revealing “mathy” white-papers that really explain the algorithm in detail. So unfortunately many of his objections relate to extremely specific optional aspects of GAs, which he considers to be rules, that are used only to speed up search times.
The other general sense I get here is confusion about intent. The intent of a GA is not to provide a verifiable and validating “evolution simulation”. The intent is to use the concepts of evolution to intelligently search a design-space for a valuable solution. That is to say that there’s nothing stopping someone from using GAs to power an evolution simulation. In fact many people have!
Anyways, enough preamble. Let me pick apart each of his arguments.
A ‘trait’ can only be quantitative so that any move towards the objective can be selected for. Many biological traits are qualitative—it either works or it does not, so there is no step-wise means of getting from no function to the function.
Batten uses the word ‘trait’ throughout the article as analogous to a GA “solution”. Here, he is saying that GAs are not allowed to have a non-stepwise solution; you cannot have a solution that simply works or doesn’t work.
You most certainly can, however your search of the design space will be hindered if this is your only indicator of fitness. Why not use a quantitative fitness?
Batten claims that in biology there exists situations where something either works or it doesn’t – there is no middle ground. I tried very hard to come up with a “qualitative biological trait” that I couldn’t represent quantitatively. This frog can’t jump == this frog can jump 0.0 meters. This dog doesn’t have spots on its head == Average head spot frequency is 0.0%.
In this regard, every trait can be expressed quantitatively, and something like network sorting, where there is a very clear work/doesn’t-work distinction, can be optimized using GAs.
A GA can only select for a very limited number of traits. Even with the simplest bacteria, which are not at all simple, hundreds of traits have to be present for it to be viable (survive); selection has to operate on all traits that affect survival.
What’s to limit someone from optimizing hundreds of traits? See: weighted fitness function.
Something always survives to carry on the process. There is no rule in evolution that says that some organism(s) in the evolving population will remain viable no matter what mutations occur. In fact, the GAs that I have looked at artificially preserve the best of the previous generation and protect it from mutations or recombination in case nothing better is produced in the next iteration.
It’s a common misconception in GAs that each subsequent generation will be better fitted on average than the previous generation. This is not a guarantee.
Secondly, there are no hard-coded rules in a GA – other than the very general concepts of selection, crossover and mutation. Elitism, preserving the best solution between generations, is an optional subroutine to speed up the solution. It is not a rule.
Perfect selection (selection coefficient, s = 1.0) is often applied so that in each generation only the best survives to ‘reproduce’ to produce the next generation. In the real world, selection coefficients of 0.01 or less are considered realistic, in which case it would take many generations for an information-adding mutation to permeate through a population.
Once again, your selection method is not a hard-coded rule. The very common roulette-wheel selection method, for example, does not guarantee that the best creature is selected.
The flip side to this is that high rates of ‘reproduction’ are used. Bacteria can only double their numbers per generation. Many ‘higher’ organisms can only do a little better, but GAs commonly produce 100s or 1000s of ‘offspring’ per generation. For example, if a population of 1,000 bacteria had only one survivor (999 died), then it would take 10 generations to get back to 1,000.
It is common in GAs to use static population sizes in order to control generation run times, but there’s no rule against letting them go dynamic. The downside to this is that letting a population size grow every generation would take a LOT of computing power in later generations. And letting a population die out would mean starting the program over from scratch.
GAs typically assume a hard sequential generation transition — all parents die out instantly, and all children take over immediately. This means that to maintain a steady population, each creature produces a single child creature. This assumption is not at all far off from any steady-state population.
Generation time is ignored. A generation can happen in a computer in microseconds whereas even the best bacteria take about 20 minutes. Multicellular organisms have far longer generation times.
My mind is blown.
This argument exposes Batten’s complete misunderstanding of simulations in general. Reread his objection again. He is stating that GAs do not simulate evolution because they are too fast.
He is attacking the very definition of a simulation!
Replace the GA simulation with any weather simulation. “A storm takes many hours or days to form. Yet this so-called weather simulation calculates a prediction within seconds! Rubbish!”
The mutation rate is artificially high (by many orders of magnitude). This is sustainable because the ‘genome’ is small (see next point) and artificial rules are invoked to protect the best ‘organism’ from mutations, for example. Such mutation rates in real organisms would result in all the offspring being non-viable (error catastrophe). This is why living things have exquisitely designed editing machinery to minimize copying errors to the rate of one in about 10 billion (for humans).
The ‘genome’ is artificially small and only does one thing. The smallest real world genome is over 0.5 million base pairs (and it is an obligate parasite, which depends on its host for many of the substrates needed) with several hundred proteins coded. This is equivalent to over a million bits of information. Even if a GA generated 1800 bits of real information, as one of the commonly-touted ones claims, that is equivalent to maybe one small enzyme—and that was achieved with totally artificial mutation rates, generation times, selection coefficients, etc., etc. In fact, this is also how the body’s immune system develops specific antibodies, with these designed conditions totally different to any whole organism.
This 1800 bit “commonly-touted” and conveniently unreferenced GA that Batten refers to is perplexing. A GA can produce any number of “bits” of information. Hell, the dead-simple GA that optimized my Yahtzee program produced 91 floating point doubles, equating to 5824 “bits” of information. The arguments here are that the genetic information used by GAs are scaled down from common modern organisms.
Once again, this scaling is deliberate for the purposes of narrowing in on the relevant parameters in question. I don’t think anyone is arguing that a multi-cellular organism is more complex than task scheduler program. But then again, nothing stops you from scaling up. You’d just need more computing time/power.
In real organisms, mutations occur throughout the genome, not just in a gene or section that specifies a given trait. This means that all the deleterious changes to other traits have to be eliminated along with selecting for the rare desirable changes in the trait being selected for. This is ignored in GAs.
This statement is simply wrong. GAs permit mutations to occur throughout the genome. Simple as that.
With genetic algorithms, the program itself is protected from mutations; only target sequences are mutated. Indeed, if it were not quarantined from mutations, the program would very quickly crash. However, the reproduction machinery of an organism is not protected from mutations.
This man has never used a computer in his life.
His call to allow mutations in the program itself is hilarious. The GA program acts as governing physical laws, not as the “reproduction machinery” of the organism. When real-life genetic mutation occurs, the universe doesn’t explode, right?
There is no problem of irreducible complexity with GAs. Many biological traits require many different components to be present, functioning together, for the trait to exist at all (e.g. protein synthesis, DNA replication, reproduction of a cell, blood clotting, every metabolic pathway, etc.).
The complexity of the system is entirely left open to the designer. My ZombieSim for example has competing organisms co-evolving in the same environment, each optimized within its own population. A fairly complex system of interactions when you think about it.
Polygeny (where a trait is determined by the combined action of more than one gene) and pleiotropy (where one gene can affect several different traits) are ignored. Furthermore, recessive genes are ignored (recessive genes cannot be selected for unless present as a pair; i.e. homozygous), which multiplies the number of generations needed to get a new trait established in a population.
Also a blatantly false statement. The actions/traits of the agents in ZombieSim are determined by the neural network’s many hundred genes — evidence of both polygeny and pleiotropy. There’s no reason why recessive genes can’t and don’t exist in the system. How a gene is expressed can easily be analyzed.
Multiple coding genes are ignored. From the human genome project, it appears that, on average, each gene codes for at least three different proteins (see Genome Mania — Deciphering the human genome. In microbes, genes have been discovered that code for one protein when ‘read’ in one direction and a different protein when read backwards, or when the ‘reading’ starts one letter on. Creating a GA to generate such information-dense coding would seem to be out of the question. Such demands an intelligence vastly superior to human beings for its creation.
Despite what Batten thinks, humans are actually quite intelligent.
The outcome with a GA is ‘pre-ordained’. Evolution is by definition purposeless, so no computer program that has a pre-determined goal can simulate it—period.
The common use of a GA is to find a good solution before lunch. But a true GA is not goal-oriented and has no end-condition. ZombieSim will run forever (or it should if I ever fix those memory leaks).
Perhaps if the programmer could come up with a program that allowed any random change to happen and then measured the survivability of the ‘organisms’, it might be getting closer to what evolution is supposed to do! Of course that is impossible (as is evolution).
Batten’s final statement here echoes his inability to grasp GAs. The whole damn point is that they do allow random change to happen and measure the survivability of the organism! *bangs head on desk*
With a particular GA, we need to ask how much of the ‘information’ generated by the program is actually specified in the program, rather than being generated de novo. A number of modules or subroutines are normally specified in the program, and the ways these can interact is also specified. The GA program finds the best combinations of modules and the best ways of interacting them. The amount of new information generated is usually quite trivial, even with all the artificial constraints designed to make the GA work.
This final argument may initially appear to have some merit. The claim that no new information can be produced from a GA, that output modules are initially specified and simply changed and adjusted after the fact, is a valid concern. But this concern falls squarely on the system itself, not the GA.
How the genetic information is produced and mutated is the business of the GA. How the genetic information is used to interact with the environment is the business of the system. If you want a situation where new proteins can arise from nothing, the system can do just that. If the system allows it. This may be Batten’s point, but I think the concept in itself is a little self-referencing.
ZombieSim, a genetically optimized neural network, is once again a good example. The neural network is set up to interpret the genetic code as movement “traits”. Including color traits or mass traits is simply a matter of adjusting the system to allow such a trait to exist.
In real-life, no organism is going to develop a trait that cannot exist within the physical constraints of our universe. A frog won’t develop a “25th-dimension quantum-time temperature indicating eyeball” just as my zombies won’t suddenly evolve to play fantasy football. These traits aren’t within the scope of the system. New proteins can arise seemingly out of nothing, but they are not out of the scope of the system.
In review, I think Batten has completely missed the intent of GAs.
He could have just as easily made argument “Some animals have hair. In every genetic algorithm, there is no hair trait”.
The underlying concepts of GAs — selection, mutation and crossover — are based on our theory of evolution. These concepts have been applied in countless applications and are providing priceless cutting-edge breakthroughs in science and engineering. Just as humans once looked toward bird flight and sought to understand the concepts of aerodynamics, we’ve been able to apply our collective hours of observation, research and analysis into an actual working product.
But then again, it’s just a theory, right?
I feel like whatever was said today isn’t going to change anyone’s mind about evolution. But to satisfy Batten and all those GA-haters in the world, I’ve provided the world’s first ever “Creationism Algorithm” for all of them to use.
Let the discussion begin!