Quantum Genetic Algorithms, A 'Natural'?

Finding More Fit Chromosomes Using the Rapid Fire GA

JOSH JONES BART RYLANDER

School of Engineering School of Engineering

University of Portland University of Portland

Portland, OR 97203 Portland, OR 97203

U.S.A. U.S.A.

Abstract: The genetic algorithm (GA) has become a popular tool to help estimate solutions to intractable problems, but can take lengthy amounts of time to converge upon non-optimal solutions. We present simple modifications to the genetic algorithm used to improve its performance. We use these common optimizations to create a hybrid algorithm we call the “Rapid Fire GA” (RFGA) which produces solutions much faster than the traditional GA, and always finds the optimal solution if given enough time. We then compare the effectiveness of each algorithm when computing solutions to three instances of two dissimilar problems. Finally, we provide ideas for new research. This research is important because it presents a potentially better form of the genetic algorithm.

Key Words: Genetic Algorithm, Heap, NP Problem Optimization, Problem Complexity

1 Introduction

Many problems in industry that we would like fast solutions for fall into the computational complexity class NPC (Non-deterministic polynomial time complete). These are problems for which there is no known efficient algorithm (less than exponential time). This is troublesome because it means regardless of the speed of the computer employed, the solutions will not be found in a reasonable amount of time. Many algorithms have been developed to find sub-optimal solutions to these problems. One of these algorithms is the GA. It randomly produces a subset of possible solutions (the population), calculates each solution’s fitness (how good it is), and then produces new solutions (chromosomes), based on past chromosomes. This continues for a set number of generations, or until all chromosomes have converged upon a single solution. This method generates nearly-optimal solutions to otherwise intractable problems, but better solutions cost increasingly more calculation time. Therefore, much research is still being conducted to find better ways to develop better solutions.

In this paper, we compare the results of our hybrid algorithm to that of the traditional GA.

Our traditional GA consists of a population held in a 2-dimensional array, with a separate array to contain each chromosome’s fitness. The GA uses single point crossover, 50% elitism, and single allele mutation at a variable rate. For purposes later explained, our GA does not attempt mutation on the most fit solutions.

Each of the algorithms we produced was run on three instances of two dissimilar problems. The first problem is the trivial “Max Ones” problem. It simply asks, “how many 1’s can there be in a binary string of length n?” It is used to test the efficiency of each GA, because it is plain to see the correct answer is N. We test the algorithms on strings of length 128, 256, and 512.

The second problem is 3-processor scheduling. This is an NPC problem that attempts to allocate a list of tasks to three independent processors in the best time optimal order. Our algorithms again are tested with task lengths of 128, 256, and 512.

2 Common GA Inefficiencies We now present common inefficiencies of the traditional GA, and produce an outline for improvements. The first problem is in the population’s storage using a two-dimensional array. When each chromosome’s fitness is calculated, it must then be sorted according to its fitness. Sorting then involves copying the entire length of the chromosome through the population array in

O(n log n) time. This becomes very burdensome for chromosomes of large length.

A simple solution to this is to represent each chromosome as a struct, and the population as an array of these structs. Each struct contains a fitness variable and a pointer to an array that contains the chromosome’s data. When chromosomes are sorted in this situation, only the pointer itself needs to be copied.

A more trivial problem is the order in which the function to calculate a chromosome’s fitness is called. In the traditional GA, crossover and mutation are performed on the population, and then the fitness function is called on all chromosomes in the population. For some problems, the fitness function can be very burdensome and doesn’t always need to be calculated in full.

A solution is to call the fitness function from the context of the crossover and mutation functions. From these functions the fitness function only need to be called on the chromosomes that are modified. Furthermore, the crossover function can be written so that it generates a chromosome’s fitness as it is being produced. This saves on the overhead of iterating through the length of the chromosome a second time to get its data, as well as the overhead of a function call.

The final modification is to store the chromosomes in the population array using a heap layout instead of linearly sorted. A heap is similar to a complete binary tree in structure, only it has one requirement: each parent has a larger value (fitness) than both its children. The array can be implemented as a heap if each element [i] is said to have children [2i+1] and [2i+2]. The second half of the array then makes up the heap’s leaves, and is therefore still the half with the lowest fitness.

The benefit of using a heap comes when using large population sizes. A sorted array runs minimally in O(n log n), even when only minimal changes have been made. Maintaining a heap runs at only O (log n). This is a conservative upper-bound in the GA, because we only need to call heap operations on the single chromosomes only as they change, not the entire array.

Although the heap provides benefits with large array sizes, implementing the heap has drawbacks. Knowing that only one of the leaves possesses the worst fitness, we must scan the entire second half of the population to check for 100% convergence, where before we could simply compare the fitness of the first and last elements. Furthermore, it is possible for a leaf in a heap to have a higher value than a non-leaf, as long as the non-leaf is not its parent. Using a more conservative elitism (less then 50%) may avoid this problem. Finding at which size of culture the heap becomes more efficient than a sorted array and the leaf-uncle inversion’s effect on the converged solution are suggested topics of research for the interested reader.

3 Enhancement Results We created an enhanced GA based upon the preceding ideas, and compared it to the traditional GA described in the introduction. We ran two separate parameters for each problem instance. First, the GA ran with a small population and high mutation rate. Second, it ran with a large population and small mutation rate. The smaller population size with larger mutation rate universally ran better on both GA’s, but we continue testing with larger population sizes for comparison with the RFGA later in this paper. The population sizes and mutation rates were chosen based on which seemed to converge within 5-15 seconds, or reach the optimal solution most quickly. We assumed equal running times on the same algorithm produced similar outcomes. The following table shows the results.

|Length |# Chromsms. |Mut. Rate |Generatns./ Conv. |

| | | |Time |

|128 |20 |25 |536 (0s) |

|128 |200 |3 |501(5 s) |

|256 |30 |20 |542 (3s) |

|256 |200 |2 |312 (3s) |

|512 |50 |12 |1351 (10.5s) |

|512* |200 |1 |200 (18s) |

Table 1: Data from running the traditional GA on the max 1’s problem.

*did not reach the optimal solution

|Length |# Chromsms. |Mutatn. Rate |Generatns./ |

| | | |Conv. Time |

|128 |30 |28 |10,860 (1.3s) |

|128 |200 |4 |2599 (3s) |

|256* |30 |28 |1941 (0s) |

|256 |200 |4 |3681 (6s) |

|512 |50 |15 |6540 (6s) |

|512 |200 |4 |2753(9s) |

Table 2: Data from running the enhanced GA on the max 1’s problem.

*-An unexpected decrease in calculation time

The data shows us that for this trivial problem, the enhanced GA greatly reduced computation time as the number of generations to a solution increases. However, for smaller population and chromosome sizes, the enhanced GA needed more generations (and thus more time) to converge upon the optimal solution. We hypothesize this is due to the leaf-uncle inversion affect in the heap causing the GA to converge early for simple tests, and is recommended for further research.

To make the task list for our 3 processor-scheduling problem reproducible, we used the same four tasks repeatedly until it reached a list of length n. The 4 tasks are as follows:

Processor: 1 2 3

Task 1: 16 4 12

Task 2: 10 7 15

Task 3: 7 11 15

Task 4: 4 3 4

|# of Tasks|Chromsms/Mutation |Generatns./ |Avg. Result/ |

| |Rate |Conv. Time |Standard Dev. |

|128 |8/68 |2199/6.3 |272.8/4.7 |

|128 |200/3 |1859/19 |306/7.5 |

|256 |8/65 |11,068/6.86 |580/85 |

|256 |200/2 |261.4/8.00 |651.9/7.3 |

|512 |8/66 |14,877/16.1 |1111.4/20.9 |

|512 |200/1 |143.4/11.6 |1362.2/20.3 |

Table 3: The traditional GA when solving 3-processor scheduling.

|# of Tasks|Chromsms/Mutation |Generatns./ |Avg. Result/ |

| |Rate |Conv. Time |Standard Dev. |

|128 |8/77 |68,754/ 8.4 |269/ .95 |

|128 |200/4 |2,355/4.36 |304.7/ |

| | | |6.4 |

|256 |8/70 |23,759/5.7 |554.1/ |

| | | |17.0 |

|256 |200/3 |649.4/3.0 |649.9/ |

| | | |18.8 |

|512 |8/73 |12,863/6.2 |1148/ |

| | | |66.3 |

|512 |200/3 |1708/11.0 |1357/ |

| | | |15.5 |

Table 4: The enhanced GA when solving 3 processor scheduling

The data clearly shows the enhanced GA calculates each generation much faster then the traditional GA, though more research with higher elitism is needed to make its crossovers as efficient as the traditional GA.

4 The RFGA Analysis of the GA, traditional or enhanced, shows a flaw still remains in the algorithm’s structure. First, the GA must produce the solution it will eventually output as its result. Then, that solution must pass through the rest of the population before the GA can converge and terminate. Running larger populations with higher mutations may provide better final solutions, but may cause the time for convergence to approach infinity. The GA wastes an expensive amount of time waiting for convergence. Some GA’s put a limit on the number of generations to process while waiting for convergence, but this has the possibility of exiting the algorithm while new solutions are still being produced.

The RFGA algorithm is a modification of the GA that bypasses these wasted calculations and focuses on producing the most optimal answer in the quickest amount of time. Instead of terminating when the GA converges, the RFGA only focuses on how frequently the population’s most optimal solution is improved. Every time a new best chromosome is produced, the RFGA stores a system time stamp. After each generation, the program checks how much time has gone by since the last new solution improvement. When the time reaches a certain length, the program terminates.

For example, we can tell the program to continue trying to find new solutions until 10 seconds have past without an improvement. We can then give it a large population size, and a high mutation rate. Because the RFGA doesn’t concern itself with convergence, these high values rapidly produce new chromosomes with the potential to be more fit than any current chromosomes. Furthermore, we do not need to guess an efficient size of generations to terminate that will not exit the GA prematurely. The RFGA will continue until 10 seconds go by without a new best chromosome, at which point the program will exit.

It is now plain to see why the first GAs did not attempt to mutate the most fit chromosome. The RFGA needs this to keep track of how often a new best chromosome is produced, and still have it to be able to output a solution when the program terminates.

The RFGA builds off of the enhancements made to the traditional GA. Its purpose is to go through as many generations as quickly as possible as long as new solutions are being produced. Therefore it is necessary to maximize the speed at which it performs this, without regard for how efficiently it would otherwise converge.

Convergence is actually a hindrance for the RFGA. Once its chromosomes begin to match a local optima, it is difficult to break out and continue producing new solutions. This is easily counteracted by increasing the mutation rate for large populations. However, in our tests high mutation rates tended to initially slow down crossovers. Further research is suggested on the effectiveness of the RFGA using dynamic mutation rates, which increase as the algorithm is running based on convergence rate.

We tested the RFGA against the same problems as the two previous GAs. The results took two forms. The first test calculated the average result produced by the RFGA. That is, if the enhanced GA gave a solution in 16 seconds, what was produced by the RFGA at 16 seconds? Secondly, we tested how long it took the RFGA to find the actual optimal* solution to the problem.

|String Length |Chromosomes |Mutation |Time to optimal |

| | | |Solution |

| | | |(seconds) |

|128 |50 |5 |0* |

|128 |200 |5 |0 |

|256 |50 |15 |0 |

|256 |200 |5 |0 |

|512 |30 |70 |1 |

|512 |200 |20 |1 |

Table 5 gives the time for the RFGA to find the optimal solution to the Max 1’s problem

*-0 seconds means the solution was produced virtually instantaneously

|Tasks |Chromosomes/ |Time to Match/ |Optimal |

| |Mutation |Solution found |Solution*/ |

| | | |Time to find |

|128 |8/75 |1 second/ |264/ |

| | |271 |25 seconds |

|128 |200/70 |1 second/ |264/ |

| | |266 |2 seconds |

|256 |8/80 |545/1 second |528/40 seconds |

|256 |200/70 |532/1 second |528/5 seconds |

|512 |8/90 |1094/15 seconds |n/a |

|512 |300/65 |1053/ |1051/ |

| | |25 seconds |40 seconds |

Table 6 gives the time for the RFGA to find solutions to 3 processor scheduling.

*-Optimal solution is produced by running the RFGA for extended amounts of time, and therefore is only an estimate and not proven.

From these results it is obvious that the RFGA performs much more quickly, and produces much better results then the typical GA algorithm. The data was produced on a 1.0 GHz Athlon AMD running Windows 98.

5 Conclusion We have presented the GA and some of its inefficiencies. We have listed simple fixes to these inefficiencies in the implementation of an enhanced GA. Each was then tested on three instances of two dissimilar problems, and examined. Finally, we presented the RFGA as a hybrid alternative to the GA, which logically followed the enhancements used in the second GA. The RFGA was then used to solve the same problems, and showed tremendous potential in producing better solutions than the GA. Throughout the paper we mention ideas for further research: The leaf/uncle inversion problem in the heap structure and its affect on crossover efficiency, using a more strict elitism than 50% to counteract the same problem, the difference in efficiency between a sorted array and heap structure in a GA, and finally, the use of dynamic mutation in the RFGA.

Other directions to take would be to test the RFGA against other NPC problems, such as the traveling salesman, or max clique. Another is to compare the RFGA to other evolutionary methods, such as Particle Swarm Optimization and Ant Colony Optimization.

References:

[1] Gotshall, S., Rylander, B., Optimal Population Size and the Genetic Algorithm. World Science and Engineering Society Conference on Soft Computing, 2002.

[2] Holland, J., Adaptation in Natural and Artificial Systems, Ann Arbor, MI, University of Michigan Press, 1975

[3] Rawlings, G.J.E. (1991). Introduction. Foundations of Genetic algorithms, Morgan Kaufman. Pp. 1-10.

[4] Rylander, B., Foster, J., GA-hard Problems, Proc. On Genetic and Evolutionary Computation Conference, 2000.

[5] Vose, M., The Simple Genetic Algorithm, Morgan Kaufman, 1999

................
................

In order to avoid copyright disputes, this page is only a partial summary.

To fulfill the demand for quickly locating and searching documents.

It is intelligent file search solution for home and business.

Literature Lottery

To fulfill the demand for quickly locating and searching documents.

Related download

Related searches