William John Holden



The combination formula has immediate applications to a wide range of disciplines, including statistics, project management, computer science, and finance. The wide reach of the combination formula gives it an importance that demands greater study. In this paper we will study the combination formula in some depth to encourage an intuitive understanding of its behavior and computation.A combination is an unordered subset of some set S. There are several common notations for combinations, including:n C r=Cn,r=nrnr can be read “n combinations taken r at a time,” “n choose r,” or “n take r.” There is a simple formula to compute nr, known as the combination formula or binomial coefficient, that we will introduce at the end of this chapter.Consider a set S={a,b,c,d,e,f}. The cardinality (size) of this set is S=6, so n=6 for use in the combination formula. If we wish to divide S into subsets into unit size then the set containing all possible subsets is:a,b,c,d,e,fWe can easily see that if we want subsets of size 1 then there are 61=6 possible ways to do this. We can generalize this ton1=nfor all positive integers n. In the special case of n=0 we say 01=0 because it is never possible to take more elements than are in a set. That is, 0≤r≤n.The case 00 is worth investigating. If S=0 then S={}, an empty set. Given an input set with no elements, then there is only one way to combine the empty elements of this set into another set – also an empty set. So 00=1. In fact, the same logic applies for all n≥0. It also works for the case r=n; there is only one combination of a set that contains of its elements. Thereforen0=nn=1A string of bits can be used as a convenient method to represent the presence or absence of an element in a set. For S={a,b,c,d,e,f} we need only six bits. We can easily iterate through all possible bit combinations to see that there are 26=64 possible combinations.000000001000010000011000100000101000110000111000000001001001010001011001100001101001110001111001000010001010010010011010100010101010110010111010000011001011010011011011100011101011110011111011000100001100010100011100100100101100110100111100000101001101010101011101100101101101110101111101000110001110010110011110100110101110110110111110000111001111010111011111100111101111110111111111If n=6 and r=2 then there are 15 possible combinations of bit strings with a Hamming weight of two:000011, 000101, 000110, 001001, 001010, 001100, 010001, 010010, 010100, 011000, 100001, 100010, 1001000, 101000, 11000000Notice that inverting all of these bits give the opposite combination.111100, 111010, 111001, 110110, 110101, 110011, 101110, 101101,101011, 100111, 011110, 011101, 0110111, 010111, 00111111With two bits set we represented all combinations with two elements chosen. By inverting each bit, we get all combinations with two elements not chosen. This generalizes to the identitynr=nn-rThe sum of all possible combinations 0≤r≤n for a set of size n is 2n.r=0nnr=i=1n2=2nThe case n2 has an intriguing relationship to the triangular numbers. If we start with S and select a then we must choose one of the remaining five elements. If we do not choose a and choose b then we must select one of the remaining four elements. If we choose neither a nor b but choose c then we must select one of the remaining three elements. This process continues until we choose e and have only f available to select. If we select no element before f then it is not possible to take two elements from a subset of size one (recall r≤n). In general,n2=n-1+n-2+…2+1=k=1nn-k2=k=1n-1k=nn-12The final case we will consider before moving on to a general formula is for r=3. For S={a,b,c,d,e,f} we choose a and now must take two of the remaining five elements b,c,d,e,f?S, which is given by 52. Next we eliminate a and choose b and take two of the remaining four elements {c,d,e,f}, which is given by 42. We select d and are forced to take each element in {e,f}, at which point no further combinations are possible. (Recall that in cases r>n we define nr=0). We therefore realize n3 is actually the sum of triangular numbers.n3=n-12+n-22+n-32+…+22+12=k=1n-1k2We will construct a closed-form solution for n3 but first let us develop an intuition for its structure. Visualize each triangular sum n2 as a triangle of bricks with unit height. A stack of these bricks will form half of a pyramid. We can easily find the volume of a pyramid using calculus. Assuming the height h of a pyramid relates to its width w linearly by some constant k=wh then the area A□h=w2=kh2=k2h2. Then the volume of the pyramid with a square base is given byV□h=0hA□x?x=0hk2x2?x=k20hx2?x=k2x330h=k2h33=wh2h33=w2h3If the pyramid has a triangular base instead of a square then A△h=A□(h)2 and the volume becomesV△h=0hA△(x)?x=0hA□(x)2?x=120hA□x?x=12V□h=w2h6We expect to see a constant factor of 16 in any closed-form solution to n3. For finding a closed form sum of triangular numbers we need the identityk=1xk2=2x+1x+1x6We prove this identity by induction. Given a basis ofk=11k2=21+11+116=3216=1Assume the induction hypothesis is true for some x=j and the sum is 2j+1j+1j6. Then at x=j+1 the sum must be larger by j+12.k=1j+1k2=2j+1j+1j6+j+12=2j2+2j+j+1j6+j2+2j+1=2j3+2j2+j2+j+6j2+12j+66=2j3+9j2+13j+66We see that this sum is equal to the following which proves the induction hypothesis is true for all j∈Z+.k=1j+1k2=2j+1+1j+1+1j+16=2j+2+1j+2j+16=2j+3j2+j+2j+26=2j+3j2+3j+26=2j3+6j2+4j+3j2+9j+66=2j3+9j2+13j+66 ?Armed with the identity for summing a series of squares we can now findn3=k=1n-1k2=k=1n-112k2-k=12k=1n-1k2-k=12k=1n-1k2-k=1n-1k=12(2n-1+1)(n-1+1)(n-1)6-(n-1)(n-1+1)2=12(2n-1)(n)(n-1)6-(n-1)(n)2=nn-122n-16-12=nn-122n-16-36=nn-122n-46=nn-1n-26We now have identities for combinations with 0≤r≤3. We could continue creating additional identities but we now have enough information to recognize the behavior of the combination formula. Again, for the purposes of building intuition, consider the hypervolume of a series of n-1 half pyramids arranged in ascending size. Though impossible to visualize, we can calculate the hypervolume using the repeated integralV△*h=0hA△(x)?2x=0hA□(x)2?2x=120hk2x33?x=12(k2x4)120h=k2h424=w2h224Notice the denominator of the hypervolume is 24 in four dimensions. It was 6 in three dimensions, 2 in two dimensions, and unit in a single dimension. Further integration into increasingly high-dimensional space would increase the denominator in the hypervolume equation according to the power rule. We can also see a pattern in the numerator. At r=1 the numerator was n, for r=2 the numerator was n(n-1), and at r=3 the numerator was n(n-1)(n-2).We can also understand this behavior intuitively. At r=2 we found nn-12 combinations. n-1 because we don’t combine any element with itself (otherwise it would have been n2) and 12 to deduplicate combinations with different orderings. In the combination formula we consider sets like {a,b,c}, {a,c,b}, {b,a,c}, {b,c,a}, {c,a,b}, and {c,b,a} to be the same, hence we would divide by 16 in the case of r=3. If we wanted to count all of these orderings then we simply would not divide by this factor. A counting of subsets where ordering matters is known as the permutation. In general, we can recursively define any combination by the sum of n-1 combinations taken r-1 at a time.nr=k=1n-1kr-1A recursive definition to calculate arbitrary combinations is equal to the canonical definition using factorials, known as the binomial coefficient identity.nr=n!r!n-r!Another recursive definition for finding combinations views the problem as a decision tree. Consider again S={a,b,c,d,e,f} and the problem 63. We construct a binary tree rooted at the original problem 63, where we branch based on whether we choose or do not choose a. If we choose a then we are left with 5 elements from which we must take 2. If we do not choose a then we must choose 3 elements from the remaining 5. Thus, 63=52+53.At the next level of the decision tree we choose or do not choose b. On one branch we had to choose two elements we now recurse to 52=41+42. On the other branch, where we had to choose three elements, we find 53=42+43. Recursion continues to base cases, such as n0=1 and 0r>0=0. The combination formula constructed as a decision tree generalizes to an identity known as Pascal’s Formula.nr=n-1r-1+n-1rNow let us consider the computational complexity on a computer.First, we compute nr=n!r!n-r! directly.def binomial(n, r, f): return f(n) // (f(r) * f(n - r))The parameter f, in this context, refers to a factorial function that is specified by the caller. Using Python’s built in math.factorial function we easily construct Pascal’s Triangle.import mathfor n in range(10): print(*[binomial(n, r, math.factorial) for r in range(n+1)])When executed, the program outputs Pascal’s Triangle:11 11 2 11 3 3 11 4 6 4 11 5 10 10 5 11 6 15 20 15 6 11 7 21 35 35 21 7 11 8 28 56 70 56 28 8 11 9 36 84 126 126 84 36 9 1It is simple to implement our own na?ve factorial function:def _factorial(n): return 1 if n == 0 else n * _factorial(n - 1)We can compare the performance of the built in math.factorial function to our own implementation.def benchmark(limit, f): start = timer() for n in range(limit): [binomial(n, r, f) for r in range(n+1)] end = timer() return end – startWhen executed we find the built in math.factorial function is considerably faster than the na?ve _factorial function we have built. This is actually quite common in programming. Developers of popular frameworks often invest in finding optimal approaches to computation that might not seem obvious. For example, it is faster in practice to compute factorials by divide and conquer rather than a straightforward multiplication.>>> benchmark(100, _factorial)0.19996605199412443>>> benchmark(100, math.factorial)0.02862050899420865Interestingly, we can beat Python’s built in math.factorial easily with dynamic programming by memozing factorial calculations for later. Python has a very convenient syntax for caching results of pure functions.from functools import lru_cache@lru_cache(maxsize=1024)def _factorial_dp(n): return 1 if n == 0 else n * _factorial_dp(n - 1)This single directive instructs Python to check if an identical invocation to _factorial_dp has been cached and, if so, return this result immediately.>>> _factorial_dp.cache_clear()>>> _factorial_dp.cache_info()CacheInfo(hits=0, misses=0, maxsize=1024, currsize=0)>>> benchmark(100, _factorial_dp)0.009150605008471757>>> _factorial_dp.cache_info()CacheInfo(hits=15149, misses=100, maxsize=1024, currsize=100)Here we see that the use of dynamic programming saved 15149 invocations of the _factorial_dp function when calculating the first 100 rows of Pascal’s Triangle. The benefits of the dynamic programming approach become increasingly pronounced as the size of the input grows.>>> _factorial_dp.cache_clear()>>> benchmark(500, _factorial_dp)1.2330169359920546>>> benchmark(500, math.factorial)5.665788909012917>>> _factorial_dp.cache_clear()>>> benchmark(600, _factorial_dp)2.3750273040204775>>> benchmark(600, math.factorial)11.008371475007152>>> _factorial_dp.cache_clear()>>> benchmark(700, _factorial_dp)4.588109176023863>>> benchmark(700, math.factorial)20.517555702012032The dynamic programming approach can also be beneficial if we implement Pascal’s Formula. Again, we define a benchmarking procedure to measure the time necessary to complete the calculation. The benchmark accepts a “combiner” that will be used to calculate the first n rows of Pascal’s Triangle.def benchmark_combination(limit, combiner): start = timer() for n in range(limit): [combiner(n, r) for r in range(n+1)] end = timer() return end – startdef combination(n, r): if r == 1: return n elif r > n: return 0 elif r == 0 or r == n: return 1 else: return combination(n-1, r-1) + combination(n-1, r)To our horror we heuristically assess that the doubly-recursive combination function has exponential computational complexity.>>> benchmark_combination(10, combination)0.00042666701483540237>>> benchmark_combination(15, combination)0.01228521199664101>>> benchmark_combination(20, combination)0.2757053799869027>>> benchmark_combination(25, combination)8.162856736016693Dynamic programming to the rescue!@lru_cache(maxsize=65536)def combination_dp(n, r): if r == 1: return n elif r > n: return 0 elif r == 0 or r == n: return 1 else: return combination_dp(n-1, r-1) + combination_dp(n-1, r)We change nothing in the code but enable caching of results. The maxsize parameter controls how large the dictionary can grow. A larger dictionary is not strictly better in practice.>>> benchmark_combination(1000, combination_dp)0.6813005529984366>>> combination_dp.cache_info()CacheInfo(hits=995006, misses=500500, maxsize=65536, currsize=65536)The memoized version of Pascal’s Formula outperforms even our memoized factorial function used in the binomial formula. Why? Pascal’s Formula consists only of addition, whereas the binomial formula requires a large factorial divided by another large factorial. Multiplication is generally quadratic, On2 in complexity (slightly faster with Karatsuba’s method) whereas addition is an O(n) linear time operation. With dynamic programming we trade calculation for memory and reduce an exponential time algorithm to linear time. ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download