Problem : To generate all Combinations of a set of distinct elements

Before we start discussing about the implementation we will go through the basic definitions of Combinations. Then we discuss the method to generate all the Combinations with examples and descriptions. Then the source code to implement the method will be presented. Continue reading.

Definitions

Combination:
  • A Combination of a set of distinct objects is an un-ordered arrangement of some or all of these objects.Let A = {1,2,3} , then {1}, {2}, {3}, {1,2}, {2,3} , {1,3}, {1,2,3} and {} are the all the combinations of set A
r-Combination:
  • An r-Combination of a set of distinct objects is an un-ordered arrangement of ‘r’ elements of these objects. Let there be n distinct elements in a set, then a combination of r elements selected from that set is an r-combination of the set.Let A = {1,2,3} then {1,2},{1,3},{2,3} are the 2-permutations of set A

A combination of a certain set is simply a subset of it. Like the sets {1,2} and {2,1} are the same combinations, because they are the same subsets of the set {1,2,3}. If there are ‘n’ elements in a set, then combinations with length ‘r’ (where r < n) would be (^{n}C_{r}) = 1 . The total number of combinations that can be generated is \sum_{r=0}^{n}(^{n}C_{r}) = 2^{n}. No repetitions of a certain set element is allowed here.

The Idea

The main idea is to group the bit positions with a ‘1’ of the binary count sequence and use that group to select a combination from a string. Each binary count is unique ie. the places where it has 1’s (or 0’s) are unique in each count. For each count we will get a unique set of bit positions where the bit is 1. For example, the 4 bit binary count starts from 0000 and ends in 1111 . When a bit position is ‘1’ we can think that bit position is selected. and such in 1010 bit position 1 and 3 are selected. Because each count has a unique set of positions which is ‘1’ , each count will select a unique set of positions, and as we count up different combinations of bit positions get selected, thus counting from 0000 to 1111 will select all possible combinations of bit positions. To select a combination from a source string this unique set of bit positions can be used to filter-out/select-from another string and make a combination.

Lets go through this with an example. Let there be a source string of length 4. Each count of the 4 bit binary number has 1’s and 0’s in a certain position and no other count has 1’s and 0’s in the same positions. Like in 1011 there are 1’s in the 0th , 1st and 3rd position and a 0 in the 2nd position. No other binary count has the 1’s and 0’s in the same positions as 1011 . We group the bit position and form a set. Like in the previous example, 1011 will make a set {0,1,3} . For each count we can make such a set consisting of bit-positions where the bit is a 1. Each count will generate a unique set of bit positions. Let the source string of length 4 be "abcd". For each such set generated from the binary count, we use the set elements (position numbers) to select from the string "abcd" and generate the combinations. Like the {0,1,3} selects the combination "dca". As we count up the binary sequence different positions get selected getting new sets and because each set is unique it will select a unique set of characters from the string “abcd” and form combinations.

All the combinations of a 4 length string is shown below. Study the below table to get the idea.

+-------+------------+-----------+
| Binary | Position  |  Selected |
| Count  | Set       |  String   |
+-------+------------+-----------+
| 0000   | {}        |  ""       |
| 0001   | {0}       |  "a"      |
| 0010   | {1}       |  "b"      |
| 0011   | {0,1}     |  "ab"     |
| 0100   | {2}       |  "c"      |
| 0101   | {0,2}     |  "ac"     |
| 0110   | {1,2}     |  "bc"     |
| 0111   | {0,1,2}   |  "abc"    |
| 1000   | {3}       |  "d"      |
| 1001   | {0,3}     |  "ad"     |
| 1010   | {1,3}     |  "bd"     |
| 1011   | {0,1,3}   |  "abd"    |
| 1100   | {2,3}     |  "cd"     |
| 1101   | {0,2,3}   |  "acd"    |
| 1110   | {1,2,3}   |  "bcd"    |
| 1111   | {0,1,2,3} |  "abcd"   |
+-------+------------+-----------+

Note: The 0th binary bit is the rightmost bit of the binary number. The 0th character is the leftmost character in the string.

The left side column denotes the binary counts, in the middle column we make the set of the bit positions which have a ‘1’ in the corresponding binary count. In the right most column, we use the bit positions from the set to select from the string. The character positions are selected from the source string, denoted by the corresponding bit position set. Like in the count 1001 the set is {0,3} so we select only the 0th and 3rd characters from the source string and thus the selection is “ad” . After all the binary count of a fixed bit length (4 bits here) is finished we have acquired all the unique sets and thus using the sets we have got all the combinations of the string.

For a 4 length string there is a total of 24 = 16 combinations. The break up is as below:

Zero length combinations : ^{4}C_{0} = 1
One length combinations : ^{4}C_{1} = 4
Two length combinations : ^{4}C_{2} = 6
Three length combinations : ^{4}C_{3} = 4
Four length combinations : ^{4}C_{4} = 1
Total = 16 combinations

The combinations generated in this method is not ordered according to string length. We cannot generate the r-combinations of a string for a certain value of ‘r’ with this method. To generate a fixed length say only select 4-combinations or 3-combinations from a set of n elements we need to generate the binary count in such a way so that it first counts all the binary numbers having four 1s . This is not discussed here.

The benefit of this method is we can generate combinations sorted according to weight. This means, if there is a certain weight (a value) assigned to each of the string characters, and we sort the characters of the string ascending (or descending) to their weights, then the combinations would be generated in ascending (or descending) weight order. In this way we can guarantee each combination generated has a lesser or an equal value than its previously generated combination. This can be used to select sub-strings in a scrabble solver application, as each letter has a weight there and so this combination method will generate high-scoring patterns to low-scoring patterns, and can be used to cut down the search space for the best solution by stopping as soon as the first scrabble match is found. Although special bonus scores are to be handled separately and will make this more complicated.

Programming Approach

The main problem is to decide how we will generate the binary counts. The simplest way is to use a normal regular unsigned integer variable to count and then use bit masking to determine the ‘1’ bit positions and only allow that certain character position of the source string to be selected. This has some limitations as the length of unsigned integer is not a fixed value and will change in different platforms. Generally the length of integer is 4bytes (at least 2bytes), so we get a 4*8 = 32bits length bit field. This restricts the combinations to be generated to 32 length. A long long int will have 8bytes = 64bits length.

Another method is to use an r-permutation program to count a binary string sequence and use that string to select from the combination source string. A r-permutation engine can be found here. This can be easily modified to count binary sequence by just setting the source set to “10” and tell the length of the binary count to generate all the binary numbers up-to that length.

In the second method practically there is no combination length limitation so we can generate even 100 length combinations also. The main draw back is the extra calculations and time needed to generate the permutation string and filtering them out. So there is a combination length (range) and the execution time trade-off. We will present the first approach where we will generate combinations with the help of an integer.

In most applications we will not need such long combinations to generate. Like for a scrabble game we do not expect a word to have more than 32 characters. Also You get 7 chips per turn in a scrabble game. In a single player scrabble the combination length is limited to 7. The first method would be good enough for these applications.

The Sourcecode

We now present the source code to generate the combinations of a source string input thriugh the keyboard. The code is discussed afterwards,

/* Program to generate all r-Combinations of a set with distinct element
 * This code is a part of https://phoxis.wordpress.com/2009/10/13/allcombgen/
 */

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

#define MAX 100
#define TRUE 1
#define FALSE 0

int generate_combination (const char *source_string,char *combination_string);

unsigned int comb_mask = 0x00000001;
unsigned int max_mask_length;

/* Main Function, driving generate_combination() */
int
main (void)
{
  char source_string[MAX], combination_string[MAX];
  int len;

  printf ("Enter the Source String:");
  scanf ("%s", source_string);

  len = strlen (source_string);
  max_mask_length = ~(0x00000001 << len);

  printf ("\nPrinting Combinatins:\n");
  while (1)
    {
      if (generate_combination (source_string, combination_string) == FALSE)
        break;
      printf ("%s\n", combination_string);
    }
  printf ("\nfinished\n");
  return 0;
}

/*
 *  File Name     : combination.c
 *  Function Name : generate_combination
 *  Parameters    :
 *                @ (const char *) source_string
 *                @ (char *)       combination_string
 *  Return Type   : (int)
 *                    # return FALSE if no more mermutations, else return TRUE
 *  Globals       :
 *                @ (unsigned int) comb_mask
 *                    # A bitmask. Used to select unique combinations from source
 *                      string. This function increments and updates this mask
 *                      in each iteration. Each call results in a new permutation.
 *                @ (unsigned int) max_mask_length
 *                    # A bitmask. Used to control the length of masking of
 *                      comb_mask variable. This is not modified in this function.
 *                      Initilized in function main()
 *  Description   : Each call to this function generated a new combination from
 *                  the source_string and then places into combination_string
 *                  The combination is done based upon comb_mask variable. The
 *                  positions, in which comb_mask has a '1' , are only selected
 *                  from the source_string to make a combination.
 *  Note          : The combination generation is comb_mask dependent. To generate
 *                  unique combinations in each call comb_mask should not be modified
 *                  any where else, if some explicit customized combinination is to be
 *                  generated. Each bit field has a fixed length (32bit) , the combintaion
 *                  can be generated is limited to this length, in this version of code.
 *                  This function does not return the null set.
 */

int
generate_combination (const char *source_string, char *combination_string)
{
  unsigned int mask = 0x00000001;
  int s_pos = 0, c_pos = 0;

  /* Main logic */
  while ((mask & max_mask_length))
    {
      if ((comb_mask & mask))
      {
          combination_string[c_pos] = source_string[s_pos];
          c_pos++;
      }
      s_pos++;
      mask <<= 1;
    }

  /*update permutation mask */
  comb_mask++;

  /* Terminate the combination_string with NULL character */
  combination_string[c_pos] = 0;

  /* If combination_string is empty, ie. c_pos == 0 , return FALSE else return TRUE */
  return (c_pos == 0 ? FALSE : TRUE);
}

Click Here To Download this Code

Code Description

The variable comb_mask here is the bit field which we use to count up the binary sequence and make the selection set. The comb_mask variable is set to 0x00000001 , that is the upper 31 bits are set to ‘0’ and the LSB is set to ‘1’, and counted upwards. The max_mask_length is used to limit the masking of variable mask and comb_mask up-to the length of the source string. Its is set to ~(0x00000001 << len) . If the length of the source string is 'n' then the '(n+1)'th position of the max_mask_length will be ‘0’, which determines the loop’s termination. The mask variable is initialized to 0x00000001 . It is left bit shifted in each iteration to shift the ‘1’ from right most to left positions. In each iteration it is checked with the max_mask_length to see if the max bit field length is exceeded and if the iteration should stop as all combinations are generated.
The loop termination is as follows: The mask has only one ‘1’ in a certain position. It is AND-ed with max_mask_length. If the length is 4 then max_mask_length will have a ‘0’ in the 5th position. The loop will be true till mask has the ‘1’ at positions less than 5th. Whenever mask will have the ‘1’ in the 5th position the bitwise operation (mask & max_mask_length) would be false , (both are complements of each other), and the loop will terminate. Thus length of the count is bound inside a limit.
For example if the length of the source string is 4 then max_mask_length is ~(0x00000001 << 4) = ~(0x00000010) = 0xffffffef = 11111111111111111111111111101111 in binary. in the while loop this value is masked with mask, when ever the mask’s 4th bit would get 1 it would mean that it has counted from 00000 to 01111 and generated all the combinations and now has become 10000 so we would stop. Thus the loop is terminated after all the combinations are generated.

In the loop the if statement tests each bit position of the comb_mask with the help of mask variable which scans from the rightmost bit to leftmost bit and checks if a certain bit position is a ‘1’, if it is then the character of the corresponding position of the source string is selected. The bit position being checked currently is tracked by s_pos counter. If a a bit position of comb_mask is ‘1’ then the position of that bit (recorded by s_pos) is used to select the corresponding character from the source_string and assign to combination_string . The c_pos variable points to the last character of the combination_string.
For example, comb_mask is 0x0000000a whose binary equivalent is 00000000000000000000000000001010 . Now mask is 00000000000000000000000000000001 . (mask & comb_mask) evaluates to 0, so the 0th character is not selected. next iteration mask is equal to 00000000000000000000000000000010 . (mask & comb_mask) evaluates to 1, so the 1st character is selected, and so on.

The mask is left shifted by 1 bit and updated in each iteration, so it can check the next bit. The comb_mask is incremented in each iteration to make select new combinations.

Notes

The null set is not selected by the above code.

The above code does not select the input string for repeat characters in it. So a string “aaaa” is allowed and each ‘a’ is considered as different entity.

This is not a process with which permutation without repetitions can be made, that permuting elements by changing their order within themselves cannot be made.

This is a good process to generate a bruteforce password generator, which can be used to input into an appropriate hash function and stored, or matched with the target hash.

Advertisements

4 thoughts on “All Combinations Without Repetitions

  1. Hello there, just became aware of your blog through Google, and found that it’s really informative. I’m going to watch out for brussels.

    I will be grateful if you continue this in future. Many people will be benefited from your writing.

    Cheers!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s