Yesterday i was translating some code i wrote in R to C++. I had some calculation to do which required the list of index of the top n values in a list. Therefore what i needed was not a the list sorted itself or get a sorted copy of a given list, but actually the sorted order of the index into the actual array. To describe the problem, for example consider the following list:

arr = 20 50 40 80 10

The sorted (non-decreasing) ordering of the values is obviously

sorted_arr = 10 20 40 50 80

The sorted (non-decreasing) ordering of the index into the original array is (index starts at 1 in this example)

sorted_index_into_arr = 5 1 3 2 4

Therefore the smallest value in the list arr could be found by indexing into arr using the first value of sorted_index_into_arr, which stores the index of the array arr holding the smallest value.

arr[ sorted_index_into_arr[1] ]

The sorted index ordering is very easy to get in languages like R and Octave or Matlab. For example in R we can do the following to get the sorted index order using the order function:

> arr <- c (20, 50, 40, 80, 10)
> arr
[1] 20 50 40 80 10
> order (arr)
[1] 5 1 3 2 4

In the case of Octave or Matlab you can get the index order using the sort function, which will return two lists (1xn matrix), the first list is the sorted array itself, and the next one is the sorted index order into the original array.

octave:1> a = [20 50 40 80 10]
a =

   20   50   40   80   10

octave:2> [sorted index] = sort (a)
sorted =

   10   20   40   50   80

index =

   5   1   3   2   4

Although these functional languages provides gives these features in C/C++ it is not immediately available using the builtin sort library functions. But the sorting routines accepts a function pointer to a comparison function, writing appropriate code for which will do the trick.

The Idea

From this point all index will be 0 based. Say, the length of the list is n. Initially we will have an index list idx = ( 0, 1, 2, ..., (n - 1) ) , and an original array with values arr = (x0, x1, x2, ..., x(n - 1)). The idx reflects the permutation of the elements in arr. If we were to sort the array itself, we would compare the values of the array arr and then depending on the outcome we would rearrange the values of arr. On the other hand if we need to get the sorted index ordering, we need to sort idx depending on the values of arr. That is, we need to compare the values of arr and depending on the outcome of the result, instead of modifying arr we perform the same modification in idx. When comparing values of arr we need to index it using the values of idx. This is because at all time idx will hold the modified array element permutation of arr. Thus, although arr is not modified itself, the modified arr values can be found by indexing into arr using idx values, which stores the permutation of the index values of original arr, reflecting a modified arr indirectly.

Say after some passes of sorting we have some modified idx array, which represents the partially sorted index order. In that case for each 0 <= i <= (n - 1), arr[idx[i]] will give the partially sorted values. Basically the contents idx[i] tells which is the ith element in the partially modified arr.

To demonstrate i am showing a simple selection sort code to get the sorted index order of a given list. The sort function is given below:

int *sorted_order (const int *arr, int n)
{
  int *idx, i, j;

  idx = malloc (sizeof (int) * n);

  for (i=0; i<n; i++)
  {
    idx[i] = i;
  }

  for (i=0; i<n; i++)
  {
    for (j=i+1; j<n; j++)
    {
      if (arr[idx[i]] > arr[idx[j]])
      {
        swap (&idx[i], &idx[j]);
      }
    }
  }

  return idx;
}

Note how the indexing is done in the above highlighted lines (16 and 18). In line 16 we compare the actual values of the partially modified array values accessed by arr[idx[i]] and then if a wrong ordering is found we swap the index idx[i] and idx[j] to reflect the change in arr and keep arr untouched. At any moment the ordering of the values of arr is held by the values in array idx. Say for the input of arr = (20, 50, 40, 80, 10), we have initially idx = (0, 1, 2, 3, 4). The passes of selection sort are shown below.

arr = (20, 50, 40, 80, 10)                  Stays unmodified throughout

Pass #1

Pass #1
  initial idx = (0, 1, 2, 3, 4)

  Iteration 1 compares:
    (arr[idx[0]] > arr[idx[1]]) --> (arr[0] > arr[1] ) --> (20 > 50) --> false
    no interchange
  
  Iteration 2 compcares:
    (arr[idx[0]] > arr[idx[2]]) --> (arr[0] > arr[2] ) --> (20 > 40) --> false
    no interchange
  
  Iteration 2 compares:
    (arr[idx[0]] > arr[idx[3]]) --> (arr[0] > arr[3] ) --> (20 > 80) --> false
    no interchange
  
  Iteration 2 compares:
    (arr[idx[0]] > arr[idx[4]]) --> (arr[0] > arr[4] ) --> (20 > 10) --> true
    swap (idx[0] with idx[4])
    updated idx = (4, 1, 2, 3, 0)
  
  Therefore new array ordering is arr[idx[0 to 4]] = arr[4, 1, 2, 3, 0] = 10, 50, 40, 80, 20
  which is what we wanted.

Pass #2

  
Pass #2

  updated idx = (4, 1, 2, 3, 0)

  Iteration 1 compares:
    (arr[idx[1]] > arr[idx[2]]) --> (arr[1] > arr[2] ) --> (50 > 40) --> true
    swap (idx[1] with idx[2])
    updated idx = (4, 2, 1, 3, 0)
  
  Iteration 2 compares:
    (arr[idx[1]] > arr[idx[3]]) --> (arr[2] > arr[3]) --> (40 > 80) --> false
    no interchange
  
  Iteration3 compares:
    (arr[idx[1]] > arr[idx[4]]) --> (arr[2] > arr[0]) --> (40 > 20) --> true
    swap (idx[1] with idx[4])
    updated idx = (4, 0, 1, 3, 2)
  
  Therefore new array ordering is arr[idx[0 to 4]] = arr[4, 0, 1, 3, 2] = 10, 20, 50, 80, 40
  so we are in the right track.

Pass #3

  Pass #3

  updated idx = (4, 0, 1, 3, 2)

  Iteration 1 compares:
   (arr[idx[2]] > arr[idx[3]]) --> (arr[1] > arr[3] ) --> (50 > 80) --> false
   no interchange
  
  Iteration 1 compares:
    (arr[idx[2]] > arr[idx[4]]) --> (arr[1] > arr[2] ) --> (50 > 40) --> true
    swap (idx[2] with idx[4])
    updated idx = (4, 0, 2, 3, 1)
  
  Therefore new array ordering is arr[idx[0 to 4]] = arr[4, 0, 2, 3, 1] = 10, 20, 40, 80, 50
  so we are in the right track.

Pass #4

  Pass #4

  updated idx = (4, 0, 2, 3, 1)

  Iteration 1 compares:
    (arr[idx[3]] > arr[idx[4]]) --> (arr[3] > arr[1] ) --> (80 > 50) --> true
    swap (idx[3] with idx[4])
    updated idx = (4, 0, 2, 1, 3)
  
  Therefore new array ordering is arr[idx[0 to 4]] = arr[4, 0, 2, 1, 3] = 10, 20, 40, 50, 80
  so we are in the right track.

Finally

Finally we get idx = (4, 0, 2, 1, 3) which is the sorted index order

Using with sort library functions

We don’t need to write own sorting function in order to get the sorted order index. It will not be good to write your own sorting function, as the libraries already comes with efficient sorting functions, also using the library sort methods will make the code portable. In C we have the bsort from stdlib and n C++ we have the std::sort method from C++ STL. I will describe how to use these to achieve the sorted index ordering.

Using qsort in C

qsort signature is as follows:

void qsort(void *base, size_t nmemb, size_t size,
                  int(*compar)(const void *, const void *));

It will accept a void pointer to the base address of an array of elements of any type in the base argument, the number of elements in the array in nmemb and the size of each element in size. The last element is a function pointer which we need to pass to tell how to compare the two elements, in case you pass strings, structures etc. and also control the sort ordering, non-increasing or non-decreasing. Here is what the manual has to say about the return value of the compare function: “The comparison function must return an integer less than, equal to, or greater than zero if the first argument is considered to be respectively less than, equal to, or greater than the second”.

Given these information we need to pass the idx array as the first argument and the nmemb and size as required (sizeof (type).) For our purposes the comparison function should compare the value at the original arr indexed by idx as we have seen in the above selection sort example. The qsort function will call the comparison function using the values in the passed array to it, which is actually pointer to the idx. Therefore a comparison function would be like below:

static int compar (const void *a, const void *b)
{
  int aa = *((int *)a), bb = *((int *)b);
  
  if (arr[aa] < arr[bb]) return -1;
  if (arr[aa] == arr[bb]) return 0;
  if (arr[aa] > arr[bb]) return 1;
}

Note that the ugly looking *((int *)a) is nothing but typecasting the void * pointer to integer first, then dereferencing it to get the value pointed to by the pointer.

Here we compare the values in arr, the original array, indexed with the passed integer values in compar function, which are actually the values of the idx array. There is one issue. Where will we get the arr inside the scope of this function? One way is we can make a global or static global pointer pointing to the original array and use it in the compare function.

Here goes the example code.

#include <stdio.h>
#include <stdlib.h>


/* holds the address of the array of which the sorted index
 * order needs to be found
 */
int *base_arr;

/* Note how the compare function compares the values of the
 * array to be sorted. The passed value to this function
 * by `qsort' are actually the `idx' array elements.
 */
static int
compar (const void *a, const void *b)
{
  int aa = *((int *) a), bb = *((int *) b);
  if (base_arr[aa] < base_arr[bb])
    return -1;
  if (base_arr[aa] == base_arr[bb])
    return 0;
  if (base_arr[aa] > base_arr[bb])
    return 1;
}

int
main (void)
{
  int *arr, *idx, n, i;

  printf ("\nEnter n: ");
  scanf ("%d", &n);

  arr = malloc (sizeof (int) * n);
  idx = malloc (sizeof (int) * n);
  printf ("\nEnter the list: ");
  for (i = 0; i < n; i++)
    {
      scanf ("%d", &arr[i]);
    }

  /* initialize initial index permutation of unmodified `arr'
   */
  for (i = 0; i < n; i++)
    {
      idx[i] = i;
    }

  /* Assign the address of out original array to the static global
   * pointer, this will be used by the compare function to index 
   * into the original array using `idx' values
   */
  base_arr = arr;

  qsort (idx, n, sizeof (int), compar);

  printf ("\nOriginal list: ");
  for (i = 0; i < n; i++)
    {
      printf ("%d ", arr[i]);
    }

  printf ("\nSorted index: ");
  for (i = 0; i < n; i++)
    {
      printf ("%d ", idx[i]);
    }

  printf ("\nSorted array using arr[sorted_idx[i]]: ");
  for (i = 0; i < n; i++)
    {
      printf ("%d ", arr[idx[i]]);
    }


  free (arr);
  free (idx);

  printf ("\n");
  return 0;
}

Note how the address of the original array arr is assigned to the static global pointer base_arr just before the call to bsort . The compare function will access this base_arr pointer to access the original values in arr using the values of the idx (a and b in compar are actually pointers to elements of idx

Using the C++ STL std::sort method

Passing predicate function

In C++ we always do not have the liberty to declare globals. If you are using to sort function without any classes around then the way is similar to what we have seen with C. Here we go with an example in this using a function.

#include <iostream>
#include <vector>
#include <algorithm>

using namespace std;

static vector < int >*base_arr;

bool
compar (int a, int b)
{
  return ((*base_arr)[a] < (*base_arr)[b]);
}

int
main (void)
{
  vector < int >arr, idx;
  int n;

  cout << "\nEnter n: ";
  cin >> n;

  arr.resize (n);
  idx.resize (n);

  cout << "\nEnter the list: ";
  for (int i = 0; i < n; i++)
    {
      cin >> arr[i];
    }

  /* initialize initial index permutation of unmodified `arr'
   */
  for (int i = 0; i < n; i++)
    {
      idx[i] = i;
    }

  base_arr = &arr;
  sort (idx.begin (), idx.end (), compar);

  cout << "\nOriginal list: ";
  for (int i = 0; i < n; i++)
    {
      cout << (*base_arr)[i] << " ";
    }

  cout << "\nSorted index: ";
  for (int i = 0; i < n; i++)
    {
      cout << idx[i] << " ";
    }

  cout << "\nSorted array using arr[sorted_idx[i]]: ";
  for (int i = 0; i < n; i++)
    {
      cout << (*base_arr)[idx[i]] << " ";
    }

  cout << endl;
  return 0;
}

Passing overloaded () operator class object

There is another way to achieve the same results, which is through functor or function object. Which is a class with the () operator overloaded. So we can apply () operator on the class object to invoke the overloaded () function. In out case we overload the () operator being the comparison function, and we pass to sort an instance of this class. In the sort function whenever the it will attempt to call the function, actually () operator will be applied on the object by the sort function it will call the comparator actually. This is the same stuff which we wanted. Here is an example class about which I was talking about.

struct compare_index
{
  const vector<int> base_arr;
  compare_index (const vector<int> &arr) : base_arr (arr) {}

  bool operator () (int a, int b) const
  {
    return (base_arr[a] < base_arr[b]);
  }
};

In this case we call the sort function as below:

sort (idx.begin (), idx.end (), compare_index (arr));

By compare_index (arr) this will initialize the value of the original vector into the member variable base_arr of compare_index, using which we will be able to compare the array values. Thus, this eliminates the global variables as in the previous example.

You can also declare the class to accept any type, for which you can declare it to be as below

templace<class T>struct compare_index
{
  const T base_arr;
  compare_index (const T arr) : base_arr (arr) {}

  bool operator () (int a, int b) const
  {
    return (base_arr[a] < base_arr[b]);
  }
};

To use it with our example of vector use

sort (idx.begin (), idx.end (), compare_index<vector<int> &>(arr));

A sample code is given below, which also achieves the same result.

#include <iostream>
#include <vector>
#include <algorithm>

using namespace std;


struct compare_index
{
  const vector < int >base_arr;
    compare_index (vector < int >&arr):base_arr (arr)
  {
  }

  bool operator   () (int a, int b) const
  {
    return (base_arr[a] < base_arr[b]);
  }
};

class my_test
{
public:
  void test_sorting (void)
  {
    vector < int >arr, idx;
    int n;

      cout << "\nEnter n: ";
      cin >> n;

      arr.resize (n);
      idx.resize (n);

      cout << "\nEnter the list: ";
    for (int i = 0; i < n; i++)
      {
        cin >> arr[i];
      }

    /* initialize initial index permutation of unmodified `arr'
     */
    for (int i = 0; i < n; i++)
      {
        idx[i] = i;
      }

    sort (idx.begin (), idx.end (), compare_index (arr));

    cout << "\nOriginal list: ";
    for (int i = 0; i < n; i++)
      {
        cout << arr[i] << " ";
      }

    cout << "\nSorted index: ";
    for (int i = 0; i < n; i++)
      {
        cout << idx[i] << " ";
      }

    cout << "\nSorted array using arr[sorted_idx[i]]: ";
    for (int i = 0; i < n; i++)
      {
        cout << arr[idx[i]] << " ";
      }

    cout << endl;
  }
};

int
main (void)
{
  my_test tstobj;
  tstobj.test_sorting ();

  return 0;
}

Links and References

  1. Sort and get index list in C++
  2. http://www.cplusplus.com/reference/algorithm/sort/
  3. man bsort
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s