Jumble word solver again

I have already posted a jumbled word solver written in C language, although what I posted is actually become old, as I have changed some of the things in the code. I will update the post with this (hope to update!) with the new changes, but before it I would like to post the the same stuff in other languages and with different datastructure. Recently I am learning Perl and brushing up C++, therefore I will post jumble word solver written in C++ and Perl in this post.

First, let us review the theory. The implementation I made with C used trie trees to search (at that time I did not know what I constructed already had a name). This time I will use a map or a hashmap with chains.

A set of meaningful words whose characters are permutations of each other (referred to as jumble words from now on) will have a unique pattern when sorted (either non-increasing or non-decreasing). Therefore if we take a set of jumble words and then sort the characters of each of them, and use the sorted pattern as the key in the map, it will land in the same location (collision) of the map. Each value in the map is a pointer to list, which will hold the list meaningful words having the same sorted pattern. In short, the character sorted pattern of the words are used to key in the map and then the actual word is added into the linked list in that location of the map.

For example the words “spot”, “post”, “pots”, “opst” are jumble words, that is, they have the same sorted character patterns. The sorted pattern is “opst”, using this string as a key into the map we land in the same location for each of these strings. Say if we encounter “opts” first, then we key into the map using “opst” and fetch the linked list. Because it is the first string, there is no associated linked list in the map, so we allocate it and add the word “opts”. Now we encounter “pots”, and key into the location using the sorted pattern “opst” and fetch the existing linked list and add this word “pots” at the head. In similar way other words are added to the map. Below I have shown a sample map after we have inserted all the above words and also the word “hello’.

There is one issue to consider. The keying into the map will be case sensitive, which will create a problem. Like if the word “Post” had a ‘P’ in caps in the word list, the sorted order will be “Post” (increasing order), and therefore this will map into some other location of the map, which is undesirable. Therefore before doing any kind of processing, the first thing to do is to ransform all the characters of the every string to lowercase (or uppercase).

    .
    .
    .
+-------+----+   +------+    +------+    +------+    +------+
| opst  |  --+-->| spot |--->| post |--->| pots |--->| opts |
+-------+----+   +------+    +------+    +------+    +------+
|       |    +
+-------+----+
    .
    .
    .
+-------+---+   +-------+
| ehllo | --+-->| hello |
+-------+---+   +-------+
    .
    .
    .

Once such a map is constructed, searching for jumble word solution is easy. Just input a string for which the jumble word set has to be found (all meaningful character permutations of the input string), convert each character to lowercase, sort the characters, key into the map and fetch the list. If there is no list, there is no meaningful permutations of the input string. If a list is found then print the entire list, which has all the meaningful character permutations of the input string.

Here is a more formal outline of the process in pseudocode:

Insert

1. word = word_list_file.read ();
2. sorted = word; //Backup
3. sorted.tolower ();
4. sorted.sort_characters ();
5. map_entry = map.find (sorted);
   a. if (map_entry has linked list allocated)
           add word to linked list head
   b. else
           create new linked list and add the word
6. If there are more files Goto 1. else end.

Search

1. buffer = get_word_from_user ();
2. buffer.tolower ();
3. buffer.sort_characters ();
4. map_entry = map.find (buffer);
   a. if (map_entry has linked list allocated)
            print strings in all the nodes in the linked list
   b. else
            tell them that there is no solution with respect
            to the given wordlist
5. If more words are to be entered, Goto 1. else end.

Where is the word list? I will use the default word list file which comes with different flavours of Linux, placed in “/usr/share/dict/words”. If you have your own pass the path of it through the command-line argument.

Now for the sourcecodes. First I will post the C++ and then Perl.

Sourcecode C++

#include <iostream>
#include <fstream>
#include <string>
#include <list>
#include <map>
#include <algorithm>

using namespace std;

int main (int argc, char *argv[])
{
  const string default_dictfile = "/usr/share/dict/words";
  string dictfile, buffer, sorted;
  ifstream ifs;
  map<string, list<string> *> wmap;
  list<string> *wlist;
  map<string, list<string> *>::iterator wmit;

  //Open the file
  if (argc < 2)
  {
    cout << "Using default dictionary: \"" << default_dictfile << endl;
    dictfile = default_dictfile;
  }
  else if (argc == 2)
  {
    dictfile = argv[1];
    cout << "Using dictfile: \"" << dictfile << endl;
  }
  else
  {
    cout << "Usage: " << argv[0] << " [dictfile]" << endl;
    exit (0);
  }

  ifs.open (dictfile.c_str (), fstream::in);
  if (!ifs.good ())
  {
    cerr << "Error opening file \"" << dictfile << "\"" << endl;
    exit (1);
  }

  //Load file
  cout << "Loading file" << endl;
  while (ifs.good ())
  {
    //Get current string into buffer
    ifs >> buffer;

    //Transform each character to lowercase and sort the characters in the string
    sorted = buffer;
    transform (sorted.begin (), sorted.end (), sorted.begin (), ::tolower);
    sort (sorted.begin (), sorted.end ());

    //Key into map to get the linked list of words with the same sorted order
    wmit = wmap.find (sorted);
    if (wmit == wmap.end ())
    {
      //If the entry is empty, make a new linked list and insert
      //the pointer of this new list into the map keyed by the
      //sorted string
      wlist = new list<string> ();
      wmap.insert (pair<string, list<string> *> (sorted, wlist));
    }
    else
    {
      //If we already have a linked list there, then extract it
      wlist = wmit->second;
    }
    //Add the new word to the list we fetched from the map
    //Note that we need to create an object and add the newly created object
    wlist->push_front (string(buffer));
  }
  cout << "File loaded" << endl;
  
  ifs.close ();

  //Find jumble
  while (1)
  {
    //Get input
    cout << "Enter jumble (\"q\" to quit): ";
    cin >> buffer;

    //Transform each character to lowercase and sort the characters in the string
    transform (buffer.begin (), buffer.end (), buffer.begin (), ::tolower);
    if (buffer == "q")
    {
      break;
    }
    sort (buffer.begin (), buffer.end ());

    //Find the linked list in the map with the list of words consisting of the same
    //sorted pattern
    wmit = wmap.find (buffer);
    if (wmit == wmap.end ())
    {
      cout << "Not found" << endl;
    }
    else
    {
      //If found, then get the list and print every element of the list
      wlist = wmit->second;
      for (list<string>::iterator it = wlist->begin (); it != wlist->end (); it++)
      {
	cout << *it << ", ";
      }
      cout << "\b\b" << endl;
    }
  }

  //Iterate over each element of the map and delete the lists, which we allocated before
  cout << "Freeing memory" << endl;
  for (map<string, list<string> *>::iterator it = wmap.begin (); it != wmap.end (); it++)
  {
    delete it->second;
  }

  cout << "End" << endl;

  return 0;
}

Here I have used the map and the list STL container. The map<string, list <string>*> tells that we are declaring a map which is keyed with a string and the values stored are of type list *. And list tells that the list will contain string type objects. The transform will apply each of the elements starting from the first argument (iterator) to the second argument (iterator), and apply the function supplied as the fourth argument, and store the results starting from the third argument (iterator). The sort is used to sort the characters of the string by passing the start and the end iterators of the string.

When terminating I have deleted all the linked list pointers which I allocated while loading. Although it is not required in such a small program, but it is very important that you free what you allocate.

In C++11 we have forward_list which is implimented as single linked list, which can be used in the place of list.

You can also use the container unordered_map which uses Hash Tables to store the objects therefore you will get an O(1) insert, remove and search average time complexity, whereas map stores the data in Red Black Tree internally, which guarantees O(log n) insert search and remove. Basically the map implements Binary Search Tree, and the self balancing can be implemented by any algorithm. In my version it is implemented as Red Black Trees.

Note that unordered_map is added in C++0x therefore you need to add the -std=c++0x at least, (or c++11) to compile. If you are using forward_list, then you may need to add the switch -std=c++11 in gcc.

The unordered_map and the forward_list has similar interfaces (with added methods), therefore in this program you can simply replace map with unordered_map and list with forward_list.

I have executed the code with both the map and unordered_map and inspected that with the linux words wordlist the unordered_map version loads the file almost 60% faster. The map version takes 1.22 seconds real time whereas the unordered_map takes 0.71 seconds real time. I am executing both of them as (the program will quit for “q”):

echo "q" | time -p ./jumble

Now is the time for the Perl code.

Sourcecode Perl

#!/usr/bin/perl -w

use strict;
use warnings;

#Open the file
open (my $WL, "/usr/share/dict/words") or die "Cannot open \"/usr/share/dict/words\"\n";

#initialize hash
my %hashlist = ();

#Unset buffering just to print the messages at the point intended
$| = 1;

#For each read word, convert it into lower case, split each character, sort them and then join
#them to form the sorted pattern. Hash into the hashlist with the pattern and append the 
#string to the list in that hash location. Then Close file.
print "Loading word list .. ";
foreach my $string (<$WL>)
{
  chomp $string;
  my $pattern = join ('',sort (split ('', lc($string))));
  push (@{$hashlist{$pattern}}, $string);
}
close ($WL);
print "Complete\n";

#Input jumble words to solve, convert to lower case, split each character, sort them, join them
#to form the hash key, and fetch the list of words which has the same sorted pattern, show them.
print "\nEnter jumble (\"q\" to quit): ";
while (<STDIN>)
{
  chomp;
  last if (lc ($_) eq "q");
  print "Matched wordlist: ";
  my $pattern = join ('', sort (split ('', lc($_))));
  if (exists ($hashlist{$pattern}))
  {
    print join (', ', @{$hashlist{$pattern}}), "\n";
  }
  else
  {
    print "No meaningful word present in word list\n";
  }
  print "\nEnter jumble (\"q\" to quit): ";
}

print "Terminating\n";

my $pattern = join ('',sort (split ('', lc($string)))); is actually does.

Converts the string to lowercase
Splits the above result at each character (denoted by empty ”) and returns a list
Sorts the above result
Concatenates each character of the above result and stores into $pattern

push (@{$hashlist{$pattern}}, $string); will do

$hashlist{$pattern}: Fetch the value stored at location keyed with $pattern
Use the above value to cast into a list type “@{above result}”
Push the $string into the list we found in the above result. If there was no list, one will be automatically created when “push”ing

The codes are pretty self explanatory and I also have added comments so there is no point explaining them every line. But if you have something you cannot understand or you have found some error or bug (especially in Perl), please let me know about it in the comments.

4 thoughts on “Jumble word solver again”

Pingback: Jumble Word Solver | Phoxis
yish says:

September 29, 2014 at 7:28 pm

what is ur run time?

1. phoxis says:
  
  September 30, 2014 at 1:15 pm
  
  Once the word list is loaded in memory the search results are almost instant. The time to load the word list depends on the length of the file. Maybe give it a try once and checkout how fast the loading and searching works.
  
Pingback: A wide character trie | Phoxis