A wide character trie implementation

It’s been a long time since I posted here. Therefore I decided to dig out some quick and straightforward stuffs from my disk which I previously decided should’t be in this blog.

I have posted multiple trie and dictionary search based programs in C, C++ and Perl before Jumble Work Solver and Jumble Work Solver Again. This time (again!) it’s about a trie. Although this time there was a specific requirement from a group who needed to implement a trie based word distance counting for bangla language, therefore Unicode support. This was supposed to be modified more and plugged into a spelling correction for scanned OCR text in the Bengali language.

Initially I suggested that a ternary tree would be more appropriate as the memory cost for standard trie (not compressed) would be huge. Although, finally the decision was to go with plain and simple trie. I know it is mad, but this is what it is :D.

Also, before going into the implementation, I should note that there is an efficient implementation of Radix Tree present in libcprops library. Though which we won’t be using for this one. I would recommend you people to have a look into this library if you already haven’t seen it yet.

Let’s say, we have a word “hello” and a node with an array of pointers of length 26 representing each character of the English language, each of which indicates that if the ith character follows the character represented by this node. Therefore for this example “hello”, the head node’s array of pointers will have the location 7 pointing to another node, which will have the 4th location of the pointer pointing to another node, whose pointer array’s 11 th location will point to another node and so on. Note here the indexing starts from zero. When the word ends, then the next pointer can be pointed to a special terminal marker node, which can be common to all, and the node is marked as a terminal node. This is essential as a valid word can be a substring of another valid word, in which case the shorter would need to be decided. When another new word like “help” comes in, we will follow the same path upto “hel” created by the word “hello”, and as we find there are no pointer for “p” pointing to any node, a new node will be created as explained before. Continue reading “A wide character trie implementation”

Advertisements

Jumble word solver again

I have already posted a jumbled word solver written in C language, although what I posted is actually become old, as I have changed some of the things in the code. I will update the post with this (hope to update!) with the new changes, but before it I would like to post the the same stuff in other languages and with different datastructure. Recently I am learning Perl and brushing up C++, therefore I will post jumble word solver written in C++ and Perl in this post.
Continue reading “Jumble word solver again”

Jumble Word Solver


Jumbled Word: A string of characters is given, the task is to find all the meaningful words that can be created by rearranging its letters. Solving a jumble word means to find all the meaningful words that can be made with the initial string.

Objective Of The Article

In this article first we will describe how a jumbled word can be solved and then we present a very simple computer program. After this we present an advanced computer program which will make the solution very fast with the help of a specially designed tree.

Article Revision: 3
Notice 25.9.2009 : This article is undergoing an update and will be made online soon .

Here is another related post Jumbled word solver with C++ and Perl implementation with hash and list: Jumble word solver again.
*Please Note:* This article is very old and lengthy, I am planning for either a revision or a rewrite for the trie tree part which reflect the recent modifications. At the time I wrote this I did not know that the datastructure which I worked out was a trie :) therefore the long explanation.
Continue reading “Jumble Word Solver”