Everybody of us has definitely played the word search game in which a grid of characters are given, and a list of words are given which are to be found in the grid horizontally, vertically or diagonally. In this post i will show a simple solution to a more generalized version of the search which finds the strings in the grid with backtracking.

The Problem

Consider the following alphabet grid:

+----+----+----+----+----+
| w  | p  | a  | o  | u  |
+----+----+----+----+----+
| p  | i  | e  | d  | q  |
+----+----+----+----+----+
| l  | l  | n  | r  | o  |
+----+----+----+----+----+
| e  | c  | a  | u  | w  |
+----+----+----+----+----+
| y  | d  | x  | s  | z  |
+----+----+----+----+----+

This is an alphabet grid consisting of different words hidden in it. An alphabet x is adjacent to another alphabet * in the grid if * is horizontally, vertically or diagonally adjacent to x. The following diagram shows the adjacent positions of character * with respect to x.

+----+----+----+
| *  | *  | *  |
+----+----+----+
| *  | x  | *  |
+----+----+----+
| *  | *  | *  |
+----+----+----+

The task is to find a given string in the grid with the condition that all the characters of the given string should be adjacent in the order in which they appear in the string, and no character in the tile may be used more than once to construct the given string.

For example the word linux can be constructed by the following index positions of the above grid:

+----+----+----+----+----+
|    |    |    |    |    |
+----+----+----+----+----+
|    | I  |    |    |    |
+----+----+----+----+----+
| L  |    | N  |    |    |
+----+----+----+----+----+
|    |    |    | U  |    |
+----+----+----+----+----+
|    |    | X  |    |    |
+----+----+----+----+----+

Solution Idea

The idea to solve this problem of finding a string as defined above in the given grid is simple backtracking. We can visualize this grid as a graph, where each cell in the grid is a node of the graph and two nodes has an edge between them if the the corresponding alphabets of the two nodes are adjacent in the grid. In this the solution approach will be to start from each node of the graph and a depth first search (DFS) in the graph and check if there exists a simple path in the graph which corresponds to the given string in the grid.

To do this we can start from each node of the graph and then at each iteration select an adjacent branch and visit the node on the other end of this branch. We will take a count of the depth at which the DFS is in. After visiting a node at depth d we need to match if the dth character in the given string matches. If yes then we can continue with DFS from this node, else this partial path seen so far at depth d will not lead to a solution (mismatch at level d), therefore we need to back track and return back at level (d - 1) and continue DFS by selecting the next branch at this level, trying to find another path.

If the top level node in the DFS has depth 0, then the string is found in the grid/graph if the DFS reaches a depth of equal to the length of the given string.

The DFS could be implemented without explicitly representing the graph corresponding to the grid structure. We need to perform a DFS in the graph, for which we need to get the children at each node of the graph. At each node of the graph the children of the node are those nodes which are adjacent to that node. Therefore to perform DFS, we can simply construct a recursive function accepting the depth and the index into the alphabet grid of the current node and then pass the index into the alphabet grid of the next child to be processed in DFS order.

One point which should be noted that the edge cells in the alphabet grid does not have 8 children, the corner ones have 3, and the edge ones have 5 children. Therefore we need to carefully call such that the DFS respects this constraint.

The function can be constructed as follows:

/* `mat' is the alphabet grid of dimension MAX x MAX
 * `pat' is the string to be found in the grid
 * `d' is the current depth of the DFS
 * `i' and `j' represents the cell of the grid/node of graph
 */
int grid_match (char mat[MAX][MAX], char *pat, int len, int d, int i, int j)
{

  /* Base conditions */
  /* [1] Return success if length of `pat' is equal to depth `d' */
  /* [2] Return failure if the index `i' or `j' are out of bounds */
  /* [3] Return failure if the character at this child 
   *     at `mat[i][j]' is not equal to `pat[d]'
   * [4] Return failure if the character at this child is already
   *     visited by this path.
   */
 
  /* Mark the `mat[i][j]' cell/node is visited, so that only simple paths 
   * are searched.
   */

  /* Perform DFS on children of cell mat[i][j] on the following order */
  /* Accumulate success or failure values in `r'. If any of the branch 
   * returns success, then we have found one solution.
   */
  r = r | grid_match (mat, pat, d+1, i-1, j-1); /* Top Left cell */
  r = r | grid_match (mat, pat, d+1, i-1, j);   /* Top cell */
  r = r | grid_match (mat, pat, d+1, i-1, j+1); /* Top Right cell */
  r = r | grid_match (mat, pat, d+1, i, j+1);   /* Right cell */
  r = r | grid_match (mat, pat, d+1, i+1, j+1); /* Bottom Right cell */
  r = r | grid_match (mat, pat, d+1, i+1, j);   /* Bottom cell */
  r = r | grid_match (mat, pat, d+1, i+1, j-1); /* Bottom Left cell */
  r = r | grid_match (mat, pat, d+1, i, j-1);   /* Left cell */

  /* Unmark the `mat[i][j]' cell/node as visited, so that we can visit this
   * cell/node following another simple path
   */

  /* return the value of `r' representing the status if this path leads to 
   * a match, to the previous depth
   */
}

The above pseudocode renders the solution idea. The comments in the above pseudo code presents the implementation of idea. The | is the bitwise OR operator in the expression r = r | expression. In the above sequence the variable r will have value 1 if any of the function calls in grid_match () returns 1. This is because we want to return a success value (1 for this case) in case atleast one path matching the string is found.

To implement the condition which tells that “no character in the tile may be used more than once to construct the given string” we need to ensure that the path created by the DFS does not intersect, that is basically perform a graph search (instead of tree search). For this, the above code marks the node a node at depth d whenever it is a valid one (passes through the base conditions). When the recursion rolls back to the level (d - 1) we need to unmark this node/cell at level d such that it could be visited by some other path in this DFS.

The success (1) and failure (0) values are passed up the depth of the DFS tree.

Note that the pseudocode of the recursive function considers the grid to be square. The solution can be generalized by making an m x n grid simply by changing the dimensions.

The parameter len holds the length of pat. This is done to stop re-computation of the same string length over and over in each recursive depth.

To search if the entered string is present in the alphabet grid, we need to perform a DFS starting from every character of the grid, ie. searching a path representing the string in the equivalent graph, starting at each node of the graph. This can be done as follows:

`mat' is the alphabet grid
`pat' is the pattern to search

  for i=0 to number of rows
    for j=0 to number of columns
      r = r | grid_match (mat, pat, 0, i, j);

The above pseudocode simply starts a DFS at each cell by making a recursive call at every index (i,j) setting the depth to 0. If r returns success (1) then the
given string is in the grid, else it is not.

Sourcecode

Now I will show the entire sourcecode which accepts any grid and any string and performs the search. In addition this code highlights the string in the grid, if found.

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <ctype.h>

#define STR_MAX 128

typedef struct _grid_t
{
  char **mat;
  int r_max, c_max;
} grid_t;

/* MAIN Search Functions START */

/* grid_t *grid   : grid 
 * char *pat      : poiner to string to be searched
 * int len        : length of `pat' ie. `strlen (pat)'
 * unsigned int d : the current depth
 * int i          : the current row of the grid at this path depth
 * int j          : the current column of the grid at this path depth
 */
int
grid_match (grid_t * grid, char *pat, int len, unsigned int d, int i, int j)
{
  int r = 0;
  char temp;

  /* [1] Return success if length of `pat' is equal to depth `d' */
  if ((int) d == len)
    return 1;

  /* [2] Return failure if the index `i' or `j' are out of bounds */
  if (((i < 0) || (j < 0)) || (i >= grid->r_max) || (j >= grid->c_max))
    return 0;

  /* [3] Return failure if the character at this child at `mat[i][j]' is not equal to `pat[d]'
   */
  if (grid->mat[i][j] != pat[d])
    return 0;

  /* [4] Return failure if the character at this child is already visited by this path.
   */
  if (isupper (grid->mat[i][j]))
    return 0;


  /* Save character and mark location to indicate
   * DFS has visited this node, to stop other branches
   * to enter here and cross over path
   */
  temp = grid->mat[i][j];
  grid->mat[i][j] = 0;

  /* Only return the first encountered match. If a success ie. r = 1
   * is found in any of the branch we will not make anymore DFS calls
   * because we have found the first occurance
   */
  r |= grid_match (grid, pat, len, d + 1, i - 1, j - 1);
  if (r == 0)
    r |= grid_match (grid, pat, len, d + 1, i - 1, j);
  if (r == 0)
    r |= grid_match (grid, pat, len, d + 1, i - 1, j + 1);
  if (r == 0)
    r |= grid_match (grid, pat, len, d + 1, i, j + 1);
  if (r == 0)
    r |= grid_match (grid, pat, len, d + 1, i + 1, j + 1);
  if (r == 0)
    r |= grid_match (grid, pat, len, d + 1, i + 1, j);
  if (r == 0)
    r |= grid_match (grid, pat, len, d + 1, i + 1, j - 1);
  if (r == 0)
    r |= grid_match (grid, pat, len, d + 1, i, j - 1);

  /* restore value */
  grid->mat[i][j] = temp;

  /* mark if success by making the matched characters to uppercase
   * the grid_show_match () will only print the uppercase chars
   * and print the lowercase chars as dots `.'
   */
  if ((grid->mat[i][j] == pat[d]) && (r == 1))
    {
      grid->mat[i][j] = toupper (grid->mat[i][j]);
    }

  return r;
}


/* Find the match by executing DFS starting at each cell
 */
int
grid_find_match (grid_t * grid, char *pat)
{
  int i, j, r = 0, len;


  /* make all the grid characters to lowercase. 
   * uppercase char means it is a part of a 
   * successful path
   */
  for (i = 0; i < grid->r_max; i++)
    {
      for (j = 0; j < grid->c_max; j++)
        {
          grid->mat[i][j] = tolower (grid->mat[i][j]);
        }
    }

  len = strlen (pat);
  for (i = 0; i < grid->r_max; i++)
    {
      for (j = 0; j < grid->c_max; j++)
        {
          r |= grid_match (grid, pat, len, 0, i, j);
          /* only find one occurrence of the string */
          if (r == 1)
            {
              break;
            }
        }
      if (r == 1)
        {
          break;
        }
    }

  return r;
}

/* MAIN Search Functions END */


/* AUXILIARY Functions START */


/* alloc grid
 */
grid_t *
grid_allocate (int r, int c)
{
  grid_t *grid;
  int i;

  grid = malloc (sizeof (grid_t));
  grid->mat = malloc (sizeof (char *) * r);
  for (i = 0; i < r; i++)
    {
      grid->mat[i] = malloc (sizeof (char) * c);
    }
  grid->r_max = r;
  grid->c_max = c;

  return grid;
}


/* free allocated memory
 */
void
grid_free (grid_t * grid)
{
  int i;

  if (grid == NULL)
    {
      return;
    }
  for (i = 0; i < grid->r_max; i++)
    {
      free (grid->mat[i]);
    }
  free (grid->mat);
  free (grid);
}

/* get the alphabet grid from stdin
 */
void
grid_input (grid_t * grid)
{
  int i, j;

  for (i = 0; i < grid->r_max; i++)
    {
      for (j = 0; j < grid->c_max; j++)
        {
          scanf (" %c", &grid->mat[i][j]);
        }
    }
}

/* Simply show the grid
 */
void
grid_show (grid_t * grid)
{
  int i, j;

  for (i = 0; i < grid->r_max; i++)
    {
      for (j = 0; j < grid->c_max; j++)
        {
          printf ("%c ", grid->mat[i][j]);
        }
      printf ("\n");
    }
}

/* Highlight the matched portion of the grid
 * print matched character in the found path as
 * uppercase, and other characters as a dot `.'
 */
void
grid_show_match (grid_t * grid)
{
  int i, j;

  for (i = 0; i < grid->r_max; i++)
    {
      for (j = 0; j < grid->c_max; j++)
        {
          if (isupper (grid->mat[i][j]))
            printf ("%c ", grid->mat[i][j]);
          else
            printf (". ");
        }
      printf ("\n");
    }
}

/* AUXILIARY Functions END */

/* A sample DRIVER code to demonstrate the grid_match () function
 */

int
main (void)
{
  grid_t *grid;
  int r, c;
  char pat[STR_MAX];
  int i;

  printf ("\nEnter grid size (r, c): ");
  scanf (" %d %d", &r, &c);
  grid = grid_allocate (r, c);

  printf ("\nEnter the characters of the grid in row major order: ");
  grid_input (grid);

  printf ("\n\nEntered grid:\n");
  grid_show (grid);

  while (1)
    {
      printf ("\n\nEnter pattern to be searched (\"x\") to quit: ");
      scanf ("%s", pat);

      for (i = 0; pat[i] != '\0'; i++)
        {
          pat[i] = tolower (pat[i]);
        }

      if (strcmp (pat, "x") == 0)
        {
          break;
        }

      r = grid_find_match (grid, pat);

      if (r == 0)
        {
          printf ("\nThe pattern \"%s\" is not found in the grid", pat);
        }
      else
        {
          printf ("\nThe pattern \"%s\" was found in the grid", pat);
          printf ("\nShowing the match:\n\n");
          grid_show_match (grid);
        }
    }

  grid_free (grid);

  printf ("\n");
  return 0;
}

Code Issues

An issue is that if there are more than one occurrence of the string in the grid, then this code will highlight and match only the first occurrence encountered by the DFS. This depends of the order in which the children of each node is exploded. Also if one path is split in two or more, for example a first portion matches “orac” and there are two possible direction where “le” matches, in such a case only first path in the split will be matched.

If we want to only highlight all occurrence matches, then we need continue search even if one successfully path is found that is, r=1 at any level. This could be implemented by removing the if (r == 0) condition before each call to grid_match (). The modification will look like this:

int grid_match (grid_t *grid, char *pat, unsigned int d, int i, int j)
{
             .
             .
             .
    r |= grid_match (grid, pat, len, d+1, i-1, j-1);
    r |= grid_match (grid, pat, len, d+1, i-1, j);
    r |= grid_match (grid, pat, len, d+1, i-1, j+1);
    r |= grid_match (grid, pat, len, d+1, i, j+1);
    r |= grid_match (grid, pat, len, d+1, i+1, j+1);
    r |= grid_match (grid, pat, len, d+1, i+1, j);
    r |= grid_match (grid, pat, len, d+1, i+1, j-1);
    r |= grid_match (grid, pat, len, d+1, i, j-1);
             .
             .
             .
}

and in the grid_find_match () function

int grid_find_match (grid_t *grid, char *pat)
{
  int i, j, r = 0;
  
  for (i=0; i<grid->r_max; i++)
  {
    for (j=0; j<grid->c_max; j++)
    {
      r |= grid_match (grid, pat, 0, i, j);
      /* we allow continuing the search even r = 1
       * that is a match was found
       */
    }
  }
  return r;
}

The problem of matching more than one occurrence of string is that in the grid the matched characters will be highlighted, but because there is no direction indicated in the grid and multiple characters are highlighted, it will be difficult to trace the matched paths in the grid.

For example in the grid in the beginning of the post the string “oracle” matches three split path as follows

. . . . . 
. . E . . 
L L . R O 
E C A . . 
. . . . .

Another minor issue is that all the matching is done lowercase. The input string to be matched is converted to lowercase and the toupper marking of the grid changes the original grid, in which all the characters are stored in lowercase. At each call of grid_find_match () all the uppercase characters from the previous call are converter to lowercase.

To avoid input string with characters not present in the grid we can first check if all the characters present in the string is also present in the character the same number of time (the current problem definition does not allow using of one single character more than once), and then we proceed.

Modifying definition of adjacent cells

If we want to change the definition of the adjacent cells from the existing, to only the cells which are horizontally or vertically adjacent, then we need to simply remove the recursive calls which continue DFS with the diagonal cells of the grid.

Output

Let me show the output when the grid shown at the beginning of this post is input in the program. We have a 5×5 grid.

In this grid the following words are present:
linux, windows, apple, oracle, dell , and probably one or two more which i forgot (made the grid around 2hr before writing this section).


Enter grid size (r, c): 5 5

Enter the characters of the grid in row major order: w p a o u p i e d q l l n r o e c a u w y d x s z


Entered grid:
w p a o u 
p i e d q 
l l n r o 
e c a u w 
y d x s z 


Enter pattern to be searched ("x") to quit: linux

The pattern "linux" was found in the grid
Showing the match:

. . . . . 
. I . . . 
L . N . . 
. . . U . 
. . X . . 


Enter pattern to be searched ("x") to quit: windows

The pattern "windows" was found in the grid
Showing the match:

W . . . . 
. I . D . 
. . N . O 
. . . . W 
. . . S . 


Enter pattern to be searched ("x") to quit: apple

The pattern "apple" was found in the grid
Showing the match:

. P A . . 
P . E . . 
. L . . . 
. . . . . 
. . . . . 


Enter pattern to be searched ("x") to quit: oracle

The pattern "oracle" was found in the grid
Showing the match:

. . . . . 
. . . . . 
L . . R O 
E C A . . 
. . . . . 


Enter pattern to be searched ("x") to quit: dell

The pattern "dell" was found in the grid
Showing the match:

. . . . . 
. . E D . 
L L . . . 
. . . . . 
. . . . . 


Enter pattern to be searched ("x") to quit: acer

The pattern "acer" is not found in the grid

Enter pattern to be searched ("x") to quit: ibm

The pattern "ibm" is not found in the grid

Enter pattern to be searched ("x") to quit: hello

The pattern "hello" is not found in the grid

Enter pattern to be searched ("x") to quit: x

Complexity

The memory requirements are very good because only the child in the current partially traversed path is expanded therefore the growth of space requirements will be \mathcal{O}(n) .

The problem is to find a path in a graph which satisfies a certain condition. The time complexity is not straight forward. To study the worst case consider the following 6×6 grid of alternating a’s and b’s:

+----+----+----+----+----+----+
| a  | b  | a  | b  | a  | b  |
+----+----+----+----+----+----+
| a  | b  | a  | b  | a  | b  |
+----+----+----+----+----+----+
| a  | b  | a  | b  | a  | b  |
+----+----+----+----+----+----+
| a  | b  | a  | b  | a  | b  |
+----+----+----+----+----+----+
| a  | b  | a  | b  | a  | b  |
+----+----+----+----+----+----+
| a  | b  | a  | b  | a  | b  |
+----+----+----+----+----+----+

If we input the string “abababababababababababababababababaa” to be searched then it will take an insane amount of time to report failure. Why? Because the input string matches all the first 35 characters and then fails to match the last one. Therefore the DFS starts at each of the 36 cells and for each such DFS it goes down upto (n-1) level just to discover failure and backtrack and try another branch.

Note that in the above example the problem is basically finding a Hamiltonian path in the graph formed by the alphabet grid. Finding a Hamiltonian path in a graph is NP-Complete, and we can reduce the worst case instance of this character search algorithm into an instance of Hamiltonian path, which can be reduced in polynomial time (formal proof not shown), therefore, from the definition of NP-Completeness, the worst case instance of finding a string in the alphabet grid is also NP-Complete. Therefore no polynomial time algorithm exists to fix the worst case time. This conclusion was from the post in stackoverflow in this post from the answer by missingno: Improve a word search game worst case.

Although for moderate length strings which does not form such situation which does not get close to find a Hamiltonian path, in which the string matches a majority of its length with the grid at almost every DFS and fails at then end, will perform fast.

The Original Grid Game

In the original grid game only strings matched horizontally, vertically and diagonally are allowed, not zigzag pattern as the above generalized version is allowed. To implement the problem to search strings horizontally, vertically or diagonally, we can implement six functions, one searching for each direction. Two function for left-to-right and right-to-left horizontal strings, two for top-to-bottom and bottom-to-top strings, two for the two diagonal strings and run each function for each input string. Each of these functions can be constructed by removing appropriate recursive calls from the grid_match () . For example, for the horizontal matcher keep only those calls which performs DFS with only the Left and Right cells, for vertical matcher keep only those recursive calls which performs DFS with only the Top and Bottom cells. For diagonal create two functions, one for the 45 degree diagonal and one for the 135 degree diagonal similarly allowing appropriate adjacent diagonal elements recursive calls. I am not posting the source code in this post, as the modification is pretty much straight forward.

Although the original word search game could be made much easily as the search is straight forward. To search horizontally spanning strings, we can take each row of the grid and check for if the input string and its reverse is a substring of the row, similarly we can check for vertical and diagonal string for substrings.

Links

The problem was taken from tech-queries.blogspot.in/2011/07/find-if-given-pattern-is-present-in.html

Update Info

  • 20.07.2012 : Code modified. Removed re-computation of strlen (pat). Instead we are now passing the length of the pat in the grid_match ()recursive function. Minor text addition to reflect this change.
    • Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s