In CLI programs getting input from the standard input is very important. Often we need to input strings from the user. Strings are delimited by newline, blankspace character and other characters as per the user interpretation. I am discussing on some ways here with which we can take input strings with blankspaces in it safely.

gets

The first way most of us think of to getting a string is with gets (). It is what is first taught in a lot of books and texts. We can take input a string with blank spaces inside the string with gets (), also it preserves leading and trailing series of blankspaces. But, it has a problem. The buffer is finite and there is no way to tell the maximum length of the input/buffer. So a sufficiently large input in gets () will lead to a buffer overflow, and then the behavior is undefined (the program might crash).

Here is a sample code to demonstrate the thing. Enter a sufficiently long string.

int main (void)
{
  char buffer[10];
  gets (buffer);
  printf ("[%s]\n", buffer);
  return 0;
}

fgets

With fgets () we can control the maximum number of characters to be scanned into the buffer, this controlling the maximum input to be scanned and thus avoid the buffer overflow. The fgets () has the following format:

char *fgets(char *s, int size, FILE *stream);

The s is the pointer to the buffer, size is the maximum character to be read, and stream is the stream/file from which the string is to be read. fgets () will scan a string upto the EOF character or a newline character, and reads atmost one less than size characters from the stream. So if you enter a short line than size it will match upto the first newline character, and if you enter a line having more characters than size, then it will store the first size - 1 characters in the buffer and use the last location in the buffer to store the NULL character and terminate the string. One thing is to be noted is that, if a newline is read, then it is also placed in the buffer.

The stdin is the standard input file from which we can fetch user input. This file is opened by the operating system and available to the program for standard input

Problem is, some one might not want to store the '\n' at the end of the buffer when reading from the keyboard. If you enter a string “hello” and press enter, fgets () will store ‘h’, ‘e’, ‘l’, ‘l’, ‘o’, ‘\n’, ” in the buffer. Depending on your requirement fgets () is a good option, it will take strings with blankspaces inside it also preserve both leading and trailing blankspaces, but the trailing newline when getting the input from the keyboard might create a problem is some applications. Like for example when you need to generate hashes the last newline can be problematic.

int main (void)
{
  char buffer[10];
  fgets (buffer, 10, stdin);
  printf ("[%s]\n", buffer);
  return 0;
}

scanf

Next we come to scanf (). The %s format specifier is the standard one which we use to get strings but it has some problems. This has the same problem as in gets (), buffer overflow, and also with this you cannot take input a string which includes blankspaces within it. We can overcome the buffer overflow problem with %s by specifying the max length of the string. The format for this is “%ns” where ‘n’ is the max length value, an integer constant. Therefore we can do

int main (void)
{
  char buffer[10];
  scanf ("%10s", buffer);
  printf ("[%s]\n", buffer);
  return 0;
}

This will take input a maximum of 10 characters into the buffer c, and overcome the buffer overflow problem. Also there is no trailing newline character stored in the string.

So is there a way to input a string with blankspace characters in it? Yes there is. We can use the scanset to input a string which ends with a '\n'. The format string to do this is: “%[^\n]”

int main (void)
{
  char buffer[10];
  scanf ("%[^\n]", buffer);
  printf ("[%s]\n", buffer);
  return 0;
}

A brief description of the scanset is like this: A line will be scanned upto the first character which is not present within the [ … ] braces. Therefore if we have “%[abc]” as the format string and we input “abcdacdxabcd” only “abcdacd” will be stored in the buffer (terminated by NULL character). The remaining part of the string is kept in the stdin buffer. A ‘^’ character inside the [ … ] braces will invert the set and match all the characters other than which are included inside the braces. Therefore “%[^abc]” will match a string upto the first occurance of ‘a’, ‘b’, or ‘c’.

In the above format the “%[^\n]” will match a string which a newline character is not encountered. Therefore we can get a string with any number of blankspaces inside it (leading or trailing), and they will be preserved.

Similarly here also we can limit the number of maximum characters to be scanned. Like with the “%10[^\n]” . This change will overcome the buffer overflow problem, the trailing newline problem and also take input a string with blankspaces in it, and also preserve leading and trailing blankspaces.

There is one problem with this method, which comes to surface when you use such statements more than one time serially. For example consider the following code:

int main (void)
{
  char str1[128], str2[128], str3[128];

  printf ("\nEnter str1: ");
  scanf ("%[^\n]", str1);
  printf ("\nstr1 = %s", str1);

  printf ("\nEnter str2: ");
  scanf ("%[^\n]", str2);
  printf ("\nstr2 = %s", str2);

  printf ("\nEnter str3: ");
  scanf ("%[^\n]", str3);
  printf ("\nstr2 = %s", str3);

  printf ("\n");
  return 0;
}

When it is executed only the first scanf stops for the prompt. The program does not stop for the next scanf s. Recall that the scanset will only match the characters in the set you have defined and stop matching at the first character which is not in the scanset definition. The remaining characters stay in the input buffer. Therefore in the case of “%[^\n]” what happens is that, when the first '\n' character is detected it is not scanned and it remains in the stdin buffer, and the first scanf terminates. The next scanf finds this '\n' character and stops matching and thus every scanf with such format will read the left over '\n' and stop, causing the problem.

What we need to do is to read and discard the unmatched '\n' by the scanset. This can be done by placing a getchar () after the scanf (). I will suggest to use the assignment suppression operator ‘*‘ to do this. In this case the format is like this “%[^\n]%*c” and this is used like:

scanf ("%128[^\n]%*c", buffer);

The scanset portion will scan an entire line while a newline character ‘\n’ is not found, and place the scanned portion into the buffer, and the next “%*c” part will scan one character from the stdin buffer, which is the leftover newline character from the last scanset operation, and will do nothing with the read in newline. Effectively this will simply discard the last character. This will fix the problem faced above. Also the max length 128 will limit the max string length overcoming the buffer overflow.

There is a problem by hard coding the max string length inside the format string in scanf () like “%128[^\n]%*c” , because in case you need to change you need to make the change in every location. In this case we can first prepare the format string with sprintf () first and then make use of that format string with scanf like:

sprintf (format_string, "%%%d%s", max_len, "[^\\n]%*c");
scanf (format_string, buffer);

The “%%” makes the ‘%’ character the “%d” part puts the max_len, and then the “[^\n]%*c” is appended. Now we can use the format_string with scanf ().

Also you can avoid this if you make use of macro and the string append feature in C language like this:

#define LEN "5"
      .
      .
      .
scanf ("%"LEN"[^\n]", buffer);

loop and getchar

This is a very dynamic way, as you can scan the input and process it in any way you wish. But writing such a function for every application, especially where you need nothing special is not worth it, and it is better to use the standard library functions, which already provide a lot of features.

#define LEN 25

int main (void)
{
  char buffer[LEN]="hello";
  int i;

  for (i=0; i<LEN-1; i++)
  {
    buffer[i] = getchar ();
    if (buffer[i] == '\n')
     break;
  }
  buffer[i] = '\0';
  printf ("[%s]\n", buffer);
  return 0;
}

References and Links

  1. scanf man page
  2. http://en.wikipedia.org/wiki/Standard_streams

So that’s it.

Advertisements

One thought on “Input Strings with Blankspaces

  1. These are such small things that we don’t notice while programming but these may lead to some serious error,hence must be handled with care. And this article helps all the way through to avoid those…good one.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s