Bash Script: Counting lines, words, characters


The objective is to write a shell script to mimic the functions of the wc command. This shell script would be made with bash built-ins and standard coreutils commands, and is made as accurate as possible to mimic wc.

The Idea

The file would be read one line at a time which we can do by redirecting the entire file to the input of a loop which reads line by line from the file and counts the lines as it reads them. Once a line is read the characters can be counted and using IFS characters the words are separated to count the words in that line. This process is repeated for each line in the file. The sourcecode is first shown, and then a detailed description is given.


#This code is a part of


function my_wc ()

  # -r is needed to interpret the backslach characters as the part of
  # the text in the file, and not as escape sequences
  while read -r line
      # set IFS to noraml field terminating characters
      read -a word <<< "$line"
      # set IFS to \n only so that in the next read command
      # only \n terminated lines at once are read
  done < "$file_name"
  #If last line is not \n terminated then the last read will fail
  #and while will break. We will calculate the last line seperately
  #if it is not null
  if [ -n "$line" ]
      # We should not count this line, because it 
      # is not terminated by newline character
      read -a word <<< "$line"
  # The newline characters are also characters, so count them
  #   echo -e "\nFile name: $file_name \nLines : $l\tWords: $w\tCharacters: $c"
  echo "  $l  $w $c $file_name"


# Main execution sequence

while [ -n "$1" ]
  if [ ! -f "$file_name" ]
     echo "File \"$file_name\" does not exist or is not a regular file"
     shift 1
  my_wc "$1"

  shift 1

if [ $file_count -gt 1 ]
   echo -e " $total_l $total_w $total_c total"


The main driver

The line, word, character counting code is written as a module named my_wc () function. The main execution module calls this by supplying valid file names. First we have a look at the main execution sequence. First a the number of file names passed through the command line is backed up in the shell variable file_count . The while loop one by one takes the supplied file names by shift ing the positional parameters into $1 and passes it to the my_wc () function to process the file. Before calling the function, it checks if the file exists and it is a regular file or not. If the current file with the path at $1 is a regular file then only it is processed else it is ignored and the next file is processed.

Just like wc if there are multiple files supplied through the command line it prints the total count of all the files passed as command line parameters. The total count of lines, words and characters are kept in the total_l, total_w, total_c shell variables which are initialized to 0 at the beginning. This variables are updated by my_wc at the end of the function.

This is how the main driver works. Now we mode to the my_wc () function.

my_wc () function

First the IFS variable contents is backed up into IFS_BAK. This is needed because inside the loop we will be switching the values of the IFS to interpret a line differently depending on the field separation characters (described later). The file_name is initialized with the passed parameter. Note no error checking related to files is included here, the function trusts its caller. The line, word, and character counts for this particular file would be stored in the l, w, and c shell variables respectively. Another array is declared and defined word this would be used as a temporary array to count the words in a line.

The contents of the file_name is redirected to the input of the while loop. Before the loop the IFS character is made \n this would ensure one single line is read at each iteration by the read command in the while loop. The setting of the \n helps the process to preserve leading blank spaces in a single line, which would otherwise be ignored if the standard set of IFS characters were used.

The -r switch in read -r ensures that the backslash ‘\‘ characters inside the file should be interpreted as a single character and not an escape sequence. If this switch is not used then the the two characters in “\d” would be interpreted as one.

After reading one line the line counter is incremented by one. The characters in the current line is counted by ${#line} , and added to the total character count of the current file in c.

Now we need to count the number of words in the current line being processed. First the IFS is restored to the original value from the backup. The read -a word <<< "$line" redirects the contents of the current line into the input of read with the document here (<<<) redirection . The -a switch will store the words in the into the array word by separating the line at each IFS character encountered. Because we have restored the IFS to the original value, the words would be separated as normally it is done. The -a simplifies the task. After the words are loaded in the different indices of the word array, the array elements count is simply added to the previous word counter and the word count is updated. At the end of the loop the IFS is again made to \n and prepared for the next loop iteration. This loop will read through the entire file and count the line, words and characters.

There is one special case. If the file does not terminate with a new line, that is if the last line of the file does not have a newline character at the end, then the read would read this line, but return false, which would terminate the while loop, and this line would go unprocessed. To process this line an if - else statement is included outside. If the last line was such a line which did not end with a new line character, then the variable line would be not null, in which case it is processed separately in the body of the if statement. Here in a similar manner as in the while loop we count the number of characters and words. Note that this line is not counted. This is because this line not terminated with a newline character, therefore we will not count this a one line.

Once this process is done we need also to remember that the newline characters which terminate each line are also characters, which are not counted by the ${#line} shell substitution. Therefore the number of newline characters are added to the total characters.

The counts are simply printed in the terminal. Recall that we have also defined a total line, word, and character count variable which is defined to count the total number of lines, words, and characters of all the files passed through the command line. These variables are updated with the counts of the current file, before returning. The total count is printed by the main execution sequence if the number of passed parameters are more than 1.


Sample output of this script are shown. Also the outputs of wc is also shown for comparison.

  • Counting the number of lines, words, and characters in the script itself.

    [phoxis@localhost ~]$ ./
      79  243 1714
    [phoxis@localhost ~]$ wc ./
      79  243 1714 ./
  • Counting all the .sh files contents with the script and wc

    [phoxis@localhost ~]$ ./ *.sh
      83  217 1426
      47  69 537
      43  80 600
      15  77 567
      72  220 1486
      13  40 264
      40  105 605
      35  70 392
      81  281 2149
      40  88 571
      28  48 394
      32  62 426
      110  333 1963
      40  160 895
      89  171 1341
      45  137 847
      72  210 1231
      8  9 85
      47  118 777
      36  61 470
      83  163 1140
      58  178 1331
      40  119 718
      102  515 3547
      43  138 931
      79  243 1714
     1381 3912 26407 total
    [phoxis@localhost ~]$ wc *.sh
       83   217  1426
       47    69   537
       43    80   600
       15    77   567
       72   220  1486
       13    40   264
       40   105   605
       35    70   392
       81   281  2149
       40    88   571
       28    48   394
       32    62   426
      110   333  1963
       40   160   895
       89   171  1341
       45   137   847
       72   210  1231
        8     9    85
       47   118   777
       36    61   470
       83   163  1140
       58   178  1331
       40   119   718
      102   515  3547
       43   138   931
       79   243  1714
     1381  3912 26407 total


There are two ways this code differs externally to wc. One is this script only reads from a regular file, and when the file does not exist or is not a regular file the error message is different than the wc command. Two is the output line formatting is different. Internally the major problem with this code is the execution time. It takes huge amount of time to count contents from a moderately large file. For example to test count the lines, words, and characters of the file /usr/share/dict/linux.words . wc would count it in no time, but the script would take a huge amount of time, however it counts the lines, words, and characters correctly. For small to medium files this script works fast.

About phoxis

This entry was posted in Computer Science, Linux / Unix Shell and tagged , , . Bookmark the permalink.

5 Responses to Bash Script: Counting lines, words, characters

  1. It’s SLOW! Why the heck would you want to write it in an interpreted language? Time to write a Bash compiler for LLVM, I guess…

    • phoxis says:

      Definitely it is slow, as i have discussed in the “Comments” section. I would never write a script for this matter, and would always prefer to use the wc command or best to write my own C code which is my primary language of choice. This was just a demonstration and nothing else.

      • Prantik Maitra says:

        It won’t really matter whether it is slow or fast…what does matter is that the above demonstration will bring marks for us…

      • Prantik Maitra says:

        One must write this code because it is the one which comes in our university exams….
        I guess this reason is sufficient…

  2. wow, i couldnt do this, im in awe of people who can :)

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s