Find process IDs of a running process by name


In this post I will talk about a procedure to find the process IDs of a running process by name, which can then be used to send signals or do other stuffs. For example if you have multiple instances of bash opened, this should be able to get you the list of process IDs (PIDs) of the bash instances.

Firstly, a shell utility is already available called pidof which is a part of the sysvinit-tools package. There are a whole bunch of tools in this package which lets you query PID based on different requirements, send signals to set of processes, etc. Just check out the stuff.

I will only mention the outline of how this is done and post the sourcecodes to do it. After that this can be extended to have many features just like the tools of sysvinit-tools package or more.

We will use information stored in the /proc file system. Last time I used the /proc filesystem to fetch CPU utilization information. Also note that /proc is not an actual filesystem which exists on the disk, it is a pseudo-filesystem which is provided by the kernel to access kernel’s internal data. Although we use the same filesystem calls to access the files and directories in the /proc filesystem, the underlying mechanism with which these pseudo-files are handles are very different than the actual files in a physical filesystem. Visit http://en.wikipedia.org/wiki/Procfs for introductory information about /proc.

For each process created in the system, kernel makes a directory named with the process ID under the /proc directory. Therefore if you firefox and its PID happens to be 1807, then you will have /proc/1807 created. Under that directory there are a lot of files which contains the process specific information, including the command name, arguments with which it was invoked, a link to the executable, the current working directory of the process, open file descriptors (includes sockets, pipes), memory maps, various process stats, and many others. Just have a look at man 5 proc and check the information in it with /proc/[pid]/

Three files are of our present interest. The /proc/[pid]/comm, /proc/[pid]/cmdline, /proc/[pid]/status and /proc/[pid]/stat. I will just give a brief about these file highlighting only the portions which is relevant to the current application. For more, just check the man pages.

/proc/[pid]/comm

This holds the command name as a string, the name of the executable file. You can fetch the name of the executable file (not the path) which is being executed with a specific [pid]. Using this file you can also set the command name of the process. For example if you want to get the command name of a process using the PID 27527 then do

cat /proc/27527/comm
firefox

The output is firefox. To set the name you can do

echo -e "my_name\0" > /proc/27527/comm

Although we won’t set the name, but it is interesting to know about it. Note that the length of the name string is limited to the TASK_COMM_LEN macro in the kernel which is currently set to 16. Therefore any command name with a longer name than 16 chars will be truncated.

/proc/[pid]/cmdline

The contents of this file reflects the entire list of commandline arguments for the process with the specific pid when the executable was executed. For example if you have executed myout p1 p2 p3 or ./myout p1 p2 p3 or /home/guest/myout p1 p2 p3, the file will contain the exact commandline along with whatever path given at the the of the invocation. The point to note that each argument in this file are null character separated. This is very important to know. If you cat the cmdline file for the above executable then you will see something like

myoutp1p2p3

We cannot see the separating null characters '' in between the arguments. To confirm, just execute

cat -v /proc/[pid]/cmdline
myout^@p1^@p2^@p3^@

This will show also the non-printing characters, and you can see ^@ after each argument. Therefore, even if we had blankspaces in between a single argument, then we can easily and unambiguously parse it.

/proc/[pid]/stat

This file contains a whole lot of information about the process, pid, filename of the executable, parent PID, group ID, different time measurements, priority, number of threads and so on. Check this section of man 5 proc. We are only interested in the comm field. This field is identical of what I have shared for the /proc/[pid]/comm. Here the command name is given in parenthesis.

/proc/[pid]/status

This file basically contains the information in /proc/[pid]/stat and /proc/[pid]/statm (details about memory). Here also we can find the name of the process with beside the field named Name:.

The plan

There are a lot of ways we can fetch the command name. But why did I describe these and how we are going to use these? I am sure you have guessed it (if you don’t know it already). We are going to scan one of the above files of each running process, and get the process name from it, match the process name with the given name to match, if a match is found we have the PID for an instance of that running process, else we skip it. The directories in /proc with numerical names are the directories of the running process. The process is briefed below.

A process name string `to_match' is given

for each numerical named directory `pid' in /proc
{
   - Open the file `/proc/pid/comm' (or any others above)
   - Read `read_buffer' the line with command/cmdline
   - If matched `read_buffer' matches `to_match'
     {
       add `pid' it to list, and continue.
     }
   - else
     {
       this is not the process we are looking for,
       so just continue.
     }
}

Pretty simple.

Which file to parse

We need to decide on which file to parse of the four files mentioned above. Note that the /proc/[pid]/stat ‘s second field, /proc/[pid]/status ‘s Name: field and /proc/[pid]/comm stores identical information. Therefore if we only want access to the command name which has executed (truncated string), we can use any of them and parse out the appropriate field. In this case I will go with the /proc/[pid]/comm, because we only need the name of the command and the comm will contain only one line with the required string.

Whereas /proc/[pid]/cmdline will hold the entire commandline, therefore parsing out the first argument and matching with the given string to match does not guarantee a match even if it is. This is because, if you have ran the process as ./a.out param1 or /home/guest/Documents/a.out param1 the first argument is ./a.out and /home/guest/Documents/a.out respectively. Now if you want to match a.out then the match will fail in this case because the executed command has also may have a relative or absolute path before the actual file name. Therefore we need to strip off the path from the first argument, may be manually or using basename.

There is another issue. Do you want to consider the name of the shell script, or a perl script or any script as a name of the process, or consider it’s interpreter’s executable name as a process process name? For example if you have a bash script named test.sh and you execute it as ./test.sh then the cmdline will hold

 "/bin/bash^@./test.sh^@"

where the ^@ are null characters. Therefore if you take the first parameter, then you think that the process is bash, but actually it is executing test.sh. So, to detect a script using cmdline file, you need to parse the other arguments as well. I think it can be done by first detecting if the first argument is an interpreter or a shell. If yes, then parse it’s other command line arguments to detect what script it is running. Which is pretty cumbersome work, because probably you need to store a list of shells. Parsing will also be messy, as you also need to keep in mind ignoring the other commandline switches which the shell/interpreter executable might have.

On the other hand the /proc/[pid]/comm file stores the script name instead of the interpreter/shell. Therefore for the above example test.sh will be stored in the /proc/[pid]/comm file. The pidof tool when executed without any flags, will take the name of the shell/interpreter as the name of the process. To detect scripts also you need to use the -x switch as

pidof -x test.sh

If the last few characters differ, only which differs in two executable names, then this will find both the processes as instances, which is not the case. For example executable files names as “abcdefghijklmnoxxxx” and “abcdefghijklmnoyyyy” will both show as “abcdefghijklmno” (15 bytes name + one byte for newline) in the comm file. Try making two dummy executable which wait for some input, send them to background in the shell and exeute ps and/or see the contents of comm. pidof abcdefghijklmnoxxxx will properly detect the actual PID, because it will access the cmdline file, but it won’t detect scripts running.
I think it will be interesting to have a look at the sourcecode of pidof command to know what’s going on.

Sourcecode

I will keep it simple in this post and use the /proc/[pid]/comm file to read the executable name. If a script test.sh runs in bash and if you are searching for the PIDs for bash, the below code will not find it, as it inspects the /proc/[pid]/comm file. To detect the script run the following code with the script name as the commandline argument.

Here is the code in C

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <ctype.h>
#include <dirent.h>
#include <libgen.h>

/* checks if the string is purely an integer
 * we can do it with `strtol' also
 */
int check_if_number (char *str)
{
  int i;
  for (i=0; str[i] != '\0'; i++)
  {
    if (!isdigit (str[i]))
    {
      return 0;
    }
  }
  return 1;
}

#define MAX_BUF 1024
#define PID_LIST_BLOCK 32

int *pidof (char *pname)
{
  DIR *dirp;
  FILE *fp;
  struct dirent *entry;
  int *pidlist, pidlist_index = 0, pidlist_realloc_count = 1;
  char path[MAX_BUF], read_buf[MAX_BUF];

  dirp = opendir ("/proc/");
  if (dirp == NULL)
  {
    perror ("Fail");
    return NULL;
  }

  pidlist = malloc (sizeof (int) * PID_LIST_BLOCK);
  if (pidlist == NULL)
  {
    return NULL;
  }

  while ((entry = readdir (dirp)) != NULL)
  {
    if (check_if_number (entry->d_name))
    {
      strcpy (path, "/proc/");
      strcat (path, entry->d_name);
      strcat (path, "/comm");

      /* A file may not exist, it may have been removed.
       * dut to termination of the process. Actually we need to
       * make sure the error is actually file does not exist to
       * be accurate.
       */
      fp = fopen (path, "r");
      if (fp != NULL)
      {
        fscanf (fp, "%s", read_buf);
        if (strcmp (read_buf, pname) == 0)
        {
          /* add to list and expand list if needed */
          pidlist[pidlist_index++] = atoi (entry->d_name);
          if (pidlist_index == PID_LIST_BLOCK * pidlist_realloc_count)
          {
            pidlist_realloc_count++;
            pidlist = realloc (pidlist, sizeof (int) * PID_LIST_BLOCK * pidlist_realloc_count); //Error check todo
            if (pidlist == NULL)
            {
              return NULL;
            }
          }
        }
        fclose (fp);
      }
    }
  }


  closedir (dirp);
  pidlist[pidlist_index] = -1; /* indicates end of list */
  return pidlist;
}

int main (int argc, char *argv[])
{
  int *list, i;

  if (argc != 2)
  {
    printf ("Usage: %s proc_name\n", argv[0]);
    return 0;
  }
  list = pidof (argv[1]);
  for (i=0; list[i] != -1; i++)
  {
    printf ("%d ", list[i]);
  }
  free (list);
  if (list[0] != -1)
  {
    printf ("\n");
  }
  return 0;
}

Just to note, here I am storing the found PIDs in the list pidlist which can grow dynamically. Preliminarily I am taking a block of memory of length of PID_LIST_BLOCK if the number of processes exceeds this number the existing block is extended to the length of PID_LIST_BLOCK number of elements using realloc. Else, nothing special going in this code.

And also learning perl is fun, therefore the following code.

#!/usr/bin/perl -w

use strict;
use warnings;

my @pid_list;
my $status;

die ("Usage: $ARGV[0] process_name\n") if (scalar (@ARGV) != 1);
my $target_name = $ARGV[0];
opendir (my $proc_dir, "/proc/") or die ("Cannot open \"/proc/\"\n");

while (my $pid = readdir ($proc_dir))
{
  if ($pid =~ m/[0-9]+/) #If an integer
  {
    $status = open (my $comm_file, "/proc/$pid/comm");
    if ($status != 0)
    {
      my $command_name = <$comm_file>;
      chomp ($command_name); #remove trailing new line
      push (@pid_list, $pid) if ($command_name eq $target_name);
      close ($status);
    }
  }
}
closedir ($proc_dir);

print (join (" ", @pid_list), "\n");

Run the codes with one command line argument which is the process name for which all the running instances are to be found.

Links and References

About these ads

About phoxis

Homo-sapiens
This entry was posted in Computer Science, Linux Programming and tagged , , , , , , , . Bookmark the permalink.

One Response to Find process IDs of a running process by name

  1. Pingback: Generate the process tree of a Linux system | Phoxis

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s