EmbLogic's Blog

Understanding Linux InterProcess Communication-II: Process Management

Before we should discuss about PIPES and FIFOs. There are is a difference between an “Inter-process Communication Mechanism” and an “Inter-process Communication Resource/Facility”, though it is very difficult to draw a line between them and differentiate between them.

Pipes and FIFOs are “Inter-process Communication Mechanisms” while semaphores, message queues and shared memory segments are “Inter-process Communication Resources”.

The best way to remember the difference between the two is: Inter-process Communication Mechanisms emphasize “how and why” data communication occurs between two User Mode processes, while on the other hand, Inter-process Communication Resources define the same objective, but in a more polished manner, by implementing the functionality through programming interfaces (and most of the times, using rather complex ones!).

To understand the concept of PIPES and FIFOS we have to look how a process is created in linux opearing system.

Linux is a very dynamic system with constantly changing computing needs. The representation of the computational needs of Linux centers around the common abstraction of the process. Processes can be short-lived (a command executed from the command line) or long-lived (a network service). For this reason, the general management of processes and their scheduling is very important.

From user-space, processes are represented by process identifiers (PIDs). From the user’s perspective, a PID is a numeric value that uniquely identifies the process. A PID doesn’t change during the life of a process, but PIDs can be reused after a process dies, so it’s not always ideal to cache them.

In user-space, you can create processes in any of several ways. You can execute a program (which results in the creation of a new process) or, within a program, you can invoke a fork or exec system call.

Process representation

Within the Linux kernel, a process is represented by a rather large structure called task_struct. This structure contains all of the necessary data to represent the process, along with a plethora of other data for accounting and to maintain relationships with other processes (parents and children).

A full description of the task_struct is beyond the scope of this article, but a portion of task_struct is shown here. This code contains the specific elements this article explores. Note that task_struct resides in ./linux/include/linux/sched.h.

A small portion of task_struct is re

struct task_struct {
volatile long state;
void *stack;
unsigned int flags;
int prio, static_prio;
struct list_head tasks;
struct mm_struct *mm, *active_mm;
pid_t pid;
pid_t tgid;
struct task_struct *real_parent;
char comm[TASK_COMM_LEN];
struct thread_struct thread;
struct files_struct *files;

};

Process Management

Now, let’s explore how you manage processes within Linux. In most cases, processes are dynamically created and represented by a dynamically allocated task_struct. One exception is the init process itself, which always exists and is represented by a statically allocated task_struct. You can see an example of this in ./linux/arch/i386/kernel/init_task.c.

Lets take an example of process, If you have two terminal windows showing on your screen, then you are probably running the same terminal program twice—you have two terminal processes. Each terminal window is probably running a shell; each running shell is another process. When you invoke a command from a shell, the corresponding program is executed in a new process; the shell process resumes when that process completes.

All processes in Linux are collected in two different ways. The first is a hash table, which is hashed by the PID value; the second is a circular doubly linked list. The circular list is ideal for iterating through the task list.

Process IDs are 16-bit numbers that are assigned sequentially by Linux as new processes are created.
Every process also has a parent process (except the special init process).Thus, you can think of the processes on a Linux system as arranged in a tree, with the init process at its root. The parent process ID, or ppid, is simply the process ID of the process’s parent.

When referring to process IDs in a C program, always use the pid_t typedef, which is defined in <sys/types.h>. A program can obtain the process ID of the process it’s running in with the getpid() system call, and it can obtain the process ID of its parent process with the getppid() system call.
Lets see an example of getpid() and getppid()

#include <stdio.h>
#include <unistd.h>
int main ()
{
printf (“The process ID is %d\n”, (int) getpid ());
printf (“The parent process ID is %d\n”, (int) getppid ());
return 0;
}

If you invoke this program several times, a different process ID is reported because each invocation is in a new process. However, if you invoke it every time from the same shell, the parent process ID (that is, the process ID of the shell process) is the same.

Viewing Active Processes

The ps command displays the processes that are running on your system. The GNU/Linux version of ps has lots of options because it tries to be compatible with  versions of ps on several other UNIX variants. These options control which processes are listed and what information about each is shown.

root@localhost~]#% ps
PID        TTY     TIME         CMD
21693   pts/8   00:00:00   bash
21694   pts/8   00:00:00    ps

Killing a Process

You can kill a running process with the kill command. Simply specify on the command line the process ID of the process to be killed.
Creating Processes
Two common techniques are used for creating a new process.

  • The first is relatively simple but should be used sparingly because it is inefficient and has considerably security risks using system().
  • The second technique is more complex but provides greater flexibility, speed, and security using fork() and exec().

Using system() call

The system function in the standard C library provides an easy way to execute a command from within a program, much as if the command had been typed into a shell.
In fact, system creates a subprocess running the standard Bourne shell (/bin/sh) and hands the command to that shell for execution.

For example, in our program we invoke the ls command to display the contents of the root directory, as if you typed ls -l / into a shell.

#include <stdlib.h>
int main ()
{
int return_value;
return_value = system (“ls -l /”);
return return_value;
}

As the system function uses a shell to invoke your command, it’s subject to the features,  imitations, and security flaws of the system’s shell. You can’t rely on the availability of any particular version of the Bourne shell.

Using fork and exec

Linux provides one function, fork, that makes a child process that is an exact copy of its arent process. Linux provides another set of functions, the exec family, that causes a particular process to cease being an instance of one program and to instead become an instance of another program.

To spawn a new process, you first use fork to make a copy of the current process.Then you use exec to transform one of these processes into an instance of the program you want to spawn.

Calling fork()
General Syntax for fork() call is:

#include <unistd.h>

pid_t fork(void);

fork() creates a new process by duplicating the calling process. The new process, referred to as the child, is an exact duplicate of the calling process, referred to as the parent, except for the following points:

  • The child has its own unique process ID, and this PID does not match the ID of any existing process group.
  • The child’s parent process ID is the same as the parent’s process ID.
  • The child does not inherit its parent’s memory locks.
  • Process resource utilizations and CPU time counters are reset to zero in
    the child.
  • The child’s set of pending signals is initially empty.
  • The child does not inherit semaphore adjustments from its parent.
  • The child does not inherit timers from its parent

The child process is created with a single thread — the one that called fork(). The entire virtual address space of the parent is replicated in the child, including the states of mutexes, condition variables, and other pthreads objects.

The child inherits copies of the parent’s set of open file descriptors. Each file descriptor in
the child refers to the same open file description (see open(2)) as the corresponding file
descriptor in the parent. This means that the two descriptors share open file status flags, current file offset, and signal-driven I/O attributes (see the description of F_SETOWN and F_SETSIG in fcntl(2)).

The child inherits copies of the parent’s set of open message queue descriptors (see
mq_overview(7)). Each descriptor in the child refers to the same open message queue description as the corresponding descriptor in the parent. This means that the two descriptors share the same flags (mq_flags).

So how do the two processes differ? First, the child process is a new process and therefore has a new process ID, distinct from its parent’s process ID. One way for a program to distinguish whether it’s in the parent process or the child process is to call getpid. However, the fork function provides different return values to the parent and child processes—one process “goes in” to the fork call, and two processes “come out,” with different return values.The return value in the parent process is the process ID of the child.The return value in the child process is zero. Because no process ever has a process ID of zero, this makes it easy for the program whether it is now running as the parent or the child process.
#include <stdio.h>
#include <sys/types.h>
#include <unistd.h>
int main ()
{
pid_t child_pid;
printf (“the main program process ID is %d\n”, (int) getpid ());
child_pid = fork ();
if (child_pid != 0)
{
printf (“this is the parent process, with id %d\n”, (int) getpid ());
printf (“the child’s process ID is %d\n”, (int) child_pid);
}
else
printf (“this is the child process, with id %d\n”, (int) getpid ());
return 0;
}

In our next article we will cover exec families and also combining both exec and fork.

Sources:

  • linux man pages
  • Advance Linux Programming by Mark Mitchell, Jeffrey Oldham,
    and Alex Samuel

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>