Linux Namespaces

Starting from kernel 2.6.24, there are 6 different types of Linux namespaces. Namespaces are useful in isolating processes from the rest of the system, without needing to use full low level virtualization technology. CLONE_NEWIPC: IPC Namespaces: SystemV IPC and POSIX Message Queues can be isolated. CLONE_NEWPID: PID Namespaces: PIDs are isolated, meaning that a PID inside of the namespace can conflict with a PID outside of the namespace. PIDs inside the namespace will be mapped to other PIDs outside of the namespace. The first PID inside the namespace will be ‘1’ which outside of the namespace is assigned to init CLONE_NEWNET: Network Namespaces: Networking (/proc/net, IPs, interfaces and routes) are isolated. Services can be run on the same ports within namespaces, and “duplicate” virtual interfaces can be created. CLONE_NEWNS: Mount Namespaces. We have the ability to isolate mount points as they appear to processes. Using mount namespaces, we can achieve similar functionality to chroot() however with improved security. CLONE_NEWUTS: UTS Namespaces. This namespaces primary purpose is to isolate the hostname and NIS name. CLONE_NEWUSER: User Namespaces. Here, user and group IDs are different inside and outside of namespaces and can be duplicated. Let’s look first at the structure of a C program, required to demonstrate process namespaces. The following has been tested on Debian 6 and 7. First, we need to allocate a page of memory on the stack, and set a pointer to the end of that memory page. We use alloca to allocate stack memory rather than malloc which would allocate memory on the heap. void *mem = alloca(sysconf(_SC_PAGESIZE)) + sysconf(_SC_PAGESIZE); Next, we use clone to create a child process, passing the location of our child stack ‘mem’, as well as the required flags to specify a new namespace. We specify ‘callee’ as the function to execute within the child space: mypid = clone(callee, mem, SIGCHLD | CLONE_NEWIPC | CLONE_NEWPID | CLONE_NEWNS | CLONE_FILES, NULL); After calling clone we then wait for the child process to finish, before terminating the parent. If not, the parent execution flow will continue and terminate immediately after, clearing up the child with it: while (waitpid(mypid, &r, 0) < 0 && errno == EINTR) { continue; } Lastly, we’ll return to the shell with the exit code of the child: if (WIFEXITED(r)) { return WEXITSTATUS(r); } return EXIT_FAILURE; Now, let’s look at the callee function: static int callee() { int ret; mount("proc", "/proc", "proc", 0, ""); setgid(u); setgroups(0, NULL); setuid(u); ret = execl("/bin/bash", "/bin/bash", NULL); return ret; } Here, we mount a /proc filesystem, and then set the uid (User ID) and gid (Group ID) to the value of ‘u’ before spawning the /bin/bash shell. […]