ASE ++ : Linux Kernel Programming

Table of Contents

Introduction

The objective of this short course is to understand the internals of the Linux Kernel. More specifically we will see:

  • How to set up a development environment for programming Linux kernel code on a PC. Since the course is short, we will only see programming on a PC (for embedded programming the tools to be used are slightly different).
  • What is a kernel module, and how to write a simple one
  • How does scheduling works in Linux
  • How to debug and trace kernel code.

The course is not exhaustive; we will just touch the surface of some topic. However, it is a good starting point for people that would like to pursue the topic of kernel programming. It is also useful for the students interested in other topics because it gives an overview of the internal workings of the kernel, and of the many difficulties that you can find in developing system code.

Setting up the environment

Setting a VM

In this course we will use kvm as vitalisation solution. For more information about running a VM with kvm you can look at the man page of qemu:

man qemu

In fact, kvm is a simple wrapper to the qemu-kvm which runs a virtual machine natively on Linux. To run kvm you must enable the vitalisation in the BIOS (Intel vt-x or AMD amd-v). If it is not possible to enable virtualisation, you can still run qemu in emulation mode (but it will be much slower).

To simplify the task of running VM with qemu and setting all the right options, I prepared a Debian image for the course, and a script that you can use to run your virtual machine with Linux. The Debian image and the script are available at /kvm/debian32.img and /start_debian on your student account at FIL.

A copy of this image file is available here.

The start_debian script is quite complex to parse, but here are a few explanation on how it works. I will explain the parameters of the VM on two simpler scripts:

  • kvm-std.sh runs the machine with his own kernel (version 3.16.3):

    #!/bin/bash
    SMP=${2-2}
    
    kvm -smp $SMP -m 2048 -boot c \
      -hda debian32.img \
      -net nic -net user,hostfwd=tcp::10022-:22 \
      -serial stdio
    
  • The second script, kvm-term.sh uses a kernel compiled by yourself:

    #!/bin/bash
    KERNEL=${1-/home/lipari/devel/linux-3.19/build/kvm-32/arch/x86/boot/bzImage}
    #KERNEL=${1-/home/lipari/devel/linux-3.16.3/arch/x86/boot/bzImage}
    SMP=${2-2}
    
    kvm -smp $SMP -m 2048 -boot c \
      -hda debian32.img \
      -net nic -net user,hostfwd=tcp::10022-:22 \
      -serial stdio \
      -kernel $KERNEL -append "root=/dev/sda1 console=ttyS0 rw" \
      -display none
    

    The only thing you need to modify is the value of the KERNEL variable: it must point to the location on your hard disk of the compiled kernel you wish to use.

Notice that TCP port 22 (used by ssh) has been "forwarded" on port 4022 of the same machine. Therefore, when you try to connect to 127.0.0.1:4022 on the local machine, you are actually forwarded to the virtual machine.

Now, to run the VM you need to

  • copy the start_debian script in your home dir, so you can modify it at will;
  • launch it from your home.

The VM that I prepared has two users:

  • user root with password ase++ and
  • user ase with password ase++.

To ssh to a running guest do:

ssh ase@127.0.0.1 -p 4022

Other useful links (not necessary for this short course):

Compiling the kernel sources

As shown in the previous section, you need to compile the kernel by yourself. The first thing you need to do is to download a kernel from https://www.kernel.org/. I suggest version 4.1.19.

You can also try to download the latest bleeding edge development version from git repository, and switch to the development branch for the real-time subsystem:

git clone git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git
git checkout -b tip-sched-core origin/sched/core

Once you have the kernel sources on your file system, you need to compile the kernel. Since it is not so easy to configure the kernel properly for use with the VM, I provide a config file already configured for you. You can take this as a starting point for your configurations.

kvm32.config

Save this file in the Linux directory overwriting the existing .config file (if any). Then type

make menuconfig

to start the configuration, and select the options that you would like to enable. For the time being you do not need to change anything, we will come back to this step later on in the course.

To compile the kernel, you can specify the output directory where you want to put your compiled objects for a certain architecture. For example, you could think of a directory structure as follows:

dir-tree.png

In this case,

  • just create the directory structure build/kvm32/
  • copy the kvm32.config in the same directory as .config
  • run
make O=../build/kvm32 -j8

from within the linux-4.1.19 directory. If everything goes well, you will find file bzImage in directory build/kvm32/arch/x86/boot/.

Transferring files to the virtual machine

To transfer files to the virtual machine, there are two methods, a) using SSH to a running virtual machine, or b) mount the virtual machine image using a loop back device.

SSH access

(This is the suggested method) For copying files, you can use command scp as follows:

scp -P 4022 <files_to_copy>  ase@127.0.0.1:<destination_path>

Loop back device

To mount loop back, first find the sector offset

fdisk -u -l /path/to/debian.img

Then you can compute:

offset = sector size * start offset

Normally that would be 32256 or 1048576. Then you need to mount the loop back device:

mount -o loop,offset=1048576 /path/to/debian.img /path/to/mountpoint

At this point, you can access the debian disk as it were a local device mounted in /path/to/mountpoint.

Writing a kernel module

A loadable kernel module is a software module that can be dynamically loaded into the kernel space and interact with the kernel. It is, from all points of view, kernel code which runs in kernel space with kernel privileges, and that can access to all kernel symbols. The only difference is that, instead of being statically linked in the kernel image at compilation + linking time, it can be compiled separately and later loaded and linked with the rest of the kernel code.

Usually, kernel modules provide access to device drivers and extra functionality that it is not central to the kernel. They also provide a possibility for programmers to easily customise the kernel features.

Books and other material

A very good introduction to the art of writing kernel modules and device drivers is found in the following book:

Jonathan Corbet, Alessandro Rubini, and Greg Kroah-Hartman "Linux Device Drivers".

The book is freely available for download, and I strongly recommend it to anybody willing to be introduced to kernel programming. See in particular Chapter 2 for an introduction to module programming.

Here I am going to provide a short tutorial, taken in part from the book and in part from various web sites.

Another interesting resource is the "Crash course on kernel programming", by Robert P.J. Day:

I also recommend the book "Linux Kernel Development", by Robert Love, for a good explanation of the internals of the Linux kernel.

Finally, a very useful reference for searching and exploring the Linux kernel code is the following:

The makefile

First of all, it is important to understand how to compile a module. Please refer to the following makefile.

# If KERNELRELEASE is defined, we've been invoked from the
# kernel build system and can use its language.
ifneq ($(KERNELRELEASE),)
        obj-m := hello.o
# Otherwise we were called directly from the command
# line; invoke the kernel build system.
else
        KERNELDIR ?= <kernel source dir>
        BUILDDIR  ?= <kernel build dir>
        PWD := $(shell pwd)
default:
        $(MAKE) -C $(KERNELDIR) O=$(BUILDDIR) M=$(PWD) modules
endif

You can download the Makefile from here.

Of course, you need to adjust the KERNELDIR and BUILDDIR variables to point to the root of your kernel sources, and to your build directory, respectively.

From the LDD book:

This makefile is read twice on a typical build. When the makefile is invoked from the command line, it notices that the KERNELRELEASE variable has not been set. It locates the kernel source directory by taking advantage of the fact that the symbolic link build in the installed modules directory points back at the kernel build tree.

If you are not actually running the kernel that you are building for, you can supply a KERNELDIR option on the command line, set the KERNELDIR environment variable, or rewrite the line that sets KERNELDIR in the makefile.

Once the kernel source tree has been found, the makefile invokes the default target, which runs a second make command (parameterized in the makefile as $(MAKE)) to invoke the kernel build system as described previously. On the second reading, the makefile sets obj-m, and the kernel makefiles take care of actually building the module.

This mechanism for building modules may strike you as a bit unwieldy and obscure. Once you get used to it, however, you will likely appreciate the capabilities that have been programmed into the kernel build system. Do note that the above is not a complete makefile; a real makefile includes the usual sort of targets for cleaning up unneeded files, installing modules, etc.

Our first module

Now the first module example, which simply prints hello worlds in the log file.

#include <linux/init.h>
#include <linux/module.h>
MODULE_LICENSE("Dual BSD/GPL");

static int hello_init(void)
{
  printk(KERN_ALERT "Hello, world\n");
  return 0;
}

static void hello_exit(void)
{
  printk(KERN_ALERT "Goodbye, cruel world\n");
}

module_init(hello_init);
module_exit(hello_exit);

The printk function prints a string on various output, depending on the first macro which is the message priority. There are 8 possible priorities, they can be found in kernel.h along with their explanation. If the priority is less than variable console_loglevel the message is also printed on the current terminal. Otherwise it goes in the kernel logs, which can be visualised with command:

dmesg

By compiling this file you obtain a hello.ko file, which is an object file ready to be loaded into the kernel. To load the module, type:

insmod hello.ko

Loading a module requires superuser privileges, so you may want to run it with sudo, or just as root.

To remove the module, you type:

rmmod hello

again as superuser.

A few things to notice:

  • You are inside the kernel. Therefore, you cannot use the glibc: no printf() (but you can use printk), no scanf, no fopen, no malloc (but you can use kmalloc) etc. You need to use equivalent functions available in the kernel.
  • module_init(hello_init) tells the kernel which function to call upon loading the module. This function contains initialisation code for the module environment.
  • module_exit(hello_exit) tells the kernel which function to call upon removal of the module.
  • Notice that there is no main(). In fact, modules do not run as normal user processes. Typically, the module contain functions that are called by the kernel upon occurrence of certain events. Therefore, if we want to write something meaningful,
    • first of all we need to understand what kind of events happen in the kernel,
    • then, understand which events are useful for us
    • finally, we need to write and install some function to be called when a certain even happens.

This module has no special function except the ones for initialisation and cleanup.

Symbols

Which symbols you can access from your module? Here is a quick explanation:

To list all kernel symbols:

cat /proc/kallsyms

On each line, the symbol name is preceded by

  • the address in memory
  • a character [DdSsTt], with the following meaning:

    • D or d The symbol is in the initialised data section.
    • S or s The symbol is in an uninitialised data section for small objects.
    • T or t The symbol is in the text (code) section.

    Uppercase symbols are global/exported; lowercase are local unexported symbols.

Interacting with the module

The module runs in kernel space, that is in a different memory space than normal user programs. Also, it cannot use normal glibc functions (such as fprintf() and fscanf()) and other user space libraries. Also, there is no main() function, so module functions are executed only in response to events.

So, how we can communicate with the module?

There are several possibilities (syscalls, for example). Here we will study the possibility to interact through a "file" in the /proc file system.

What follows is an example of module which prints the kernel jiffies (i.e. a counter that is incremented every kernel tick). To activate the feature, you first have to write '1' on the file /proc/crash-jiffies, and you can disable the module by writing '0' on it.

This module has been adapted from a similar one in the Crash Course web-site by R. P.J. Day.

#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/proc_fs.h>
#include <linux/init.h>
#include <linux/seq_file.h>
#include <linux/jiffies.h>
#include <linux/string.h>
#include <asm/uaccess.h>

#define JIFFIES_BUFFER_LEN 4
static char jiffies_buffer[JIFFIES_BUFFER_LEN];
static int  jiffies_flag = 0;

static int 
jiffies_proc_show(struct seq_file *m, void *v)
{
    if (jiffies_flag)
        seq_printf(m, "%llu\n",
                   (unsigned long long) get_jiffies_64());
    return 0;
}

static int 
jiffies_proc_open(struct inode *inode, struct file *file)
{
    return single_open(file, jiffies_proc_show, NULL);
}

static ssize_t
jiffies_proc_write(struct file *filp, const char __user *buff,
                   size_t len, loff_t *data)
{
    long res;
    printk(KERN_INFO "JIFFIES: Write has been called");
    if (len > (JIFFIES_BUFFER_LEN - 1)) {
        printk(KERN_INFO "JIFFIES: error, input too long");
        return -EINVAL;
    }
    else if (copy_from_user(jiffies_buffer, buff, len)) {
        return -2;
    }
    jiffies_buffer[len] = 0;

    kstrtol(jiffies_buffer, 0, &res);
    jiffies_flag = res;

    return len;
}

static const struct file_operations jiffies_proc_fops = {
    .owner      = THIS_MODULE,
    .open       = jiffies_proc_open,
    .read       = seq_read,
    .write      = jiffies_proc_write,
    .llseek     = seq_lseek,
    .release    = single_release,
};

static int __init
jiffies_proc_init(void)
{
    proc_create("crash_jiffies", 0666, NULL, &jiffies_proc_fops);
    return 0;
}

static void __exit
jiffies_proc_exit(void)
{
    remove_proc_entry("crash_jiffies", NULL);
}

module_init(jiffies_proc_init);
module_exit(jiffies_proc_exit);

MODULE_AUTHOR("Modified by Giuseppe Lipari from Robert P. J. Day, http://crashcourse.ca");
MODULE_LICENSE("GPL");
MODULE_DESCRIPTION("A jiffies /proc file.");

A few comments:

  • The init function jiffies_proc_init just creates an entry in the /proc directory, with name crash_jiffies. This will be seen as a file from the user
  • When creating the entry, we specify a =file_operations) data structure, which contains a set of pointers to function. Each function is a callback that is going to be called when the user performs some operation on the proc file. For example is the use tries to open the file, the jiffies_proc_open function is called. Notice in particular the seq_read and seq_lsee (which are functions that already exist in the Linux kernel) and the jiffies_proc_write (which is a function provided by the module).

    You do not need to specify all possible function pointers in the data structure.

  • The jiffies_proc_open calls another function that opens a "sequence file structure" and installs another function called jiffies_proc_show which finally provides the content of the file when needed.
  • The jiffies_proc_show just prints on the sequence structure with a seq_printf() the current number of jiffies (a 64 bit integer).
  • For more information on sequence files and why they are useful, please refer to this web page.
  • When the user writes to the proc file, function jiffies_proc_write is called, which copies the data from user space to the internal buffer jiffies_buffer by using the copy_from_user function.

    WARNING: pay attention to the length of the data to be copied. It is absolutely necessary to not exceed the buffer length, otherwise the kernel will crash unpredictably!

  • Then the buffer content is terminated with 0, and transformed into a long with kstrtol.
  • Notice that the jiffies_proc_show prints on the file only if the flag is set to 1.
  • After compiling and installing the module, you can test it with

    cat /proc/crash_jiffies
    echo '1' > /proc/crash_jiffies
    cat /proc/crash_jiffies
    echo '0' > /proc/crash_jiffies
    cat /proc/crash_jiffies
    

    You should see nothing on the first cat, the jiffies on the second one, and none again on the last one.

KProbes

Another thing that it is possible to do is to intercept kernel functions or arbitrary kernel code by using probes. The full documentation of the Kprobes is available in the kernel documentation, e.g. https://www.kernel.org/doc/Documentation/kprobes.txt

In particular, JProbes are useful to intercept calls to some relevant function in the kernel. For example, you can install a jprobe with the following code snippet:

static struct jprobe my_jprobe = {
    .kp = {
        .symbol_name = "sched_setscheduler",
    },
    .entry = (kprobe_opcode_t *) my_callback
};

// ...
// within the init function
register_jprobe(&my_jprobe);
printk(KERN_ALERT "plant jprobe at %p, handler addr %p\n",
       my_jprobe.kp.addr, my_jprobe.entry); 
// ...

If the register_jprobe() is successful, function my_callback will be called just before the sched_setscheduler function is called within the kernel, and you will be able to access the same arguments as the original function. In fact, my_callback must have the same prototype as the original sched_setscheduler function.

Exercise:

  1. search the kernel sources to locate the function and identify the prototype of sched_setscheduer
  2. Write a simple kernel module that installs a callback as a jprobe; when the callback is called it should printk something.
  3. Write a simple C program that creates a Pthread, and calls sched_set_scheduler on it. When you execute the program, you should see the printk (in the screen, or in the logs, depending on the level).

Other useful information

The project

The goal of this project is to write a kernel module which monitors the execution time of a task on the different cores of a processor. Here are the requirements:

  1. The module initially installs file /proc/ase_cmd in the proc file system, and a directory /proc/ase/ which will contain the output of the module.
  2. Then it waits for the user to write a valid process ID (PID) on the file (for example with echo). From then on, it starts to track the process execution time on an internal data structure. It also creates a file with name equal to the process ID in the directory /proc/ase/
  3. When the user reads from this file, it outputs there the total execution time of the process until now.
  4. When the process terminates, the module must understand this and close the corresponding file in the /proc/ase directory, and free the corresponding memory for the data structure. This can be done by intercepting the do_exit function of the Linux kernel.
  5. You can think of giving an additional command in /proc/ase/ to stop tracking a certain process. In that case, even if the process has not yet terminated, the corresponding data structures are destroyed, and the memory is freed.
  6. Track the execution time of a process on each core. This means that, if the process never migrates, the run time in one of the cores must be zero. Write a test by using affinities that verifies that this is indeed the case (_hint_: you must intercept the appropriate functions to identify when a process migrates: read https://www.systutorials.com/239971/migration-thread-works-inside-linux-kernel/ to understand ho migration works in the kernel).

Author: G. Lipari

Created: 2017-05-15 lun. 16:25

Validate