# ASE ++ : Linux Kernel Programming

## ChangeLog

• 8 april 2018: I removed the section on KProbes, since they have been deprecated in the kernel. Consequentely the project has been slightly changed (simplified). I also added an addition section of explainations and hints on how to progress in the project.
• 25 mars 2018: This file has been updated for the course of 2017-2018.

## Introduction

The objective of this short course is to understand the internals of the Linux Kernel. More specifically we will see:

• How to set up a development environment for programming Linux kernel code on a PC. Since the course is short, we will only see programming on a PC (for embedded programming the tools to be used are slightly different).
• What is a kernel module, and how to write a simple one
• How does scheduling works in Linux
• How to debug and trace kernel code.

The course is not exhaustive; we will just touch the surface of some topic. However, it is a good starting point for people that would like to pursue the topic of kernel programming. It is also useful for the students interested in other topics because it gives an overview of the internal workings of the kernel, and of the many difficulties that you can find in developing system code.

## Setting up the environment

### Setting a VM

In this course we will use kvm as vitalisation solution. For more information about running a VM with kvm you can look at the man page of qemu:

man qemu



In fact, kvm is a simple wrapper to the qemu-kvm which runs a virtual machine natively on Linux. To run kvm you must enable the vitalisation in the BIOS (Intel vt-x or AMD amd-v). If it is not possible to enable virtualisation, you can still run qemu in emulation mode (but it will be much slower).

To simplify the task of running VM with qemu and setting all the right options, I prepared a Debian image for the course, and a script that you can use to run your virtual machine with Linux. The Debian image and the script are available at /local/debian32.img.

A copy of this image file is available here.

Here are two scripts for launching the virtual machine:

• kvm-std.sh runs the machine with his own kernel (version 3.16.3):

#!/bin/bash
SMP=${2-2} kvm -smp$SMP -m 2048 -boot c \
-hda debian32.img \
-net nic -net user,hostfwd=tcp::10022-:22 \
-serial stdio

• The second script, kvm-term.sh uses a kernel compiled by yourself:

#!/bin/bash
KERNEL=${1-/home/lipari/devel/linux-3.19/build/kvm-32/arch/x86/boot/bzImage} SMP=${2-2}

kvm -smp $SMP -m 2048 -boot c \ -hda debian32.img \ -net nic -net user,hostfwd=tcp::10022-:22 \ -serial stdio \ -kernel$KERNEL -append "root=/dev/sda1 console=ttyS0 rw" \
-display none


The only thing you need to modify is the value of the KERNEL variable: it must point to the location on your hard disk of the compiled kernel you wish to use.

Notice that TCP port 22 (used by ssh) has been "forwarded" on port 4022 of the same machine. Therefore, when you try to connect to 127.0.0.1:4022 on the local machine, you are actually forwarded to the virtual machine.

Now, to run the VM you need to

• copy the script in your home dir, so you can modify it at will;
• launch it from your home.

The VM that I prepared has two users:

• user root with password ase++ and
• user ase with password ase++.

To ssh to a running guest do:

ssh ase@127.0.0.1 -p 4022



### Compiling the kernel sources

As shown in the previous section, you need to compile the kernel by yourself. The first thing you need to do is to download a kernel from https://www.kernel.org/. I suggest version 4.15.13.

Once you have the kernel sources on your file system, you need to compile the kernel. Since it is not so easy to configure the kernel properly for use with the VM, I provide a config file already configured for you. You can take this as a starting point for your configurations.

Save this file in your Linux directory.

To compile the kernel, you can specify the output directory where you want to put your compiled objects for a certain architecture. For example, you could think of a directory structure as follows:

Go inside the Linux directory (linux-4.15.13) and type

make O=../build/kvm32 menuconfig



to start the configuration. Select <Load>, and type the name of the config file (kvm32.config). After the configuration has been loaded correctly, select the options that you would like to enable. For the time being you do not need to change anything. Then, select <Save> and change its name as .config. This is the default name for a configuration file in Linux. Finally, select <Exit>, this will create a Makefile in the build directory, and it will copy the .config file in the right place.

Finally, to compile the kernel, type

make O=../build/kvm32 -j8 bzImage



from within the linux-4.15.13 directory. If everything goes well, you will find file bzImage in directory build/kvm32/arch/x86/boot/. Test the kernel by lanching the virtual machine using the kvm-term.sh command in which you have updated the KERNEL variable with the correct path.

#### Bleeding edge

You can also try to download the latest bleeding edge development version from git repository, and switch to the development branch for the real-time subsystem:

git clone git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git
git checkout -b tip-sched-core origin/sched/core



Configuration and compilation are the same.

### Transferring files to the virtual machine

To transfer files to the virtual machine, there are two methods, a) using SSH to a running virtual machine, or b) mount the virtual machine image using a loop back device.

#### SSH access

(This is the suggested method) For copying files, you can use command scp as follows:

scp -P 4022 <files_to_copy>  ase@127.0.0.1:<destination_path>



#### Loop back device

To mount loop back, first find the sector offset

fdisk -u -l /path/to/debian.img



Then you can compute:

offset = sector size * start offset



Normally that would be 32256 or 1048576. Then you need to mount the loop back device:

mount -o loop,offset=1048576 /path/to/debian.img /path/to/mountpoint



At this point, you can access the debian disk as it were a local device mounted in /path/to/mountpoint.

## Writing a kernel module

A loadable kernel module is a software module that can be dynamically loaded into the kernel space and interact with the kernel. It is, from all points of view, kernel code which runs in kernel space with kernel privileges, and that can access to all kernel symbols. The only difference is that, instead of being statically linked in the kernel image at compilation + linking time, it can be compiled separately and later loaded and linked with the rest of the kernel code.

Usually, kernel modules provide access to device drivers and extra functionality that it is not central to the kernel. They also provide a possibility for programmers to easily customise the kernel features.

### Books and other material

A very good introduction to the art of writing kernel modules and device drivers is found in the following book:

The book is freely available for download, and I strongly recommend it to anybody willing to be introduced to kernel programming. See in particular Chapter 2 for an introduction to module programming.

Here I am going to provide a short tutorial, taken in part from the book and in part from various web sites.

Another interesting resource is the "Crash course on kernel programming", by Robert P.J. Day:

I also recommend the book "Linux Kernel Development", by Robert Love, for a good explanation of the internals of the Linux kernel.

Finally, a very useful reference for searching and exploring the Linux kernel code is the following:

### The makefile

First of all, it is important to understand how to compile a module. Please refer to the following makefile.

# If KERNELRELEASE is defined, we've been invoked from the
# kernel build system and can use its language.
ifneq ($(KERNELRELEASE),) obj-m := hello.o # Otherwise we were called directly from the command # line; invoke the kernel build system. else KERNELDIR ?= <kernel source dir> BUILDDIR ?= <kernel build dir> PWD :=$(shell pwd)
default:
$(MAKE) -C$(KERNELDIR) O=$(BUILDDIR) M=$(PWD) modules
endif


Of course, you need to adjust the KERNELDIR and BUILDDIR variables to point to the root of your kernel sources, and to your build directory, respectively.

From the LDD book:

This makefile is read twice on a typical build. When the makefile is invoked from the command line, it notices that the KERNELRELEASE variable has not been set. It locates the kernel source directory by taking advantage of the fact that the symbolic link build in the installed modules directory points back at the kernel build tree.

If you are not actually running the kernel that you are building for, you can supply a KERNELDIR option on the command line, set the KERNELDIR environment variable, or rewrite the line that sets KERNELDIR in the makefile.

Once the kernel source tree has been found, the makefile invokes the default target, which runs a second make command (parameterized in the makefile as $(MAKE)) to invoke the kernel build system as described previously. On the second reading, the makefile sets obj-m, and the kernel makefiles take care of actually building the module. This mechanism for building modules may strike you as a bit unwieldy and obscure. Once you get used to it, however, you will likely appreciate the capabilities that have been programmed into the kernel build system. Do note that the above is not a complete makefile; a real makefile includes the usual sort of targets for cleaning up unneeded files, installing modules, etc. ### Our first module Now the first module example, which simply prints hello worlds in the log file. #include <linux/init.h> #include <linux/module.h> MODULE_LICENSE("Dual BSD/GPL"); static int hello_init(void) { printk(KERN_ALERT "Hello, world\n"); return 0; } static void hello_exit(void) { printk(KERN_ALERT "Goodbye, cruel world\n"); } module_init(hello_init); module_exit(hello_exit);  The printk function prints a string on various output, depending on the first macro which is the message priority. There are 8 possible priorities, they can be found in kernel.h along with their explanation. If the priority is less than variable console_loglevel the message is also printed on the current terminal. Otherwise it goes in the kernel logs, which can be visualised with command: dmesg  By compiling this file you obtain a hello.ko file, which is an object file ready to be loaded into the kernel. To load the module, type: insmod hello.ko  Loading a module requires superuser privileges, so you may want to run it with sudo, or just as root. To remove the module, you type: rmmod hello  again as superuser. A few things to notice: • You are inside the kernel. Therefore, you cannot use the glibc: no printf() (but you can use printk), no scanf, no fopen, no malloc (but you can use kmalloc) etc. You need to use equivalent functions available in the kernel. • module_init(hello_init) tells the kernel which function to call upon loading the module. This function contains initialisation code for the module environment. • module_exit(hello_exit) tells the kernel which function to call upon removal of the module. • Notice that there is no main(). In fact, modules do not run as normal user processes. Typically, the module contain functions that are called by the kernel upon occurrence of certain events. Therefore, if we want to write something meaningful, • first of all we need to understand what kind of events happen in the kernel, • then, understand which events are useful for us • finally, we need to write and install some function to be called when a certain even happens. This module has no special function except the ones for initialisation and cleanup. ### Symbols Which symbols you can access from your module? Here is a quick explanation: To list all kernel symbols: cat /proc/kallsyms  On each line, the symbol name is preceded by • the address in memory • a character [DdSsTt], with the following meaning: • D or d The symbol is in the initialised data section. • S or s The symbol is in an uninitialised data section for small objects. • T or t The symbol is in the text (code) section. Uppercase symbols are global/exported; lowercase are local unexported symbols. ### Interacting with the module The module runs in kernel space, that is in a different memory space than normal user programs. Also, it cannot use normal glibc functions (such as fprintf() and fscanf()) and other user space libraries. Also, there is no main() function, so module functions are executed only in response to events. So, how we can communicate with the module? There are several possibilities (syscalls, for example). Here we will study the possibility to interact through a "file" in the /proc file system. What follows is an example of module which prints the kernel jiffies (i.e. a counter that is incremented every kernel tick). To activate the feature, you first have to write '1' on the file /proc/crash-jiffies, and you can disable the module by writing '0' on it. This module has been adapted from a similar one in the Crash Course web-site by R. P.J. Day. #include <linux/module.h> #include <linux/kernel.h> #include <linux/proc_fs.h> #include <linux/init.h> #include <linux/seq_file.h> #include <linux/jiffies.h> #include <linux/string.h> #include <asm/uaccess.h> #define JIFFIES_BUFFER_LEN 4 static char jiffies_buffer[JIFFIES_BUFFER_LEN]; static int jiffies_flag = 0; static int jiffies_proc_show(struct seq_file *m, void *v) { if (jiffies_flag) seq_printf(m, "%llu\n", (unsigned long long) get_jiffies_64()); return 0; }  static int jiffies_proc_open(struct inode *inode, struct file *file) { return single_open(file, jiffies_proc_show, NULL); } static ssize_t jiffies_proc_write(struct file *filp, const char __user *buff, size_t len, loff_t *data) { long res; printk(KERN_INFO "JIFFIES: Write has been called"); if (len > (JIFFIES_BUFFER_LEN - 1)) { printk(KERN_INFO "JIFFIES: error, input too long"); return -EINVAL; } else if (copy_from_user(jiffies_buffer, buff, len)) { return -2; } jiffies_buffer[len] = 0; kstrtol(jiffies_buffer, 0, &res); jiffies_flag = res; return len; } static const struct file_operations jiffies_proc_fops = { .owner = THIS_MODULE, .open = jiffies_proc_open, .read = seq_read, .write = jiffies_proc_write, .llseek = seq_lseek, .release = single_release, };  \vfill static int __init jiffies_proc_init(void) { proc_create("crash_jiffies", 0666, NULL, &jiffies_proc_fops); return 0; } static void __exit jiffies_proc_exit(void) { remove_proc_entry("crash_jiffies", NULL); } module_init(jiffies_proc_init); module_exit(jiffies_proc_exit); MODULE_AUTHOR("Modified by Giuseppe Lipari from Robert P. J. Day, http://crashcourse.ca"); MODULE_LICENSE("GPL"); MODULE_DESCRIPTION("A jiffies /proc file.");  A few comments: • The init function jiffies_proc_init just creates an entry in the /proc directory, with name crash_jiffies. This will be seen as a file from the user • When creating the entry, we specify a =file_operations) data structure, which contains a set of pointers to function. Each function is a callback that is going to be called when the user performs some operation on the proc file. For example is the use tries to open the file, the jiffies_proc_open function is called. Notice in particular the seq_read and seq_lsee (which are functions that already exist in the Linux kernel) and the jiffies_proc_write (which is a function provided by the module). You do not need to specify all possible function pointers in the data structure. • The jiffies_proc_open calls another function that opens a "sequence file structure" and installs another function called jiffies_proc_show which finally provides the content of the file when needed. • The jiffies_proc_show just prints on the sequence structure with a seq_printf() the current number of jiffies (a 64 bit integer). • For more information on sequence files and why they are useful, please refer to this web page. • When the user writes to the proc file, function jiffies_proc_write is called, which copies the data from user space to the internal buffer jiffies_buffer by using the copy_from_user function. WARNING: pay attention to the length of the data to be copied. It is absolutely necessary to not exceed the buffer length, otherwise the kernel will crash unpredictably! • Then the buffer content is terminated with 0, and transformed into a long with kstrtol. • Notice that the jiffies_proc_show prints on the file only if the flag is set to 1. • After compiling and installing the module, you can test it with cat /proc/crash_jiffies echo '1' > /proc/crash_jiffies cat /proc/crash_jiffies echo '0' > /proc/crash_jiffies cat /proc/crash_jiffies  You should see nothing on the first cat, the jiffies on the second one, and none again on the last one. ## Other useful information ## The project The goal of this project is to write a kernel module which monitors the execution time of a task. Here are the requirements: 1. The module initially installs file /proc/ase_cmd in the proc file system, and a directory /proc/ase/ which will contain the output of the module. 2. Then it waits for the user to write a valid process ID (PID) on the file (for example with echo). From then on, it starts to track the process execution time on an internal data structure. It also creates a file with name equal to the process ID in the directory /proc/ase/. If the process ID is not valid, the module does nothing. 3. When the user reads from this file, it outputs there the total execution time of the process until now, and the index of the CPU on which the task is currently running. 4. When the process terminates, the module must understand this and close the corresponding file in the /proc/ase directory. 5. Add a command in /proc/ase_cmd to stop tracking a certain process. For example, by writing a negative number as follows: echo "-123" > /proc/ase_cmd  the module will stop tracking process 123. In this case, even if the process has not yet terminated, the corresponding data structures are destroyed, and the corresponding memory is freed. In the following I give some hint on how to realize the project. ### Task descriptor A task (a process or a thread) in Linux is described by a data structure, the struct task_struct. This structure contains all information about a specific task, and it is also called Task Descriptor. The task_struct is a relatively large data structure, at around 1.7 kilobytes on a 32-bit machine. It contains many different fields, for example (from include/linux/sched.h): struct task_struct { ...; volatile long state; const struct sched_class *sched_class; int exit_state; int exit_code; int exit_signal; ...; pid_t pid; ...; }  Inside the kernel, tasks are typically referenced directly by a pointer to their task_struct structure. In fact, most kernel code that deals with processes works directly with struct task_struct. Consequently, it is very useful to be able to quickly look up the process descriptor of the currently executing task, which is done via the current macro. You can lookup a task struct by its pid by using the following: extern struct task_struct *find_task_by_vpid(pid_t nr);  However, this function may not be available in a module (the symbol is not exported). Therefore, you have to find the task by using the following: struct task_struct *mytask = NULL; struct pid *pid_struct = NULL; pid_struct = find_get_pid((int) pid); my_task = pid_task(pid_struct, PIDTYPE_PID);  (remember to add error control!) ### Data structures You have to declare a structure where the module will store all the information related to a project that the module is tracking. In a first step you could declare a static array of such structures to contain a limited amount of processes (for example, up to 4). This means that the module will be able to track only up to 4 processes. However, this will also simplify the development, because you can avoid to allocate/deadllocate dynamic memory. Once everything works fine, you will add dynamic allocation to remove the limitation on the number of processes. To dynamic allocate and free memory in the kernel, you can use kmalloc() and kfree(), similar to the malloc and free of the c standard library. The final project must deal with dynamic allocated data structures. ### Task termination Since we cannot use KProbes, there is no way to inform the module when a task terminates. This means that, when one of the task that the module is tracking may have terminated in the meanwhile. Therefore, before doing any operation on a task_struct the module must check if the task is still alive or not. If not, the module has to free the data structure. This happens when we are asked to list the content of the directory /proc/ase/ and when we read the content of one of the files in /proc/ase/. In the latter case, the module has to return an error of "file does not exist". For example, suppose we are tracking process with pid=1234: echo "1234" > /proc/ase_cmd  Then, after process 1234 terminates, we try to do : $ cat /proc/ase/1234
cat: /proc/ase/1234: Aucun fichier ou dossier de ce type


To create a kernel thread, it is necessary to call the kthread_create() function (see here). The kthread_create is somehow similar to the pthread_create, except that is not executed immediately. To activate a kernel thread, you have to invoke the wake_up_process() (see here).