Friday, August 28, 2015

Linux Kernel Development I - "Hello, World!" char driver

So here is my first post about Linux kernel module development!

I published this article in my old blog before. We will write a "Hello, world!" char device driver. I also released the code long time ago [source].

Here is the reason why I start with this article: When you google printk you get so many results, tutorials, stackoverflow.com questions, .doc files, .pdf files, stuff about formatting and console log levels etc. But when you search for how to write a simple working driver (I mean a driver which does something ''besides'' printing text to dmesg) you get only a handful of helpful results, some of which are fairly out-dated.

What I want to give you is a working char device driver with minimal kernel bureaucracy. You will still need to read a good book on drivers and write lots of code later. But you will get your own working little kernel code. As they say, "Seeing is believing" and I think there is a good number of people who put off just because they don't see their code do something meaningful (or study a working module/driver source but can't see any piece code which makes sense :). I aim to help people who are curious about kernel development but lose interest thinking it is too much stuff to learn or too much code to write even for the simplest working driver. So if you are stuck while reading a book or taking a course on developing device drivers for Linux you probably came to the right place.

To follow this tutorial you will need to be able to code in C and obviously you need to be able to compile kernels and modules. You also need to know basic system calls about files, like opening and writing into a file. It is better if you have an idea or some theory on what happens in the background when you copy a file from one disk to another. And most importantly you have to realize the fact that you are not writing some independent and separate code, you are adding a tiny piece of code into a very big, versatile and complex software (which is called the operating system kernel) at run-time.

The module we will study doesn't do much. It lacks so many functionalities it would commit suicide if it had a conscience. If kernel had a conscience it would shoot the module right in the head as soon as it loaded to end its misery. This is absolutely not a correct way to write modules. But we just want to see a module doing some module stuff with minimal code, so it's OK I guess. Just a reminder, don't run this module on a production machine. Ideally one should develop and test modules in a virtual environment of some sort (as I have explained in test VM series previously in this blog).

So let's start with some theory. I will jump over the questions "What is an OS kernel?" and "What is a device driver?". Drivers in Linux kernel can be compiled inside the kernel binary or can be compiled as a module. Modules are binary pieces of code added in run-time to the kernel. You have to note that a module is compiled for a specific kernel build. If you change some options and recompile the kernel, afterwards you have to recompile all the modules too.

Usually a device driver is responsible for three things:
* Holding/allocating/managing device specific info and data
* Answering system calls (which allows OS to interact with hardware)
* Handling hardware interrupts (which allows hardware to interact with OS)

However there are also drivers for devices which exist only in kernel memory like a virtual network tunnel device or a tty.

And there are intermediary drivers which take a system call from user-space and tell another driver to do something (usually in some sort of hierarchy). For example filesystem drivers decide where a data will be written to or read from on a block device and then tells the block device driver to do the job.

Which means there are some other drivers which don't take orders from user-space but from fellow drivers or other kernel mechanisms.

We will be writing a module for a char device of 2nd type above: a device which exists only in kernel memory. It's only responsibilities will be creating/removing a virtual device in kernel memory and answering some basic system calls.

So what exactly is a char device? Well, I don't know if there is an exact definition. But many OS functions and peripheral devices are implemented as or at least can be manipulated by char devices. For example ttys, sound sub-system, sound cards, frame-buffers, printers, hids(keyboard and mouse). They differ from block devices in that block devices read and write in predefined block-sizes and by using caches/queues whereas char devices can read and write any amount of data any time. There are other differences both in functionality and code implementation. And network drivers are different from both. You should read a book for more precise information.

Char devices are interacted with "file" system calls. You may have heard that everything is represented with files under Unix. Well that's not true when it comes to networking and some other areas but for most things it is true. And char devices are no exception. Our module will be accessed by using system calls like open(), read(), write() from the user-space. We will create a node to our device in the filesystem. And when a user-space program opens that device file (our "node") it will be able to communicate with our module!

I learned writing kernel modules by reading the famous (and free) LDD3 book. They begin with a char driver named "scull" in the book. It does everything formally and supports all basic file operations which can be applied to a char driver. As if that's not enough it also includes a somewhat complicated data structure for storing data in kernel memory.

My driver "scully" however just registers itself to the kernel with major number 100 (if it is free) and gives the hard-coded char string inside to anyone who attempts to read it. It looks like it supports writing too but it is so lazy, it just tells user-space it wrote whatever they sent without doing any actual work.

So let's analyse the source code from bottom to top.

module_init() and module_exit() macros define the functions which run before the module is loaded into the kernel and before it is removed respectively. Basically the exit function frees any resources taken by the module and rolls back any changes the module has made. And you may think that the init function is a parallel to int main(int argc, char **argv) function at first. However that's quite wrong. It is more like the propeller fuel compartment in the space shuttles which is used to create enough acceleration to escape Earth's gravity and discarded as soon as the fuel inside is depleted. The init function should exit right after all the necessary data structures are set and registered. Then it is removed from kernel memory forever. Any local variables stored in its stack are discarded into oblivion.

You see, kernel modules are not procedural programs actually. They are a composition of functions and data structures which are registered into the kernel in order to fulfill some pre-defined operations. But more on that later, let's see what init and exit actually do.

There are two global (and static) variables which are used in these functions.

One is dev_t mydevnum which holds the device numbers. The device numbers are used to keep track of devices inside kernel. They are made of two numbers, namely a major and a minor number. You can type ls -l /dev/ and see them in device files in the file size column. If you want to register or create a new char device you need a major and minor number which is not being used by another device. It is just an index to the devices really.

The other one is struct cdev mydev which is the char device representation kernel gives us. We will not touch it.

We access the dev_t struct with macros instead of writing something like "dev_t mydevnum.major = 100;". In my code I wrote mydevnum = MKDEV(100,0); which stores major number 100 and minor number 0 inside the struct. After that you can read major number with MAJOR(mydevnum) and minor number with MINOR(mydevnum).

The init function scully_init() begins with the above line and then calls register_chrdev_region(mydevnum, 1, "scully");. This function registers a name to a device number. This is enough to see our driver enlisted in "character devices" section of /proc/devices with name "scully" and major number 100. count argument (which is 1 in our case) is the number of device numbers we want to reserve. In our case we just reserve the device number (100,0). We can reserve (100,1) to (100,n) too if we increase the count.

One should of course check the return value to see if we actually succeeded in registering the device numbers. And you should unregister the numbers with unregister_chrdev_region() when the module is unloaded (or if it fails to load after register_chrdev_region() function) so that other devices can use the numbers later.

Since it is not possible to know which major and minor numbers are free beforehand, one can use the function alloc_chrdev_region() which finds a free major number and registers it. I didn't use it for sake of simplicity. If major number 100 is taken when you try to load the module just change it in the code and recompile.

After registering some numbers in the kernel we actually define our device with cdev_init(&mydev, &f_ops);. One thing you should know about drivers is a driver manages multiple devices of same kind (or at least similar kinds). For example e1000e driver provides code for managing Intel gigabit ethernet cards. It is loaded into the kernel memory once and after that it manages all the ethernet cards on the computer which it supports. Whether you have a Gbit port on your laptop's Intel mainboard or two 4-port ethernet cards on your server (which means 8 devices), all the management is done with same functions in the code. That is one tricky part of writing drivers, you never know how many devices your code will manage. So with cdev_init() we only initialize a general data structure, a template, for all the devices we are going to use. I will get to the f_ops struct later, but for now I can say that it contains the interface which allows user-space to communicate with our driver.

Now that our cdev struct is initialized and associated with the necessary file operations we can finally create devices in memory. But just before that there is another necessary line of code mydev.owner = THIS_MODULE;. We just told the kernel this struct is related to our module (namely scully.ko when compiled).

Finally with cdev_add(&mydev, mydevnum, 1); we create/allocate a char device in memory. Similar to register functions it creates "count" many devices starting with number mydevnum. Again we only create one device with number (100,0).

If this function fails we free the numbers we registered and return a non-zero integer. That means the module could not be loaded and insmod prints the relevant error.

If it succeeds we just return 0 and say goodbye to the init function.

Exit function just does everything in reverse order. It deletes the device we created and then unregisters/frees the device numbers.

When we move up, we see the struct file_operations structure. It contains function pointers used as hooks to place our own driver specific methods. You can see the whole list in "include/linux/fs.h" under kernel source directory. Intuitively they each correspond to the system calls defined in the standard C library with a few differences. We also set the owner attribute to "THIS_MODULE" to mark that this file structure belongs to our binary (scully.ko).

If we don't appoint a specific function to a system call then either the system call always succeeds or kernel may appoint another function for that syscall or kernel may return an error when a syscall is made. For example if we have a *read function but we don't appoint an *aio_read for asynchronous read then kernel automatically use blocking *read instead when an async read syscall is made. If *mmap is undefined then mmap() syscalls return ENODEV. If *open is not specified then open() syscalls always succeed but we are not notified (but our module can still handle read() and write() calls after the device file is opened).

We will use four of these functions but you need to read a book to learn how each "file operation" works and what is expected from them. Let's start with read() and write() calls.

Their return value is ssize_t and as you can guess it is the number of bytes read or written. The function arguments need some explanation.

struct file *fptr contains information about how our device file is accessed and which device file is accessed if there is more than one (with fptr->f_dentry->d_inode). We don't use it but it is crucial for normal drivers. Again you'd better read a book to understand various data it carries. It also has a pointer named "private_data" which can be used to hold useful information for the device as long as the device file remains open. For example the "scull" driver in LDD3 use it to store a pointer to the cdev struct the opened device file corresponds to.

char __user *uspptr is another interesting function argument. It represents a portal to the user-space!

Specks of warm sunlight coming through it shed some light on the dark denizens of kernel space. You can almost catch a glimpse of a tree or some fluffy clouds. You can use it to yell for help. "HELP ME! I'm trapped inside this cold, damp, lightless dungeon they call kernel! Help me, get me out of here!"

Or you can use it to send your dark demon-spawn minions onto the user-space and start an invasion that will mark the beginning of an age of relentless and gruesome wars which will decisively make you the supreme ruler of all user-space realms. You will teach what true power means to those puny and weak processes who lived in peace and harmony under the ward of the scheduler and the virtual memory manager. You shall bring the init process (the one with pid == 1) down to its knees and your reign of terror and tyranny will be eternal. (well it is kind of late here and I guess I really need some sleep.)

Better yet, you can use it to supply/take whatever information is requested/offered by the system call which introduced the user-space pointer. To do that we use copy_to_user() to send data to user-space and copy_from_user() to take data from user-space.

Well, normally accessing user-space is a delicate matter. There are many things you should be aware of. However since we don't delve into the details here, I will just say that these functions work like memcpy(), copying one buffer to another (only those buffers probably won't be even in the same memory mapping). Again, read a book to learn how interacting with user-space works and things you should be aware of.

The remaining two arguments are how many bytes we are going to take or send and an index to where reading/writing will occur (which we don't care).

So we copy the portion of our static "Hello, world!" string of requested length to user-space with copy_to_user() in read function.

In write() function we don't do anything but return the byte count write() system call requested us to write just to look like we are working.

Let's wrap this tutorial up by talking about the remaining two functions, open and release.

They have two arguments. One is of type struct inode and the other is struct file we mentioned above. The inode struct contains information about the device file or more precisely it ''represents'' the device file. For detailed information you should again consult other sources but here's the difference between struct file and struct inode: struct file is more like the file descriptors you use with system calls. The file must be opened to have a struct file and it contains info about the process which opened the file. struct inode on the other hand represents the file, just like an inode in the filesystem and it is unique. struct file also contains a pointer to the file's inode.

The open function is called when our device file is opened by a process for the first time in the user-space. And release function is called when the file is closed. And by "closed" I mean when all of the open file descriptors are closed, ie. the process is done with the file. It is not called each time close() is called by the process. For example a file descriptor may have multiple copies in multiple threads, release is called when all threads close the file.

We could have added an llseek() function too. It is used to change the file position without actually reading or writing stuff. But we don't utilize a file position in the first place.

There is one other function hook, namely ioctl for the ioctl() syscall, which is pretty useful. All we can do is I/O with read() and write() syscalls. ioctl() on the other hand is used to modify the behaviour of the hardware generally. Well, I can't think of a good example but let's say you are writing a driver for an LCD display which prints text and supports multiple fonts and display sizes. You can write the driver so that you write stuff on the LCD screen with write() and you can use ioctl() to change the font and display size. You guessed it right, reading a book on device drivers is a good start to understand how ioctl works.

Finally the MODULE_LICENSE("GPL"); macro is a statement that your code is compliant with GPL license and therefore is able to use other GPL licensed parts of the Linux kernel. It also affects the letter that shows up when you taint (brake) the kernel. It shows G for GPL'd modules and P for proprietary modules. There might be other things I don't know about it too, just make your own research if you are curious.

I didn't wrote anything about printk because as I said in the beginning there are lots of sources on that. I used KERN_ALERT log level so that everything shows up in the dmesg regardless of your kernel config.

So we have finished the code for our module. But we are not finished yet. We need a device file to be able access our module from user-space. To do that you can use the script makedevfile I provide with the source. It is a simple script which creates a device file under /dev/ named scully. You can of course do it yourself by using mknod program. You need to give 4 parameters, the name of the device file, device type (c for char device, b for block device), major number of device, minor number of device. You may want to change the access permissions after that. makedevfile script just gives rw to everyone which is probably not suitable for most real devices.

To sum up: when a user-space process needs to interact with a char device in the kernel space, it needs a device file with major and minor numbers set to the device's numbers. Then it can send and receive stuff by means of system calls used for regular files. The char device drivers give and take stuff from user-space processes by using copy_to_user() and copy_from_user() functions with the user-space pointer the system call provides.

Finally we can actually get that "Hello, World!" phrase from our module. Just typing "cat /dev/scully" will print the string inside the module indefinitely. You can use "dd if=/dev/scully of=testoutput bs=13 count=1" to copy the portion of it we are interested in (you need to "cat testoutput" afterwards). Using "dd if=/dev/scully of=feelings bs=4 count=13" will show how our module feels about its life in general.

I have a few things to say about compiling. If you change the link kerneldir which comes with the source to your kernel source directory and write the absolute path to the module source in farmake.script you should be able compile just running the farmake.script. The kernel in the source directory must be compiled at least once though. If you don't know how to compile a kernel then you need to learn how to do it before attempting to write modules.

Well, this tutorial took a lot longer than I thought. It lacks lots of topics because I wanted to keep it short. I hope it helps anyone stuck trying to learn kernel programming. Please drop a comment if helps you. Just typing something like "thanks" is enough. If you think there is something wrong or misleading write it in the comments too (probably there are many).

Remember, we only wrote a pathetic little module which doesn't fulfill most of its responsibilities and lacks a lot of functionality. One needs to learn lots of topics to write valid modules: like concurrency issues, interrupt handling, memory management etc. However this module is a big and important step for a kernel newbie I think.

2 comments: