Data Structures in the Linux Kernel

Doubly linked list

Linux kernel provides its own implementation of doubly linked list, which you can find in the include/linux/list.h. We will start Data Structures in the Linux kernel from the doubly linked list data structure. Why? Because it is very popular in the kernel, just try to search

First of all, let’s look on the main structure in the include/linux/types.h:

  1. struct list_head {
  2. struct list_head *next, *prev;
  3. };

You can note that it is different from many implementations of doubly linked list which you have seen. For example, this doubly linked list structure from the glib library looks like :

  1. struct GList {
  2. gpointer data;
  3. GList *next;
  4. GList *prev;
  5. };

Usually a linked list structure contains a pointer to the item. The implementation of linked list in Linux kernel does not. So the main question is - where does the list store the data?. The actual implementation of linked list in the kernel is - Intrusive list. An intrusive linked list does not contain data in its nodes - A node just contains pointers to the next and previous node and list nodes part of the data that are added to the list. This makes the data structure generic, so it does not care about entry data type anymore.

For example:

  1. struct nmi_desc {
  2. spinlock_t lock;
  3. struct list_head head;
  4. };

Let’s look at some examples to understand how list_head is used in the kernel. As I already wrote about, there are many, really many different places where lists are used in the kernel. Let’s look for an example in miscellaneous character drivers. Misc character drivers API from the drivers/char/misc.c is used for writing small drivers for handling simple hardware or virtual devices. Those drivers share same major number:

  1. #define MISC_MAJOR 10

but have their own minor number. For example you can see it with:

  1. ls -l /dev | grep 10
  2. crw------- 1 root root 10, 235 Mar 21 12:01 autofs
  3. drwxr-xr-x 10 root root 200 Mar 21 12:01 cpu
  4. crw------- 1 root root 10, 62 Mar 21 12:01 cpu_dma_latency
  5. crw------- 1 root root 10, 203 Mar 21 12:01 cuse
  6. drwxr-xr-x 2 root root 100 Mar 21 12:01 dri
  7. crw-rw-rw- 1 root root 10, 229 Mar 21 12:01 fuse
  8. crw------- 1 root root 10, 228 Mar 21 12:01 hpet
  9. crw------- 1 root root 10, 183 Mar 21 12:01 hwrng
  10. crw-rw----+ 1 root kvm 10, 232 Mar 21 12:01 kvm
  11. crw-rw---- 1 root disk 10, 237 Mar 21 12:01 loop-control
  12. crw------- 1 root root 10, 227 Mar 21 12:01 mcelog
  13. crw------- 1 root root 10, 59 Mar 21 12:01 memory_bandwidth
  14. crw------- 1 root root 10, 61 Mar 21 12:01 network_latency
  15. crw------- 1 root root 10, 60 Mar 21 12:01 network_throughput
  16. crw-r----- 1 root kmem 10, 144 Mar 21 12:01 nvram
  17. brw-rw---- 1 root disk 1, 10 Mar 21 12:01 ram10
  18. crw--w---- 1 root tty 4, 10 Mar 21 12:01 tty10
  19. crw-rw---- 1 root dialout 4, 74 Mar 21 12:01 ttyS10
  20. crw------- 1 root root 10, 63 Mar 21 12:01 vga_arbiter
  21. crw------- 1 root root 10, 137 Mar 21 12:01 vhci

Now let’s have a close look at how lists are used in the misc device drivers. First of all, let’s look on miscdevice structure:

  1. struct miscdevice
  2. {
  3. int minor;
  4. const char *name;
  5. const struct file_operations *fops;
  6. struct list_head list;
  7. struct device *parent;
  8. struct device *this_device;
  9. const char *nodename;
  10. mode_t mode;
  11. };

We can see the fourth field in the miscdevice structure - list which is a list of registered devices. In the beginning of the source code file we can see the definition of misc_list:

  1. static LIST_HEAD(misc_list);

which expands to the definition of variables with list_head type:

  1. #define LIST_HEAD(name) \
  2. struct list_head name = LIST_HEAD_INIT(name)

and initializes it with the LIST_HEAD_INIT macro, which sets previous and next entries with the address of variable - name:

  1. #define LIST_HEAD_INIT(name) { &(name), &(name) }

Now let’s look on the misc_register function which registers a miscellaneous device. At the start it initializes miscdevice->list with the INIT_LIST_HEAD function:

  1. INIT_LIST_HEAD(&misc->list);

which does the same as the LIST_HEAD_INIT macro:

  1. static inline void INIT_LIST_HEAD(struct list_head *list)
  2. {
  3. list->next = list;
  4. list->prev = list;
  5. }

In the next step after a device is created by the device_create function, we add it to the miscellaneous devices list with:

  1. list_add(&misc->list, &misc_list);

Kernel list.h provides this API for the addition of a new entry to the list. Let’s look at its implementation:

  1. static inline void list_add(struct list_head *new, struct list_head *head)
  2. {
  3. __list_add(new, head, head->next);
  4. }

It just calls internal function __list_add with the 3 given parameters:

  • new - new entry.
  • head - list head after which the new item will be inserted.
  • head->next - next item after list head.

Implementation of the __list_add is pretty simple:

  1. static inline void __list_add(struct list_head *new,
  2. struct list_head *prev,
  3. struct list_head *next)
  4. {
  5. next->prev = new;
  6. new->next = next;
  7. new->prev = prev;
  8. prev->next = new;
  9. }

Here we add a new item between prev and next. So misc list which we defined at the start with the LIST_HEAD_INIT macro will contain previous and next pointers to the miscdevice->list.

There is still one question: how to get list’s entry. There is a special macro:

  1. #define list_entry(ptr, type, member) \
  2. container_of(ptr, type, member)

which gets three parameters:

  • ptr - the structure list_head pointer;
  • type - structure type;
  • member - the name of the list_head within the structure;

For example:

  1. const struct miscdevice *p = list_entry(v, struct miscdevice, list)

After this we can access to any miscdevice field with p->minor or p->name and etc… Let’s look on the list_entry implementation:

  1. #define list_entry(ptr, type, member) \
  2. container_of(ptr, type, member)

As we can see it just calls container_of macro with the same arguments. At first sight, the container_of looks strange:

  1. #define container_of(ptr, type, member) ({ \
  2. const typeof( ((type *)0)->member ) *__mptr = (ptr); \
  3. (type *)( (char *)__mptr - offsetof(type,member) );})

First of all you can note that it consists of two expressions in curly brackets. The compiler will evaluate the whole block in the curly braces and use the value of the last expression.

For example:

  1. #include <stdio.h>
  2. int main() {
  3. int i = 0;
  4. printf("i = %d\n", ({++i; ++i;}));
  5. return 0;
  6. }

will print 2.

The next point is typeof, it’s simple. As you can understand from its name, it just returns the type of the given variable. When I first saw the implementation of the container_of macro, the strangest thing I found was the zero in the ((type *)0) expression. Actually this pointer magic calculates the offset of the given field from the address of the structure, but as we have 0 here, it will be just a zero offset along with the field width. Let’s look at a simple example:

  1. #include <stdio.h>
  2. struct s {
  3. int field1;
  4. char field2;
  5. char field3;
  6. };
  7. int main() {
  8. printf("%p\n", &((struct s*)0)->field3);
  9. return 0;
  10. }

will print 0x5.

The next offsetof macro calculates offset from the beginning of the structure to the given structure’s field. Its implementation is very similar to the previous code:

  1. #define offsetof(TYPE, MEMBER) ((size_t) &((TYPE *)0)->MEMBER)

Let’s summarize all about container_of macro. The container_of macro returns the address of the structure by the given address of the structure’s field with list_head type, the name of the structure field with list_head type and type of the container structure. At the first line this macro declares the __mptr pointer which points to the field of the structure that ptr points to and assigns ptr to it. Now ptr and __mptr point to the same address. Technically we don’t need this line but it’s useful for type checking. The first line ensures that the given structure (type parameter) has a member called member. In the second line it calculates offset of the field from the structure with the offsetof macro and subtracts it from the structure address. That’s all.

Of course list_add and list_entry is not the only functions which <linux/list.h> provides. Implementation of the doubly linked list provides the following API:

  • list_add
  • list_add_tail
  • list_del
  • list_replace
  • list_move
  • list_is_last
  • list_empty
  • list_cut_position
  • list_splice
  • list_for_each
  • list_for_each_entry

and many more.