Original Date Published: March 8, 2022

62270d310fc0e771bbf3ea30_main_iou_imagemain_iouring.png

This blog posts covers io_uring, a new Linux kernel system call interface, and how I exploited it for local privilege escalation (LPE)

A breakdown of the topics and questions discussed:

  • What is io_uring? Why is it used?
  • What is it used for?
  • How does it work?  
  • How do I use it?
  • Discovering an 0-day to exploit, CVE-2021-41073 [13].
  • Turning a type confusion vulnerability into memory corruption
  • Linux kernel memory fundamentals and tracking.
  • Exploring the io_uring codebase for tools to construct exploit primitives.
  • Creating new Linux kernel exploitation techniques and modifying existing ones.
  • Finding target objects in the Linux kernel for exploit primitives.
  • Mitigations and considerations to make exploitation harder in the future.

Like my last post, I had no knowledge of io_uring when starting this project. This blog post will document the journey of tackling an unfamiliar part of the Linux kernel and ending up with a working exploit. My hope is that it will be useful to those interested in binary exploitation or kernel hacking and demystify the process. I also break down the different challenges I faced as an exploit developer and evaluate the practical effect of current exploit mitigations.

io_uring: What is it?

Put simply, io_uring is a system call interface for Linux. It was first introduced in upstream Linux Kernel version 5.1 in 2019 [1]. It enables an application to initiate system calls that can be performed asynchronously. Initially, io_uring just supported simple I/O system calls like read() and write(), but support for more is continually growing, and rapidly. It may eventually have support for most system calls [5].

Why is it Used?

The motivation behind io_uring is performance. Although it is still relatively new, its performance has improved quickly over time. Just last month, the creator and lead developer Jens Axboe boasted 13M per-core peak IOPS [2]. There are a few key design elements of io_uring that reduce overhead and boost performance.

With io_uring system calls can be completed asynchronously. This means an application thread does not have to block while waiting for the kernel to complete the system call. It can simply submit a request for a system call and retrieve the results later; no time is wasted by blocking.

Additionally, batches of system call requests can be submitted all at once. A task that would normally requires multiple system calls can be reduced down to just 1. There is even a new feature that can reduce the number of system calls down to zero [7]. This vastly reduces the number of context switches from user space to kernel and back. Each context switch adds overhead, so reducing them has performance gains.

In io_uring a bulk of the communication between user space application and kernel is done via shared buffers. This reduces a large amount of overhead when performing system calls that transfer data between kernel and userspace. For this reason, io_uring can be a zero-copy system [4].

There is also a feature for “fixed” files that can improve performance. Before a read or write operation can occur with a file descriptor, the kernel must take a reference to the file. Because the file reference occurs atomically, this causes overhead [6]. With a fixed file, this reference is held open, eliminating the need to take the reference for every operation.

The overhead of blocking, context switches, or copying bytes may not be noticeable for most cases, but in high performance applications it can start to matter [8]. It is also worth noting that system call performance has regressed after workaround patches for Spectre and Meltdown, so reducing system calls can be an important optimization [9].

What is it Used for?

As noted above, high performance applications can benefit from using io_uring. It can be particularly useful for applications that are server/backend related, where a significant proportion of the application time is spent waiting on I/O.

How Do I Use it?

Initially, I intended to use io_uring by making io_uring system calls directly (similar to what I did for eBPF). This is a pretty arduous endeavor, as io_uring is complex and the user space application is responsible for a lot of the work to get it to function properly. Instead, I did what a real developer would do if they wanted their application to make use of io_uring - use liburing.

liburing is the user space library that provides a simplified API to interface with the io_uring kernel component  [10]. It is developed and maintained by the lead developer of io_uring, so it is updated as things change on the kernel side.

Put an io_uring on it - Exploiting the Linux Kernel
Interactive graph