This is the code repository of "Userspace Bypass: Accelerating Syscall-intensive Applications".
License: GPL
Author: Zhe Zhou
- Software configuration
-
Ubuntu 20.04.2 with Kernel version 5.4.44
-
Python 3.8 & module: miasm v0.1.3
-
gcc 9.4.0
-
(optional) Qemu 4.2.1(Debian 1:4.2-3ubuntu6.24) with KVM modules
- Use for virtual machine evaluation
-
Redis 6.2.6
-
Nginx 1.20.0
-
- Hardware configuration
- Server machine: Intel Xeon Platinum 8175*2, 192G memory, Samsung 980 pro NVMe SSD, and Mellanox Connectx-3 NIC.
- Client machine: Intel Xeon Platinum 8260, 128G memory, and Mellanox Connectx-5 NIC.
- This is the hardware platform we use, not mandatory.
- Change the kernel version to 5.4.44 and modify it. (Or just replace this three files from the /source_codes/kernel_modify)
- Kernel 5.4.44 can be downloaded here.
- Patch the kernel using patch file in
source_codes/kernel_modify/linux-5.4.44.patch.- Move the patch file into root directory of linux-5.4.44.
patch -p1 < linux-5.4.44.patchto patch the kernel.
If patching the kernel using the patch file, the next three steps on modify the kernel can be skipped.
- Modify codes in "linux-5.4.44/arch/x86/entry/common.c" like this:
// Add this two line before do_syscall_64() function: void(*zz_var)(struct pt_regs *, unsigned long ts); EXPORT_SYMBOL(zz_var); // Change do_syscall_64() function as below: __visible void do_syscall_64(unsigned long nr, struct pt_regs *regs) { struct thread_info *ti; unsigned long ts = ktime_get_boottime_ns(); enter_from_user_mode(); local_irq_enable(); ti = current_thread_info(); if (READ_ONCE(ti->flags) & _TIF_WORK_SYSCALL_ENTRY) nr = syscall_trace_enter(regs); if (likely(nr < NR_syscalls)) { nr = array_index_nospec(nr, NR_syscalls); regs->ax = sys_call_table[nr](regs); #ifdef CONFIG_X86_X32_ABI } else if (likely((nr & __X32_SYSCALL_BIT) && (nr & ~__X32_SYSCALL_BIT) < X32_NR_syscalls)) { nr = array_index_nospec(nr & ~__X32_SYSCALL_BIT, X32_NR_syscalls); regs->ax = x32_sys_call_table[nr](regs); #endif } if(zz_var != NULL) (*zz_var)(regs, ts); syscall_return_slowpath(regs); }
- Modify codes in "linux-5.4.44/arch/x86/mm/fault.c" like this:
// add in the beginning of no_context() int (*UB_fault_address_space)(unsigned long, struct task_struct *, unsigned long);
// add just ahead of "#ifdef CONFIG_VMAP_STACK" UB_fault_address_space = (void*) kallsyms_lookup_name("UB_fault_address_space"); if(UB_fault_address_space){ int ret = UB_fault_address_space(address, tsk, regs->r13); /* * ret = 1 means UB_fault_address_space() * determins that this fault is caused by UB, * (in UDS SFI calling, R13 will be the Base address) * so we will handle that; */ if(ret==1){ /* * Return an error to UB; * firstly we lookup and call UB_SFI_error_handler() * it will return a fix_up function in the context */ unsigned long (*UB_SFI_error_handler)(int); unsigned long UB_error_return; UB_SFI_error_handler = (void*) kallsyms_lookup_name("UB_SFI_error_handler"); if(UB_SFI_error_handler){ UB_error_return = UB_SFI_error_handler(-0x200); // -0x200 means address access error; regs->ip = UB_error_return; return; } } }
- Modify codes in "linux-5.4.44/arch/x86/mm/pageattr.c" after function set_memory_x() like this:
int set_memory_x(unsigned long addr, int numpages) { if (!(__supported_pte_mask & _PAGE_NX)) return 0; return change_page_attr_clear(&addr, numpages, __pgprot(_PAGE_NX), 0); } // add this line: EXPORT_SYMBOL(set_memory_x);
- Then compile the kernel.
This is a short tutorial(steps 1-5) about how to compile linux kernel. (Tips: you can use multi-threads to compile the kernel to save time. In step 5:
make -j xx, 'xx' on behalf of the threads you want for compiling. Or after step 4, use the script insource_codes/scripts/compile_kernel/to compile the kernel. The script needs to be moved inlinux-5.4.44/directory.) A.configfile insource_codes/kernel_modifyis our config file when compile the kernel. Just use the default ubuntu 20.04.2 kernel compilation option is OK, this file is for reference only. - Modify the grub to start with the new kernel.
grep menuentry /boot/grub/grub.cfgcheck the option of the new kernel, like this:
if [ x"${feature_menuentry_id}" = xy ]; then menuentry_id_option="--id" menuentry_id_option="" export menuentry_id_option menuentry 'Ubuntu' --class ubuntu --class gnu-linux --class gnu --class os $menuentry_id_option 'gnulinux-simple-3ce46e7e-eb73-4980-b6da-c03947b8e717' { submenu 'Advanced options for Ubuntu' $menuentry_id_option 'gnulinux-advanced-3ce46e7e-eb73-4980-b6da-c03947b8e717' { menuentry 'Ubuntu, with Linux 5.15.0-69-generic' --class ubuntu --class gnu-linux --class gnu --class os $menuentry_id_option 'gnulinux-5.15.0-69-generic-advanced-3ce46e7e-eb73-4980-b6da-c03947b8e717' { menuentry 'Ubuntu, with Linux 5.15.0-69-generic (recovery mode)' --class ubuntu --class gnu-linux --class gnu --class os $menuentry_id_option 'gnulinux-5.15.0-69-generic-recovery-3ce46e7e-eb73-4980-b6da-c03947b8e717' { menuentry 'Ubuntu, with Linux 5.8.0-43-generic' --class ubuntu --class gnu-linux --class gnu --class os $menuentry_id_option 'gnulinux-5.8.0-43-generic-advanced-3ce46e7e-eb73-4980-b6da-c03947b8e717' { menuentry 'Ubuntu, with Linux 5.8.0-43-generic (recovery mode)' --class ubuntu --class gnu-linux --class gnu --class os $menuentry_id_option 'gnulinux-5.8.0-43-generic-recovery-3ce46e7e-eb73-4980-b6da-c03947b8e717' { -> menuentry 'Ubuntu, with Linux 5.4.44' --class ubuntu --class gnu-linux --class gnu --class os $menuentry_id_option 'gnulinux-5.4.44-advanced-3ce46e7e-eb73-4980-b6da-c03947b8e717' { menuentry 'Ubuntu, with Linux 5.4.44 (recovery mode)' --class ubuntu --class gnu-linux --class gnu --class os $menuentry_id_option 'gnulinux-5.4.44-recovery-3ce46e7e-eb73-4980-b6da-c03947b8e717' {
- Here we want to use option
menuentry 'Ubuntu, with Linux 5.4.44'. Modify grub to replace the boot kernel. sudo vim /etc/default/gruband change the first line toGRUB_DEFAULT="Advanced options for Ubuntu>Ubuntu, with Linux 5.4.44"grub-install --versionto check grub version.sudo update-gruborsudo update-grub2to update the grub for grub version < 2.0 or grub version >= 2.0.
- After bootup, use
uname -rcommand to check whether the kernel version has been changed.
- Disable the address randomization in
su(sudo su) user.echo 0 > /proc/sys/kernel/randomize_va_space - Run the program to be boosted.
- Find the potentially syscall address of the program: (Or just use the pre-hardcode address in
source_codes/ub/zz_daemon/main.c, if it is changed, please add the new address insource_codes/ub/zz_daemon/main.c)- How to find syscall address:
- Use
straceto find the addresses of syscalls. e.g.:writeof redis:sudo strace -ip xxx, xxx is the pid of redis-server. (Here we need the redis-server is running, i.e., a redis-client program is running to communicate with the redis-server: one terminal run./redis-server, another terminal run./redis-benchmark.)- Then find the address of
writeand do step 4.
- Modify codes in daemon program:
source_codes/ub/zz_daemon/main.c// add the syscall address in targets[] // redis -> write const unsigned long targets[] = {0x7ffff7e5232f};
- Compile the daemon program using
make. - Insert the kernel module in
ub/zz_lkmfolder:sudo insmod zz_lkm.koand run the daemon programsudo ./zz_daemonin zz_daemon folder. - Run the program to be boosted and waiting for boost complete.
- It will be printed in
dmesgafter every 500k syscalls are captured, checkdmesgto find whether syscall has been boosted. - Finally, uninstall the module using
sudo rmmod zz_lkm.
- Every program needs to be boosted individually: re-insert the kernel module and re-run the daemon program. We give one script to run ub in
source_codes/ub/namedstart.sh. If the syscall address is right, kernel module and daemon program have been compiled, just runsudo ./start.shto start ub.
To simplify the artifact, we also write several scripts to reduce the repetitive workload of client test. Please see source/scripts/ folder, the usage of them are specified inside the scripts.
All the options with the tag '(Optional: has been Pre-hardcode)' can be bypassed. But if it cannot boost successful, please re-do the experiment from the (Optional: has been Pre-hardcode) step OR follow the instruction on how to find syscall address.
- Two sparated experiment: ssd disk read and memory read.
- For ssd disk read test:
- Codes lie in
source_codes/apps/io_file. We have modified thesyscall_readcodes to have 11 times read function tests. The first time read test for the boost period, and the 10 times followed for evaluation. - Firstly, make a big file in toRead folder named test.file. We use
ddto build a 2 Gbytes file, e.g.,dd if=/dev/zero of=test.file bs=1M count=2048 - Modify codes in
io_file/syscall_read.c:- Make sure the
FILE_POSis1 WITH_SUMparameter is corresponding to Table 3 in the paper.
- Make sure the
make- (Optional: has been Pre-hardcode)
sudo ./syscall_read <IO_SIZE>, likesudo ./syscall_read 1024for 1024 bytes every read.straceto get the syscall address, now we supportpread64()syscall. - (Optional: has been Pre-hardcode) Modify
ub/zz_daemon/main.cand add syscall address in arraytargets[]. Re-compile the daemon program. - Insert the kernel module, and run the daemon program.
- Run the
syscall_readprogramsudo ./syscall_read <IO_SIZE>. The boost period will happen in the first read function of the program(we repeat the read function 11 times.), and the 10 times followed will enjoy the boosting.
- Codes lie in
- For memory read test:
- The only difference is to build a file in
/dev/shm/folder and modifyFILE_POS -> 0inapps/io_file/syscall_read.c.
- The only difference is to build a file in
- For io_uring test:
- We use fio 3.16 to test io_uring.
sudo apt install fio sudo fio --name=/dev/shm/test.file --bs=<IO_SIZE> --ioengine=io_uring --iodepth=<IO_DEPTH> --iodepth_batch_submit=<IO_DEPTH> --iodepth_batch_complete=<IO_DEPTH> --iodepth_batch_complete_min=<IO_DEPTH> --rw=read --direct=0 --size=<FILE_SIZE> --numjobs=1 --sqthread_poll=1 --runtime=240 --group_report- To be fair, we set different batch sizes with different file sizes:(IO size - file size) 64-256MiB, 256-1GiB, 1024-8GiB, 4096-16GiB.
- We also test different io_depth influences on memory read. The range is 2^(1 - 10), which corresponds to Fig 6 in the paper.
- We use fio 3.16 to test io_uring.
- Redis version: 6.2.6. Download and compile.
- Bind the redis-server to a specific NIC and port in
config.conf(findbindinconfig.conf). - (Optional: has been Pre-hardcode) Get the syscall address of redis-server. Here we only support syscall
writeof redis-server. Add the syscall address insource_codes/ub/zz_daemon/main.cand compile the daemon program. - Insert the kernel module then run the daemon program.
- Run redis-server in
redis-6.2.6/src:./redis-server ../redis-conf. - Run redis-client. In our environment, we use two servers and a pair of directly connected Mellonax Connectx-3/5 NIC to do the experiment.
./redis-benchmark -h <IP_ADDRESS_OF_REDIS_SERVER> -p <PORT_OF_REDIS_SERVER> -t get -n 1000000 -d 3 --threads 2. The parameter-tspecify the method, e.g.,getorset, and-dmeans the data size value.
- We verify
-dfrom$2^0$ to$2^{14}$ . - Every
getmethod test should start from asettest with a same-dparameter. - The boosting period may need 20-30s for redis, so the
-nparameter needs to be large enough. The acceleration gets better as the benchmark runs longer. - After the boost complete, you can stop the benchmark and start a new benchmark test without boost again.
- The redis-server and redis-client can run in the same machine.
- Different hardware settings will get different results.
- F-Stack
- Use the F-Stack official tutorial to install and run.
- Bind one NIC to DPDK.
- The redis-6.2.6 is in
appfolder, compile and bind it to the DPDK NIC. - Start redis from F-Stack:
sudo redis-server --conf config.ini --proc-type=primary --proc-id=0 app/redis-6.2.6/redis.conf
- Multi-NIC are needed for DPDK configuration.
- If you use Mellanox NIC and the driver >= mlx4, then DPDK is supported originally. No DPDK NIC binding needed.
- Nginx version: 1.20.0.
- Install tutorial.
libpcre-devis needed. Configure options we used:sudo ./configure --prefix=/usr/share/nginx --sbin-path=/usr/sbin/nginx --conf-path=/etc/nginx/nginx.conf --error-log-path=/var/log/nginx/error.log --http-log-path=/var/log/nginx/access.log --pid-path=/run/nginx.pid --lock-path=/var/lock/nginx.lock --modules-path=/usr/lib/nginx/module --with-http_gunzip_module --with-http_gzip_static_module make && make install- The nginx configuration files are in
source_codes/apps/nginx, move them to/etc/foldermv source_codes/apps/nginx /etc/. The website files need to be put in/var/www/htmland they can be accessed from the8088port. Usingddto make files of a specific size. i.e.,sudo dd if=/dev/zero of=4k.html bs=4K count=1 - Run
sudo nignxto start nginx daemon program. Test whether it is working bycurlorwget, e.g.,curl http://localhost:8088/4k.html. - Do the benchmark by using wrk from another machine.
./wrk -t8 -c1024 -d12 <URL_&_FILES>. Here-t8 -c1024 -d12represent 8 threads, 1024 connection, and 12 seconds respectively. - (Optional: has been Pre-hardcode)
stracethe nginx-worker thread to find the syscall address. Now we support 5 syscalls acceleration:openat, setsockopt, writev, sendfile, close. Add addresses of these 5 syscalls insource_codes/ub/zz_daemon/main.c, and recompile the daemon program. - Insert the kernel module first, and then run the daemon program in root mode.
- Run wrk from another machine(the same machine is also ok) and wait for the boost complete. The boost period may cost more than 3 minutes depending on the RPS, so the first boost needs a big number of wrk -d parameter.
- After the acceleration is complete, stop wrk and continue to use
-d12for testing.
- Some syscalls gaps of nginx may be very large, so modify
syscall_short_thandhot_caller_thinsource_codes/ub/zz_lkm/stat.cto capture them. Increasingsyscall_short_thand reducinghot_caller_thcan catch syscalls that execute slower and with longer intervals. - Modify
worker_processesandworker_cpu_affinityin nginx configure filesetc/nginx/nginx.confcan set nginx worker threads and affinity. (worker_cpu_affinityset core affinity in the binary bit map.)worker_cpu_affinity: 0010000000000000: which means 16 cores in this machine, and bind the only one worker process to core13.
- After changing the configuration, use
sudo nginx -s reloadto load the new config.
- Two machine(client and server) are needed. Codes in
source_codes/apps/socket/udpfolder. - Client uses
send_upd.cas the sender. Change the 'xxx' oftheirAddr.sin_addr.s_addr = inet_addr("xxx.xxx.xxx.xxx");insource_codes/apps/socket/send_udp.cto one of the server NIC address. Usegcc send_udp.c -o send_udp -lpthreadto compile the sender. Just use./send_udpto run. - Server needs to modify 'xxx' of
const char *opt = "xxx";insource_codes/apps/socket/udp/raw_socket_udp.cto the real name of the chosen NIC.maketo compile the server. Usesudo ./sniff <0_OR_1>to run. 0 or 1 means whether to do the calculation of the incoming packages. - (Optional: has been Pre-hardcode) Same as previous, use
straceto get the syscall address after running these two programs. Here we support server's syscallrecvfrom(). Then add its address in the daemon program. - Insert the kernel module, recompile the daemon program, and run.
- Run the sender and receiver program again, waiting for the boost complete.
- Here we also modify the receiver to have an 11 times socket read test. The first one is used for boosting period, and the 10 times followed for evaluation.
sudo apt install python3-bpfccandsudo pip install bcc- Two machine(client and server) are needed. Codes in
source_codes/apps/socket/bpffolder. - Client uses
send_upd.cas the sender. Change the 'xxx' oftheirAddr.sin_addr.s_addr = inet_addr("xxx.xxx.xxx.xxx");insource_codes/apps/socket/send_udp.cto one of the server NIC address. Usegcc send_udp.c -o send_udp -lpthreadto compile the sender. Just use./send_udpto run. - Server needs to modify 'xxx' of
device = "xxx"insource_codes/apps/socket/bpf/main.pyto the real name of the chosen NIC. - Just run
sudo python3 wrapper.py. The script will output every 10 seconds.
- In most situations, turning on KPTI will have better performance gain. Newer processors may not be affected by the Meltdown, so they are not affected by KPTI.
- How to turn off KPTI: modify
GRUB_CMDLINE_LINUX_DEFAULT=""line in/etc/default/grub, addnoptioption inside the double quotation marks. Then update grub and reboot.
Address are collected in our setting, please double check.
If addresses update is needed, please follow the instruction on how to find syscall address.
| Application | Syscalls | Address |
|---|---|---|
| redis | write | 0x7ffff7e5232f |
| nginx | openat | 0x7ffff7fa1abb |
| setsockopt | 0x7ffff7df274e | |
| writev | 0x7ffff7de6487 | |
| sendfile | 0x7ffff7de4fae | |
| close | 0x07ffff7fa1437 | |
| raw socket | recvfrom | 0x7ffff7fa76ca |
| read memory/file | pread | 0x7ffff7ed116a |