Linux Emulation Benchmarking
Did you know that all of NetBSD, FreeBSD 11, and SmartOS (based on Illumos) can run native Linux binaries via syscall emulation?
Many benchmarks have been done in the past attempting to compare (e.g.) Linux to FreeBSD or Illumos, but results are often complicated by differences in compilers, compiler settings, and available libraries. For example, GCC’s libgomp
uses a completely different synchronization code when compiled for Linux.
I’ve run some quick tests using only a single NAS NPB benchmark (IS, class C).
Methodology
All tests were run using the exact same binary, compiled on a Debian Jessie machine via GCC 4.9. This is an OpenMP benchmark, but the libgomp
library has been included in the binary. As a result, we’re really just testing the performance achieved by the kernel (and system call emulation, if applicable).
Important notes:
- I don’t have NetBSD results because NetBSD 7 fails to boot on this particular machine and I couldn’t resolve the issue.
- The FreeBSD results are based on a snapshot build of FreeBSD 11 (20150917-r287930). This is not a stable release, and there may be development-oriented options turned on which slow down the kernel. It should be re-evaluated upon release.
Hardware
- Dell Studio XPS 435MT
- 18GB DDR3
- Intel Core i7 920
This machine has 4 cores and 2-way hyperthreading:
Tested OSs
- Debian GNU/Linux 8.0, Linux 3.16.
- FreeBSD 11 snapshot, 20150917-r287930
- NetBSD 7.0
- Joyent SmartOS 20150917T232817Z, Debian 7 LX-branded zone
Benchmark
- NAS NPB 3.3.1, IS (integer sort) class C, OpenMP version
- Compiled with Debian 8.0’s GCC 4.9.
-O3 -fopenmp -mcmodel=medium -static
.
Results
Below are the results. (Higher is better.) FreeBSD appears to be a bit slower, but this is not a release and I’m not sure if it’s fair to draw any conclusions just yet. On the other hand, Joyent’s SmartOS is essentially just as fast as native Linux in terms of peak Mop/s.
The most interesting differences happen when we have a number of threads that doesn’t map nicely to the hardware; for example, with 5, 6, or 7 threads on our 4-core hyperthreaded machine, Joyent is consistently significantly faster than Linux. This suggests to me that the Illumos scheduler is doing something better in these strange cases.
(default/active/passive refer to the OMP_WAIT_POLICY
setting for libgomp
.)