I mean, ideally you could get rid of the performance hit from syscalls as you co...

I mean, ideally you could get rid of the performance hit from syscalls as you could run the app and the barebones kernel in the same address space. You could also get rid of pages, virtual memory, and have a very fast malloc implementation. read()/write() could also be super fast. In addition, you can get rid of the scheduler (assuming no threads) and a ton of other complexity. Also the cpu cache lines would be hella consistent.

Of course, you’re throwing away a lot. But for certain applications (like HFT), the potential benefits seem very attractive.