Very cool! I will definitely give this a try, I've been looking to build Go bindings to Mach[0] soon.
It looks like this would make cross-compiling CGO easier (no target C toolchain needed?)
Does this do anything to reduce the overhead of CGO calls / the stack size problem? IIRC the reason CGO overhead exists is at least partly because goroutines only have an ~8k stack to start with, and the C code doesn't know how to expand it-so CGO calls "must" first have the goroutine switched to an OS thread which has an ~8MB stack.
One reason I think Go <-> Zig could be a fantastic pairing is that Zig plans to add a builtin which tells you the maximum stack size of a given function[1], so you could grow the goroutine stack to that size and then call Zig (or, since Zig an compile C code, you could also call C with a tiny shim to report the stack required?) and then eliminate the goroutine -> OS thread switching overhead.
Contributor here: Purego doesn’t do anything to improve the overhead of calling into C. It uses the same mechanisms that Cgo does to switch to the system stack and then call the C code. Purego just avoids having to need a C toolchain to cross compile code that calls into C from Go.
I’ve actually been quite interested in Zig. If that built-in was added than it would likely be possible to grow the goroutine stack to the proper size and than call the Zig code. Very interesting stuff!
Makes sense! I also wonder (if you know): last I looked I recall that each CGO call requires switching to the system stack, but I can't recall what happens after. Does it switch back to a regular goroutine stack once the syscall has completed?
I wonder if a more tailored CGO implementation could pin a goroutine to a thread which is guaranteed to have a system stack available, so that each CGO call need not worry about that switching at all. Maybe that'd require runtime changes though?
Stack switching isn't that much of the overhead. "ordinary" cgo overhead is <100ns now, has been for a few years, and is much closer to 30 than 80 on recent processors. Most of the overhead is a set of 4 CAS operations (incidentally this means that AMD has measurably lower cgo overhead because of something with its caching model I don't understand).
If cgo's only overhead was the "ordinary" overhead, most people wouldn't have an issue with it. It's downright zippy, in fact... as long as your syscall/C call takes less than 1us. If you stay under the 1us threshold, go will put the OS thread used for the syscall back where it found it and everything moves on.
The issue is that the OS thread was previously serving N goroutines that other parts of the program may be waiting on to move forward, and the OS thread is in a state where go can't pre-empt it and allow those other goroutines to move forward, and it has no idea how long it will be until it can move forward.
As a result, if a syscall/c call takes longer than 1us, go has no choice at this time but to resume a new thread, context switch all the old work onto that thread, and then suspend the syscall thread when it comes back. If you do this a lot, your performance will crater.
There's also separately a few issues around how go chooses to resume/suspend OS threads (for instance, if an os-locked goroutine does coooperative park for any reason to wait on another thread to do something, go will suspend the thread it was on, context switch to a different thread, then when the goroutine wakes up, it will realize its mistake, resume the thread it was on and context switch again).
This is all fixable stuff, but all the use cases that google cares about are working fine so it doesn't really get any attention.
Yeah the default behavior is to switch back. It’s possible to pin a goroutine to a thread with runtime.LockOSThread(). However I don’t believe it avoids the stack switching. It’s purpose it to make sure that Thread Local Storage works properly. The runtime is pretty smart though so it might already do the optimization you suggested in someway. I know it has a goroutine specifically for monitoring if a thread is stuck in a external call and therefore spawns a new thread to continue work (sysmon)
Maybe I should play with removing the stack switching from purego so that under condition of a locked OS thread you can avoid that overhead :) I might give that a shot sometime
I thought I had a piece of dust on my screen, but as I scrolled the dust scrolled: what do these 0xB7 characters do in the identifiers? Are they just "name mangling" to keep them from being exported or something?
"In Go object files and binaries, the full name of a symbol is the package path followed by a period and the symbol name: fmt.Printf or math/rand.Int. Because the assembler's parser treats period and slash as punctuation, those strings cannot be used directly as identifier names. Instead, the assembler allows the middle dot character U+00B7 and the division slash U+2215 in identifiers and rewrites them to plain period and slash."[0]
I did not know that this could reasonably be done. For some reason it did not occur to me that you could break the chicken and egg problem by simply linking to libdl dynamically; Go binaries are usually static and I didn't even realize it had a mechanism for dynamically linking like this.
This is pretty cool because you can already do this sort of thing on Windows (using the syscall package, since the Windows Loader is always available from kernel32 anyways) and I use it all the time. Probably the most consequential thing I've done with it is my WebView 2 bindings. But with this, you could probably do the same thing on Linux and Mac with GtkWebkit and ... WebKit, and get a native HTML window without CGo on Windows, Mac, and Linux. Perhaps this has already been done (haven't paid attention) but it would make a pretty nice way to get a UI going in Go. (It's not like I'm a fan of using HTML UIs for native apps, but it works pretty well if you don't overdo it, and using native widgets on a given platform does mess up predictability a bit, but it saves disk space at least.)
It loads the dynamic library at runtime, instead of linking against it, which means it makes cross-compiling with CGO easier as no target C toolchain is needed.
What slimsag wrote is correct. It makes cross-compiling code that needs to call C functions as easy a setting the GOOS and GOARCH and just building. This means no need to worry about building a C cross-compiler.
I do want to write an article about how purego works under the hood.
I don't have either. I was gonna figure out how to post it after I actually sat down and wrote it lol. I'll probably post it in the golang subreddit and maybe link to it in the README.md since it describes how purego works.
If memory serves, dev.to tends to be downranked or outright filtered by a bunch of places.
(I have no idea how or why that came about, I've merely observed people having all sorts of trouble getting posts on there visible in aggregators and etc.)
Interestingly it’s easy to do this with just standard library on Windows: syscall.NewLazyDLL plus NewProc are enough. (Of course in practice you should probably use golang.org/x/sys instead.) I’ve never thought about why dlopen isn’t offered in syscall on *nix until now.
It’s pretty simple to use if you are familiar with dlopen and friends.
Just call purego.Dlopen(“libname.so”, purego.RTLD_GLOBAL)
Take the returned library (make sure to check for errors with purego.Dlerror() first) and call purego.Dlsym(lib, “cfuncName”). If it exists than u can call it with either purego.SyscallN or purego.RegisterFunc
It would be good to have more documentation on usage, though; things like how to deal with struct padding (or packed structs), common OS API types (presumably manual munging of UCS2/UTF16 is needed for Windows), etc; at least to mention that it's unchanged from …/x/sys?
It's easier with dlopen because it's still C and therefore you have the normal headers…
GP is not exactly right; CGO is not involved in building a shared library except for exporting C functions which can be called. It doesn't "convert Go code to C code" or anything like that.
`-buildmode=c-shared` is what produces a shared library with a C ABI exposed; https://golang.org/s/execmodes was the design document for it, which explains:
> It follows that all Go code shares a single runtime. All Go code uses the same memory allocator, the same goroutine scheduler, and in general acts as though it were linked into a single Go program. This is true even when multiple shared libraries are involved.
You can already do this with CGo. The difference between CGo and purego is that CGo requires you have a C toolchain installed while purego seems to allow you to call a function that has already been compiled apart from the Go build system. In either case, you can call the Go library from Python, but the tough bits are translating Go objects into Python objects (and vice versa) as well as making sure object lifetimes are correctly managed (I think this is mostly "copy data rather than passing pointers across the language boundary").
I know about CGo and have used it. The post title, "A library for calling C functions from Go without Cgo", made me curious how do I achieve something similar with PureGo without using CGO
I’m one of the main contributors. I’ve looked into it bc I wanted to know if I could build iOS apps without Cgo. ATM, it is not possible. The reason is because when you run go build creating a shared object it runs the go cgo tool. That tool although written entirely in Go doesn’t know about purego and so will go ahead and import runtime/cgo which requires a C toolchain. Now it could be possible to circumvent that with using a custom Go build toolchain but the goal of purego was to be seemless to use in a project. Just use it and then go build like any other dependency.
Cross-compiling with cgo can be frustrating at times, at least in my own experience. Since Ebitengine is a game engine, and they made this repository, I am presuming it's related.
I am in love with Zig (as you know), but feel the need to say: please be careful with blanket claims like this. purego is a pretty admirable approach to fixing cgo cross-compilation without Zig (though has its own drawbacks, like no static binary for example.)
Using Zig for Go cross-compilation, although quite great, isn't bulletproof. Finding the right CC/CXX incantation can be fairly tricky, the articles on this are not super up-to-date, and you need a copy of the macOS SDK[0] if you intend to cross-compile for macOS. You may also run into a few scary linker warnings and need to figure out the right Go build flags.
I definitely think Zig and Go can be best-friends, after all they share so many similar qualities and seem quite complementary to each-other. It's definitely possible to get Go<->Zig CGO cross compilation working (I did so for Sourcegraph); But Zig needs a little more love before it'll 'just work' as claimed for the Go use case, so best not to claim otherwise until then.
Curious - what is frustrating about it? I found the process very easy and with no weird bugs, but I only made a thin layer over a supplied C SDK, so maybe my use of cgo is not representative?
Make sure you have the righ C cross compiler for the target platform. Then you might encounter the usuall Wrong version of libc etc. So it's not enough to compile for goos=linux for example but also need to make sure that your C cross-compiler has the right version of libc for the target machine or Statically link the version you want, or deploy inside a container etc etc.
Pure Go? don't need anything else beside the Go compiler.
Ah, I see, thank you for the explanation. The reason I never had any problems of this kind is that I always compile Go in a container anyway, so the environment is controlled completely. Makes sense that using the C toolchain on the host would be painful, yes.
Setting up the cross-compiling toolchain is a pain in itself, to occasional dabblers and/or when your target is off the beaten track. I can trivially target MIPS-LE with `GOARCH=mipsle GOOS=linux`, but as soon as I added CGO into the mix to support SQLite, things went off the rail and I gave up after trying to set up a mips cross-compiling tool chain in a container
When cross compiling pure Go you set an environment variable, build, and you have a binary ready to run, it is trivial.
When cross compiling with cgo you need a C cross compiling toolchain for the target platform installed with the right libc, etc. It is not impossible and that is how C/C++ is always cross compiled, but it is much more hassle than compiling pure Go.
Why would rust even have this problem? Rust has native "extern "C"" blocks and good FFI.
The issue in Go is that goroutines run on small stack and C code has no way to know of that or increase the stack size - so Go's C calling facility (cgo) has to go through a thread and a proper stack.
There are some wild assembly hacks to go around it ^^
This project isn't really about calling C functions, Go already supports that after all (CGO). Rust does suffer the same problem, which is that cross-compilation when using C code is a bit of a nightmare.
purego solves this by using dlopen and friends.
My understanding is Rust solves this through libloading (same approach effectively) and more heavy-handed approaches like cross-rs which distribute full C/C++ build toolchains for each target (in Docker images or something?)
> C should have never ben part of Go to begin with
It’s unavoidable on most platforms. Linux is pretty much the only mainstream platform where the syscall interface is considered the stable interface between user and kernel. Other platforms (like macOS, *BSD, and Windows) consider libc (or equivalent in Win32) to be the stable interface.
Win32 in particular is known for guzzling KBs of stack before getting around to making an actual syscall, so you wouldn’t be able to avoid trampolining through an ABI-compatible environment anyway.
I really really want to learn and use Go, I spent months with it.
In the end, it's a very different paradigm and it is a pain for me to switch from the C-family-syntax to Go and back on a daily basis, and I gave up. Using C, JS, Java, C++, Dart, maybe even Rust(only tried it shortly) on the other hand is much more comfortable and natural for me. Go is just such a different style.
Been vastly different does carry some cost shall I say, Go could be a great language for new programmers however.
Dependencies need updating from time to time whatever the dev team would prefer to be true. In a mono repo, projects like, upgrade all uses of X to X’ can be done by a central team if needed and safe. The dev teams don’t have to do much.
If everyone has their own repos, and there are too many of them, it becomes difficult for central teams to do more than write migration tools and dash boards and nag people.
It looks like this would make cross-compiling CGO easier (no target C toolchain needed?)
Does this do anything to reduce the overhead of CGO calls / the stack size problem? IIRC the reason CGO overhead exists is at least partly because goroutines only have an ~8k stack to start with, and the C code doesn't know how to expand it-so CGO calls "must" first have the goroutine switched to an OS thread which has an ~8MB stack.
One reason I think Go <-> Zig could be a fantastic pairing is that Zig plans to add a builtin which tells you the maximum stack size of a given function[1], so you could grow the goroutine stack to that size and then call Zig (or, since Zig an compile C code, you could also call C with a tiny shim to report the stack required?) and then eliminate the goroutine -> OS thread switching overhead.
[0] https://github.com/hexops/mach
[1] https://github.com/ziglang/zig/issues/157