Mac OS X shared library initialization

Posted by on in Blogs
I was part of the team that was passed through the Kylix threshing machine originally, so I decided to do a little research into shared library initialization and termination early on.  Some very difficult to debug things can happen to you if you get surprised by library load/unload sequencing on a platform.

When we did the original linux work, we started out with the assumption that the loader worked the same as Windows with respect to initialization order.  We assumed dependencies were initialized first, in a depth first ordering.  I don't remember anymore what the default ordering was, but it wasn't right.  We found out that the tools are responsible for supplying the shared object initialization order in a special section in the executable.  I didn't want to have a similar experience late in the development cycle for the Mac.

The first thing to do was to find out how shared objects (e.g. .dylib files) specify initialization procedures.  If you look at the MACH-o spec, or in mach-o/loader.h, you'll find the LC_ROUTINES load command, which seems just right for the job.  Comments in the header say this is used for C++ static constructors.  Excellent!  A little further reading shows the ld option -init allows you to specify the startup routine manually, and directs you to the man page for ld.  OK, off we go, and there we see only one sentence that includes a clause stating that this is used rarely.  Red flag.  And there the documentation trail dies.  Red flag.  OK, let's go look at a C++ static constructor example built with g++.  Hmm, no LC_ROUTINES.  Red flag.  OK, let's look for LC_ROUTINES in some of the .dylibs in the shipping OS.  Hmm, no LC_ROUTINES.  Red flag.  So, how do C++ static constructors get called?  Looking around a bit (see otool -lv), I see sections like __mod_init_func, with the S_MOD_INIT_FUNC_POINTERS flag set.  Good luck finding documentation on the S_MOD_INIT_FUNC_POINTERS flag (yeah, ok, red flag).

So, here's where we are saved a ton of time from the fact that the OS is open source.  I have the DYLD source handy, and a little grepping and reading gives a pretty clear picture of how things really work. LC_ROUTINES is only suitable for calling an initialization routine.  LC_ROUTINES does not support a termination notification.  So now LC_ROUTINES is a porcupine of red flags, and I wouldn't touch it with a ten foot pole.  S_MOD_INIT_FUNC_POINTERS and S_MOD_TERM_FUNC_POINTERS are the way to go.  These indicate sections that are arrays of pointers to functions that the OS calls for both the executable and for shared objects, if it finds them.  Each function in the S_MOD_INIT_FUNC_POINTERS section is called with a mess of parameters, which I'll describe later.  Each function in the S_MOD_TERM_FUNC_POINTERS section is called with no parameters at all.  The functions in the S_MOD_TERM_FUNC_POINTERS array are called in reverse order.  So for init, it's init[0], init[1], ..., init[n], and for term it's term[n], term[n-1], ..., term[0].  Without source, this would have been a long slow learning experience.

The loader source code also indicates a recursive descent initialization of the shared objects, but the devil is in the details, and rather than try to fully grok the source code, and possibly miss something critical, I decided to switch back to the empirical, and write a bunch of tests.  Armed with the details of how the loader wants us to set up the init and term functions, I descended on our C++ linker, and brought it to the point where it could build a Mac .dylib (mostly).  Many of the results are mundane, but a few are sort of interesting, and I'll try to present them here compactly.

First some notation.  In all the examples below, X refers to the executable.  Other letters refer to shared objects.  Solid directed lines indicate a static dependency between objects.  For example, X -> A, means the executable depends on the shared library A, and we'd expect A to be initialized before X runs.  Also we'd expect A to be terminated after X runs.  Dotted lines in the examples mean we dynamically load a shared object with dlopen.  That's where a lot of the really horrible things happened in Kylix, because of all the package load/unload operations, so I invested some tests there.  To show the actual initialization results, I'm going to use a very simple notation.  I use the letter of the object for initialization, and the same letter, preceded by ~ to indicate termination.  So for the example X -> A, I expect the following sequence: A X ~X ~A.

So, simple tests first.
X -> A

result: A X ~X ~A
/ ^
X |
\ |

result: A B X ~X ~B ~A

And a diamond:
/ \
X -> D A
\ /

result: A B C D X ~X ~D ~C ~B ~A

OK, moving on to a complicated static linking example.  This one is a ladder, and here we should do a little explanation.
    D -> C -> B -> A
/ | ^ | ^
X | | | |
\ v | v |
H -> G -> F -> E

In the example above, we have two straight legs of dependencies, with cross dependencies running back and forth between the legs.  You can't do a naive recursive depth first initialization of this, because you could end up running some initializers before their dependents are initialized.  You have to do a topological ordering of the dependencies, and initialize in that order.  That can be done with a recursive operation, and it is done that way by the Mac loader.  The easiest way to observe it is to follow all the dependency sequences in the ladder above.  Here they are:

The last one is the longest chain.  In all the chains, D is at the top.  The longest one is the initialization chain we need, since it guarantees all dependents will be initialized first.  It is roughly what we expect out of the loader, barring ties in initialization order due to libraries at the same level in the dependency graph.

Here's what we got when we ran the example:  A E F B C G H D X ~X ~D ~H ~G ~C ~B ~F ~E ~A

OK, so things are looking up.  I am getting what I expect out of the loader, and I'm not terribly surprised, because otherwise the world would have fallen apart long ago for the Mac, right?  That's what we thought on Kylix, and it's why I looked so closely here.  Anyway, now on to the dynamic loading cases.  This is where we mix static loading with dynamic loading, and make sure the semantics are reasonable, and don't require us to jump through too many hoops in the linker and RTL.
X -> A
B -> C

In this example, X loads B with dlopen, and then unloads it with dlclose before exiting.  B depends on C, but not A.  Here's what we got: A X [load B] C B [unload B] ~B ~C ~X ~A.

Now, just for fun, let's try the same case, but let's not call dlclose in X.  Meaning we orphan the B shared library.  Here's where it gets harder to guess what would happen.  Here's what we got:  A X [load B] C B ~B ~C ~X ~A.  Yep, same as the case where we called dlclose!  I certainly wouldn't rely on this feature, but it's nice to know about.

Now, a more complicated case where we try out load and unload sequences.
X --> A
. ^
. |
. . . B
. ^
. |
C ----

In the case above, there's a special runtime sequence.  X loads B, then loads C, then unloads B and unloads C.  The interleaved order is purposeful - to be mean.  There are no cyclical dependencies involved, but there are reference count issues.  C depends on B, so when we manually unload B in X, we hope the OS doesn't call the terminate routine in B, because C still needs B.  And here's what we got:  A X [load B] B [load C] C [unload B] [unload C] ~C ~B ~X ~A.  That's what we wanted.

And for our last example, we'll include a cyclical case.  This is a case where the user manually, via dlopen, causes a cyclical dependency to appear in the shared object dependencies.  This is bad, and there is no good way for the OS to resolve it.  Still, we'd like to know what happens.  For example, one outcome could have been for the OS to panic and kill the task.  The way we create the cycle is to have a shared object use dlopen in its initialization routine to load another shared object that has a static dependency on something further up the initialization chain that hasn't been initialized yet.
X -> D -> C -> B -> A
^ .
| .
| .
-------- E

And what we got was this:  A B E C D X ~X ~D ~C ~E ~B ~A

So what happened was that the OS stopped the initialization of the library loaded with dlopen at the point where it hit pending dependencies in an existing initialization chain.  That meant the loaded library E had its initializer run with some dependencies uninitialized (C and D).  This protected the main executable's dependencies, sort of, but either way, the user is probably hosed.  As a basic rule of thumb, people should avoid calling dlopen in a library initializer anyway unless they really know what they are doing.

So there's a basic way of summarizing the results here:

  • Each shared library has its init/term routine(s) called once and only once per process.

  • The list of shared libraries for a task is initialized in one order, and terminated in the reverse order.

  • All dependencies are initialized before their dependents.

This is a good result.  It's similar to Windows, and it appears to meet our basic requirements for package initialization code in the RTL without us having to add more magic to the tools over and above what we've done in the past to preserve initialization order.

Once last thing.  I promised to give list the parameters to the initialization functions in the S_MOD_INIT_FUNC_POINTERS sections.

void func(int argc, const char **argv, const char **envp, const char **apple, struct ProgramVars *pvars);

And ProgramVars is this:
struct ProgramVars {
const void* mh;
int* NXArgcPtr;
const char*** NXArgvPtr;
const char*** environPtr;
const char** __prognamePtr;

The interesting item here, really, is the mh field. That one is actually the Mach header for the image, which is really nice, because it means on OS X, a shared library has a reliable way of finding useful things about itself, like the precise start of sections in memory.  I don't know why there is the redundancy of information in the parameters passed to the init function, and the fields of the struct of the last parameter.

Ideally, none of you out there will never have to worry about this at all, but if you do, there it is.


  • Guest
    Allen Bauer Friday, 29 January 2010

    "As a basic rule of thumb, people should avoid calling dlopen in a library initializer anyway unless they really know what they are doing."

    There are warnings all over in Windows about this very thing. (see Raymond Chen's blog, The problem in Windows is that there are some things that may implicitly cause a library to be loaded from another library's init, namely nearly anything using COM :-(.

  • Guest
    Primoz Friday, 29 January 2010

    I'm strictly a Windows programmer and have no intentions to program on OS X in any foreseeable future but I still have to say - excellent posts, all of them! I love reading about OS idiosyncrasies any time. Just keep posting!

  • Guest
    Mike Dixon Sunday, 31 January 2010

    These blog posts could really use a "Mac OS" category tag to go along with the Delphi tag.

  • Guest
    Elbaz Haim Sunday, 7 November 2010

    can i develop an apple application with c++builder 10

  • Please login first in order for you to submit comments
  • Page :
  • 1

Check out more tips and tricks in this development video: