Debugging `launchd` on OSX 10.10.3
With the recent discovery of some IPC vulns in OSX (one, two, three) digging into the way IPC works in OSX came to the front of my TODO list. That, along with the re-write of
launchd for 10.10.x, well, let’s just say it has thus far been a fruitful endeavor - that story is for another post. Which leads me to this post - why would I want to debug
Mach is the low level functionality within the XNU kernel, which provides IPC between threads and processes; IPC endpoints are referenced via ‘ports’ (unsigned integers). Without going into too much detail about OSX IPC, the flows are very similar to DNS. When one process wants to speak to another process, it queries a special Mach port called ‘bootstrap’ (
launchd in the context of OSX IPC) to lookup/provide-access-to the port of the endpoint it would like to communicate. This lookup is done via a string like ‘com.apple.networkd’.
launchd responds with a ‘port’ that is used for the destination of future IPC messages. ‘bootstrap’/
launchd is the first point of communication when communicating between two processes.
The reason I had an interest in debugging
launchd is because I had been able to trigger some crashes.
launchd is like
init for linux; the kernel spawns it as PID 1 and every process is executed under it. When
launchd crashes, the kernel panics, and your machine reboots with the “there was a problem, press any key to continue” screen. User-land triggering kernel bugs is obviously interested due to the trust boundary crossed.
This blog post outlines my thought process (and associated fails) on how I was (kind-of) able to debug a crash in
launchd. A quick outline is s follows:
- Attach debugger… FAIL
- Default crash logs… FAIL
- Kernel debugging… FAIL
- Old launchd source… FAIL
- dtrace… SUCCESS YAY (kind of)
- Kernel debugging again (with KDK)… FAIL
- Kernel debug build… YES!
My first thought was to simply attach a debugger (lldb) to
launchd, however after attempting to I received the following message:
I initially thought this was some anti-debug trick Apple introduced to prevent debugging of the service. However, I got to thinking - if
launchd is PID 1, and everything is spawned off of it, then if
lldb was able to break
lldb should be halted as well - something like the “chicken or the egg” problem. It would be like if you spawned a thread, and then put a breakpoint in the parent, but the thread still was running. It is just not possible.
Default crash logs
Based on some other OSX research I had done in the past, I know there are some default directories where crash logs are stored:
- /Library/Logs/CrashReporter - Where crash data will be stored (similar to the pop-up for “do you want to send these details to Apple”)
- /Library/Logs/DiagnosticReports - Where application/kernel panic logs will be stored
- /var/log/ - Where default application logs are stored
I enabled all possible logging options for
launchd, and was unable to find anything useful from the
launchd logging output. There were, however, the following:
Kernel panic logs
The following file
/Library/Logs/DiagnosticReport/Kernel*.panic will show the details of the kernel panic, however it was not very useful as there are just direct memory address references, no backtrace, no function names, no debugging output, etc:
One of the useful things from this is that the crash log refers to a specific file/line of the xnu source describing why the kernel crash had occurred - the kernel will panic if
launchd exits, and the bugs I had found cause
launchd to crash.
launchd crash logs
From everything I had read about other
launchd crashes, there should be a crashdump file like any other process, however from the
launchd re-write, I can only assume Apple had disabled that feature. In turn, you get a semi-useful
/usr/bin/sample output located in the
/var/log/com.apple.xpc.launchd/ directory. Although this gives a bit more information than the kernel panic, I still am leaps and bounds away from finding the root cause of these crashes.
My next thought was to move to kernel debugging, and try and catch the crash before it jumped into the kernel.
Note: At this point in time none of the 10.10.x kernel debug kit (KDK)s were available
In order to enable kernel debugging (on the guest), you must set some specific
nvram flags. The flags are just added together to make the final value to set (0x141 = 0x100 + 0x040 + 0x001). Flags are set using the
nvram flag=value command.
Note: DO NOT set the ‘nvram boot-args’ with kernel debugging and NOT ARP on a VM - you wont be able to connect via TCP, and you’ll need to find a way of reverting the nvram
|0x01||Stop at boot time and wait for the debugger to attach|
|0x02||Send kernel debugging output to the console|
|0x04||Drop into debugger on a nonmaskable interrupt|
|0x08||Send kernel debugging information to a serial port|
|0x10||Make ddb the default debugger|
|0x20||Output diagnostics information to the system log|
|0x40||Allow the debugger to ARP and route|
|0x80||Support old versions of gdb on newer systems|
|0x100||Disable the graphical panic dialog screen|
There are a useful setups I found:
nvram boot-args "-v debug=0x141"- Verbose, wait for a debugger at boot
nvram boot-args "-v debug=0x146"- Verbose, wait for a debugger upon kernel crash/panic
nvram boot-args "-v debug=0xd04 _panicd_ip=192.168.121.1"- Verbose, cause a coredump to be transmitted to a panic server (must have a panic server running - directions are in the KDK ReadMe.html file)
When dealing with kernel crashes, having to reboot and re-attach every time became quite annoying, so I found myself using the flags that waited for the debugger upon panic.
When a crash occurs (I was triggering the
launchd bug), the OS/VM should looks something like this (note: the bottom of the screen showing “waiting for debugger”):
From the host, you can now connect to the kernel debugger via lldb’s
Note: Using an lldbinit from deroko to mimic @osxreverser’s gdbinit
When thinking about the kernel, in relation to debugging
launchd, the kernel is just a process. More specifically the parent process of
launchd. From the backtrace we can see that in the debugger we are in the context of the kernel. This is telling us exactly what the kernel panic error message from the crash said - the kernel panic’ed at kern_exit.c:359. As far as I am aware, after a crash and attaching a debugger to the parent, there is no way of switching the debugger to the context of a spawned/child process. (Although, as I am writing this, there may be some way of causing the kernel debugger to wait upon boot, following threads, and somehow stop following threads when you are in the
launchd context - I have not tried this.)
At this point, I started looking into the older kernel debug kit, and noticed there were some additional python libraries/functions for
lldb that may allow me to switch context from the kernel to
launchd. I spent a little time trying to get 10.9 KDK
lldb libraries working on 10.10, but decided to just wait for the 10.10 KDKs to come out.
Old launchd source
While waiting for the 10.10 KDKs to come out, it came to my attention that older versions of
launchd have been open sourced (code browser or tarball).
After diving a bit into this older version of the
launchd source, it seems that there are a bunch of flags to trigger different debugging behavior; one behavior was that
launchd should trap into the kernel debugger. This was exactly what I was looking for, however the version of
launchd source was from OSX 10.9, and I was on 10.10. Those flags are:
- Creating the /var/db/disableAppleInternal file
- Creating the /var/db/.launchd_shutdown_debugging file
- Having the nvram boot-args set to verbose mode (
nvram boot-args -v)
- Setting the ‘launchd_trap_sigkill_bugs’ value in nvram boot-args (
nvram boot-args launchd_trap_sigkill_bugs)
Unfortunately, none of the above gave me an interactive kernel debuugger for
launchd. Upon kernel panic’ing, the debugger would still put me into the context of the kernel. Either I was doing something wrong, misunderstanding the code, or Apple removed this functionality with the re-write of
During the first kernel debugging process, I had an epiphany:
dtrace hooks happen at a pretty low level; I know there is a hook to perform actions upon applications exiting/faulting; if, somehow, someway, this fault hook would execute before the kernel panic occurs, I may be able to gain some additional information.
I ran a simple dtrace script to perform a stacktrace on
launchd upon it crashing, redirecting the output to a file (this can be done as a one-liner):
And voila! I now had a specific location, within
launchd, of where this crash is occurring. That being said, it was still quite hard backtracing to understand exactly why the crash occurred.
Kernel debugger again (with KDK)
Apple releases Kernel debug kits (KDKs) with each version of their kernel. These packages provide debug/developer builds of the kernel, along with
lldb scripts/functions/tools to help in debugging kernel/driver code. While performing the previous research, there was no KDK available for my version of OSX.
Once I noticed that the 10.10.x KDK’s have been released, I pulled and installed. My first thought was that some of the additional python libraries would allow me to get more insight into
launchd, and possibly switch context from kernel to
launchd. Unfortunately that was not the case, but there are some very interesting features of the KDK python scripts/functions for
lldb. The location of these scripts are
Now, these additional tools did not provide any help to me while debugging
launchd there are a WHOLE BUNCH of them and felt it worth noting.
To include these script, follow the instructions when attaching the debugger:
Just from viewing the
help command, you can see how much additional functionality has been introduced (this is only a small segment):
While writing this, I noticed some functions that may be of some benefit. Specifically
showtaskstacks -F launchd, however the backtrace shows that it has jumped/switched context into the kernel debugger. This may not be a lost cause; I did spend too much time here.
Kernel debug build
After the kernel debug kit came out, and a second look at the XNU source, I noticed something very interesting near the kernel crash:
It looks like, if you are running a DEVELOPMENT or DEBUG build of the kernel and
launchd crashes, the kernel will perform a coredump of
launchd somewhere. After a little more sifting through the XNU source, the file should exist in ‘/cores/core.[PID]’.
The ReadMe.html in the KDK (/Library/Developer/KDKs/*/ReadMe.html) has details on how to install/run a DEVELOPMENT/DEUG kernel. After following that, triggering the crash, and rebooting I finally found a golden nugget - there was a file at ‘/cores/core.1’!
Note: DO NOT try and run a developer/debug kernel for a different version of OSX - it wont work, your kernel wont boot, and you’ll have to find a way of reverting (I WAS DESPARATE!)
core.1 file with lldb (
lldb /sbin/launchd -c core.1) I can finally at least dynamically debug the crash. Which, unfortunately seems to be a null pointer dereference, however I haven’t fully debugged the root-cause to understand full impact.
Note: Some information is [REDACTED] because I have not reported any associated bugs. Once they are reported/fixed, [REDACTED] information will be removed