When I was writing mustardwatch, I realized that the Linux ptrace API was even more complicated than I remembered. It was impossible to handle stopping signals correctly before PTRACE_SEIZE, introduced in Linux 3.4 (2012). If you don't need to support earlier kernels, simplify your life by only using PTRACE_SEIZE. I'll treat all APIs that aren't compatible with SEIZE (e.g. PTRACE_TRACEME) as deprecated, and ignore them here. In particular, everything here assumes the following rules: * Always attach with PTRACE_SEIZE. * Always set PTRACE_O_TRACESYSGOOD. Here's the way ptrace works: * You (the tracer) seize a tracee. * The tracee can be running, or in several kinds of stopped states. * You can call waitpid() on it -- as if it's your child -- which can give you various events, including many ptrace-only events. * When the tracee is stopped, you can inspect or change it with various ptrace calls. * You can restart the tracee in various ways. More details: * To seize: PTRACE_SEIZE takes a flags argument (the same flags as SETOPTIONS). * Make sure to set TRACESYSGOOD to distinguish syscalls from other stops. * If you want to trace children, set TRACE{CLONE,FORK,VFORK}. If you're forking a child, you probably want it to be stopped. So your seize code might look like: pid = fork(); if (pid == 0) { // Child. raise(SIGSTOP); execve(...); } else { // Parent. waitpid(pid, &wstatus, WSTOPPED); // Wait for child to stop. ptrace(PTRACE_SEIZE, pid, 0, PTRACE_O_TRACESYSGOOD); // Seize child. // XXX: Is this correct? Do we need to INTERRUPT? kill(pid, SIGCONT); // Continue child execution. } Followed by the main wait()/ptrace() loop. The loop might look like this: while (1) { poll(...): // Possibly on a signalfd to get SIGCHLD notifications. // If we got a SIGCHLD, run waitpid to get ptrace events from the child. while (1) { int wstatus; pid_t pid = waitpid(-1, &wstatus, WNOHANG|__WALL); if (pid == 0 || (pid < 0 && errno == ECHILD)) break; // We got a child event. if (WIFEXITED(wstatus) || WIFSIGNALED(wstatus) || WIFCONTINUE(wstatus)) { // This isn't a stop, e.g. the child exited. } if (WIFSTOPPED(wstatus)) { // This is a stop. See below. int stop_sig = WSTOPSIG(wstatus); int stop_event = wstatus >> 16; // ... ptrace(ptrace_restart, pid, 0, continue_sig); // In the case of a group-stop, you probably want to use PTRACE_LISTEN // rather than a normal restart call. } } } Seizing a tracee does not stop it. Note that when a tracee forks and you have TRACEFORK set, you may get events for the new pid before seeing the PTRACE_EVENT_FORK message (so you won't necessarily recognize the pid from wait()). * Stopped states. In various cases, a tracee may enter a stopped state. You can tell when you call waitpid and WIFSTOPPED(wstatus) is true. When you get a stop, you can inspect or modify the tracee, and then continue execution. When you continue execution, you can pass ptrace a signal number as the last argument. For each kind of stop, I specify what the no-op action that mimics untraced behavior would be. The specific kind of ptrace stop is given in two parts of the wstatus, referred to below as: int stop_sig = WSTOPSIG(wstatus); // Stopping signal. int stop_event = wstatus >> 16; // ptrace event identifier. There are several kinds of stops: * Syscall-stop: Delivered right before entering and right after exiting a system call, if you use PTRACE_SYSCALL. Can be identified with: stop_sig == (SIGTRAP|0x80) && stop_event == 0 No-op action: Continue execution with continue_sig = 0. * Event stop: Delivered to indicate other special ptrace-specific events. The event type is given in stop_event. Mostly you sign up for events explicitly: If you enable PTRACE_O_TRACEFORK, you might get PTRACE_EVENT_FORK events, and so on. For many events you can request additional event information with PTRACE_GETEVENTMSG. PTRACE_EVENT_STOP is special (you signed up for it when you used PTRACE_SEIZE): * If you get a STOP event and stop_sig is SIGTRAP, that indicates that you successfully interrupted a tracee (or a tracee's new forked/cloned child, which automatically gets interrupted at startup). (But note that if your INTERRUPT request happens at the same time as some other stop, you may get notified of that stop instead.) * If you get a STOP event and stop_sig is a stopping signal, that indicates a group-stop. See below. (Note: If you INTERRUPT a tracee which is currently in a non-ptrace stopped state, it goes into a group-stopped state. See below.) * There are no other STOP events. (TODO: Double-check this.) Can be identified with: stop_event != 0 No-op action: For all event stops other than group-stops, continue execution with continue_sig = 0. For group-stops, see below. * Signal-delivery-stop: When a process gets a signal (other than SIGKILL), its tracer is notified first, and may change or drop the signal by not passing it to the restart call. Can be identified with: stop_sig != (SIGTRAP|0x80) && stop_event == 0 No-op action: Continue execution with continue_sig = stop_sig. * Group-stop: A group-stop is a special kind of event stop that needs to be treated differently. When a process gets a stopping signal -- e.g. when the user presses ^Z -- you're first notified as above, with a signal-delivery-stop (if the process has multiple threads, only one thread gets a signal-delivery-stop). Then, if you pass the signal on to the tracee, it gets stopped. Every thread in the process -- including the one that got the signal-delivery-stop -- gets a group-stop event, indicated with stop_event == PTRACE_EVENT_STOP, and stop_sig one of the stopping signals (SIGSTOP, SIGTSTP, SIGTTIN, SIGTTOU). A group-stop indicates that the tracee is stopped, but in a special ptrace-stop state, not a regular stopped state; SIGCONT won't be delivered in this state. A tracee is also put into this state if you INTERRUPT it while it's in a regular stopped state. Can be identified with: stop_event == PTRACE_EVENT_STOP && stop_sig != SIGTRAP The no-op action is *not* to continue execution -- since the tracee is supposed to be stopped -- but to put it in a regular stopped state (rather than the special ptrace stopped state it's currently in), with PTRACE_LISTEN (and continue_sig = 0). * Inspecting and modifying the tracee. The APIs available are mostly the ones you'd expect: Get/set register state, memory, signal state, etc. (Note that a lot of the information you want is exposed via /proc/pid/ rather than ptrace.) For reading and writing memory, process_vm_{read,write}v is much better than the direct ptrace API, which operates a machine word at a time. Note: If you want to do a system call -- say, to allocate memory in the tracee -- you'll need the instruction pointer to be on a syscall instruction (or equivalent) on an executable page. There's no nice way to do this; your reasonable options are: * Run the tracee until the next time it does a system call; * Scan its memory for a system call instruction (perhaps in the VDSO); * Temporarily write a syscall instruction to executable memory. (A special note on PTRACE_POKEDATA: It can write even to non-writable pages in the tracee. If you write to a shared read-only page, it automatically gets copied to a private mapping and unshared.) * Restarting the tracee. When the tracee is stopped, you can restart it with CONT, SYSCALL, or SINGLESTEP (and with the related SYSEMU_SYSCALL and SYSEMU_SINGLESTEP, which skip over executing system calls to let you emulate them yourself). The restart calls take a signal argument, as described above. You can also "restart" a tracee after group-stop with PTRACE_LISTEN, as described above. You can also detach from the tracee with PTRACE_DETACH. // TODO: Write about ESRCH handling. // BPF? // TODO: Add a note about EINTR. Some system calls (such as epoll_wait, though // not poll) will return EINTR when a process is interrupted, even though there // was no signal. The ptrace man page says this is a kernel bug. Requests: I {PEEK,POKE}{TEXT,DATA,USER} I {GET,SET}{FP,}REGS I {GET,SET}REGSET I {GET,SET,PEEK}SIGINFO I {GET,SET}SIGMASK I GETEVENTMSG I SECCOMP_GET_FILTER I {GET,SET}_THREAD_AREA I GET_SYSCALL_INFO [new] I SETOPTIONS R CONT R SYSCALL, SINGLESTEP R SYSEMU, SYSEMU_SINGLESTEP R LISTEN R DETACH G INTERRUPT G SEIZE x KILL x ATTACH x TRACEME Options: EXITKILL TRACE{CLONE,FORK,VFORK} TRACEEXEC TRACEEXIT TRACESYSGOOD TRACEVFORKDONE TRACESECCOMP SUSPEND_SECCOMP