Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Runtime] Allow inspection of function names from a compiled .so #16836

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

Lunderberg
Copy link
Contributor

Prior to this commit, the LLVMModuleNode provided a "get_func_names" function that would return the names of functions available within the result of tvm.build. However, this utility was not preserved across a round-trip through mod.export_library and tvm.runtime.load_module.

This commit adds a similar "__get_func_names" function to a DSOLibrary, which returns the symbols available for use in the library. This is exposed in the Module.keys() method, mimicking the interface that a Python user would expect.

In addition, this is used to improve the error message shown when a library does not contain the requested function. The difflib module (from Python's stdlib) is used to find functions that are contained in the module, and have similar names to the one requested.

@tqchen
Copy link
Member

tqchen commented Apr 3, 2024

seems this behavior was platform dependent, and ideally we should not introduce behavior that only works on one platform but not others. It is also hard for other implementations of the same Module interface to support this behavior.
As a result, keeping the minimal necessary functionalities is preferred here.

struct link_map* map = nullptr;
dlinfo(lib_handle_, RTLD_DI_LINKMAP, &map);

Elf64_Sym* symbol_table = nullptr;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if this will work on windows

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This particular implementation definitely wouldn't work on Windows, and is in the #ifdef for the linux-specific code. Currently, the Windows implementation returns an empty list.

@Lunderberg
Copy link
Contributor Author

seems this behavior was platform dependent, and ideally we should not introduce behavior that only works on one platform but not others.

My goal was partly to avoid needing to alter an existing IRModule to support this functionality. Since the Library interface already abstracts over the platform-specific behavior, it makes sense for additional platform-specific to be implemented by extending the Library interface.

@Lunderberg
Copy link
Contributor Author

Lunderberg commented Apr 3, 2024

As an alternative implementation for the same end-user functionality, what would you think about a lowering pass that is applied just before tir.transform.MakePackedAPI()? That could generate the __get_func_names function within the IRModule, and wouldn't require platform-specific support to read symbols from the shared library.

Edit: Thinking on it, I like this approach more and more. While inspecting the symbol table would only let us determine the function names, generating the metadata in a IRModule transform could later be extended to provide the function signature as well.

@tqchen
Copy link
Member

tqchen commented Apr 3, 2024

the only main issue about generate approach is that it may not be comprehensive if we in the end link multiple exports together, which is a less frequent usecase as in static library case(where we append the symbol prefix).

I think this is more of an optional feature rather than mandatory feature, so we should avoid exposing it as things like keys, but maybe use some of the info if available for error reporting

@Lunderberg
Copy link
Contributor Author

the only main issue about generate approach is that it may not be comprehensive if we in the end link multiple exports together, which is a less frequent usecase as in static library case(where we append the symbol prefix).

That's a good point, though I believe we'll have the correct behavior by default. For both the mod.imported_modules and the static libraries, the user should only be calling the functions through the externally-exposed wrappers in mod. These are the names present in the IRModule itself.

(As an extension, we could indicate which functions are present within the static and/or imported modules, but that would be a later extension.)

I think this is more of an optional feature rather than mandatory feature, so we should avoid exposing it as things like keys, but maybe use some of the info if available for error reporting

I think we want to expose it to the user, for use debugging. If a user receives an error message with suggested typo corrections, or stating that there no similarly named functions, the obvious next step is to look at what functions actually are available. Since runtime.Module already provides __getitem__ like a python dictionary, other dictionary-like methods such as .keys() would be the obvious place for a new developer to look.

@Lunderberg
Copy link
Contributor Author

I've reverted the implementation of this PR, but kept the same unit tests. The platform-dependent implementation is entirely removed, replaced with a TIR lowering pass.

Prior to this commit, the `LLVMModuleNode` provided a
`"get_func_names"` function that would return the names of functions
available within the result of `tvm.build`.  However, this utility was
not preserved across a round-trip through `mod.export_library` and
`tvm.runtime.load_module`.

This commit adds a similar `"__get_func_names"` function to a
`DSOLibrary`, which returns the symbols available for use in the
library.  This is exposed in the `Module.keys()` method, mimicking
the interface that a Python user would expect.

In addition, this is used to improve the error message shown when a
library does not contain the requested function.  The `difflib` module
(from Python's stdlib) is used to find functions that are contained in
the module, and have similar names to the one requested.
@Lunderberg Lunderberg force-pushed the runtime_allow_listing_of_runtime_module_functions branch from 934bb63 to 74456cf Compare April 13, 2024 14:04
@Lunderberg Lunderberg force-pushed the runtime_allow_listing_of_runtime_module_functions branch from 74456cf to 41a0662 Compare April 13, 2024 19:52
@Lunderberg
Copy link
Contributor Author

Looks like the re-implemented version has trouble with RPC devices and with microtvm runtime. In order to compile the metadata function, I needed to add the ability to return a tir::StringImm from a PrimFunc, where previously this was only used to provide const char* arguments. There's a difference in how TVMArgValue and TVMRetValue represent kTVMStr: in TVMArgValue it stores a non-owning const char* in TVMValue::v_str, while in TVMRetValue it stores an owning std::string* in TVMValue::v_handle.

There is no straightforward way to perform the equivalent of rv.v_handle = new std::string("string literal"); in generated LLVM code, and it would be very implementation-dependent. Instead, I implemented T.ret("string literal") by delegating to a PackedFunc to allocate the TVMRetValue, then returning that value. This works very nicely for most use cases, but fails for the microtvm runtime (because the backend helper function is not present) and for RPC devices (because the string's buffer doesn't seem to be returned).

I think the ability to query a module's contents is too useful to set aside entirely, but this PR is going on the back burner for me until I can find a better way to implement it.

@Lunderberg
Copy link
Contributor Author

Making a few notes here for later. The key limitation is that a PrimFunc cannot reliably return a const char*, even to a statically-allocated string.

  • Even when using the same type code, TVMArgValue and TVMRetValue can have different internal representations.
  • A TVMRetValue with type code kTVMStr should have v_handle as its active member, and will be cast to a std::string*. The TVMRetValue owns the std::string, and must delete it when destructed.
  • A TVMArgValue with type code kTVMStr should have v_handle as its active member, and will be cast to a const char*. The TVMArgValue does not own the const char*, and must not delete it when destructed.
  • No other type code exists that could return a value as a
  • Allocating and constructing a std::string* depends on the c++ stdlib implementation. Reproducing this, either in the LLVM codegen or in the C runtime, would be implementation-dependent and error-prone.
  • The workaround in the current implementation calls from TIR to a packed function, which then constructs the std::string instance. This workaround is not available in AOT environments.

Idea for a design:

  1. Update TVMByteArray to have two additional members, free_func and free_func_arg.

    typedef struct {
        // A pointer to the byte array.
        const char* data;
        
        // The number of bytes in the array
        size_t size;
        
        // If the TVMPODValue_ owns the allocation, a function that can be called to
        // free the allocation.  If the TVMPODValue_ does not own the alllocation, should
        // be set to NULL.
        void (*free_func)(void*);
        
        // If NULL, then `free_func` is called using `data` as an argument.  If non-NULL
        // then `free_func` is called using `free_func_arg` as an argument.  This allows
        // additional context about the allocation to be provided, but only if required.
        void* free_func_arg;
    } TVMByteArray;
  2. TVMArgValue should still hold a non-owning pointer to the buffer. This would be stored with free_func = NULL.

  3. TVMRetValue should still hold an owning pointer to the buffer. For existing FFI calls, this would be stored with free_func = [](void* ptr) { delete static_cast<std::string*>(ptr); } and free_func_arg = &my_str.

  4. When returning a static string from TIR, this would be stored with free_func = NULL.

Steps 1/2/3 would all be in C++, and would be straightforward to implement with the existing APIs. Step 4 would allow a PrimFunc to return a non-owning const char* pointer, which should be converted to a python str by the FFI, and which should not have a destructor called.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants