Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add is_external_fun/1 and is_local_fun/1 guards #8554

Open
starbelly opened this issue Jun 8, 2024 · 45 comments
Open

Add is_external_fun/1 and is_local_fun/1 guards #8554

starbelly opened this issue Jun 8, 2024 · 45 comments
Assignees
Labels
enhancement team:VM Assigned to OTP team VM

Comments

@starbelly
Copy link
Contributor

Is your feature request related to a problem? Please describe.

No problems.

Describe the solution you'd like

The code exists in beam utilities to discern whether a fun is an external fun ref or a local fun. It would be quite nice to have these as guards. Said guards would mostly be convenience since we can get this information from erlang:fun_info/1 today. Though, there may be performance benefits to having a simple guard vs what erlang:fun_info/1 returns.

Describe alternatives you've considered

Simply use erlang:fun_info/1 in the body of a function required to check whether a function is external or local.

Additional context

In some situations one must validate that function is not a local function in order to ensure this is not slung around between nodes, as this can be a problem in a mixed cluster with different versions of OTP. In my case, we need to ensure callback functions in configuration are only ever external funs for this reason.

Additionally, I'd be happy to do the PR, but I figured an issue regarding the feature was appropriate first.

@okeuday
Copy link
Contributor

okeuday commented Jun 8, 2024

Creating only the guard function is_function_external/1 would avoid using "local" for use that doesn't relate to Distributed Erlang nodes.

I still have hope that Erlang/OTP might fix the fact that node/0 should not be a guard function, due to it not being a referentially transparent function (creating an error in the draft language specification). If that language bug gets fixed, that would likely lead to guards like is_local_pid/1, is_local_port/1, is_local_reference/1, is_remote_pid/1, is_remote_port/1, is_remote_reference/1 (as described at #5568).

@starbelly
Copy link
Contributor Author

Additionally note, while I put is_external_fun/1 in the issue name, this may be too broad, my original thought was is_ext_fun_ref/1 it could also be is_external_fun_ref/1. Also, If we just had guard to to indicate whether a term was a ext fun ref or not, the need for is_local_fun/1 becomes less useful (at least I have no use case for this).

@IngelaAndin IngelaAndin added the team:VM Assigned to OTP team VM label Jun 10, 2024
@jhogberg
Copy link
Contributor

jhogberg commented Jun 10, 2024

I agree that this would be useful. I'd like to change the names however (and not just here, though the ship might have sailed already) as "local fun" and "external fun"/"remote fun" might make people think about distribution, when it's more about how the fun relates to its module: the former being a fun tied to a specific module and the latter a fun tied to an MFA.

@jhogberg jhogberg self-assigned this Jun 10, 2024
@michalmuskala
Copy link
Contributor

Yes, I agree this would be very helpful to have, and I also agree that we could likely do better with the naming.

I also thing this should be coupled with having more places in the system use such funs instead of MFA tuples - with the recent changes they are smaller in memory, and also integrate far better with various analysis tools - typecheckers, dialyzer, static analysis, IDE tooling (go to definition navigation, etc), linting...
Using actual closures though in some places is not desirable since they prevent code upgrades and cross-node evaluation to work transparently (unlike MFA tuples), so being able to easily verify and specify what type of function is desirable would be useful in those cases.

@RaimoNiskanen
Copy link
Contributor

What about is_export_function/1,2 (arity 2 for the fun's arity, compare to is_function/1,2) since it relates to an module's export table and not the Erlang Distribution protocol.

There are, as said, important cases where you need the restriction that a fun() has to be an export entry fun, but rarely the opposite. Therefore I think that the module local variant isn't needed, since it in those rare cases be accomplished by combining is_function/1,2 and is_export_function/1,2.

@okeuday
Copy link
Contributor

okeuday commented Jun 10, 2024

@RaimoNiskanen is_function_exported/1,2 seems more natural to me, but I also always try to use common prefixes to group ideas (when I am allowed to). Either way, it sounds like a good approach.

@jhogberg
Copy link
Contributor

I don't think it's common here. A "local" fun is not tied to a particular instance of a node, and works just fine everywhere as long as the required code is available. "Local" refers to the fun's relation to a specific module, which is relevant in non-distributed scenarios (code upgrades, funs persisted to disk or similar).

@michalmuskala
Copy link
Contributor

I'd float some ideas is_free_function or some sort of super-verbose is_safely_shareable_function; or opposite is_module_bound_function or just is_bound_function

@RaimoNiskanen
Copy link
Contributor

Floating on...: is_function(F, export), is_function(F, export, Arity).
Or promote fun_info/2 into a guard so you can write fun_info(F, type) =:= external.

@starbelly
Copy link
Contributor Author

I like the idea of promoting fun_info/2 into a guard, as it covers cases in the future that might not be thought of today. I do rather like is_bound_function/1 more or something along those lines as it is succinct and can be documented simply (i.e., takes a function reference, returns true if it is bound to a module, otherwise false).

@starbelly
Copy link
Contributor Author

Coming back to this, I think is_bound_function/1,2 doesn't tell the whole truth here, I believe we can technically say funs evaluated via erl_eval are bound to that module, as such is_export_function/1,2 or is_exported_function/1,2 makes more sense. Maybe there's more ideas...

@michalmuskala
Copy link
Contributor

I was thinking about this a bit more. I think using the term "exported" will be problematic given the semantics of the already-existing erlang:function_exported/3. In particular, it verifies the function actually exists (and the module is loaded). In this case (I presume) we want a guard that succeeds even if the module is not loaded and doesn't actually check the function exists - only that it's the "external" format of the fun syntax.

@starbelly
Copy link
Contributor Author

@michalmuskala I agree with that and your presumption at least as far as what I had in mind was definitely not to check if the code was loaded.

@okeuday
Copy link
Contributor

okeuday commented Jun 13, 2024

I think using the term "exported" is valid because it is describing a future expectation that the function must be exported to be used, otherwise it will be considered undefined (i.e., an undefined function error occurs). The existence of the erlang:function_exported/3 does make the situation a little unusual if a guard function like is_function_exported/1,2 exists too, but the documentation can describe how the is_function_... guard function is only checking the function type and not whether the function's module is loaded.

@starbelly
Copy link
Contributor Author

@okeuday I definitely don't think you are wrong here, especially coupled with what @RaimoNiskanen stated, that this is related to the export table on a module. I think part of it for me is what's easier on the eyes and would what result in less ambiguity, the head of clause should be enough to disambiguate.

What about a happy medium between promoting fun_info/2 to a guard and is_exported_fun/1 (or some variation of), such is is_fun_type/2 ? I'd rather have a single arity guard, but that would be nicer than having to do a abs equal comparison in the case fun_info/2 was promoted to a guard.

@okeuday
Copy link
Contributor

okeuday commented Jun 15, 2024

@starbelly People haven't liked fun_info/1 in the past. The documentation says it is only meant for debugging and the function returns more than you normally need, in proplist format due to its age while most people would likely prefer a map now. Either way, the return value of fun_info/1 isn't friendly for guard expressions. The fun_info/1,2 type value as local or external isn't common and isn't represented in a type (it shouldn't be used more than it is meant to be used, for debugging). That makes fun_info/2 a bad choice as a guard, partially due to its association to fun_info/1.

That is why I saw this as a new guard function to associate with the existing is_function/1,2 guard functions, except providing true for a subset of the types that are true for is_function/1,2. That would make it a variation on the is_function/1,2 guard functions, so to communicate that variation I think it is best to use a suffix describing the variation. The simplest name using common terminology seems to be is_function_exported/1,2 because -export([...]). is always present in a module (is_function_export/1,2 is possibly not treating export as a verb and it is easier to understand export in the past tense, in the is_function_... use).

Using suffixes to describe variation is a little odd for people that want to write function names as English sentences because using suffixes for the variation should be equivalent to a subject-object-verb word order. Using subject-object-verb word order is most common among natural languages though, so it should be how most people naturally think.

@starbelly
Copy link
Contributor Author

@okeuday I don't disagree with what you've stated, but I'm still wondering if there is a better name that doesn't "run together" with what exists already. Needs a little more thought, I do believe, this would be something that sticks for a long time after all 😄

@starbelly
Copy link
Contributor Author

Another idea is_named_function/1

@okeuday
Copy link
Contributor

okeuday commented Jun 23, 2024

@starbelly A named function could be a local function, so the "named" word doesn't help to distinguish between exported functions and local functions.

@starbelly
Copy link
Contributor Author

@starbelly A named function could be a local function, so the "named" word doesn't help to distinguish between exported functions and local functions.

Yes, I thought about this after I shared the idea, and ambiguity therein. So, I think we need to be clear about the "export" part, at least I can not think of something that is quite ambiguous, sans what @michalmuskala suggested around is_safe_and_shareable_will_not_blow_up_in_your_face_as_long_as_the_module_and_function_is_there_regardless_of_the_version/1

I'd like to hear from OTP team at this point 😄

@okeuday
Copy link
Contributor

okeuday commented Jun 23, 2024

@starbelly Using is_function_atomic or is_function_updatable would avoid using exported, but the function names seem a bit less intuitive.

@starbelly
Copy link
Contributor Author

I think export / exported is fine, and it makes sense as that's precisely what we're talking about. I think ideally we'd have a name that conveys something more, but thus far I don't think anyone has come up with something that is non-ambiguous and sends a signal to the user.

@starbelly
Copy link
Contributor Author

I'm about to open a PR for this, while it would be nice to have the name ironed out here vs the PR, I'm not sure it'll play out like that. As happy medium and what I think might just work well, is is_export/1,2. Let me know your thoughts.

@michalmuskala
Copy link
Contributor

I was thinking about this again. Perhaps it would be easier to name things if we flipped the problem around? Something like an is_closure_function/1 to detect non-export funs - while usage to detect specifically export funs would be much more verbose (with something like is_function(X) and also not is_closure_function(X)) however, the naming part could be easier to settle

@jhogberg
Copy link
Contributor

jhogberg commented Aug 27, 2024

I like it, though I'm wary of how verbose it will seem a decade down the line when we're all familiar. is_binary_data would've made sense at the time binaries were added, but aren't we happy it's is_binary now?

How about is_closure/1,2 and is_export/1,2?

Edit: Come to think of it is_function/1,2 itself is a good example: the syntax says fun and everyone calls them "funs," yet the guard function is the odd one out name-wise. If we had a time machine and could change this, would is_fun/1,2 have been a better choice?

@RaimoNiskanen
Copy link
Contributor

I suspect this problem isn't perfectly suited for flipping around.

We want to know if a function is safe to keep around over module upgrades so when you later call it the call may be to a new module:function/arity, that is if the call goes to an exported/external function.

If a function has a closure is a slightly different property:

foo() ->
  F1 = fun (Y) -> erlang:is_boolean(Y),
  F2 = fun erlang:is_boolean/1,
  {F1, F2}.

Here F2 is an export entry fun, but F2 is not. Neither has a closure. Although an export entry fun cannot have a closure, a "local" fun also may not have one.

@michalmuskala
Copy link
Contributor

I would still say F1 is a closure, it just happens to close over an empty environment. But yes, I agree it's somewhat ambiguous.

@jhogberg
Copy link
Contributor

You can also argue that F1 is a true closure because it closes the specific module version it's in.

@RaimoNiskanen
Copy link
Contributor

RaimoNiskanen commented Aug 27, 2024

So the property we want: is module specific function or is code upgrade safe function.

But I see what you mean; it is possible to define a closure fun to be any fun that isn't called trough an export entry, that is every fun that is created through the fun (Args...) -> end syntax, and by that we are home free as long as there is only two forms to create funs; the above, and fun M:F/A.
That seems solid enough... I think I have no more objections.

@starbelly
Copy link
Contributor Author

is_closure/1,2 sounds nice to me.

@starbelly
Copy link
Contributor Author

starbelly commented Aug 27, 2024

Hmm, while it sounds nice, and maybe this is niche, folks will have to perform two checks with is_closure/1,2 vs one (i.e., is_function(Fun) andalso not is_closure(Fun)), thoughts?

@jhogberg
Copy link
Contributor

I'm partial to having both is_closure/1,2 and is_export/1,2.

@starbelly
Copy link
Contributor Author

starbelly commented Sep 2, 2024

Humm, should extend the type t_fun or add t_closure / t_export in the same PR?

Edit:

I suppose only extending t_fun makes sense, but the question still remains.

@josevalim
Copy link
Contributor

If I may push back on this feature a bit. I agree that being able to distinguish between these two is important but is it common enough to warrant adding a guard for it? Only few functions are available in guards today and we typically teach the guards early on and we tend to memorize them. Is this distinction important enough to make it front-facing in the language?

I have written several dozens of thousands of Erlang/Elixir code at this point and I probably had to distinguish between them 2 or 3 times... and I tend to be writing lower level code. How many use cases can we even surface collectively?

In my opinion, calling erlang:fun_info(Fun, env) in those cases is more than enough. Perhaps we could add a functions module to keep some convenience helpers, but I don't see a need for more functions in erlang (it is already quite large) and especially not in guards.

@okeuday
Copy link
Contributor

okeuday commented Sep 18, 2024

@josevalim Yes, it is common enough to add a guard for it. The difference matters when you care about hot-code loading of functions that require validation before they are loaded. Often hot-code loading is avoided by Erlang developers (due to operations preferring static source code) but it is a critical feature of the Erlang language that influenced its design and helps to justify its use (for the important fault-tolerance use-cases that would be nice to see in the future).

I don't believe the existence of Elixir is a reason to not improve Erlang.

@josevalim
Copy link
Contributor

josevalim commented Sep 18, 2024

I don't believe the existence of Elixir is a reason to not improve Erlang.

Where was this argument laid out? Can you please show me so I can improve myself? Because I don't think I said at any moment that Erlang should not improve because Elixir exists.

@jhogberg
Copy link
Contributor

If I may push back on this feature a bit. I agree that being able to distinguish between these two is important but is it common enough to warrant adding a guard for it? Only few functions are available in guards today and we typically teach the guards early on and we tend to memorize them. Is this distinction important enough to make it front-facing in the language?

On the flip side a guard that isn't used doesn't need to be memorized. If any of us were asked to provide a list of guards off the top of our heads, who would remember to include list/1 and friends?

@okeuday
Copy link
Contributor

okeuday commented Sep 20, 2024

Just to be clearer about the use-cases of the is_export/1,2 (or whatever name it becomes) guard function, the main use-case described in the issue description is regarding Distributed Erlang use of functions which is a very valid and important problem.

A separate use-case is when a module wants to support hot-code loading and returns a function. In that situation you want to ensure the function returned only calls exported functions. If the module uses configuration to use Erlang functions dynamically, then you care about having the is_export/1,2 guard function used. If you are able to stick with explicit function calls, you may not need the is_export/1,2 guard function for validation, but you likely still want the guard function to have the use-case documented (to avoid the surprise failure that occurs when you don't care about this problem in your module you thought you could hot-code load).

The compiler can't check for that problem in all possible ways it can occur, so any check for it would be incomplete (if one was added). If a compiler check was added, it would likely be a warning that is always turned off to avoid causing noise for source code that doesn't care about this problem.

So, the use-cases for these guards are important but might appear unusual to people.

@okeuday
Copy link
Contributor

okeuday commented Sep 20, 2024

An additional idea is adding a separate function syntax that forces you to only use exported functions for your function calls in the function body. That would be a way of avoiding the is_export/1,2 guard function use-cases (when creating anonymous functions). A closure that takes a variable with a function may not be able to determine whether a variable is an exported function, but having a separate function syntax would ensure that causes a compilation error.

@josevalim
Copy link
Contributor

josevalim commented Sep 20, 2024

I don’t think the use cases have been unclear to the folks in this thread. I was personally asking how common they would be in practice. Providing code examples that do such checks today (or should be doing them) would, IMO, be a useful addition to the discussion.

Of course, it is not my decision to make and no one has to answer to my concerns. I only hope that me asking for examples is not going to lead to more strawman arguments that I don’t want Erlang to evolve.

@starbelly
Copy link
Contributor Author

starbelly commented Sep 21, 2024

Providing code examples that do such checks today (or should be doing them) would, IMO, be a useful addition to the discussion.

I too would like to hear from other people how they would use this, tbh. I have stated my cases, which to brief is to guard against anon functions being stored (either in disk or in memory) for safe code upgrades (and thus used at all within the code that would said stored functions, which require consistent validation, vs just once) and guarding against anon functions being slung over the wire.

I think I I understand your concerns aside from usage, which may be, it could be confusing to new comers, when to use what, is that right?

Edit:

Additionally, if we really don't want new guards or bifs, it is still possible to modify is_function to take a third argument. I'm ok with the new guards, but if enough people think it's clutter, then that's still a possibility. The annoying thing about this approach, is that in order to check whether something is or isn't a closure, you always have to provide the arity. Though, we could pattern match and dispatch based on the type of the second argument, though I feel that's a bit messy.

@josevalim
Copy link
Contributor

My concern is not newcomers per-se, but rather that the number of guards have been historically kept small. And given what is a guard or not is mostly an arbitrary decision, I'd say that keeping the list small is a good thing, since it impacts the way you write the code (it controls what goes on the right side of when, inside if, etc).

There is also the argument that there are other functions that we could make guards: is_alive, is_process_alive, function_exported, etc. which I have used more frequently than fun_info. Should some of them be guards too?

While I don't think any of the arguments above are blockers, I think more examples of how we could use this feature would be helpful. The most recent example I can think of is in telemetry but that's not a use case screaming to be a guard. :)

@starbelly
Copy link
Contributor Author

There is also the argument that there are other functions that we could make guards: is_alive, is_process_alive, function_exported, etc. which I have used more frequently than fun_info. Should some of them be guards too?

These are arguments I can get behind. I can not say the use wouldn't be rare, as it would not. Not rare in the sense that it's never used, but the frequency of usage in a code base would not be great, contrast that with is_binary/1. Of course, map_size/2 and guards of this sort are the same for me, that is, they are useful, but the frequency of usage is not great, yet I'm always happy they exist and at my disposal when needed.

Another alternative, in response to your points, these do not have to be guards. What drove this initially on my end? I don't fancy the thought of using a debug function to get some of the information I'm after to make a decision, especially if if I'm going to have to call it not once, but many times 😄 . These could just simply be bifs, if they still lived in erlang, then if needed they could be promoted to guards later on.

Still, we need to hear more from others, I'm a fan of listening 😄

@okeuday
Copy link
Contributor

okeuday commented Sep 21, 2024

Guard functions need to be referentially transparent, as described in the spec. Unfortunately, node/0 is not referentially transparent due to being a read of global data that is allowed to change (but there is an ignored issue for that at #5568). That also means that is_alive/0 and is_process_alive/1 should not be guards due to being reads of global data that is allowed to change (mutable global data).

The more guard functions that are available, the better source code can be, so there is a benefit by making things simpler. From that perspective, using if should always be preferred instead of case, if that is possible. People generally get mixed-up on what referential transparency means (the spec does since it has had the node/0 error for so long), but there is a description at https://github.com/okeuday/effects#readme which can help to clarify things.

I ran into the hot-code loading use-case when attempting to hot-code load https://github.com/CloudI/CloudI/blob/develop/src/lib/cloudi_service_monitoring/src/cloudi_service_monitoring.erl due to the aspect functions (that are used to attach extra monitoring to services) which needed to use only exported functions. That is fixed now, but there is the potential for getting into that problem elsewhere and that type of bug is easy to have in source code (due to nothing checking for it).

@starbelly
Copy link
Contributor Author

Still haven't heard back from others on use cases, I'm just going to defer to @jhogberg on whether to open this PR or not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement team:VM Assigned to OTP team VM
Projects
None yet
Development

No branches or pull requests

7 participants