Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SHELL=sh zellij is broken #1943

Open
raphCode opened this issue Nov 16, 2022 · 14 comments
Open

SHELL=sh zellij is broken #1943

raphCode opened this issue Nov 16, 2022 · 14 comments

Comments

@raphCode
Copy link
Contributor

This worked previously:
SHELL=sh zellij
and zellij would start up with the specified shell.

Since #1769 this results in:

Error occurred in server:

  × Thread 'pty' panicked.
  ├─▶ Originating Thread(s)
  │     1. ipc_server: NewClient
  │     2. pty_thread: NewTab
  │
  ├─▶ At zellij-server/src/pty.rs:441:13
  ╰─▶ Cannot find shell sh
@har7an
Copy link
Contributor

har7an commented Nov 16, 2022

You're right, the lookup is wrong in this case. Is this something we actively supported previously, or was it more of a side-effect of the implementation? Because afaik, echo $SHELL always returns a full path to a executable.

@raphCode
Copy link
Contributor Author

raphCode commented Nov 16, 2022

I was happy that I could use it.
Not sure if users relied on this. (Probably)

SHELL=sh does the right thing in vim (:ter), but not im tmux. Can't think of more programs to test.

@har7an
Copy link
Contributor

har7an commented Nov 16, 2022

Lol, good one. Fair point, I'll reenable it. A bit of which magic should fix this, thanks for raising it!

@imsnif
Copy link
Member

imsnif commented Nov 16, 2022

If I may intervene - why not remove the exists check entirely? I don't think it's doing us any favors.

@har7an
Copy link
Contributor

har7an commented Nov 16, 2022

why not remove the exists check entirely?

Because starting zellij with a shell that doesn't exist (and for that matter it's irrelevant whether its specified in the config.kdl or in $SHELL) causes a freeze and makes zellij unresponsive to any input. This is not an issue on a local connection, because I just kill the terminal but I've had this happen via SSH on a remote machine in university yesterday. That's annoying to "fix".

Although now that you mention it, I guess this needs a fix in another location where we catch both cases, right? I assume that #1769 touched a different codepath than what a shell defined in config.kdl uses? No idea where they go tbh.

@sux2mfgj
Copy link
Contributor

It seem there is a handling code for the command couldn't found.
https://github.com/zellij-org/zellij/blob/main/zellij-server/src/pty.rs#L211

I don't know why does zellij cause the freeze.

@imsnif
Copy link
Member

imsnif commented Nov 16, 2022

I see - makes sense.

We have a bit of logic in place to look for the shell and similar commands when we spawn them. I wouldn't advise pre-checking with the same logic, but rather riding on that error.

We in face already do this and have a special error for this case:

Some(ZellijError::CommandNotFound { terminal_id, .. }) => {

I guess it doesn't help in this case because the screen thread never tells the client to unblock. I'm not sure there's a trivial fix for this.

@har7an
Copy link
Contributor

har7an commented Nov 16, 2022

RunCommand::command is used in 19 places in our codebase, we can't check it in all of these locations. There is server::os_input_output::command_exists which checks whether RunCommand::command does in fact exist, but it seems that it isn't called in all codepaths.

I think it'd be a good idea to rewrite RunCommand with a fallible constructor, then we could make it fail when either the command doesn't exist, or CWD doesn't exist. What are your thoughts?

@imsnif
Copy link
Member

imsnif commented Nov 16, 2022

That might be problematic, because in some cases it is constructed in an environment that is not guaranteed to be the same environment in which it will be run (eg. sending a cli command).

@har7an
Copy link
Contributor

har7an commented Nov 16, 2022

How so? I mean how would this influence the sanity checks? Can the CWD be relative?

@imsnif
Copy link
Member

imsnif commented Nov 16, 2022

Yes, the CWD can be relative. When we start doing more involved things with client/server (eg. web clients, sharing backends between machines, etc.) it will also be dangerous to assume in the code that it's running on the same machine.

This error, in the context of Zellij starting, trickles down to here eventually:

Some(ZellijError::CommandNotFound { terminal_id, .. }) => {

I think we even send it back to the screen (just weeding out the error). I think the "right" solution would be to handle it there. It's just a bit of work.

@imsnif
Copy link
Member

imsnif commented Nov 16, 2022

On the upside though, if we handle it in screen we can directly display it to the user in the terminal pane rather than crashing.

@har7an
Copy link
Contributor

har7an commented Nov 16, 2022

When we start doing more involved things with client/server (eg. web clients, sharing backends between machines, etc.) it will also be dangerous to assume in the code that it's running on the same machine.

Right, good point.

This error, in the context of Zellij starting, trickles down to here eventually:

But do all of them end up there? I'm not familiar with this bit of the code.

Displaying an error in the pane when the shell couldn't be found and unblocking the input handling sounds like a reasonable thing to do. In this case the user can also quit the application regularly, which helps in SSH sessions. In fact, not entirely sure how displaying text in panes is handled, we already have the mechanisms in place to get a "stringified" version of all anyhow::Errors and feed them into a closure of our choice. Maybe we can build something nifty with that?

@imsnif
Copy link
Member

imsnif commented Nov 16, 2022

We already do this here:

send_command_not_found_to_screen(

The problem is that here we do this with held panes. Meaning we are guaranteed that the pane exists in the screen thread. We don't have this guarantee in this case. In fact, the screen thread doesn't even know anything happened yet.

What happens here is (in a simplified way):

  1. The server starts, sending the layout to the pty thread.
  2. The pty thread opens all the shells/commands and sends an instruction to the screen with a list of all the terminal ids
  3. The srcreen thread opens up panes for all of the ptys according to the same layout

This error happens at the end of 2, is ignored and its id is sent to the screen (I think - gotta test it to be sure). To fix this we'll have to add a facility in screen to handle this case (namely, not weed out the error and look for it in the screen and then do the right thing with it).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants