Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove 'source lifetime #135

Closed
Tracked by #268
mitsuhiko opened this issue Nov 4, 2022 · 4 comments · Fixed by #275
Closed
Tracked by #268

Remove 'source lifetime #135

mitsuhiko opened this issue Nov 4, 2022 · 4 comments · Fixed by #275
Labels
enhancement New feature or request

Comments

@mitsuhiko
Copy link
Owner

Today the engine holds &'source str all over the place in the instructions and by extension in the templates. This is a design pillar that I originally inherited from the initial API that was inspired by tinytemplate. It's a pretty clever setup but it has the disadvantage that the API cannot deal with dynamic templates well.

The solution to this problem so far has been the source feature which introduces a Source type which lets the engine lie about lifetimes through the use of self-cell and memo-map.

There is a alternative that could be evaluated where these borrows to the source string are not taking place through the references but string handles. That would likely require significant changes in the lexer and potentially the parser as well, but the changes to the instructions are probably quite limited.

The way this could work is that the instructions that currently hold an &'source str would instead hold something like a StrHandle which looks something like this:

struct StrHandle {
    offset: u32,
    len: u32,
    #[cfg(debug_assertions)]
    instructions_id: u64,
}

impl StrHandle {
    pub fn as_str(&self, instr: &Instructions) -> &str {
        debug_assert_eq!(instr.id, self.instructions_id);
        &instr.source[self.offset as usize..self.offset as usize + self.len as usize]
    }
}

Obviously one complexity with this approach is that at no point the handles must be separated from the instructions they are contained in. It also comes at least through the safe interface with additional cost as Rust will require a lot more bounds checks. unsafe would work here, but actually making a safe interface where someone does not change the source string underneath the instructions accidentally could be quite tricky.

The benefit of doing this would be that the engine ends up with less dependencies, overall less unsafe code and a nicer API.

@mitsuhiko mitsuhiko added the enhancement New feature or request label Nov 4, 2022
@mitsuhiko
Copy link
Owner Author

One other thing is that the engine has a few places where that basic model does not work well:

  • Token::Str holds a Cow so that it can store unescaped strings
  • for internal loop usage Instruction::Lookup and Instruction::GetAttr also get static strings at times
  • Internal maps often hold &'source str and with that change they would need to hold a String instead since we could not adjacently borrow

I think a model that could work is to have a StrCache<'source> which is used everywhere where currently &'source str is used to hold the source and then a few StrHandle given out by the StrCache<'source>. For as long as also for release builds an internal ID is given out that could be made safe without incurring bounds checks. For the few places where a &'static str is placed there, I think the handle could be allowed to store references to static strings or specific instructions are used instead. For the the Cow case that only exists in the Token but never in Instruction so that problem is much smaller in comparison. In the instructions it's already converted into a Value which clones.

@mitsuhiko
Copy link
Owner Author

I made some small changes internally that address some of these issues. In d53160d I removed static strings added to instructions for loop internals and in 6e60023 I removed the Cow from the string token.

@mitsuhiko
Copy link
Owner Author

I had back out one of these due to performance regressions.

@mitsuhiko
Copy link
Owner Author

I have a new way now in which I think I want to go about this. The 'source lifetime is quite useful for multiple scenarios so it does not need outright removing. What actually however is annoying is that you don't have a convenient way to add owned strings into the environment. This runs against a bunch of lifetime issues, but that is already true for the Source feature too.

I think what I might be doing for MiniJinja 1.0 and to close off this issue is the following:

  • When the source feature is enabled, the Engine always holds a Source object internally to replace the template map
  • Environment::add_template stays and borrows
  • Environment::add_owned_template (name TDB) is does what today is env.source_mut().add_template(...)
  • Hide the Source object internally.

Since after you can add owned templates there is not a lot of reason to actually expose the Source object. The main of the Source object which at that point is internal is to set a loader:

Before:

let mut env = Environment::new();
env.set_source(Source::with_loader(|name| {
    if name == "layout.html" {
        Ok(Some("...".into()))
    } else {
        Ok(None)
    }
}));

After:

let mut env = Environment::new();
env.set_loader(|name| {
    if name == "layout.html" {
        Ok(Some("...".into()))
    } else {
        Ok(None)
    }
});

And for owned templates:

Before:

let mut env = Environment::new();
let mut source = Source::new();
source.add_template("index.html", "...").unwrap();
env.set_source(source);

After:

let mut env = Environment::new();
env.add_owned_template("index.html", "...").unwrap();

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant