Skip to content
This repository has been archived by the owner on Jun 24, 2024. It is now read-only.

Prepare for release #81

Merged
merged 19 commits into from
Apr 2, 2023
Merged

Prepare for release #81

merged 19 commits into from
Apr 2, 2023

Conversation

philpax
Copy link
Collaborator

@philpax philpax commented Mar 26, 2023

This was referenced Mar 27, 2023
@philpax philpax changed the title WIP: Prepare for release Prepare for release Mar 29, 2023
@philpax philpax added the meta:maintenance Changes that will make it easier for us to maintain code label Mar 29, 2023
@philpax
Copy link
Collaborator Author

philpax commented Mar 29, 2023

This is good to go aside from the AS fix (need to test on work laptop).

Something I noticed is that InferenceParameters contains play_back_previous_tokens (only relevant to inference_with_prompt) and inference_with_prompt takes maximum_token_count as a parameter - we should consider moving these into a specific struct for parameters for inference_with_prompt only, like

    pub fn inference_with_prompt<E: std::error::Error + 'static>(
        &mut self,
        model: &Model,
        vocab: &Vocabulary,
        inference_params: &InferenceParameters,
        inference_with_prompt_params: &InferenceWithPromptParameters,
        prompt: &str,
        rng: &mut impl rand::Rng,
        callback: impl Fn(&str) -> Result<(), E>,
    ) -> Result<InferenceStats, InferenceError> {
}

We could also potentially move the &mut impl rand::Rng into InferenceParameters. I'm not sure if this is a good idea yet, but it should let us simplify the number of arguments some more.

Is there a context in which you would use the Vocabulary of a model without the model itself? I'm leaning towards moving that into Model, too.

@KerfuffleV2
Copy link
Contributor

KerfuffleV2 commented Mar 29, 2023

We could also potentially move the &mut impl rand::Rng into InferenceParameters.

You'd have to add a lifetime to the struct and maybe it would need to be mutable also, so probably not worth it. (Also, that's a lot of tasks checked off. Nice work!)

@hhamud
Copy link
Contributor

hhamud commented Mar 30, 2023

I'm on apple silicon and I haven't felt any performance issues versus the original implementation of llama.cpp in C. What are the performance issues?

@philpax
Copy link
Collaborator Author

philpax commented Mar 30, 2023

We don't compile with any flags with ARM, so the compiler won't use NEON etc. Easy enough to fix, just haven't yet

@philpax philpax mentioned this pull request Mar 31, 2023
Copy link
Collaborator

@setzer22 setzer22 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the changes! :) Everything looks good, there's just a very minor nitpick comment from my end.

Also, why does ggml.rs appear as a removed file? The code seems to still be using it, and I don't see where that code has moved to 🤔

llama-rs/src/lib.rs Outdated Show resolved Hide resolved
@philpax
Copy link
Collaborator Author

philpax commented Apr 1, 2023

Thanks a lot for the changes! :) Everything looks good, there's just a very minor nitpick comment from my end.

Also, why does ggml.rs appear as a removed file? The code seems to still be using it, and I don't see where that code has moved to 🤔

It's moved to ggml/src/lib.rs. Not sure why Git can't detect the move.

Copy link
Collaborator

@setzer22 setzer22 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 😄

@philpax philpax merged commit 8a87611 into rustformers:main Apr 2, 2023
@philpax philpax deleted the prepare-for-release branch April 2, 2023 08:51
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
meta:maintenance Changes that will make it easier for us to maintain code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Publish to crates.io
4 participants