Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQLite History (third version) #401

Merged
merged 28 commits into from
Jun 6, 2022
Merged

Conversation

phiresky
Copy link
Contributor

This is the third version of a history database for reedline, based on the previous discussions on Github and Discord.

  • The history trait is modified to be more flexible and support all the interesting metadata:

    /// Represents a history file or database
    /// Data could be stored e.g. in a plain text file, in a JSONL file, in a SQLite database
    pub trait History: Send {
    /// save a history item to the database
    /// if given id is None, a new id is created and set in the return value
    /// if given id is Some, the existing entry is updated
    fn save(&mut self, h: HistoryItem) -> Result<HistoryItem>;
    /// load a history item by its id
    fn load(&self, id: HistoryItemId) -> Result<HistoryItem>;
    /// gets the newest item id in this history
    fn newest(&self) -> Result<Option<HistoryItemId>> {
    let res = self.search(SearchQuery::last_with_search(SearchFilter::anything()))?;
    Ok(res.get(0).and_then(|e| e.id))
    }
    /// creates a new unique session id
    fn new_session_id(&mut self) -> Result<HistorySessionId>;
    /// count the results of a query
    fn count(&self, query: SearchQuery) -> Result<i64>;
    /// return the total number of history items
    fn count_all(&self) -> Result<i64> {
    self.count(SearchQuery::everything(SearchDirection::Forward))
    }
    /// return the results of a query
    fn search(&self, query: SearchQuery) -> Result<Vec<HistoryItem>>;
    /// update an item atomically
    fn update(
    &mut self,
    id: HistoryItemId,
    updater: &dyn Fn(HistoryItem) -> HistoryItem,
    ) -> Result<()>;
    /// remove an item from this history
    fn delete(&mut self, h: HistoryItemId) -> Result<()>;
    /// ensure that this history is written to disk
    fn sync(&mut self) -> std::io::Result<()>;
    }

    The FileBackedHistory was rewritten to be based on this new trait, returning errors if the user tries to filter by things it doesn't support.

    Each history item stores the command line plus some additional info:

    pub struct HistoryItem<ExtraInfo: HistoryItemExtraInfo = Anything> {
    /// primary key, unique across one history
    pub id: Option<HistoryItemId>,
    /// date-time when this command was started
    pub start_timestamp: Option<chrono::DateTime<Utc>>,
    /// the full command line as text
    pub command_line: String,
    /// a unique id for one shell session.
    /// used so the history can be filtered to a single session
    pub session_id: Option<HistorySessionId>,
    /// the hostname the commands were run in
    pub hostname: Option<String>,
    /// the current working directory
    pub cwd: Option<String>,
    /// the duration the command took to complete
    pub duration: Option<Duration>,
    /// the exit status of the command
    pub exit_status: Option<i64>,
    /// arbitrary additional information that might be interesting
    pub more_info: Option<ExtraInfo>,
    }

  • All the stateful cursor functionality (back, forward) is removed from the History trait and moved to a separate HistoryCursor struct that works with any history. I also moved all the history tests in there because they use the cursor.

  • The historical metadata is stored is used by using the new update_last_command_context function so as to not make this a breaking change:

            Ok(Signal::Success(buffer)) => {
                line_editor
                    .update_last_command_context(&|mut c: reedline::HistoryItem| {
                        c.start_timestamp = Some(chrono::Utc::now());
                        c.hostname = Some(gethostname::gethostname().to_string_lossy().to_string());
                        c.cwd = std::env::current_dir()
                            .ok()
                            .map(|e| e.to_string_lossy().to_string());
                        c.session_id = Some(session_id);
                        c
                    })?;

    This can be called multiple times to set values that are only known after the command has run (e.g. duration, exit_status)

Stuff that should be considered (later):

  • Errors: Most methods of the history trait are fallible and IMO should stay that way. Right now the Error type is String because I don't know how you want to handle errors. Right now it seems like only std::io::Error is used in other places, but there's no specific Error struct. Errors from the history are handled as .except("todo: error handling") right now because I didn't want to change the signature of any public reedline methods.
  • Right now there's no UI or anything to use the additional features. Also the history is instantly merged across all shells for SQLite. These should be changed later.

References:

@fdncred
Copy link
Collaborator

fdncred commented Apr 18, 2022

I haven't looked at the code yet, but from your write up, it sounds like a great start.

(later) We may want to add a separate table to capture the one "big" thing that was left out of the HistoryItem struct, run_count, but I'm fine with moving ahead without it at this point. The run_count can be used to understand which commands are used most and be useful in a fuzzy search that takes run_count into consideration, but my biggest reason for wanting it, was to create a replacement for z, zoxide, and the like, but built into reedline/nushell.

Thanks @phiresky for all your hard work. I look forward to reviewing this a little bit later and playing with it!

@phiresky
Copy link
Contributor Author

I intentionally left out run_count because (if I'm understanding you correctly) it's not really something that makes sense for a normalized database structure. The SQLite history (as I implemented it) stores duplicates of the same command again every time because the start time, working dir, exit status etc. can all be different even if the command is the same.

This means you can compute the run_count column easily by doing a query like select *, count(*) as run_count from history group by command_line without having to store it. This is also more flexible because you can e.g. get the run count of a specific command in a specific directory.

To get z like functionality you would probably query the history for the run count by directory (not by command) like this:

select hostname, cwd,
	count(*) as number_of_commands_run
from history
group by hostname, cwd
order by number_of_commands_run desc
limit 10

That'll give you the 10 most used directories by number of commands run in them

@fdncred
Copy link
Collaborator

fdncred commented Apr 18, 2022

I intentionally left out run_count because (if I'm understanding you correctly) it's not really something that makes sense for a normalized database structure. The SQLite history (as I implemented it) stores duplicates of the same command again every time because the start time, working dir, exit status etc. can all be different even if the command is the same.

This means you can compute the run_count column easily by doing a query like select *, count(*) as run_count from history group by command_line without having to store it. This is also more flexible because you can e.g. get the run count of a specific command in a specific directory.

To get z like functionality you would probably query the history for the run count by directory (not by command) like this:

select hostname, cwd,
	count(*) as number_of_commands_run
from history
group by hostname, cwd
order by number_of_commands_run desc
limit 10

That'll give you the 10 most used directories by number of commands run in them

That seems reasonable to me too.

Just want to get @sholderbach's and @elferherrera's buy-in on the changes.

Also, I'm usining [bundled-full] feature on another command I'm working on. Do you have any opinions about bundled vs bundled-full?

@phiresky
Copy link
Contributor Author

As far as I know bundled-full only activates the other features of rusqlite (vtables, user-defined-functions etc), it doesn't change anything about the shipped SQLite version. Right now we're not using any of those additional features so I don't think bundled-full would change anything.

@stormasm
Copy link
Contributor

cargo build --features=sqlite
cargo run --features=sqlite

@phiresky so far everything looks good !
I ran the code and it works out of the box...
great job... 👍
really appreciate the fact that you hung in there
and made the group consensus modifications.
looking forward to the other folks opinions as well
as having this PR get landed....

@elferherrera
Copy link
Contributor

I won't be able to check it until tomorrow night. Can @fdncred or @stormasm check if the history menu works as it should in reedline when using SQL history?

@stormasm
Copy link
Contributor

@elferherrera It looks to me like it is working ! I will let @fdncred give it a whirl as well.... When I hit my ctrl-x key with @phiresky code it brings up the history....

@stormasm
Copy link
Contributor

open ./history.sqlite3 | get history

@rgwood the above command works nicely on the newly created history db 👍

@fdncred
Copy link
Collaborator

fdncred commented Apr 19, 2022

I'd rather the filename be history.db does anyone have any objections?

@phiresky
Copy link
Contributor Author

phiresky commented Apr 19, 2022

I'm a fan of .sqlite3 because .db can also mean other things (e.g. Thumbs.db on windows), sqlite3 can easily be googled (compared to .db) and if it's sqlite3 the association with all tools like e.g. sqlitebrowser is unambiguous ;)

@elferherrera
Copy link
Contributor

Is there a particular reason why you skipped the partial completion from the history completer?

@phiresky
Copy link
Contributor Author

Yeah, the reason I omitted it (for now) is that LIMIT+OFFSET-based pagination only works well under certain circumstances in SQLite (as well as most SQL databases https://use-the-index-luke.com/sql/partial-results/fetch-next-page).

The reason is that the database engine can only optimize an OFFSET query under very specific circumstances, so in most cases it has to scan through all elements from index 0 up to offset every time. Of course this is still better than actually scanning and actually retrieving all elements from 0 to maxid every time, but it's not great.

The better solution for pagination would be pagination tokens: you call search(..., limit, pagination_token). and get back a new pagination token for the next page. The pagination token here would simply be the last id returned, but that would still require a change to the interface of the partial_completion method. I didn't verify how much work that would be or if it would even be possible.

So I really just didn't want to add an offset parameter to the search function in general because it encourages bad use. If the partial_completion interface can't "easily" be changed though we should probably add it.

Copy link
Member

@sholderbach sholderbach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First of all thank you very much for all the effort and patience with us. It really looks promising to have something that both covers the old history and sql.

I collected my thoughts after I finally got time to read through the actual implementation.

Before devolving into my comment furball:

  • The feature bashisms used by nushell for the !! bash style history expansion is currently not yet up to date and fails compilation
  • There are a few design related questions (e.g. iter_chronologic or another easy to use iterator, having the ids prominently floating around) that might be worth discussing before addressing the minor stuff.
  • Yeah the error handling should be able to distinguish between issues resulting from IO fails/version incompatibilities/unsupported features.

.github/workflows/ci.yml Outdated Show resolved Hide resolved
.gitignore Outdated Show resolved Hide resolved
Comment on lines -22 to -23
/// Chronologic interaction over all entries present in the history
fn iter_chronologic(&self) -> Iter<'_, String>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the basic iterator is very valuable and should remain for ease of use. (If the implementation would favor having it on HistoryCursor would be fine as well)

Comment on lines +76 to +102
pub fn all_that_contain_rev(contains: String) -> SearchQuery {
SearchQuery {
direction: SearchDirection::Backward,
start_time: None,
end_time: None,
start_id: None,
end_id: None,
limit: None,
filter: SearchFilter::from_text_search(CommandLineSearch::Substring(contains)),
}
}
pub fn last_with_search(filter: SearchFilter) -> SearchQuery {
SearchQuery {
direction: SearchDirection::Backward,
start_time: None,
end_time: None,
start_id: None,
end_id: None,
limit: Some(1),
filter,
}
}
pub fn last_with_prefix(prefix: String) -> SearchQuery {
SearchQuery::last_with_search(SearchFilter::from_text_search(CommandLineSearch::Prefix(
prefix,
)))
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we get the constructor names more consistent?

src/history/cursor.rs Outdated Show resolved Hide resolved
src/main.rs Outdated Show resolved Hide resolved
src/history/cursor.rs Outdated Show resolved Hide resolved
src/history/cursor.rs Outdated Show resolved Hide resolved
src/history/base.rs Outdated Show resolved Hide resolved
Comment on lines +143 to +148
/// update an item atomically
fn update(
&mut self,
id: HistoryItemId,
updater: &dyn Fn(HistoryItem) -> HistoryItem,
) -> Result<()>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about it: is this a capability of the base history trait? (would it make sense to lift Reedline::update_last_command_context to the concrete history type or a subtype trait ExtendableHistory? [gnarf: can we export it from Reedline while it has to have a concrete dyn History trait bound to exist in the struct (in a type generic way it would be possible but messy)])

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's actually pretty much how I had it in #376. The main issue there was that the outside application had to retain a handle to the SqliteHistory so I had to wrap it in a Mutex. This is better now since update is part of the History trait itself, so the .history() (or .history_mut() function could work. If you want a more concrete trait then some downcasting needs to happen..

Another issue is that the update_last_command_context requires state (the last_run_command_id currently stored in Reedline), which the History trait doesn't know (and can't reconstruct because multiple shells could be open).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I kind of misread your comment. Multiple parts of the history trait (e.g. parts of the search params) only work on some History types, but I think it would be fairly hard to separate those out (instead of failing at runtime as it does now).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah unless we completely switch over to compiled generics, we will have to do runtime dispatch in some form (if we also want to be able to hotswap some components). This was just spitballing if we could get it a bit neater but for now going for effectively NotImplementedErrors should be fine

@sholderbach
Copy link
Member

Thanks for the changes!

I think the consensus is that we go ahead and merge it as soon as the tests pass and you are happy with the state. We would try to test it in nushell behind a feature flag to gradually test the more advanced features.

I think the truncates_* tests that fail at the moment I would consider details of the FileBackedHistory
I think that implementations might choose their eviction strategies.

On a similar note, the current concurrency tests rely on eviction of entries but I think we might relax the general test to just test for the presence of the relevant entry without enforcing eviction. (Filebacked might have its own more stringent test around eviction related problems with concurrent accesses) (But haven't looked into what is going on with concurrent_histories_are_threadsafe for both)

I think error handling in reedline can be improved in general (don't think we should carry anyhow, but defining our Errors and implementing necessary conversions should be done at some point)
I think you can judge it best, what errors are important to propagate from the history to actually react on, else we land and improve the errors in separate PRs as people encounter them/get annoyed. As most failures in reedlines are std::io::Errors I think it is a good default for now

@phiresky
Copy link
Contributor Author

phiresky commented Apr 20, 2022

Ok, thanks for the info. I'll fix up the test errors and move the ones related to limiting history size to FileBackedHistory. I'll also move the originals of concurrent histories to FileBackedHistories and modify the generic ones to not require truncation.

For errors I'd change the String error type to a very simple enum/struct like I mentioned above wrapped inside io Error where useful. But I'd keep panicing on them inside reedline everywhere where the current interface is not fallible. Later we can consider how the public interface should look to make reedline more fallible in places where it would be a breaking change.

@sholderbach
Copy link
Member

Awesome, sounds good to me!

@fdncred
Copy link
Collaborator

fdncred commented Apr 26, 2022

hey @phiresky, how's it going?

@phiresky
Copy link
Contributor Author

hey! I've not had time to work on it since the last changes. I think there's two things missing before this can be merged:

  1. fixing the tests that currently fail
  2. Improving the error type

Then afterwards we should probably do these things:

  1. add tests for the database-only filtering functionality
  2. work on a UI for these changes / improve the search widget
  3. make configurable, integrate into nushell

@phiresky
Copy link
Contributor Author

phiresky commented May 8, 2022

I've fixed the tests. Please allow the workflow(s) to run

@sholderbach
Copy link
Member

Awesome! can you give cargo fmt a go over the project?

@phiresky
Copy link
Contributor Author

phiresky commented May 8, 2022

sure. i also still need to fix the error handling

@fdncred
Copy link
Collaborator

fdncred commented May 9, 2022

i see some progress has been made. nice!

@fdncred
Copy link
Collaborator

fdncred commented May 16, 2022

hey @phiresky - weekly check-in. i fixed the fmt but there were some errors. can you look into them when you get a chance please?

@phiresky
Copy link
Contributor Author

hi! i'll be on vacation the next two weeks so won't be able to continue working on it. I think the only thing missing is fixing the bashisms feature

@fdncred
Copy link
Collaborator

fdncred commented May 16, 2022

ok. when i checked, there was a function commented out that seemed to be required. so it wouldn't compile.

@fdncred
Copy link
Collaborator

fdncred commented May 16, 2022

to be clear, this is what i get with cargo build

warning: variant is never constructed: `HistoryDatabaseError`
  --> src\result.rs:10:5
   |
10 |     HistoryDatabaseError(String),
   |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   |
   = note: `#[warn(dead_code)]` on by default
note: `ReedlineErrorVariants` has a derived impl for the trait `Debug`, but this is intentionally ignored during dead code analysis
  --> src\result.rs:6:17
   |
6  | #[derive(Error, Debug)]
   |                 ^^^^^
   = note: this warning originates in the derive macro `Debug` (in Nightly builds, run with -Z macro-backtrace for more info)

warning: `reedline` (lib) generated 1 warning
    Finished dev [unoptimized + debuginfo] target(s) in 18.85s

this is what i get with cargo build --features=sqlite

   Compiling reedline v0.5.0 (C:\Users\dschroeder\source\repos\forks\reedline)
    Finished dev [unoptimized + debuginfo] target(s) in 4.46s

and this is what i was talking about with cargo build --features=bashsims,sqlite

 Compiling reedline v0.5.0 (C:\Users\dschroeder\source\repos\forks\reedline)
error[E0599]: no method named `iter_chronologic` found for struct `Box<(dyn History + 'static)>` in the current scope
    --> src\engine.rs:1181:22
     |
1181 |                     .iter_chronologic()
     |                      ^^^^^^^^^^^^^^^^ method not found in `Box<(dyn History + 'static)>`

error[E0599]: no method named `iter_chronologic` found for struct `Box<(dyn History + 'static)>` in the current scope
    --> src\engine.rs:1187:22
     |
1187 |                     .iter_chronologic()
     |                      ^^^^^^^^^^^^^^^^ method not found in `Box<(dyn History + 'static)>`

error[E0599]: no method named `iter_chronologic` found for struct `Box<(dyn History + 'static)>` in the current scope
    --> src\engine.rs:1192:22
     |
1192 |                     .iter_chronologic()
     |                      ^^^^^^^^^^^^^^^^ method not found in `Box<(dyn History + 'static)>`

For more information about this error, try `rustc --explain E0599`.
error: could not compile `reedline` due to 3 previous errors

i'm also getting errors opening the sqlite db file. Error: malformed database schema (history) - near "strict": syntax error
this above error is due to an out of date sqlite3.exe

@phiresky
Copy link
Contributor Author

phiresky commented Jun 6, 2022

Thanks for fixing the bashisms feature @elferherrera ! I've updated it to be a bit more performant in a8d8703

That was the last thing I think, so from my perspective this could be merged now!

@sholderbach
Copy link
Member

Thank you very much for all the hard work @phiresky. Let's land this bad boy!

@sholderbach sholderbach merged commit 4f25ce5 into nushell:main Jun 6, 2022
sholderbach added a commit to nushell/nushell that referenced this pull request Jun 14, 2022
…status metadata (#5721)

This PR adds support for an SQLite history via nushell/reedline#401

The SQLite history is enabled by setting history_file_format: "sqlite" in config.nu.

* somewhat working sqlite history
* Hook up history command
* Fix error in SQlitebacked with empty lines

When entering an empty line there previously was the "No command run"
error with `SqliteBackedHistory` during addition of the metadata

May be considered a temporary fix

Co-authored-by: sholderbach <[email protected]>
fennewald pushed a commit to fennewald/nushell that referenced this pull request Jun 27, 2022
…status metadata (nushell#5721)

This PR adds support for an SQLite history via nushell/reedline#401

The SQLite history is enabled by setting history_file_format: "sqlite" in config.nu.

* somewhat working sqlite history
* Hook up history command
* Fix error in SQlitebacked with empty lines

When entering an empty line there previously was the "No command run"
error with `SqliteBackedHistory` during addition of the metadata

May be considered a temporary fix

Co-authored-by: sholderbach <[email protected]>
@sholderbach sholderbach mentioned this pull request Oct 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants