Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement transaction rollback #142

Closed
coffeepots opened this issue Sep 11, 2018 · 18 comments
Closed

Implement transaction rollback #142

coffeepots opened this issue Sep 11, 2018 · 18 comments

Comments

@coffeepots
Copy link
Contributor

coffeepots commented Sep 11, 2018

We require a 'snapshot' ability to rollback the VM state, for instance if there is an error.

Currently this is a commented stub defined in nimbus\db\db_chain.nim.

For example, Py-Evm describes it's snapshot as:

Snapshots are a combination of the state_root at the time of the snapshot and the id of the changeset from the journaled DB.

Each time a recording is started, the underlying journal creates a new changeset and assigns an id to it. The journal then keeps track of all changes that go into this changeset.

Discarding a changeset simply throws it away inculding all subsequent changesets that may have followed. Commiting a changeset merges the given changeset and all subsequent changesets into the previous changeset giving precidence to later changesets in case of conflicting keys.

Nothing is written to the underlying db until persist() is called.

Some example implementation references:

@tersec
Copy link
Contributor

tersec commented Sep 11, 2018

I'm currently working around this -- targeting more immediately implementable fixes -- but aside from the more general utility this has, #113 depends on it.

@coffeepots
Copy link
Contributor Author

coffeepots commented Sep 11, 2018

Yes, it's become much higher priority now you're implementing more complex state tests, and it's an important requirement for the VM operation. We can't fully test computation without it.

How it's implemented could well have a big performance impact, so we should start looking at potential designs soon.

@zah
Copy link
Contributor

zah commented Sep 12, 2018

@yglukhov mentioned that he is working on this in our last call IIRC.

Conceptually, the roll-back is a very simple thing - you just restore the root hash to a previous state. The JournalDB aims only to optimize things by reducing the number of writes to persistent memory.

What I suggest here is that if this is a blocker for something, you can just take the short-cut of restoring a previous root hash.

@yglukhov
Copy link
Contributor

Here's a crude example of what I meant: #144

@mratsim
Copy link
Contributor

mratsim commented Sep 14, 2018

Regarding the design, I prefer a commit mechanism similar to the CacheDB rather than a rollback one.

That would leave the state consistent in case of a failure.

@arnetheduck
Copy link
Member

arnetheduck commented Sep 14, 2018

+1 for explicit commit and rollback-as-default

@tersec
Copy link
Contributor

tersec commented Sep 14, 2018

Another +1 -- there tends to be fewer ways to succeed than fail, so in terms of minimizing boilerplate and maximizing clarity, it seems better to put commit along the success paths.

@zah
Copy link
Contributor

zah commented Sep 18, 2018

This has landed in master with an usage example here:
https://github.com/status-im/nimbus/blob/master/nimbus/vm/computation.nim#L89

The API is quite straight-forward. A transaction is started with beginTransaction, followed by
a call to either commit or rollback. dispose is a convenient destuctor-like proc that can be safely
executed in a finally block or a defer statement. It will rollback the transaction unless it was already committed.

@zah zah closed this as completed Sep 18, 2018
@arnetheduck
Copy link
Member

what's the explicit rollback for then? looks like it's redundant

@zah
Copy link
Contributor

zah commented Sep 18, 2018

There must be a rollback in order to tell the database system that you are no longer interested in "recording" the changes to a memory layer. In this sense, it really means "end/discard transaction", but I don't think these are better names.

@arnetheduck
Copy link
Member

well, for the code to be correct (exception safety, state rollback etc), you must call discard (manually, unfortunately).. since discard does rollback unless commit has been called, I can't think of a case where calling both rollback and discard makes sense - ie when would you want to continue doing stuff without "recording" changes?

@zah
Copy link
Contributor

zah commented Sep 18, 2018

I assume you meant dispose when you used discard. There are some subtle differences between rollback and dispose. Calling rollback on a committed transaction would be considered a programming error (an assertion), so the programmer gets to see when his expectations were violated.

Most code should just use dispose though. rollback is only for when you really want to throw away the transaction in a middle of a block and start doing something else. You'll find the exact set of methods in C# for example, so this design can be considered battle-tested.

@arnetheduck
Copy link
Member

arnetheduck commented Sep 18, 2018

dispose, right.

Calling rollback on a committed transaction would be considered a programming error (an assertion)

this looks like an invitation for extra error handling code - in this sense, I'd consider dispose a superior API and never use rollback

In C# both exist because dispose/IDisposable is non-deterministic, and you need a way to control that rollback happens at a specific point - ie the expectation is that you call rollback, and if you forget, the garbage collector may call it for you at some point in the future.

"start doing something else" in the same context as an aborted transaction sounds like an anti-pattern - what would be good examples of this? the whole point of a transaction is to contextually tie together several operations in an atomic all-or-nothing setting

@zah
Copy link
Contributor

zah commented Sep 18, 2018

In C# both exist because dispose/IDisposable is non-deterministic, and you need a way to control that rollback happens at a specific point - ie the expectation is that you call rollback, and if you forget, the garbage collector may call it for you at some point in the future.

This is not true. Dispose in C# is tied to the using statement - it's exactly the deterministic destructor-like mechanism offered by the language:
https://docs.microsoft.com/en-us/dotnet/csharp/language-reference/keywords/using-statement

@arnetheduck
Copy link
Member

That describes one possible use of it - here's the full picture:

https://docs.microsoft.com/en-us/dotnet/api/system.idisposable?view=netframework-4.7.2

The primary use of this interface is to release unmanaged resources. The garbage collector automatically releases the memory allocated to a managed object when that object is no longer used. However, it is not possible to predict when garbage collection will occur.
If your language supports a construct such as the using statement in C# and the Using statement in Visual Basic, you can use it instead of explicitly calling IDisposable.Dispose yourself. 

@zah
Copy link
Contributor

zah commented Sep 19, 2018

My point was that the call to Dispose in C# happens in a fully deterministic way, at the end of the using statement block.

@arnetheduck
Copy link
Member

yeah, but that's not the reason both dispose and rollback exist in .net - both exist because depending on the situation, you'll use one or the other (ie rollback or using - I still don't see when both are needed in the nim case - you always have to call one of them at least, and dispose, as implemented, has a superset of useful functionality obsoleting the need for rollback to exist (unless you consider raising an extra error a feature - which perhaps might be useful to catch other resource mgmt bugs where things are disposed twice and the like) ..

@zah
Copy link
Contributor

zah commented Sep 20, 2018

Well, I'm not convinced the situation in C# is any different. The recommendation is still to use using/Dispose pretty much everywhere, and to make calls to Rollback only if you have some very specific control-flow needs that cannot be represented easily with structured blocks.

Maybe that's not very obvious here, but dispose is supposed to become a destructor eventually and you woudn't have to call in manually with defer: t.dispose() any more. In case, you haven't picked this up by the examples, this is the idiomatic and recommended way to use the API now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants