Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transaction Support #109

Closed
ritch opened this issue Dec 20, 2013 · 24 comments
Closed

Transaction Support #109

ritch opened this issue Dec 20, 2013 · 24 comments
Assignees

Comments

@ritch
Copy link
Member

ritch commented Dec 20, 2013

/to @raymondfeng @altsang
/cc @bajtos

In the interest of LoopBack reliability, I'd like to start a discussion on making LoopBack ACID compliant. After brushing off earlier attempts to look into the issue I had an idea.

I think Transactions + Multiversioning will get us 80% or more to ACID.

Transactions

function Transaction(dataSource) {
  this.domain = domain.create();
  this.dataSource = dataSource;
  this.id = this.getHashCode();
  this.snapshots = [];

  domain.transaction = this;
  domain.on('error', this.onError.bind(this));
}

Transaction.prototype.commit = function(callback) {
  this.domain.run(function run() {
    this.dataSource.startTransaction(this);
  }.bind(this));
}

Transaction.prototype.snapshot = function(name, callback) {
  this.snapshots.push(this.dataSource.snapshot(name, this, callback));
}

Transaction.prototype.rollback = function(name, fn) {
  this.currentSnapshot = name ? this.getSnapshot(name) : this.snapshots.pop();
  if(this.currentSnapshot) {
    this.dataSource.applySnapshot(this.currentSnapshot)
  }
}

Transaction.prototype.onError = function() {
  this.rollback(function rollback(err) {
    this.dispose();
    throw err;

  }.bind(this));
}

Transaction.prototype.dispose = function(callback) {
  this.domain.dispose();
  this.dataSource.endTransaction(this, callback);
}

Transaction.getCurrent = function() {
  return domain.active.transaction;
}

// strong-remoting hooking
remotes.before('**', function(ctx, next, method) {
  var t = ctx.transaction = new Transaction(method.constructor.dataSource);
  t.domain.add(ctx.req);
  t.domain.add(ctx.res);
  t.commit(next);
});

remotes.after('**', function(ctx, next) {
  ctx.transaction.dispose(next);
});

The above requires the dataSource / connector to implement methods that delegate to driver transactions, or multiversioning. To support multiversioning, an outer wrapper of the connector could be devised that essentially tracks each method call and model instance. As well as changing the behavior slightly.

Multiversioning

By default all update operations during a transaction would create copies of the given model instance and tag them with the transaction.id. Model's created during the transaction would also be tagged with the transaction.id.

By default, all queries would exlclude instances tagged with transaction.ids.

Failure / Cleanup

If the transaction fails. All models tagged with a transaction.id are removed. This could also be ensured on startup / by a job of some sort to essentially garbage collect tagged instances.

If the transaction completes without error all tagged model instances would replace matching untagged models.

Concerns
During the final phase of a succesful transaction, since transactions aren't isolated by locking, there is a potential for cleanup to happen out of order. If the transaction.id is sortable we might be able to avoid this.

@bajtos
Copy link
Member

bajtos commented Dec 20, 2013

Adding @piscisaureus, he might want to comment on the way how domains are used here.

First of all, I think adding some sort of transactions would be great, as it will allow us to implement transaction-level caching and save database roundtrips when the same object is queried again.

Could you add a code sample showing how to do a transaction with commit? How to rollback the transaction on error?

Getting transactions right is a very tricky business. Imagine this scenario (Tx1, Tx2 and Tx3 are transactions started by two different HTTP requests)

  1. Tx1 reads Account A1 balance ($10).
  2. Tx1 updates A1.balance - adds $1.
  3. Tx2 reads A1.balance. If my understanding is correct, your proposal says that Tx2 will always read the original A1.balance ($10) due to multiversioning.
  4. Tx2 updates A1.balance - adds $2.
  5. Tx1 is commited, A1.balance is $11.
  6. Tx2 is commited. What happens now? Is A1.balance $12? The commit fails due to versioning conflict?
  7. Tx3 reads A1.balance What is the result?

If we changed Step 3, so that Tx2 gets the updated value ($11), consider alternative scenario, where Tx1 is rolled back in step 5. Swapping the order of steps 2 and 3 makes the situation even more interesting.

Things get much complicated when we want to have a single abstraction for SQL and multiple NoSQL stores, because everybody use a very different approaches to achieve consistency. To name a few:

  • RDBMs (SQL) have full ACID transactions and have ways how to solve the problem described above, usually using locks. E.g. Tx1 could lock the whole Account1, so that Tx2 starts only after Tx1 is finished.
  • Riak uses vector clocks, writes are always successful and reads report consistency conflicts [1]. Tx3 would have to resolve the conflict created by Tx2 write.
  • MongoDB seems to use the "last-write-wins" and "commutative/foldable operations" strategies[2]. Since loopback converts "add $1" to "set $11", Tx2 creates inconsistent data.
  • CouchDB returns 409 Conflict when you attempt to save an update that is based on an outdated version[3]. Commit of Tx2 would fail with 409.

[1] https://docs.basho.com/riak/1.3.0rc1/references/appendices/comparisons/Riak-Compared-to-MongoDB/ (row Data Versioning and Consistency)
[2] https://blog.mongodb.org/post/520888030/on-distributed-consistency-part-5-many-writer
[3] https://wiki.apache.org/couchdb/HTTP_Document_API#PUT

@ritch
Copy link
Member Author

ritch commented Dec 20, 2013

Getting transactions right is a very tricky business.

Agreed.

When you have concurrent transactions against the same data, isolation can only be maintained by ensuring order. Locking makes this reliable but slow. Snapshot isolation is how I think we would be able to generically handle this. The idea is that a transaction would fail if any updates conflict. The upside is performance and a relatively simple implementation. The downside is users have to handle conflict errors for all transactions.

  • In step 4 (Tx2 updaiting a model that is tagged with a transaction.id) will cause a conflict error in step 6.
  • Tx2 will fail and all of its operations will be rolled back.
  • Tx3 would see a balance of $11.

@raymondfeng
Copy link
Member

Let's take advanced features (such as 2PC, nested TXs, and NoSQL) out of the discussion and focus the simple 1PC case.

  1. 1PC transaction for RDBs need to use the same connection
  2. We need to have a way to control the transaction boundary (begin --> commit/rollback)
  3. We need to have a way to set and access the current transaction and associate the transaction with exactly one transaction
  4. I assume as long as the TX is active, all SQL statements run against the connection will be part of the TX, no matter if they are executed in parallel or serial from Node's perspective.

@bajtos
Copy link
Member

bajtos commented Dec 20, 2013

@raymondfeng Before taking NoSQL out of the discussion, please consider this: Because we don't support MongoDB's foldable operators and instead we always use $set which works in "last-writer-wins" manner, it's impossible to achieve data consistency in a vanilla LoopBack application at the moment.

Unless the plan is to have a set of transaction APIs, a different API for every database-type (SQL, Mongo, Couch, etc.). In which case it's ok to focus on SQL only. (But I don't think such set of APIs could be called "easy to work with".)

@ritch Failing transactions with a conflict is a possible solution. We will probably have to associate transaction.id with with Model.id, not a Model instance, because there may be multiple instances (JS objects) representing the same database record Model.id), but that's an implementation detail.

@bajtos
Copy link
Member

bajtos commented Dec 26, 2013

I did a bit of thinking about this topic in the past days and would like to make few more points.

  1. LoopBack is database agnostic, it makes it easy to use existing data in any database. Applying this approach to read/write consistency, LoopBack should make it easy to use the tools provided by the database, not invent a new one. This means transactions for SQL, foldable operators for Mongo, etc. (I do realise this is sort of a 180 degree turn from my previous comment).
  2. Making LoopBack transactions a very thin layer on top of the native SQL transactions will give us several benefits. Most importantly, we can provide more features with less effort. Secondly, LoopBack users can re-use the skills and expertise they already have in their company, instead of learning how to work with yet another mechanism for maintaining data consistency.

I am proposing the following incremental steps:

1. Expose native low-level API

The first step is to allow LB users to re-use single connection for multiple commands/queries and allow them to call native API (send SQL commands, call Mongo's update with foldable operators, etc.).

This addresses the following points from Raymond's comment:

(1) 1PC transaction for RDBs need to use the same connection
(2) We need to have a way to control the transaction boundary (begin --> commit/rollback)

/*** Oracle **/
var oracle = loopback.createDataSource({ /* Oracle config */ });

// to avoid an extra callback, the connection is initialised lazily
// as part of sending the first request
var conn = oracle.getConnection();
conn.execute('START TRANSACTION', function(err) {
  if (err) { conn.close(); return fail(); }
  conn.execute(
    'SELECT balance FROM accounts WHERE id = 42',
    function(err, res) {
      if (err) { /* rollback, close connection, return fail */ }
      conn.execute(
        'UPDATE accounts SET balance = :bal WHERE id = 42',
        { bal: res[0].balance + 1 },
        function(err) {
          if (err) { /* rollback, close connection, return fail */ }
          conn.execute(
            'COMMIT',
            function(err) {
              conn.close();
              if (err) return fail();
              done();
            }
          );
        }
      );
    }
  );
});

/*** MongoDB ***/
var mongo = loopback.createDataSource({ /* MongoDB config */ });

var conn = mongo.getConnection();
conn.accounts.update({ _id: 42 }, { $inc: { balance: 1 } }, function(err) {
  conn.close();
  if (err) return fail();
  done();
});

2. Models

Once we have the low-level mechanisms available, we can integrate them with Models. As Raymond wrote:

(3) We need to have a way to set and access the current transaction and associate the transaction with exactly one transaction
(4) I assume as long as the TX is active, all SQL statements run against the connection will be part of the TX, no matter if they are executed in parallel or serial from Node's perspective.

I see two possible approaches here:

  1. Keep using singletons, make a singleton-like Transaction object that will be used by all DAO objects (Model), probably as described by Ritchie in the beginning of this thread. Due to the async nature of Node, we have to come up with a mechanism how to save the Transaction instance when the flow enters an async call and how to restore the correct Transaction object when a callback is called later.

    (time-line view)
    [A] HTTP request comes in
    [A] Tx1 is created
    [A] (static DAO functions like findById use Tx1)  
    [A] Tx1 calls async fn
    [B] another HTTP request comes in
    [B] Tx2 is created
    [B] (static DAO functions use Tx2)
    [B] Tx2 calls async fn
    [A] async fn callback is called
    [A] (static DAO functions use Tx1)
    etc.
    
  2. Session & repository pattern

    Instead of using ActiveRecord pattern, switch to Repository pattern when running inside transactions. Because all static functions like findById are accessed via member properties, we don't need to worry about restoring global/singleton transaction on async boundaries.

    var tx = db.createTransaction();
    // tx.accounts.findById is equivalent to
    // Account.findById within the transaction tx
    tx.accounts.findById(10, function(err, res) {
      // etc.
      tx.commit();
    });

3. REST

Provide a convenient helpers to automatically wrap certain remotable methods in a transaction. Note that this should be an optional detail built on top of existing APIs, because for more complex operations involving multiple external services, users should limit the duration of the db transaction to only the relevant part of the operation (i.e. don't hold db transaction open while you are waiting for a result from SOAP call).

This can be implemented via strong-remoting hooks as described by Ritchie, or even by extending LDL and adding a new configuration property.

fn.shared = true;
fn.wrapInTransaction = true;
fn.accepts = // etc.

@ritch
Copy link
Member Author

ritch commented Dec 30, 2013

Expose native low-level API

+1

@ritch
Copy link
Member Author

ritch commented Dec 30, 2013

... example above

var tx = db.createTransaction();
// tx.accounts.findById is equivalent to
// Account.findById within the transaction tx
tx.accounts.findById(10, function(err, res) {
  // etc.
  tx.commit();
});

This is a great idea.

@ritch
Copy link
Member Author

ritch commented Dec 30, 2013

LoopBack should make it easy to use the tools provided by the database, not invent a new one.

After contemplating this a bit more, this is where I landed as well. We don't have to relegate features for compatibility / abstraction. We should expose features as they exist and clearly explain the discrepancies.

@raymondfeng raymondfeng self-assigned this Feb 28, 2014
@raymondfeng raymondfeng added this to the 2.0.0 milestone Feb 28, 2014
@bajtos bajtos mentioned this issue Sep 30, 2014
47 tasks
@bajtos bajtos modified the milestone: #Rel lb 2.0.0 Sep 30, 2014
@FelipeAguayo
Copy link

Have you implemented this? (transactions). because, I really need to implement this in my project

@bajtos
Copy link
Member

bajtos commented Jan 19, 2015

Have you implemented this?

Not yet, it hasn't been a priority for us.

@FelipeAguayo
Copy link

Uhmmm, the data consistency should be a priority. Especially if the framework works with databases.

@altsang
Copy link
Contributor

altsang commented Jan 19, 2015

If the DB driver supports transactions you should be able to call natively from the LB connector. We're not implementing transactions at the Juggler level at this time. 2PC is difficult enough when the multiple instances of a data source support it, let alone across multiple disparate data sources.

Our priorities for @bajtos and the rest of the very small team are being driven by our customer obligations and community feedback such as yours. We appreciate the +1 in this area. If there's an urgent commercial need or support please feel free to contact our sales team.

@FelipeAguayo
Copy link

ok, thanks for the answers. I'll wait later versions.

@vladmiller
Copy link

Absence of transactions is a pain.

@violet-day
Copy link

👍

2 similar comments
@javiboo
Copy link

javiboo commented Apr 15, 2015

+1

@melochale
Copy link

+1

@SnehalSatardekar
Copy link

Hello,

Have you implemented "Transactions" yet?

@raymondfeng
Copy link
Member

Not yet. I'm in the process of refactoring the RDBMS connector implementations and it will become easier to add the transaction support.

@antoineverger
Copy link

+1

@raymondfeng raymondfeng added #tob and removed #tob labels May 12, 2015
@richardaum
Copy link

At Strongloop, is possible only attach a transaction to a Model? Is possible to use multiple models operations inside a Transaction block without chaining transactions, as in SQL?

@raymondfeng
Copy link
Member

There is no reliable way in Node to implicitly propagate the transaction context over the async invocation chain. As a result, we choose to use options arg to control the transaction scope.

@visusharma
Copy link

is there any to pass this options to native sql query also ?

@bajtos
Copy link
Member

bajtos commented Apr 28, 2017

is there any to pass this options to native sql query also ?

@visusharma please open a new issue to discuss that use case.

@strongloop strongloop locked and limited conversation to collaborators Apr 28, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests