skip to main content
10.1109/DSN.2011.5958214guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

CEC: Continuous eventual checkpointing for data stream processing operators

Published: 27 June 2011 Publication History

Abstract

The checkpoint roll-backward methodology is the underlying technology of several fault-tolerance solutions for continuous stream processing systems today, implemented either using the memories of replica nodes or a distributed file system. In this scheme the recovering node loads its most recent checkpoint and requests log replay to reach a consistent pre-failure state. Challenges with that technique include its complexity (typically implemented via copy-on-write), the associated overhead (exception handling under state updates), and limits to the frequency of checkpointing. The latter limit affects the amount of information that needs to be replayed leading to long recovery times. In this work we introduce continuous eventual checkpointing (CEC), a novel mechanism to provide fault-tolerance guarantees by taking continuous incremental state checkpoints with minimal pausing of operator processing. We achieve this by separating operator state into independent parts and producing frequent independent partial checkpoints of them. Our results show that our method can achieve low overhead fault-tolerance with adjustable checkpoint intensity, trading off recovery time with performance.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
DSN '11: Proceedings of the 2011 IEEE/IFIP 41st International Conference on Dependable Systems&Networks
June 2011
597 pages
ISBN:9781424492329

Publisher

IEEE Computer Society

United States

Publication History

Published: 27 June 2011

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 26 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Fault Tolerance Placement in the Internet of ThingsProceedings of the ACM on Management of Data10.1145/36549412:3(1-29)Online publication date: 30-May-2024
  • (2020)Resource Management and Scheduling in Distributed Stream Processing SystemsACM Computing Surveys10.1145/335539953:3(1-41)Online publication date: 28-May-2020
  • (2018)A survey of state management in big data processing systemsThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-018-0514-927:6(847-872)Online publication date: 1-Dec-2018
  • (2017)SamzaProceedings of the VLDB Endowment10.14778/3137765.313777010:12(1634-1645)Online publication date: 1-Aug-2017
  • (2016)Operator Migration for Distributed Complex Event Processing in Device-to-Device Based NetworksProceedings of the 3rd Workshop on Middleware for Context-Aware Applications in the IoT10.1145/3008631.3008634(13-18)Online publication date: 12-Dec-2016
  • (2015)Dynamic Resource Management In a Massively Parallel Stream Processing EngineProceedings of the 24th ACM International on Conference on Information and Knowledge Management10.1145/2806416.2806449(13-22)Online publication date: 17-Oct-2015
  • (2014)MCEPACM Transactions on Internet Technology10.1145/263368814:1(1-24)Online publication date: 7-Aug-2014
  • (2014)Adaptive Speculative Processing of Out-of-Order Event StreamsACM Transactions on Internet Technology10.1145/263368614:1(1-24)Online publication date: 7-Aug-2014
  • (2013)MigCEPProceedings of the 7th ACM international conference on Distributed event-based systems10.1145/2488222.2488265(183-194)Online publication date: 29-Jun-2013
  • (2013)Rollback-recovery without checkpoints in distributed event processing systemsProceedings of the 7th ACM international conference on Distributed event-based systems10.1145/2488222.2488259(27-38)Online publication date: 29-Jun-2013
  • Show More Cited By

View Options

View options

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media