skip to main content
article

Elastic stream processing in the Cloud

Published: 01 September 2013 Publication History

Abstract

Stream processing is a computing paradigm that has emerged from the necessity of handling high volumes of data in real time. In contrast to traditional databases, stream-processing systems perform continuous queries and handle data on-the-fly. Today, a wide range of application areas relies on efficient pattern detection and queries over streams. The advent of Cloud computing fosters the development of elastic stream-processing platforms, which are able to dynamically adapt based on different cost-benefit trade-offs. This article provides an overview of the historical evolution and the key concepts of stream processing, with special focus on adaptivity and Cloud-based elasticity.

References

[1]
Armbrust M, Fox A, Griffith R, Joseph AD, Katz R, Konwinski A, Lee G, Patterson D, Rabkin A, Stoica I, et?al. A view of cloud computing. Commun ACM 2010, Volume 53: pp.50-58.
[2]
Hilley D. Cloud computing: a taxonomy of platform and infrastructure-level offerings. Georgia Institute of Technology, Tech. Rep. GIT-CERCS-09-13; 2009.
[3]
Amazon. Amazon Elastic Compute Cloud Amazon EC2. Available at: "https://aws.amazon.com/ec2"; 2012. Accessed May 1, 2013.
[4]
Openstack. Openstack open source cloud computing software. "https://www.openstack.org/"; 2012. Accessed May 1, 2013.
[5]
Voorsluys W, Broberg J, Buyya R. Cloud computing: Principles and paradigms. Hoboken, NJ: John Wiley & Sons; 2011.
[6]
Dustdar S, Guo Y, Satzger B, Truong HL. Principles of elastic processes. IEEE Internet Comput 2011, Volume 15: pp.66-71.
[7]
Dustdar S, Guo Y, Han R, Satzger B, Truong HL. Programming directives for elastic computing. IEEE Internet Comput 2012, Volume 16: pp.72-77.
[8]
Moxey C, Edwards M, Etzion O, Ibrahim M, Iyer S, Lalanne H, Monze M, Peters M, Rabinovich Y, Sharon G, et?al. A conceptual model for event processing systems. IBM Redguide publication; 2010. Available at: "https://www.redbooks.ibm.com/abstracts/redp4642.html". Accessed May 1, 2013.
[9]
Luckham D, Schulte R. Event processing glossary v1.1. Event Proces Tech Soc 2008, Volume 2: pp.5-19.
[10]
Sharon G, Etzion O. Event-processing network model and implementation. IBM Syst J 2008, Volume 47: pp.321-334.
[11]
Hummer W, Inzinger C, Satzger B, Leitner P, Dustdar S. Deriving a unified fault taxonomy for event-based systems. In: 6th ACM International Conference on Distributed Event-Based Systems DEBS'12. Berlin, Germany: ACM; 2012, pp.167-178.
[12]
Babcock B, Babu S, Datar M, Motwani R, Widom J. Models and issues in data stream systems. In: 21st ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS '02. New York: ACM; 2002, pp.1-16.
[13]
Patroumpas K, Sellis T. Window specification over data streams. In: Current Trends in Database Technology, EDBT 2006. Munich, Germany: Springer; 2006, pp.445-464.
[14]
Botan I, Kossmann D, Fischer PM, Kraska T, Florescu D, Tamosevicius R. Extending XQuery with window functions. In: 33rd International Conference on Very Large Data Bases VLDB. Dallas, TX: ACM; 2007, pp.75-86.
[15]
Chen J, DeWitt D, Tian F, Wang Y. Niagaracq: a scalable continuous query system for Internet databases. ACM SIGMOD Rec 2000, Volume 29: pp.379-390.
[16]
Chandrasekaran S, Cooper O, Deshpande A, Franklin MJ, Hellerstein JM, Hong W, Krishnamurthy S, Madden S, Raman V, Reiss F, et?al. Telegraphcq: continuous dataflow processing for an uncertain world. In: First Biennial Conference on Innovative Data Systems Research CIDR 2003, Asilomar, CA; January 5-8, 2003.
[17]
Carney D, Çetintemel U, Cherniack M, Convey C, Lee S, Seidman G, Stonebraker M, Tatbul N, Zdonik S. Monitoring streams: a new class of data management applications. In: 28th International Conference on Very Large Data Bases VLDB. Hong Kong, China: ACM; 2002, pp.215-226.
[18]
Abadi D, Carney D, Çetintemel U, Cherniack M, Convey C, Lee S, Stonebraker M, Tatbul N, Zdonik S. Aurora: a new model and architecture for data stream management. VLDB J 2003, Volume 12: pp.120-139.
[19]
Balakrishnan H, Balazinska M, Carney D, Çetintemel U, Cherniack M, Convey C, Galvez E, Salz J, Stonebraker M, Tatbul N, et?al. Retrospective on aurora. VLDB J 2004, Volume 13: pp.370-383.
[20]
Cherniack M, Balakrishnan H, Balazinska M, Carney D, Cetintemel U, Xing Y, Zdonik S. Scalable distributed stream processing. In: Proceedings of Conference on Innovative Data Systems and Research. Asilomar, CA: ACM; 2003.
[21]
Dean J, Ghemawat S. Mapreduce: simplified data processing on large clusters. Commun ACM 2008, Volume 51: pp.107-113.
[22]
Abadi D, Ahmad Y, Balazinska M, Cetintemel U, Cherniack M, Hwang J, Lindner W, Maskey A, Rasin A, Ryvkina E, et?al. The design of the Borealis stream processing engine. In: Conference on Innovative Data Systems Research CIDR. Asilomar, CA: ACM; 2005.
[23]
Carney D, Çetintemel U, Rasin A, Zdonik S, Cherniack M, Stonebraker M. Operator scheduling in a data stream manager. In: 29th International Conference on Very Large Data Bases VLDB. Berlin, Germany</publisherLoc>: <publisherLoc>ACM; 2003, pp.838-849.
[24]
Amini L, Andrade H, Bhagwan R, Eskesen F, King R, Selo P, Park Y, Venkatramani C. SPC: a distributed, scalable platform for data mining. In: 4th International Workshop on Data Mining Standards, Services and Platforms, DMSSP '06. New York: ACM; 2006, pp.27-37.
[25]
Wolf J, Bansal N, Hildrum K, Parekh S, Rajan D, Wagle R, Wu KL, Fleischer L. SODA: an optimizing scheduler for large-scale stream-based distributed computer systems. In: 9th ACM/IFIP/USENIX International Conference on Middleware. Berlin: Springer, 2008; pp.306-325.
[26]
Gedik B, Andrade H, Wu K, Yu P, Doo M. Spade: the system s declarative stream processing engine. In: ACM SIGMOD International Conference on Management of Data. New York: ACM; 2008, pp.1123-1134.
[27]
Biem A, Bouillet E, Feng H, Ranganathan A, Riabov A, Verscheure O, Koutsopoulos H, Moran C. IBM InfoSphere streams for scalable, real-time, intelligent transportation services. In: ACM SIGMOD International Conference on Management of Data. New York: ACM; 2010, pp.1093-1104.
[28]
Isard M, Budiu M, Yu Y, Birrell A, Fetterly D. Dryad: distributed data-parallel programs from sequential building blocks. In: 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems EuroSys, EuroSys '07. Lisbon, Portugal: ACM; 2007, pp.59-72.
[29]
Warneke D, Kao O. Exploiting dynamic resource allocation for efficient parallel data processing in the cloud. IEEE Trans Parallel Distrib Syst 2011, Volume 22: pp.985-997.
[30]
Bernhardt T, Vasseur A. Esper: event stream processing and correlation. In: O'Reilly ONJava; 2007. Available at: "https://www.onjava.com/lpt/a/6955". Accessed May 1, 2013.
[31]
Neumeyer L, Robbins B, Nair A, Kesari A. S4: distributed stream computing platform. In: 2010 IEEE International Conference on Data Mining Workshops ICDMW. Piscataway, NJ: IEEE; 2010, pp.170-177.
[32]
Project S. Storm: distributed and fault-tolerant realtime computation. Available at: "https://storm-project.net/"; 2012. Accessed May 1, 2013.
[33]
Stonebraker M, Çetintemel U, Zdonik S. The 8 requirements of real-time stream processing. SIGMOD Rec 2005, Volume 34: pp.42-47.
[34]
Raman V, Raman B, Hellerstein JM. Online dynamic reordering for interactive data processing. In: 25th International Conference on Very Large Data Bases VLDB, VLDB '99. Edinburgh, Scotland: ACM; 1999, pp.709-720.
[35]
Avnur R, Hellerstein JM. Eddies: continuously adaptive query processing. In: ACM SIGMOD International Conference on Management of Data. Dallas, TX: ACM; 2000, pp.261-272.
[36]
Hull B, Bychkovsky V, Zhang Y, Chen K, Goraczko M, Miu A, Shih E, Balakrishnan H, Madden S. Cartel: a distributed mobile sensor computing system. In: 4th International Conference on Embedded Networked Sensor Systems. New York: ACM; 2006, pp.125-138.
[37]
Madden S, Franklin M. Fjording the stream: an architecture for queries over streaming sensor data. In: 18th International Conference on Data Engineering. Piscataway, NJ: IEEE; 2002, pp.555-566.
[38]
Liu M, Li M, Golovnya D, Rundensteiner E, Claypool K. Sequence pattern query processing over out-of-order event streams. In: 25th International Conference on Data Engineering. Shanghai, China: IEEE; 2009, pp.784-795.
[39]
Tatbul N, Çetintemel U, Zdonik S, Cherniack M, Stonebraker M. Load shedding in a data stream manager. In: 29th International Conference on Very Large Data Bases. Berlin, Germany: VLDB Endowment; 2003, pp.309-320.
[40]
Babcock B, Datar M, Motwani R. Load shedding for aggregation queries over data streams. In: 20th International Conference on Data Engineering. Piscataway, NJ: IEEE; 2004, pp.350-361.
[41]
Tatbul N, Çetintemel U, Zdonik S. Staying fit: efficient load shedding techniques for distributed stream processing. In: Proceedings of the 33rd International Conference on Very Large Data Bases. Vienna, Austria: VLDB Endowment; 2007, pp.159-170.
[42]
Zang C, Fan Y. Complex event processing in enterprise information systems based on RFID. Enterprise Inform Syst 2007, Volume 1: pp.3-23.
[43]
Lakshmanan G, Li Y, Strom R. Placement strategies for internet-scale data stream systems. IEEE Internet Comput 2008, Volume 12: pp.50-60.
[44]
Xing Y, Zdonik S, Hwang J. Dynamic load distribution in the Borealis stream processor. In: 21st International Conference on Data Engineering ICDE. Piscataway, NJ: IEEE; 2005, pp.791-802.
[45]
Xing Y, Hwang JH, Çetintemel U, Zdonik S. Providing resiliency to load variations in distributed stream processing. In: 32nd International Conference on Very Large Data Bases VLDB. Seoul, Korea: ACM; 2006, pp.775-786.
[46]
Amini L, Jain N, Sehgal A, Silber J, Verscheure O. Adaptive control of extreme-scale stream processing systems. In: 26th IEEE International Conference on Distributed Computing Systems ICDCS. Piscataway, NJ: IEEE, 2006.
[47]
Gulisano V, Jimenez-Peris R, Patio-Martinez M, Soriente C, Valduriez P. Streamcloud: an elastic and scalable data streaming system. IEEE Trans Parallel Distrib Syst 2012, Volume 23: pp.2351-2365.
[48]
Kumar V, Cooper B, Schwan K. Distributed stream management using utility-driven self-adaptive middleware. In: 2nd International Conference on Autonomic Computing ICAC. Seattle, WA: IEEE; 2005, pp.3-14.
[49]
Schneider S, Andrade H, Gedik B, Biem A, Wu KL. Elastic scaling of data parallel operators in stream processing. In: IEEE International Symposium on Parallel Distributed Processing IPDPS. Rome, Italy: IEEE; 2009, pp.1-12.
[50]
Pietzuch P, Ledlie J, Shneidman J, Roussopoulos M, Welsh M, Seltzer M. Network-aware operator placement for stream-processing systems. In: 22nd International Conference on Data Engineering ICDE. Los Alamitos, CA: IEEE Computer Society; 2006, pp.49.
[51]
Shah MA, Hellerstein JM, Chandrasekaran S, Franklin MJ. Flux: an adaptive partitioning operator for continuous query systems. In: 19th International Conference on Data Engineering ICDE. Bangalore, India: IEEE; 2003, pp.25-36.
[52]
Hummer W, Leitner P, Satzger B, Dustdar S. Dynamic migration of processing elements for optimized query execution in event-based systems. In: 1st International Symposium on Secure Virtual Infrastructures DOA-SVI'11, OnTheMove Federated Conferences OTM'11. Crete, Greece: Springer; 2011, pp.451-468.
[53]
Kwon Y, Balazinska M, Greenberg A. Fault-tolerant stream processing using a distributed, replicated file system. Proc VLDB Endowment 2008, Volume 1: pp.574-585.
[54]
Bansal N, Bhagwan R, Jain N, Park Y, Turaga D, Venkatramani C. Towards optimal resource allocation in partial-fault tolerant applications. In: INFOCOM 2008. The 27th Conference on Computer Communications. Piscataway, NJ: IEEE; 2008, pp.1319-1327.
[55]
Balazinska M, Balakrishnan H, Madden SR, Stonebraker M. Fault-tolerance in the borealis distributed stream processing system. ACM Trans Database Syst 2008, Volume 33 :3: pp.1-3:44.
[56]
Hwang JH, Balazinska M, Rasin A, Cetintemel U, Stonebraker M, Zdonik S. high-availability algorithms for distributed stream processing. In: 21st International Conference on Data Engineering ICDE. Tokyo, Japan: IEEE, 2005.
[57]
Cristian F. Understanding fault-tolerant distributed systems. Commun ACM 1991, Volume 34: pp.56-78.
[58]
Satzger B. Self-healing distributed systems. Ph.D. thesis, Universität Augsburg, Germany 2008.
[59]
Olston C, Jiang J, Widom J. Adaptive filters for continuous queries over distributed data streams. In: SIGMOD International Conference on Management of Data. New York: ACM; 2003, pp.563-574.
[60]
Kleiminger W, Kalyvianaki E, Pietzuch P. Balancing load in stream processing with the cloud. In: 27th International Conference on Data Engineering, Workshops ICDEW. Piscataway, NJ: IEEE; 2011, pp.16-21.
[61]
Satzger B, Hummer W, Leitner P, Dustdar S. Esc: towards an elastic stream computing platform for the cloud. In: 2011 IEEE Fifth International Conference on Cloud Computing. Washington, DC: IEEE; 2011, pp.348-355.
[62]
Zinn D, Hart Q, McPhillips T, Ludascher B, Simmhan Y, Giakkoupis M, Prasanna V. Towards reliable, performant workflows for streaming-applications on cloud platforms. In: 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing CCGrid. Newport Beach, CA: IEEE; 2011, pp.235-244.
[63]
Hummer W, Raz O, Shehory O, Leitner P, Dustdar S. Testing of data-centric and event-based dynamic service compositions. Softw Test Verif Reliab, in press.
[64]
Banerjee P, Friedrich R, Bash C, Goldsack P, Huberman B, Manley J, Patel C, Ranganathan P, Veitch A. Everything as a service: powering the new information economy. IEEE Comput 2011, Volume 44: pp.36-43.
[65]
Shi E, Chan THH, Rieffel EG, Chow R, Song D. Privacy-preserving aggregation of time-series data NDSS. In: Network and Distributed System Security Symposium. San Diego, CA: The Internet Society; 2011.
[66]
Hummer W, Satzger B, Leitner P, Inzinger C, Dustdar S. Distributed continuous queries over web service event streams. In: 7th International Conference on Next Generation Web Services Practices NWeSP'11. Salamanca, Spain: IEEE; 2011, pp.176-181.
[67]
Loesing S, Hentschel M, Kraska T, Kossmann D. Stormy: an elastic and highly available streaming service in the cloud. In: Proceedings of the 2012 Joint EDBT/ICDT Workshops. Berlin, Germany: IEEE; 2012, pp.55-60.
[68]
Chen Q, Hsu M, Zeller H. Experience in Continuous analytics as a Service CaaaS. In: 14th International Conference on Extending Database Technology EDBT. New York: ACM; 2011, pp.509-514.
[69]
Brewer EA. Towards robust distributed systems. In: 19th ACM Symposium on Principles of Distributed Computing PODC. New York: ACM; 2000, pp.19, 7-10.
[70]
Gilbert S, Lynch N. Brewer's conjecture and the feasibility of consistent, available, partition-tolerant Web services. ACM SIGACT News 2002, Volume 33: pp.51-59.
[71]
Vogels W. Eventually consistent. Commun ACM 2009, Volume 52: pp.40-44.
[72]
Sebepou Z, Magoutis K. Cec: Continuous eventual checkpointing for data stream processing operators. In: 41st IEEE/IFIP International Conference on Dependable Systems&Networks DSN. Hong Kong, China: IEEE; 2011, pp.145-156.
[73]
Leitner P, Hummer W, Dustdar S. Cost-based optimization of service compositions. IEEE Trans Serv Comput 2013, Volume 6: pp.239-251.
[74]
Papaemmanouil O, Çetintemel U, Jannotti J. Supporting generic cost models for wide-area stream processing. In: IEEE International Conference on Data Engineering ICDE. Shanghai, China: IEEE; 2009, pp.1084-1095.
[75]
Ishii A, Suzumura T. Elastic stream computing with clouds. In: IEEE International Conference on Cloud Computing CLOUD. Washington, DC: IEEE; 2011, pp.195-202.
[76]
Satzger B, Hummer W, Inzinger C, Leitner P, Dustdar S. Winds of change: from vendor lock-in to the meta cloud. IEEE Internet Comput; 2013, Volume 17: pp.69-73.
[77]
Chandramouli B, Claessens J, Nath S, Santos I, Zhou W. RACE: real-time applications over cloud-edge. In: ACM SIGMOD International Conference on Management of Data. Scottsdale, AZ: ACM; 2012, pp.625-628.
[78]
Logothetis D, Yocum K. Wide-scale data stream management. In: USENIX 2008 Annual Technical Conference ATC. Berkeley, CA: USENIX Association; 2008, pp.405-418.
[79]
Logothetis D, Yocum K. Ad-hoc data processing in the cloud. Proc VLDB Endow. 2008, Volume 1: pp.1472-1475.
[80]
Martinaitis P, Patten C, Wendelborn A. Component-based stream processing in the cloud. In: Workshop on Component-Based High Performance Computing. New York: ACM; 2009, pp.16.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery  Volume 3, Issue 5
September 2013
47 pages

Publisher

John Wiley & Sons, Inc.

United States

Publication History

Published: 01 September 2013

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 26 Nov 2024

Other Metrics

Citations

Cited By

View all

View Options

View options

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media