SG10201906917QA - Processing data from multiple sources - Google Patents

Processing data from multiple sources

Info

Publication number
SG10201906917QA
SG10201906917QA SG10201906917QA SG10201906917QA SG10201906917QA SG 10201906917Q A SG10201906917Q A SG 10201906917QA SG 10201906917Q A SG10201906917Q A SG 10201906917QA SG 10201906917Q A SG10201906917Q A SG 10201906917QA SG 10201906917Q A SG10201906917Q A SG 10201906917QA
Authority
SG
Singapore
Prior art keywords
data
processing engine
data processing
multiple sources
receiving
Prior art date
Application number
SG10201906917QA
Inventor
Ian Schechter
Tim Wakeling
Ann M Wollrath
Original Assignee
Ab Initio Technology Llc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ab Initio Technology Llc filed Critical Ab Initio Technology Llc
Publication of SG10201906917QA publication Critical patent/SG10201906917QA/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2471Distributed queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/1734Details of monitoring file system events, e.g. by the use of hooks, filter drivers, logs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5066Algorithms for mapping a plurality of inter-dependent sub-tasks onto a plurality of physical CPUs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Hardware Redundancy (AREA)

Abstract

PROCESSING DATA FROM MULTIPLE SOURCES In a first aspect, a method includes, at a node of a Hadoop cluster, the node storing a first portion of data in HDFS data storage, executing a first instance of a data processing engine capable of receiving data from a data source external to the Hadoop cluster, receiving a computer-executable program by the data processing engine, executing at least part of the program by the first instance of the data processing engine, receiving, by the data processing engine, a second portion of data from the external data source, storing the second portion of data other than in HDFS storage, and performing, by the data processing engine, a data processing operation identified by the program using at least the first portion of data and the second portion of data. (Fig. 1)
SG10201906917QA 2014-04-17 2015-04-16 Processing data from multiple sources SG10201906917QA (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US14/255,579 US9607073B2 (en) 2014-04-17 2014-04-17 Processing data from multiple sources

Publications (1)

Publication Number Publication Date
SG10201906917QA true SG10201906917QA (en) 2019-09-27

Family

ID=53016776

Family Applications (2)

Application Number Title Priority Date Filing Date
SG10201906917QA SG10201906917QA (en) 2014-04-17 2015-04-16 Processing data from multiple sources
SG11201608186RA SG11201608186RA (en) 2014-04-17 2015-04-16 Processing data from multiple sources

Family Applications After (1)

Application Number Title Priority Date Filing Date
SG11201608186RA SG11201608186RA (en) 2014-04-17 2015-04-16 Processing data from multiple sources

Country Status (7)

Country Link
US (4) US9607073B2 (en)
EP (1) EP3132348B1 (en)
JP (3) JP6581108B2 (en)
AU (2) AU2015247639B2 (en)
CA (1) CA2946118A1 (en)
SG (2) SG10201906917QA (en)
WO (1) WO2015161025A1 (en)

Families Citing this family (38)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9607073B2 (en) 2014-04-17 2017-03-28 Ab Initio Technology Llc Processing data from multiple sources
US10672078B1 (en) * 2014-05-19 2020-06-02 Allstate Insurance Company Scoring of insurance data
CN105205082A (en) * 2014-06-27 2015-12-30 国际商业机器公司 Method and system for processing file storage in HDFS
US10599648B2 (en) * 2014-09-26 2020-03-24 Applied Materials, Inc. Optimized storage solution for real-time queries and data modeling
US10120904B2 (en) * 2014-12-31 2018-11-06 Cloudera, Inc. Resource management in a distributed computing environment
US10706970B1 (en) 2015-04-06 2020-07-07 EMC IP Holding Company LLC Distributed data analytics
US10404787B1 (en) 2015-04-06 2019-09-03 EMC IP Holding Company LLC Scalable distributed data streaming computations across multiple data processing clusters
US10366111B1 (en) * 2015-04-06 2019-07-30 EMC IP Holding Company LLC Scalable distributed computations utilizing multiple distinct computational frameworks
US10812341B1 (en) 2015-04-06 2020-10-20 EMC IP Holding Company LLC Scalable recursive computation across distributed data processing nodes
US10348810B1 (en) * 2015-04-06 2019-07-09 EMC IP Holding Company LLC Scalable distributed computations utilizing multiple distinct clouds
US10425350B1 (en) 2015-04-06 2019-09-24 EMC IP Holding Company LLC Distributed catalog service for data processing platform
US10505863B1 (en) 2015-04-06 2019-12-10 EMC IP Holding Company LLC Multi-framework distributed computation
US10515097B2 (en) * 2015-04-06 2019-12-24 EMC IP Holding Company LLC Analytics platform for scalable distributed computations
US10509684B2 (en) 2015-04-06 2019-12-17 EMC IP Holding Company LLC Blockchain integration for scalable distributed computations
US10511659B1 (en) * 2015-04-06 2019-12-17 EMC IP Holding Company LLC Global benchmarking and statistical analysis at scale
US10791063B1 (en) 2015-04-06 2020-09-29 EMC IP Holding Company LLC Scalable edge computing using devices with limited resources
US10528875B1 (en) 2015-04-06 2020-01-07 EMC IP Holding Company LLC Methods and apparatus implementing data model for disease monitoring, characterization and investigation
US10496926B2 (en) 2015-04-06 2019-12-03 EMC IP Holding Company LLC Analytics platform for scalable distributed computations
US10541936B1 (en) * 2015-04-06 2020-01-21 EMC IP Holding Company LLC Method and system for distributed analysis
US10270707B1 (en) 2015-04-06 2019-04-23 EMC IP Holding Company LLC Distributed catalog service for multi-cluster data processing platform
US10776404B2 (en) 2015-04-06 2020-09-15 EMC IP Holding Company LLC Scalable distributed computations utilizing multiple distinct computational frameworks
US10541938B1 (en) 2015-04-06 2020-01-21 EMC IP Holding Company LLC Integration of distributed data processing platform with one or more distinct supporting platforms
US10860622B1 (en) 2015-04-06 2020-12-08 EMC IP Holding Company LLC Scalable recursive computation for pattern identification across distributed data processing nodes
US11222072B1 (en) * 2015-07-17 2022-01-11 EMC IP Holding Company LLC Graph database management system and method for a distributed computing environment
US10929417B2 (en) 2015-09-11 2021-02-23 International Business Machines Corporation Transforming and loading data utilizing in-memory processing
US11403318B2 (en) * 2015-10-01 2022-08-02 Futurewei Technologies, Inc. Apparatus and method for managing storage of a primary database and a replica database
US10656861B1 (en) 2015-12-29 2020-05-19 EMC IP Holding Company LLC Scalable distributed in-memory computation
US10374968B1 (en) 2016-12-30 2019-08-06 EMC IP Holding Company LLC Data-driven automation mechanism for analytics workload distribution
CN106941524A (en) * 2017-03-14 2017-07-11 郑州云海信息技术有限公司 A kind of WEB file configuration methods of HDFS
US10776121B2 (en) * 2017-05-10 2020-09-15 Atlantic Technical Organization System and method of execution map generation for schedule optimization of machine learning flows
US10338963B2 (en) * 2017-05-10 2019-07-02 Atlantic Technical Organization, Llc System and method of schedule validation and optimization of machine learning flows for cloud computing
US10437643B2 (en) 2017-11-10 2019-10-08 Bank Of America Corporation Independent storage and processing of data with centralized event control
US20190361999A1 (en) * 2018-05-23 2019-11-28 Microsoft Technology Licensing, Llc Data analysis over the combination of relational and big data
US11030204B2 (en) 2018-05-23 2021-06-08 Microsoft Technology Licensing, Llc Scale out data storage and query filtering using data pools
JP7313123B2 (en) * 2018-05-25 2023-07-24 ヤフー株式会社 Computing system and computing method
US20190362016A1 (en) * 2018-05-25 2019-11-28 Salesforce.Com, Inc. Frequent pattern analysis for distributed systems
US11714992B1 (en) * 2018-12-13 2023-08-01 Amazon Technologies, Inc. Neural network processing based on subgraph recognition
US11782706B1 (en) 2021-06-29 2023-10-10 Amazon Technologies, Inc. Reconfigurable neural network processing based on subgraph recognition

Family Cites Families (63)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5226159A (en) 1989-05-15 1993-07-06 International Business Machines Corporation File lock management in a distributed data processing system
US6446070B1 (en) 1998-02-26 2002-09-03 Sun Microsystems, Inc. Method and apparatus for dynamic distributed computing over a network
US5966072A (en) * 1996-07-02 1999-10-12 Ab Initio Software Corporation Executing computations expressed as graphs
US5897638A (en) 1997-06-16 1999-04-27 Ab Initio Software Corporation Parallel virtual file system
US6389420B1 (en) 1999-09-30 2002-05-14 Emc Corporation File manager providing distributed locking and metadata management for shared data access by clients relinquishing locks after time period expiration
US7587467B2 (en) 1999-12-02 2009-09-08 Western Digital Technologies, Inc. Managed peer-to-peer applications, systems and methods for distributed data access and storage
US7146524B2 (en) 2001-08-03 2006-12-05 Isilon Systems, Inc. Systems and methods for providing a distributed file system incorporating a virtual hot spare
US7200747B2 (en) 2001-10-31 2007-04-03 Hewlett-Packard Development Company, L.P. System for ensuring data privacy and user differentiation in a distributed file system
US7340452B2 (en) * 2003-12-16 2008-03-04 Oracle International Corporation Parallel single cursor model on multiple-server configurations
US7315926B2 (en) 2004-09-21 2008-01-01 Emc Corporation Lock management for concurrent access to a single file from multiple data mover computers
US7716630B2 (en) * 2005-06-27 2010-05-11 Ab Initio Technology Llc Managing parameters for graph-based computations
US20080189251A1 (en) * 2006-08-25 2008-08-07 Jeremy Branscome Processing elements of a hardware accelerated reconfigurable processor for accelerating database operations and queries
US8069190B2 (en) * 2007-12-27 2011-11-29 Cloudscale, Inc. System and methodology for parallel stream processing
JP5557430B2 (en) 2008-04-11 2014-07-23 日東電工株式会社 PROTON CONDUCTIVE POLYMER ELECTROLYTE MEMBRANE, PROCESS FOR PRODUCING THE SAME, MEMBRANE-ELECTRODE ASSEMBLY USING THE SAME, AND POLYMER ELECTROLYTE FUEL CELL
JP5535230B2 (en) * 2008-10-23 2014-07-02 アビニシオ テクノロジー エルエルシー Fuzzy data manipulation
US8239847B2 (en) * 2009-03-18 2012-08-07 Microsoft Corporation General distributed reduction for data parallel computing
US8209664B2 (en) * 2009-03-18 2012-06-26 Microsoft Corporation High level programming extensions for distributed data parallel processing
WO2010140883A2 (en) * 2009-06-02 2010-12-09 Vector Fabrics B.V. Improvements in embedded system development
AU2010295547B2 (en) * 2009-09-16 2015-05-07 Ab Initio Technology Llc Mapping dataset elements
JP6084037B2 (en) * 2009-12-14 2017-02-22 アビニシオ テクノロジー エルエルシー Specifying user interface elements
US8539192B2 (en) * 2010-01-08 2013-09-17 International Business Machines Corporation Execution of dataflow jobs
US8918388B1 (en) * 2010-02-26 2014-12-23 Turn Inc. Custom data warehouse on top of mapreduce
US8555265B2 (en) 2010-05-04 2013-10-08 Google Inc. Parallel processing of data
US9495427B2 (en) * 2010-06-04 2016-11-15 Yale University Processing of data using a database system in communication with a data processing framework
US8260840B1 (en) * 2010-06-28 2012-09-04 Amazon Technologies, Inc. Dynamic scaling of a cluster of computing nodes used for distributed execution of a program
US9727438B2 (en) * 2010-08-25 2017-08-08 Ab Initio Technology Llc Evaluating dataflow graph characteristics
US9552206B2 (en) * 2010-11-18 2017-01-24 Texas Instruments Incorporated Integrated circuit with control node circuitry and processing circuitry
KR20120067133A (en) * 2010-12-15 2012-06-25 한국전자통신연구원 Service providing method and device using the same
US20120239612A1 (en) 2011-01-25 2012-09-20 Muthian George User defined functions for data loading
US9116955B2 (en) 2011-05-02 2015-08-25 Ab Initio Technology Llc Managing data queries
US8661449B2 (en) * 2011-06-17 2014-02-25 Microsoft Corporation Transactional computation on clusters
US8954568B2 (en) 2011-07-21 2015-02-10 Yahoo! Inc. Method and system for building an elastic cloud web server farm
US8356050B1 (en) * 2011-11-21 2013-01-15 Yahoo! Inc. Method or system for spilling in query environments
US20130198120A1 (en) * 2012-01-27 2013-08-01 MedAnalytics, Inc. System and method for professional continuing education derived business intelligence analytics
US9172608B2 (en) * 2012-02-07 2015-10-27 Cloudera, Inc. Centralized configuration and monitoring of a distributed computing cluster
US9268590B2 (en) 2012-02-29 2016-02-23 Vmware, Inc. Provisioning a cluster of distributed computing platform based on placement strategy
US9367601B2 (en) 2012-03-26 2016-06-14 Duke University Cost-based optimization of configuration parameters and cluster sizing for hadoop
US9158843B1 (en) * 2012-03-30 2015-10-13 Emc Corporation Addressing mechanism for data at world wide scale
US10169083B1 (en) * 2012-03-30 2019-01-01 EMC IP Holding Company LLC Scalable method for optimizing information pathway
US20130325814A1 (en) 2012-05-30 2013-12-05 Spectra Logic Corporation System and method for archive in a distributed file system
US9235446B2 (en) 2012-06-22 2016-01-12 Microsoft Technology Licensing, Llc Parallel computing execution plan optimization
US9182957B2 (en) * 2012-07-10 2015-11-10 Loring Craymer Method and system for automated improvement of parallelism in program compilation
CA2879668C (en) * 2012-07-24 2020-07-07 Ab Initio Technology Llc Mapping entities in data models
US9201638B2 (en) 2012-08-07 2015-12-01 Nec Laboratories America, Inc. Compiler-guided software accelerator for iterative HADOOP® jobs
CN103714073B (en) 2012-09-29 2017-04-12 国际商业机器公司 Method and device for querying data
US9411558B2 (en) * 2012-10-20 2016-08-09 Luke Hutchison Systems and methods for parallelization of program code, interactive data visualization, and graphically-augmented code editing
US10459920B2 (en) * 2012-10-31 2019-10-29 Hewlett-Packard Development Company, L.P. Support actual and virtual SQL dataflow by streaming infrastructure
US20140149715A1 (en) * 2012-11-28 2014-05-29 Los Alamos National Security, Llc Scalable and programmable computer systems
US9264346B2 (en) * 2012-12-31 2016-02-16 Advanced Micro Devices, Inc. Resilient duplicate link aggregation emulation
US9342557B2 (en) * 2013-03-13 2016-05-17 Cloudera, Inc. Low latency query engine for Apache Hadoop
US9292373B2 (en) * 2013-03-15 2016-03-22 International Business Machines Corporation Query rewrites for data-intensive applications in presence of run-time errors
US9256460B2 (en) * 2013-03-15 2016-02-09 International Business Machines Corporation Selective checkpointing of links in a data flow based on a set of predefined criteria
US20140304545A1 (en) * 2013-04-05 2014-10-09 Hewlett-Packard Development Company, L.P. Recovering a failure in a data processing system
US20140304549A1 (en) * 2013-04-05 2014-10-09 Hewlett-Packard Development Company, L.P. Recovering a failure in a data processing system
US9113299B2 (en) 2013-05-17 2015-08-18 Xerox Corporation Method and apparatus for automatic mobile endpoint device configuration management based on user status or activity
US10133800B2 (en) * 2013-09-11 2018-11-20 Microsoft Technology Licensing, Llc Processing datasets with a DBMS engine
US20150127880A1 (en) 2013-11-01 2015-05-07 Cognitive Electronics, Inc. Efficient implementations for mapreduce systems
US10776325B2 (en) * 2013-11-26 2020-09-15 Ab Initio Technology Llc Parallel access to data in a distributed file system
KR20150092586A (en) * 2014-02-05 2015-08-13 한국전자통신연구원 Method and Apparatus for Processing Exploding Data Stream
US9576039B2 (en) * 2014-02-19 2017-02-21 Snowflake Computing Inc. Resource provisioning systems and methods
US9607073B2 (en) 2014-04-17 2017-03-28 Ab Initio Technology Llc Processing data from multiple sources
US9679041B2 (en) * 2014-12-22 2017-06-13 Franz, Inc. Semantic indexing engine
US10191948B2 (en) * 2015-02-27 2019-01-29 Microsoft Technology Licensing, Llc Joins and aggregations on massive graphs using large-scale graph processing

Also Published As

Publication number Publication date
US10642850B2 (en) 2020-05-05
JP6815456B2 (en) 2021-01-20
WO2015161025A1 (en) 2015-10-22
US9607073B2 (en) 2017-03-28
US20170220646A1 (en) 2017-08-03
AU2015247639A1 (en) 2016-10-20
SG11201608186RA (en) 2016-10-28
AU2015247639B2 (en) 2020-03-19
EP3132348B1 (en) 2022-09-21
US11403308B2 (en) 2022-08-02
CA2946118A1 (en) 2015-10-22
US20220365928A1 (en) 2022-11-17
JP6983990B2 (en) 2021-12-17
JP2021057072A (en) 2021-04-08
JP2017518561A (en) 2017-07-06
AU2020203145A1 (en) 2020-06-04
AU2020203145B2 (en) 2022-02-10
US20200265047A1 (en) 2020-08-20
US11720583B2 (en) 2023-08-08
EP3132348A1 (en) 2017-02-22
US20150302075A1 (en) 2015-10-22
JP6581108B2 (en) 2019-09-25
JP2019200819A (en) 2019-11-21

Similar Documents

Publication Publication Date Title
SG10201906917QA (en) Processing data from multiple sources
MY179952A (en) Graph generating device, graph generating method and graph generating program
MX2020004287A (en) Processing biomass.
TW201714102A (en) Information sharing system and information sharing method for sharing information between multiple robot systems
MX2023005874A (en) Coordinated processing of data by networked computing resources.
AU2017257446A1 (en) Digital asset modeling
EP3614267A3 (en) Recoverable stream processing
TW201614650A (en) Comparison operations in memory
MX344125B (en) Modifying structured search queries on online social networks.
EA201692294A1 (en) METHOD AND DEVICE FOR DEVELOPING THE PROPOSED ONTOLOGY
MX2015015260A (en) Using inverse operators for queries on online social networks.
JP2017518561A5 (en)
MY170162A (en) Bottle with insulative body
IN2013CH05115A (en)
GB2542053A (en) Automatically generating a semantic mapping for a relational database
MX2017005095A (en) Composite partition functions.
IN2014CH02163A (en)
PH12017500234A1 (en) Information-processing system
MX2018006523A (en) Techniques for case allocation.
IN2014MU00042A (en)
MY182442A (en) Design assistance method
MX2016010227A (en) Intervention recommendation for well sites.
EP3182298A3 (en) Smart elastic scaling based on application scenarios
TW201614507A (en) Methods and devices for finding settings to be used in relation to a sensor unit connected to a processing unit
GB2527230A (en) Processing seismic attributes using mathematical morphology