CN118012917B - Data stream processing method, scheduling controller and distributed data stream processing system - Google Patents
Data stream processing method, scheduling controller and distributed data stream processing system Download PDFInfo
- Publication number
- CN118012917B CN118012917B CN202410417403.XA CN202410417403A CN118012917B CN 118012917 B CN118012917 B CN 118012917B CN 202410417403 A CN202410417403 A CN 202410417403A CN 118012917 B CN118012917 B CN 118012917B
- Authority
- CN
- China
- Prior art keywords
- processing
- flow
- execution
- message
- scheduling
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000012545 processing Methods 0.000 title claims abstract description 311
- 238000003672 processing method Methods 0.000 title claims abstract description 13
- 238000010586 diagram Methods 0.000 claims abstract description 71
- 238000005111 flow chemistry technique Methods 0.000 claims abstract description 17
- 125000002015 acyclic group Chemical group 0.000 claims abstract description 5
- 238000000034 method Methods 0.000 claims description 27
- 238000013507 mapping Methods 0.000 claims description 10
- 238000004458 analytical method Methods 0.000 claims description 9
- 230000000977 initiatory effect Effects 0.000 claims description 6
- 238000000605 extraction Methods 0.000 claims description 2
- 238000012423 maintenance Methods 0.000 abstract description 4
- 230000008569 process Effects 0.000 description 9
- 238000004590 computer program Methods 0.000 description 7
- 238000013461 design Methods 0.000 description 7
- 230000006870 function Effects 0.000 description 4
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000003860 storage Methods 0.000 description 3
- 230000007547 defect Effects 0.000 description 2
- 230000000087 stabilizing effect Effects 0.000 description 2
- 230000002159 abnormal effect Effects 0.000 description 1
- 238000007792 addition Methods 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000009897 systematic effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
- G06F16/24568—Data stream processing; Continuous queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a data stream processing method, a scheduling controller and a distributed data stream processing system, which relate to the field of data processing, wherein the data stream processing method comprises the step of determining a target processing item based on a scheduling message, the scheduling message comprises a flow certificate and flow positioning information, and the specific steps are as follows: extracting a corresponding flow configuration diagram based on the flow certificate, wherein the flow configuration diagram is a directed acyclic diagram and is used for indicating a processing flow of a corresponding data flow processing task, each node of the flow configuration diagram is a processing item, and the relationship among the nodes is the sequence of data flow processing; and determining the current processing position in the flow configuration diagram based on the flow positioning information, and taking a processing item corresponding to a next node of the current processing position as a target processing item. The invention is applicable to complex business scenes with non-uniform data processing flow under the premise of easy maintenance and easy expansion.
Description
Technical Field
The present invention relates to the field of data processing, and in particular, to a data stream processing method, a scheduling controller, and a distributed data stream processing system.
Background
In practical application, aiming at different data processing service scenes, the data types of various types of data to be processed are different, and the data processing flows are different;
In the prior art, data processing is often performed in a sequential encoding manner to obtain required data, namely, in the process type encoding is performed on data processing logic corresponding to each service scene, the scheme can cause repetition encoding, and the defects of difficult maintenance and difficult expansion exist.
Aiming at the problems:
The patent document with the publication number of CN113065029A discloses a data stream processing scheme which is used for unitizing and componentizing the content source data stream processing flows of different service scenes to obtain a unified processing logic framework and processing the content source data streams of different service scenes by using the unified processing logic framework; however, the unified processing logic framework is a fixed flow framework, and the scheme is only applicable to scenes with different data sources and the same data processing flow;
Patent document with publication number CN111597058A discloses a scheme for automatically constructing a topology architecture by each processing node according to own topology parameters and carrying out data stream processing according to the topology architecture; the topology is also a fixed flow framework, so the scheme is also applicable to single-flow data stream processing scenes.
Disclosure of Invention
The invention provides a data stream processing method and system and also provides a scheduling controller aiming at the defect that the prior art is only suitable for a single data processing flow.
In order to solve the technical problems, the invention is solved by the following technical scheme:
In a first aspect, a data stream processing method is provided, including the steps of:
acquiring a scheduling message;
Determining target processing items based on the scheduling messages, and generating execution messages corresponding to the target processing items one by one;
based on the processing items, sending the execution messages to corresponding processors, wherein the processors are in one-to-one correspondence with the processing items;
the scheduling message comprises a flow certificate and flow positioning information, and the specific steps of determining a target processing item based on the scheduling message are as follows:
Extracting a corresponding flow configuration diagram based on the flow certificate, wherein the flow configuration diagram is a directed acyclic diagram and is used for indicating a processing flow of a corresponding data flow processing task, each node of the flow configuration diagram is a processing item, and the relationship among the nodes is the sequence of data flow processing;
And determining the current processing position in the flow configuration diagram based on the flow positioning information, and taking a processing item corresponding to a next node of the current processing position as a target processing item.
In a second aspect, there is provided a scheduling controller comprising:
The system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a scheduling message, and the scheduling message comprises a flow certificate and flow positioning information;
The analysis processing module determines target processing items based on the scheduling messages and generates execution messages corresponding to the target processing items one by one;
The sending module is used for sending the execution message to the corresponding processor based on the processing items, and the processor corresponds to the processing items one by one;
The analysis processing module comprises:
The extraction unit is used for extracting a corresponding flow configuration diagram based on the flow certificate, wherein the flow configuration diagram is a directed acyclic diagram and is used for indicating a processing flow of a corresponding data flow processing task, each node of the flow configuration diagram is a processing item, and the relationship among the nodes is the sequence of data flow processing;
the target acquisition unit is used for determining the current processing position in the flow configuration diagram based on the flow positioning information, and taking a processing item corresponding to a next node of the current processing position as a target processing item.
In a third aspect, a distributed data stream processing system is provided, comprising:
The scheduling queue is used for storing scheduling messages, and the scheduling messages comprise flow certificates and flow positioning information;
the execution queues are in one-to-one correspondence with the processing items and are used for storing the execution messages corresponding to the processing items;
The processors are in one-to-one correspondence with the processing items and are used for processing corresponding execution messages;
The scheduling controller is used for acquiring scheduling messages from the scheduling queues, determining target processing items based on the scheduling messages, generating execution messages corresponding to the target processing items one by one, and sending the execution messages to the corresponding execution queues based on the processing items;
and the execution controller is used for acquiring the execution messages from each execution queue, carrying out concurrency judgment on the execution messages, and distributing the execution messages to corresponding processors based on the concurrency judgment result.
The technical scheme provided by the embodiment of the invention at least has the following beneficial effects:
Through the design of the flow configuration diagram and the flow positioning information, the next data flow direction can be determined without centralization, and the processor can be reused for data flow processing tasks of different data flows, so that the method is applicable to complex business scenes with non-uniform data processing flows on the premise of easy maintenance and easy expansion.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart showing the determination of a target processing item based on the scheduling message in embodiment 1;
FIG. 2 is a schematic diagram of a modular connection of a dispatch controller according to the present invention;
FIG. 3 is a schematic diagram of the module connections of the analysis processing module of FIG. 2;
FIG. 4 is a schematic diagram of the modular connections of a distributed data stream processing system according to the present invention.
Detailed Description
The present invention will be described in further detail with reference to the following examples, which are illustrative of the present invention and are not intended to limit the present invention thereto.
Embodiment 1, a data stream processing method, includes a scheduling control method.
The scheduling control method comprises the following steps:
s100, acquiring a scheduling message.
The scheduling message in this embodiment includes a flow credential and flow location information.
Flow voucher:
The flow certificate is a unique identification code of a flow configuration diagram, and the flow configuration diagram is used for indicating the processing flow of a corresponding data flow processing task.
In this embodiment, the flow configuration diagram is a directed acyclic graph, each node of the flow configuration diagram is a processing item, a relationship between nodes is a sequence of data flow processing, that is, each node corresponds to a processor, and a connection between the nodes indicates an execution sequence of the processor;
the processing item is a specific data processing step that can be independently completed, and a person skilled in the art can set the processing item according to actual needs, for example, data call and data notification can be used as a processing item, and the unified processing logic framework and topology described in the documents pointed out in the background art can also be used as a processing item, which is not limited in detail in the present specification.
In practical application, a worker can configure a flow configuration diagram according to practical needs, and takes a unique identification code of the flow configuration diagram as a flow credential, and when a data flow processing task is initiated, a scheduling message containing the flow credential is generated.
Flow positioning information:
the flow location information is used to indicate the current processing location in the flow configuration diagram.
The flow positioning information comprises a processing type and a processing progress, wherein the processing type comprises an entry type and a callback type, the entry type indicates that the current processing position is a flow entry, and the callback type indicates that the current processing position is a specific processing progress;
The type of the scheduling message is divided into a task initiating message and a feedback message processing;
the task initiating message is a message generated when a data stream processing task is initiated, the processing type of the task initiating message is an entry type, and the processing progress is null;
the processing feedback information is generated when the processor completes the corresponding processing project based on the execution information, the processing type of the processing feedback information is a callback type, and the processing progress is used for indicating the currently completed processing project.
Those skilled in the art can set the process positioning information according to the actual situation, for example, the processing progress can be used as the process positioning information, which is not limited in detail in the present specification.
The present embodiment pulls the scheduling message from the message queue.
S200, determining target processing items based on the scheduling message, and generating execution messages corresponding to the target processing items one by one.
When the target processing item is not obtained, the corresponding data stream processing task is completed without further processing.
S300, based on the processing items, sending the execution message to a corresponding processor, wherein the processor corresponds to the processing items one by one.
In this embodiment, message queues corresponding to the processing items one by one are provided, and the execution message is sent to the corresponding message queues according to the processing items.
Referring to fig. 1, the specific steps of step S200 include:
s210, extracting a corresponding flow configuration diagram based on the flow certificate.
S220, determining the current processing position in the flow configuration diagram based on the flow positioning information, and taking a processing item corresponding to a next node of the current processing position as a target processing item.
The next level node is a node directly belonging to the current processing position.
As shown in fig. 1, when the current processing position is a processing item a, the position of the corresponding node a, and the node B and the node C directly belonging to the node a are acquired from the flow configuration diagram, so that the processing item B corresponding to the node B and the processing item C corresponding to the node C are regarded as target processing nodes.
S230, generating corresponding execution messages for each target processing item in turn.
For a single data processing flow scene, the data flow only needs to flow along the constructed processing architecture, but for a multi-data processing flow processing scene, each processor is often required to be scheduled by a centralized node, but the centralized design is contrary to the aim of easy expansion;
according to the embodiment, through the design of the flow configuration diagram and the flow positioning information, the next data flow direction can be determined without centralization, and the processor can be reused for data flow processing tasks of different data flows, so that the method is applicable to complex business scenes with non-uniform data processing flows on the premise of easy maintenance and easy expansion.
Each processing item has corresponding service parameters and/or result parameters, wherein the service parameters are parameters to be processed corresponding to the processing item, and the result parameters are results obtained by the processing item;
Further:
The scheduling message belonging to the processing feedback message also comprises at least one result parameter;
the node of the flow configuration diagram comprises service parameters corresponding to the processing items and parameter configuration information corresponding to the service parameters, wherein the parameter configuration information comprises names corresponding to the service parameters and a name mapping table;
when the processing type of the scheduling message is the callback type, the specific steps of generating the corresponding execution message for the target processing item in step S230 are as follows:
obtaining result parameters in the scheduling message;
extracting parameter configuration information corresponding to the target processing item from the flow configuration diagram;
When the result parameter is the same as the name of the corresponding service parameter, assigning the result parameter to the service parameter;
When the service parameters which are the same as the result parameters and the names are absent, mapping the result parameters to corresponding service parameters based on the name mapping table;
And generating a corresponding execution message based on the obtained service parameters.
In this embodiment, the parameter configuration information adopts JSON format, and a person skilled in the art can determine the format of the parameter configuration information according to actual needs, such as XML format.
In practical application, if the parameter configuration information of the next processing item is null, the result parameter of the current processing item is directly used as the service parameter.
The parameter configuration information is a parameter mapping relation set under a specific processor in the flow configuration diagram, and the embodiment converts the result parameter of the preamble processing item into the service parameter of the next processing item through the design of the parameter configuration information, so as to eliminate the coupling relation between the processors.
Further:
the nodes of the flow configuration diagram also contain state attributes corresponding to the processing items;
The state attribute is used for indicating the corresponding processing item to start or skip;
When the state attribute of the target processing item is started, generating an execution message for the target processing item;
And when the state attribute of the target processing item is skip, the target processing item is regarded as completed, and the processing item corresponding to the next node of the target processing item is taken as a new target processing item based on the flow configuration diagram.
In actual use, the staff can adjust the existing flow configuration diagram to obtain the required flow configuration diagram, that is, the staff can set the state attribute of the existing flow configuration diagram according to actual needs to quickly generate the flow configuration diagram meeting the requirements of the corresponding data flow processing task.
For example: referring to the flow configuration diagram in fig. 1, the data processing flow is { a }, { B, C }, and { D }, in this order, the data flow is divided into (A, B, D) and (A, C, D) branches, where D is multiplexing of the processors;
Generating a corresponding scheduling message after the processing item A is completed;
determining that the target processing items are the processing item B and the processing item C based on the scheduling message;
When the starting attribute of the processing item B is skip and the starting attribute of the processing item C is start, updating the target processing item from the processing item B and the processing item C to the processing item D and the processing item C;
At this time, for the data flow branch of (A, B, D), continuing to judge the starting attribute of the processing item D, mapping the result parameter of the processing item A to obtain the service parameter of the processing item D when the starting attribute of the processing item D is starting, and generating an execution message corresponding to the processing item D;
And for the data flow branch of (A, C, D), mapping the result parameter of the processing item A to obtain the service parameter of the processing item C, generating an execution message corresponding to the processing item C, and subsequently obtaining the service parameter of the processing item D based on the result parameter mapping of the processing item C.
Further:
when the state attribute of the target processing item is skip, all the current target processing items are regarded as completed, and the processing item corresponding to the next node of each target processing item is regarded as a new target processing item based on the flow configuration diagram.
Taking { A }, { B, C }, and { D } as examples of the data processing flow in sequence, when the starting attribute of the processing item B is found to be skip, directly updating the target processing item from the processing item B and the processing item C to the processing item D at the moment;
The previous nodes of the processing item B and the processing item C are the processing item A, so the nodes corresponding to the processing item B and the processing item C are regarded as peer nodes, in practical application, the starting attributes corresponding to the peer nodes in the same data stream processing task are the same, and in order to reduce repeated judgment, when the starting attribute of one processing item is skip, it is judged that all the processing items in the same level as the processing item are skipped.
Further:
When the generation of the corresponding execution message based on the scheduling message fails, the scheduling message is used as a scheduling retry message;
And acquiring and processing the scheduling retry message based on a preset time interval, wherein the processing steps of the scheduling retry message are identical to those of the scheduling message.
The embodiment is provided with a scheduling fast queue and a scheduling slow queue;
The scheduling quick queue is used for placing scheduling messages;
the scheduling slow queue is used for placing scheduling retry messages;
In practical application, the scheduling information is consumed from the scheduling quick queue in real time, and is used for quickly consuming events, so that high throughput is ensured; and consuming the scheduling retry message from the scheduling slow queue according to a preset time interval, wherein the scheduling retry message is used for stabilizing a failure event and ensuring that data is not lost.
Further:
when the processor processes the corresponding execution message successfully, generating a corresponding processing feedback message as a scheduling message to be put into a scheduling quick queue so as to quickly process the next processing item;
When the processor fails to process the corresponding execution message, judging whether the corresponding execution message belongs to a preset negligible failure type or not;
if the current service parameters belong to the preset negligible failure types, the current service parameters are directly used as result parameters to generate corresponding processing feedback information which is used as scheduling information to be placed into a scheduling quick queue;
If the current execution message does not belong to the preset negligible failure type, the current execution message is taken as an execution retry message, the execution retry message is acquired and processed based on a preset time interval, and the processing steps of the execution retry message and the execution message are the same.
The embodiment is provided with a plurality of execution queues, and the execution queues are in one-to-one correspondence with the processing items;
each group of execution queues comprises an execution fast queue and an execution slow queue;
The execution quick queue is used for placing execution messages;
The scheduling slow queue is used for placing an execution retry message;
in actual application, executing information is consumed from the executing quick queue in real time, so that high throughput is ensured; and consuming the execution retry message from the execution slow queue according to a preset time interval, wherein the execution retry message is used for stabilizing the failure event and ensuring that the data is not lost.
Further:
The node of the flow configuration diagram comprises concurrent control configuration corresponding to the processing item, wherein the concurrent control configuration comprises a signal key and a dynamic parameter key, the signal key is used for indicating basic service of concurrent control corresponding to the processing item, and the dynamic parameter key is used for specifying business parameters for concurrent control;
the execution message contains concurrency control parameters corresponding to the target processing item.
The specific steps for obtaining the concurrency control parameters are as follows:
Extracting concurrency control configuration corresponding to the target processing item from the flow configuration diagram;
when the dynamic parameter key is designated in the concurrency control configuration, acquiring corresponding service parameters based on the dynamic parameter key as the concurrency control parameters;
And when the dynamic parameter key is not specified in the concurrency control configuration, extracting the signal key as the concurrency control parameter.
For example:
The processing item A is NEWS CONTENT acquisition, the signal key is NEWS_CONTENT, the dynamic parameter key value is url in the parameter, and url is a uniform resource locator;
the processing item B is to generate a summary based on a large MODEL, the signal key is BIG_MODEL, and a dynamic parameter key is not set;
the processing item C is a company entity based on a large MODEL, the large MODEL adopted by the processing item C is the same as the processing item B, the signal key is BIG_MODEL, and the dynamic parameter key is not set.
The concurrency control parameter of the processing item A is corresponding url, and concurrency control of the same data source is realized.
The concurrency control parameter corresponding to the processing item B and the processing item C is BIG_MODEL, so that concurrency control on the same basic service is realized.
The embodiment can realize concurrent processing of data stream processing tasks through the design of concurrent control parameters, and can wait for execution of the same source data through the design of dynamic parameter keys, so that abnormal coverage of the data caused by repeated processing is avoided, and dirty reading and unreal reading are solved; through the design of the signal key, the rare resources can be globally controlled, the risk of systematic breakdown is reduced, and the data processing quality and success rate are improved.
Embodiment 2, a data stream processing method, including a concurrency control method, is performed before step S300 in embodiment 1.
The concurrency control method comprises the following steps:
s400, carrying out concurrency judgment based on concurrency control parameters in the execution message or the execution retry message;
S500, when the concurrency judging result is execution, sending the execution message to a corresponding processor based on a processing item, and locking based on the concurrency control parameter.
The method comprises the following steps:
s510, inquiring a local lock object based on the concurrency control parameter;
The local lock object is stored in the environment of the local operation time, so that the query speed is high, and the execution message is directly made to wait for execution when the local lock object is queried.
S520, when the local lock object is not queried, querying the distributed lock object based on the concurrency control parameter;
The distributed lock object is stored in a distributed cache service, which may employ Redis, for example.
S530, when the distributed lock object is not queried, judging that the concurrency judgment result is execution, sending the execution message to a corresponding processor based on a processing item, and generating a corresponding local lock object and distributed lock object based on the concurrency control parameter.
The embodiment utilizes the local lock and the distributed lock to control the concurrency of data processing, controls the competition of scarce resources, and supports distributed deployment and transverse expansion based on the characteristic, namely, on one hand, a queue and a message mechanism are utilized, processing items, and execution queues and processors corresponding to the processing items are increased according to actual needs to achieve distributed expansion; on the other hand, by the distributed lock, the respective processors in the distributed architecture are controlled concurrently.
Embodiment 3, a scheduling controller, as shown in fig. 2, includes:
an obtaining module 100, configured to obtain a scheduling message, where the scheduling message includes a flow credential and flow positioning information;
The analysis processing module 200 determines target processing items based on the scheduling messages and generates execution messages corresponding to the target processing items one by one;
a sending module 300, configured to send the execution message to a corresponding processor based on a processing item, where the processor corresponds to the processing item one by one;
referring to fig. 3, the analysis processing module 200 includes:
an extracting unit 210, configured to extract a corresponding flow configuration diagram based on the flow credential, where the flow configuration diagram is a directed acyclic graph, and is configured to indicate a processing flow corresponding to a data stream processing task, each node of the flow configuration diagram is a processing item, and a relationship between nodes is a sequence of data stream processing;
a target obtaining unit 220, configured to determine a current processing position in the flow configuration diagram based on the flow positioning information, and take a processing item corresponding to a next node of the current processing position as a target processing item;
the generating unit 230 sequentially generates corresponding execution messages for the respective target processing items.
Further:
The nodes of the flow configuration diagram comprise concurrency control configuration corresponding to the processing items, the concurrency control configuration comprises a signal key and a dynamic parameter key, the signal key is used for indicating basic services of concurrency control of the corresponding processing items, and the dynamic parameter key is used for specifying business parameters for concurrency control;
the execution message contains concurrency control parameters corresponding to the target processing items;
Referring to fig. 3, the analysis processing module 200 further includes a concurrency control configuration unit 240, and the concurrency control configuration unit 240:
The concurrency control configuration corresponding to the target processing item is extracted from the flow configuration diagram;
the method is also used for acquiring corresponding service parameters based on the dynamic parameter key as the concurrency control parameters when the dynamic parameter key is designated in the concurrency control configuration;
And the signal key is also used for extracting the signal key as the concurrency control parameter when the dynamic parameter key is not specified in the concurrency control configuration.
Embodiment 4, a distributed data stream processing system, referring to fig. 4, includes:
A dispatch queue 10 for storing dispatch messages, the dispatch messages including flow credentials and flow location information;
the execution queues 20 are in one-to-one correspondence with the processing items and are used for storing the execution messages corresponding to the processing items;
The processors 30 are in one-to-one correspondence with the processing items, and the processors 30 are used for processing corresponding execution messages;
a dispatch controller 40, configured to obtain a dispatch message from the dispatch queue 10, determine a target processing item based on the dispatch message, generate an execution message corresponding to the target processing item one-to-one, and send the execution message to a corresponding execution queue 20 based on the processing item; the schedule controller 40 is the schedule controller 40 described in embodiment 3;
the execution controller 50 is configured to obtain an execution message from each execution queue 20, perform concurrency determination on the execution message, and distribute the execution message to the corresponding processor 30 based on the concurrency determination result.
Further:
The dispatch queue 10 comprises a dispatch fast queue 11 and a dispatch slow queue;
the fast scheduling queue 11 is used for placing scheduling messages;
the dispatch slow queue 12 is used to place dispatch retry messages.
The schedule controller 40:
the method is used for acquiring and processing the scheduling message from the scheduling quick queue 11 in real time;
and is further configured to place the scheduling message as a scheduling retry message into the scheduling slow queue 12 when processing the scheduling message or the scheduling retry message fails;
and is further configured to obtain and process a scheduling retry message from the scheduling slow queue 12 based on a preset interval time.
Further:
The processor 30 is configured to:
When the corresponding execution message is successfully processed, a corresponding processing feedback message is generated and is taken as a scheduling message to be put into a scheduling quick queue 11 so as to quickly carry out the next processing item;
when the corresponding execution message is failed to be processed, judging whether the corresponding execution message belongs to a preset negligible failure type or not;
If the current service parameters belong to the preset negligible failure types, the current service parameters are directly used as result parameters to generate corresponding processing feedback information which is used as a scheduling message to be put into a scheduling quick queue 11;
if the current execution message does not belong to the preset negligible failure type, the current execution message is put into the execution queue 20 as an execution retry message.
Further:
The execution queue 20 includes an execution fast queue 21 and an execution slow queue 22;
The execution quick queue 21 is used for placing execution messages;
The execution slow queue 22 is used to place execution retry messages.
The execution controller 50:
For acquiring and processing the execution message from the execution quick queue 21 in real time;
and the method is also used for returning the corresponding execution message or the execution retry message to the corresponding message queue when the concurrence judgment result is waiting;
And is further configured to obtain and process an execution retry message from the execution slow queue 22 based on a preset interval time.
For the device embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments for relevant points.
In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described by differences from other embodiments, and identical and similar parts between the embodiments are all enough to be referred to each other.
It will be apparent to those skilled in the art that embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor 30 of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal device to produce a machine, such that the instructions, which execute via the processor 30 of the computer or other programmable data processing terminal device, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It should be noted that:
Reference in the specification to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the invention. Thus, the appearances of the phrase "one embodiment" or "an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the invention.
In addition, the specific embodiments described in the present specification may differ in terms of parts, shapes of components, names, and the like. All equivalent or simple changes of the structure, characteristics and principle according to the inventive concept are included in the protection scope of the present invention. Those skilled in the art may make various modifications or additions to the described embodiments or substitutions in a similar manner without departing from the scope of the invention as defined in the accompanying claims.
Claims (10)
1. A method of data stream processing comprising the steps of:
acquiring a scheduling message;
Determining target processing items based on the scheduling messages, and generating execution messages corresponding to the target processing items one by one;
based on the processing items, sending the execution messages to corresponding processors, wherein the processors are in one-to-one correspondence with the processing items;
the scheduling message comprises a flow certificate and flow positioning information, and the specific steps of determining a target processing item based on the scheduling message are as follows:
Extracting a corresponding flow configuration diagram based on the flow certificate, wherein the flow configuration diagram is a directed acyclic diagram and is used for indicating a processing flow of a corresponding data flow processing task, each node of the flow configuration diagram is a processing item, and the relationship among the nodes is the sequence of data flow processing;
And determining the current processing position in the flow configuration diagram based on the flow positioning information, and taking a processing item corresponding to a next node of the current processing position as a target processing item.
2. A data stream processing method according to claim 1, characterized in that:
The flow positioning information comprises a processing type and a processing progress, wherein the processing type comprises an entry type and a callback type;
The type of the scheduling message comprises a task initiating message and a processing feedback message;
the task initiating message is a message generated when a data stream processing task is initiated, the processing type of the task initiating message is an entry type, and the processing progress is null;
the processing feedback information is generated when the processor completes the corresponding processing project based on the execution information, the processing type of the processing feedback information is a callback type, and the processing progress is used for indicating the currently completed processing project.
3. A data stream processing method according to claim 2, characterized in that:
The scheduling message belonging to the processing feedback message also comprises at least one result parameter;
the node of the flow configuration diagram comprises service parameters corresponding to the processing items and parameter configuration information corresponding to the service parameters, wherein the parameter configuration information comprises names corresponding to the service parameters and a name mapping table;
When the processing type of the scheduling message is a callback type, the specific steps of generating the corresponding execution message for the target processing item are as follows:
obtaining result parameters in the scheduling message;
extracting parameter configuration information corresponding to the target processing item from the flow configuration diagram;
When the result parameter is the same as the name of the corresponding service parameter, assigning the result parameter to the service parameter;
When the service parameters which are the same as the result parameters and the names are absent, mapping the result parameters to corresponding service parameters based on the name mapping table;
And generating a corresponding execution message based on the obtained service parameters.
4. A data stream processing method according to any one of claims 1 to 3, characterized in that:
The node of the flow configuration diagram comprises concurrent control configuration corresponding to the processing item, wherein the concurrent control configuration comprises a signal key and a dynamic parameter key, the signal key is used for indicating basic service of concurrent control corresponding to the processing item, and the dynamic parameter key is used for specifying business parameters for concurrent control;
The execution message contains concurrency control parameters corresponding to the target processing items;
The specific steps for obtaining the concurrency control parameters are as follows:
Extracting concurrency control configuration corresponding to the target processing item from the flow configuration diagram;
when the dynamic parameter key is designated in the concurrency control configuration, acquiring corresponding service parameters based on the dynamic parameter key as the concurrency control parameters;
And when the dynamic parameter key is not specified in the concurrency control configuration, extracting the signal key as the concurrency control parameter.
5. The method according to claim 4, wherein the step of concurrently judging is further included before the execution message is sent to the corresponding processor based on the processing item, and comprises the steps of;
Carrying out concurrency judgment based on concurrency control parameters in the execution message;
And when the concurrency judgment result is execution, sending the execution message to a corresponding processor based on the processing item, and locking based on the concurrency control parameter.
6. The method for processing data streams according to claim 5, wherein the specific step of performing concurrency determination based on the concurrency control parameter in the execution message is as follows:
performing local lock object inquiry based on the concurrency control parameters;
When the local lock object is not queried, performing distributed lock object query based on the concurrency control parameter;
and when the distributed lock object is not queried, judging that the concurrency judgment result is execution, sending the execution message to a corresponding processor based on a processing item, and generating a corresponding local lock object and distributed lock object based on the concurrency control parameter.
7. A data stream processing method according to any one of claims 1 to 3, characterized in that:
When the generation of the corresponding execution message based on the scheduling message fails, the scheduling message is used as a scheduling retry message;
And acquiring and processing the scheduling retry message based on a preset time interval.
8. A dispatch controller comprising:
The system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a scheduling message, and the scheduling message comprises a flow certificate and flow positioning information;
The analysis processing module determines target processing items based on the scheduling messages and generates execution messages corresponding to the target processing items one by one;
The sending module is used for sending the execution message to the corresponding processor based on the processing items, and the processor corresponds to the processing items one by one;
The analysis processing module comprises:
The extraction unit is used for extracting a corresponding flow configuration diagram based on the flow certificate, wherein the flow configuration diagram is a directed acyclic diagram and is used for indicating a processing flow of a corresponding data flow processing task, each node of the flow configuration diagram is a processing item, and the relationship among the nodes is the sequence of data flow processing;
the target acquisition unit is used for determining the current processing position in the flow configuration diagram based on the flow positioning information, and taking a processing item corresponding to a next node of the current processing position as a target processing item.
9. A dispatch controller according to claim 8, wherein:
The nodes of the flow configuration diagram comprise concurrency control configuration corresponding to the processing items, the concurrency control configuration comprises a signal key and a dynamic parameter key, the signal key is used for indicating basic services of concurrency control of the corresponding processing items, and the dynamic parameter key is used for specifying business parameters for concurrency control;
the execution message contains concurrency control parameters corresponding to the target processing items;
the analysis processing module further comprises a concurrency control configuration unit, wherein the concurrency control configuration unit is used for:
The concurrency control configuration corresponding to the target processing item is extracted from the flow configuration diagram;
the method is also used for acquiring corresponding service parameters based on the dynamic parameter key as the concurrency control parameters when the dynamic parameter key is designated in the concurrency control configuration;
And the signal key is also used for extracting the signal key as the concurrency control parameter when the dynamic parameter key is not specified in the concurrency control configuration.
10. A distributed data stream processing system, comprising:
The scheduling queue is used for storing scheduling messages, and the scheduling messages comprise flow certificates and flow positioning information;
the execution queues are in one-to-one correspondence with the processing items and are used for storing the execution messages corresponding to the processing items;
The processors are in one-to-one correspondence with the processing items and are used for processing corresponding execution messages;
The scheduling controller is used for acquiring scheduling messages from the scheduling queues, determining target processing items based on the scheduling messages, generating execution messages corresponding to the target processing items one by one, and sending the execution messages to the corresponding execution queues based on the processing items; the scheduling controller is the scheduling controller according to any one of claims 8 or 9;
and the execution controller is used for acquiring the execution messages from each execution queue, carrying out concurrency judgment on the execution messages, and distributing the execution messages to corresponding processors based on the concurrency judgment result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410417403.XA CN118012917B (en) | 2024-04-09 | 2024-04-09 | Data stream processing method, scheduling controller and distributed data stream processing system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410417403.XA CN118012917B (en) | 2024-04-09 | 2024-04-09 | Data stream processing method, scheduling controller and distributed data stream processing system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN118012917A CN118012917A (en) | 2024-05-10 |
CN118012917B true CN118012917B (en) | 2024-06-11 |
Family
ID=90959660
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410417403.XA Active CN118012917B (en) | 2024-04-09 | 2024-04-09 | Data stream processing method, scheduling controller and distributed data stream processing system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN118012917B (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102521232A (en) * | 2011-11-09 | 2012-06-27 | Ut斯达康通讯有限公司 | Distributed acquisition and processing system and method of internet metadata |
CN108268319A (en) * | 2016-12-31 | 2018-07-10 | 中国移动通信集团河北有限公司 | Method for scheduling task, apparatus and system |
CN115827280A (en) * | 2022-12-27 | 2023-03-21 | 北京奇艺世纪科技有限公司 | Message processing method and device, electronic equipment and storage medium |
CN116050179A (en) * | 2023-02-28 | 2023-05-02 | 安徽交欣科技股份有限公司 | Simulation scheduling method based on historical interaction data flow |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10685034B2 (en) * | 2017-10-17 | 2020-06-16 | Salesforce.Com, Inc. | Systems, methods, and apparatuses for implementing concurrent dataflow execution with write conflict protection within a cloud based computing environment |
-
2024
- 2024-04-09 CN CN202410417403.XA patent/CN118012917B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102521232A (en) * | 2011-11-09 | 2012-06-27 | Ut斯达康通讯有限公司 | Distributed acquisition and processing system and method of internet metadata |
CN108268319A (en) * | 2016-12-31 | 2018-07-10 | 中国移动通信集团河北有限公司 | Method for scheduling task, apparatus and system |
CN115827280A (en) * | 2022-12-27 | 2023-03-21 | 北京奇艺世纪科技有限公司 | Message processing method and device, electronic equipment and storage medium |
CN116050179A (en) * | 2023-02-28 | 2023-05-02 | 安徽交欣科技股份有限公司 | Simulation scheduling method based on historical interaction data flow |
Also Published As
Publication number | Publication date |
---|---|
CN118012917A (en) | 2024-05-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111506412B (en) | Airflow-based distributed asynchronous task construction and scheduling system and method | |
JP2019523462A (en) | Multitask scheduling method, system, application server, and computer-readable storage medium | |
CN105162878A (en) | Distributed storage based file distribution system and method | |
CN103237060A (en) | Method, device and system for data object acquisition | |
CN113032379B (en) | Distribution network operation and inspection-oriented multi-source data acquisition method | |
CN113179329B (en) | Service issuing method and device, server and storage medium | |
CN114237850A (en) | Quantum computation distributed queue management method and system | |
CN118012917B (en) | Data stream processing method, scheduling controller and distributed data stream processing system | |
CN115509676A (en) | Container set deployment method and device | |
CN110880131A (en) | Invoice generation method and device | |
CN107844566B (en) | Dump control method and system | |
WO2014019545A1 (en) | Content distribution system and content distribution method therefor | |
CN111147585B (en) | Equipment upgrading method, device, storage medium and system | |
CN110750362A (en) | Method and apparatus for analyzing biological information, and storage medium | |
CN113067833A (en) | Collaborative configuration service method and related components | |
CN110908698A (en) | Automatic application program publishing method based on process arrangement | |
CN115344644A (en) | Data synchronization method and device, electronic equipment and computer readable storage medium | |
CN116521309A (en) | kubernetes cluster deployment method, device, equipment and medium | |
CN110275699A (en) | Code construction method, Serverless platform and object storage platform | |
CN112306521B (en) | Middle-stage service system, method and storage medium | |
CN114884973B (en) | Batch registration method and device for vehicle positioning data and storage medium | |
CN116680012B (en) | Industrial software configuration management system and method | |
CN114629900B (en) | Information processing method, device and equipment | |
CN116088913B (en) | Integrated device, method and computer program product for whole vehicle upgrade software | |
CN109144704B (en) | Method for automatically scheduling timed tasks in distributed environment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |