CN100356335C - Method and apparatus to preserve trace data - Google Patents
Method and apparatus to preserve trace data Download PDFInfo
- Publication number
- CN100356335C CN100356335C CNB2005100730734A CN200510073073A CN100356335C CN 100356335 C CN100356335 C CN 100356335C CN B2005100730734 A CNB2005100730734 A CN B2005100730734A CN 200510073073 A CN200510073073 A CN 200510073073A CN 100356335 C CN100356335 C CN 100356335C
- Authority
- CN
- China
- Prior art keywords
- subregion
- data
- tracking data
- error log
- interrupt handler
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0766—Error or fault reporting or storing
- G06F11/0775—Content or structure details of the error report, e.g. specific table structure, specific error fields
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0706—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
- G06F11/0712—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in a virtual computing platform, e.g. logically partitioned systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Quality & Reliability (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Debugging And Monitoring (AREA)
Abstract
A method, apparatus, and computer instructions for processing trace data in a logical partitioned data processing system. A partition causing an exception is identified in response to detecting the exception. The partition is one within a set of partitions in the logical partitioned data processing system. The trace data for the identified partition is stored in an error log or other data structure for a machine check interrupt handler.
Description
Technical field
The present invention relates generally to improved data handling system, specifically, relate to the method and apparatus that is used for deal with data.More particularly, the present invention relates to be used for managing method, device and the computer instruction of tracking data at logic area data processing system.
Background technology
The large-scale symmetric multiprocessor data handling system that increases day by day such as the Sunfire 15K Server of the DHP9000Superdome Enterprise Server of the IBM eServer P690 of International Business Machine Corporation (IBM), Hewlett-Packard and Sun Microsystems, Inc. is not to be used as single large data disposal system.On the contrary, the data handling system of these types is by subregion and be used as littler system.These systems are also referred to as logical partition (LPAR) data handling system.Logically partitioning function in the data handling system allows a plurality of copies or the operation simultaneously on individual data disposal system platform of a plurality of isomery operating system of single operating.Wherein operation has the subregion of operation system image to be assigned with a non-overlapped subclass of platform resource.The assignable resource of these platforms comprise one or more on architecture distinct processor and interrupt management zone, system storage zone and I/O (I/O) adapter bus duct.The resource of subregion is presented to operation system image by the firmware of platform.
Each the different operating system or the operation system image that move in a certain platform are protected mutually, so that the software error on logical partition can not influence the true(-)running of any other subregion.By distributing the disjoint set of the platform resource that will directly manage by each operation system image, and, can provide this protection by being provided for guaranteeing the mechanism of each all uncontrollable still unallocated any resource to this reflection of videoing.In addition, can stop the software error that when the Resources allocation of control operation system, occurs to influence the resource of any other reflection.
Therefore, each operation system image or operating system that each is different are all directly controlled the different sets of allowable resource in the platform.With regard to the hardware resource in the logic area data processing system, these resources are shared between each subregion separatedly.These resources can comprise for example I/O (I/O) adapter, storer DIMM, nonvolatile RAM (NVRAM) and hard disk drive.Each subregion in the LPAR data handling system can be guided under needn't the situation of the whole data handling system power supply of switch and be closed repeatedly.
When logic area data processing system runs into fault, need the data relevant help identification and analysis of failure with process and system state.In current logic area data processing system, because current system design, the data that some tracing trouble is required are also unavailable.For example, platform firmware comprises a trace tool, and it allows to follow the tracks of the code path in the firmware.An example of the platform firmware that uses in the logic area data processing system is a supervisory routine, and it can obtain from International Business Machine Corporation (IBM).
By the trace tool of current use, make platform firmware when calling when each subregion, the trace information of the code path of being taked in the display platform firmware and critical data value are written in the trace buffer.When subregion runs into mistake and wrong path with the critical data value when tracked together, this trace information is particularly important.
Current, all logical partition Modular Data Transaction System platforms are all supported the supervisory routine trace tool, and this instrument is used for the term of execution of supervisory routine, the supervisory routine code is carried out the trace point data write the trace buffer that is arranged in the supervisory routine space.Under the situation that system breaks down, these supervisor call trace data are very crucial for effective fault analysis at scene.
This situation has produced a problem, and in large-scale configuration, processor is dedicated to a plurality of subregions, and wherein these subregions write same buffer zone.These buffer zones are organized in the round-robin mode usually.Therefore, if the subregion collapse, tracking data can be rewritten by other section posts in the logic area data processing system soon.As a result, the required critical data of assisted diagnosis problem may be lost.
A solution is to create bigger buffer zone.And then along with the increase of number of partitions, the size of this trace buffer also needs to increase to hold extra subregion.The buffer zone structure must be allocated in advance the maximum configured of considering, because each logic area data processing system is all disposed and allow to carry out dynamic-configuration separately.As a result, for less configuration, there is the storage space that is wasted.And then, accounting in the system of higher price at system storage, the space of waste has increased the cost of logic area data processing system.
Therefore, it is favourable having a kind of improved method, device and computer instruction that is used to preserve tracking data.
Summary of the invention
The invention provides a kind of method, device and computer instruction that is used for the tracking data of processing logic partitioned data processing system.Cause that unusual subregion is identified and detect described unusual with response.Described subregion is a subregion in the component district in the described logic area data processing system.The tracking data that is used for the described subregion that is identified is stored in error log or other data structures that is used for the hardware check interrupt handler.
Description of drawings
The novel feature that is considered to characteristic of the present invention has been described in claims.But, when read in conjunction with the accompanying drawings,, can understand invention itself and preferred use-pattern thereof, further purpose and advantage best by with reference to following to detailed description of illustrative embodiments, these accompanying drawings are:
Fig. 1 is the calcspar that wherein can realize data handling system of the present invention;
Fig. 2 is the calcspar that wherein can realize example logic partitioned platform of the present invention;
Fig. 3 is the synoptic diagram of assembly that is used for handling the existing logic area data processing system of tracking data;
Fig. 4 be according to a preferred embodiment of the present invention be used to manage the synoptic diagram of tracking data with the configuration of eliminating security vulnerabilities;
Fig. 5 is the process flow diagram of process that is used to filter tracking data according to a preferred embodiment of the present invention; And
Fig. 6 is the process flow diagram that is used for preserving at error log the process of tracking data according to a preferred embodiment of the present invention.
Embodiment
With reference now to accompanying drawing,, specifically, with reference to figure 1, Fig. 1 shows the calcspar that wherein can realize data handling system of the present invention.Data handling system 100 can be a symmetric multi processor (smp) system, and described system comprises a plurality of processors 101,102,103 and 104 that are connected to system bus 106.For example, data handling system 100 can be the IMBeServer that realizes as server in the network, and it is the product that is positioned at the International Business Machine Corporation (IBM) of New York A Mengke.Alternately, can adopt single processor system.What be connected to system bus 106 simultaneously is Memory Controller/cache memory 108, and it provides the interface with a plurality of local storage 160-163.I/O bus bridge 110 links to each other with system bus 106 and interface with I/O bus 112 is provided.As shown in the figure, Memory Controller/cache memory 108 can be an one with I/O bus bridge 110.
Data handling system 100 is a logical partition (LPAR) data handling systems.Therefore, data handling system 100 can make a plurality of isomery operating systems (or a plurality of examples of single operating) move simultaneously.Each operating system in these a plurality of operating systems can be carried out the software program of any amount therein.Data handling system 100 quilts are subregion logically, so that different PCI I/O adapter 120-121,128-129 and 136, graphics adapter 148 and harddisk adapter 149 can be assigned to different logical partitions.In the case, graphics adapter 148 provides connection for the display device (not shown), and harddisk adapter 149 provides connection with control hard disk 150.
Therefore, for example tentation data disposal system 100 is divided into three logical partition P1, P2 and P3.Each PCI I/O adapter 120-121,128-129,136, graphics adapter 148, harddisk adapter 149, each host-processor 101-104 and from the storer of local storage 160-163 are assigned to each subregion in these three subregions.In these examples, storer 160-163 can take the form of dual-inline memory module (DIMM).Usually, on the basis of each DIMM, DIMM do not distributed to each subregion.On the contrary, a certain subregion will obtain the part by the being seen global memory of platform.For example, processor 101, can be assigned to logical partition P1 from some part and the I/O adapter 120,128 and 129 of the storer of local storage 160-163; Processor 102-103, can be assigned to logical partition P2 from some part and the PCI I/O adapter 121 and 136 of the storer of local storage 160-163; And processor 104, can be assigned to logical partition P3 from some part, graphics adapter 148 and the harddisk adapter 149 of the storer of local storage 160-163.
Each operating system of carrying out in data handling system 100 all is assigned to different logical partitions.Therefore, each operating system of carrying out in data handling system 100 is only visited those I/O unit within its logical partition.Therefore, for example, an example of senior mutual execution (AIX) operating system can be carried out in subregion P1, and second example (reflection) of described AIX operating system can be carried out in subregion P2, and Linux or OS/400 operating system can be moved in logical partition P3.
Periphery component interconnection (PCI) host bridge 114 that is connected to I/O bus 112 provides the interface with PCI local bus 115.A plurality of PCI input/output adapter 120-121 can link to each other with pci bus 115 by PCI to PCI bridge 116, pci bus 118, pci bus 119, I/O groove 170 and I/O groove 171.PCI to PCI bridge 116 provides the interface with pci bus 118 and pci bus 119.PCI I/O adapter 120 and 121 is placed on respectively in I/O groove 170 and 171.Typical pci bus realizes supporting four to eight I/O adapters (that is the expansion slot that, is used for additional connector).Each PCI I/O adapter 120-121 provides the interface between data handling system 100 and the input-output apparatus (it is the client of data handling system 100) such as other network computers for example.
The interface of the pci bus 123 that additional host pci bridge 122 is provided for adding.Pci bus 123 links to each other with a plurality of PCI I/O adapter 128-129.PCI I/O adapter 128-129 can link to each other with pci bus 123 by PCI to PCI bridge 124, pci bus 126, pci bus 127, I/O groove 172 and I/O groove 173.PCI to PCI bridge 124 provides the interface with pci bus 126 and pci bus 127.PCI I/O adapter 128 and 129 is placed on respectively in I/O groove 172 and 173.By this way, by each PCI I/O adapter 128-129, can support the additional I/O equipment such as for example modulator-demodular unit or network adapter.In this way, data handling system 100 allows the connection of a plurality of network computers.
The graphics adapter 148 that is inserted into the memory mapped in the I/O groove 174 can be passed through pci bus 144, PCI to PCI bridge 142, pci bus 141 and host pci bridge 140 and link to each other with I/O bus 112.Harddisk adapter 149 can be placed in the I/O groove 175, and this I/O groove 175 links to each other with pci bus 145.In turn, this bus links to each other with PCI to PCI bridge 142, and bridge 142 links to each other with host pci bridge 140 by pci bus 141 again.
Host pci bridge 130 provides interface so that be connected to I/O bus 112 for pci bus 131.PCI I/O adapter 136 links to each other with I/O groove 176, and groove 176 links to each other with PCI to PCI bridge 132 by pci bus 133.PCI to PCI bridge 132 links to each other with pci bus 131.This pci bus also is connected to host pci bridge 130 service processor mailbox interface and isa bus visit transfer logical one 94 and PCI to PCI bridge 132.Service processor mailbox interface and isa bus visit transfer logical one 94 are transmitted the PCI visit of going to PCI/ISA bridge 193.NVRAM storage 192 links to each other with isa bus 196.Service processor 135 links to each other with isa bus visit transfer logical one 94 with service processor mailbox interface by its local pci bus 195.Service processor 135 is also by a plurality of JTAG/I
2C bus 134 links to each other with processor 101-104.JTAG/I
2C bus 134 is JTAG/ scanning bus (referring to IEEE 1149.1) and Phillips I
2The combination of C bus.But, alternately, JTAG/I
2C bus 134 can be by independent Phillips I
2C bus or independent JTAG/ scanning bus replace.Host-processor 101,102,103 and all SP-ATTN signals of 104 all are connected to the interruption input signal of service processor together.Service processor 135 has its oneself local storage 191, and addressable hardware operation panel 190.
When the initial energized of data handling system 100, service processor 135 uses JTAG/I
2C bus 134 is come inquiry system (main frame) processor 101-104, Memory Controller/cache memory 108 and I/O bridge 110.Finish after this step, service processor 135 has been understood the resource (inventory) and the topology of data handling system 100.Service processor 135 is also carried out Built-in Self Test (BIST), the property testing that is in the main true (BAT) and memory test on all parts of finding by inquiry host-processor 101-104, Memory Controller/cache memory 108 and I/O bridge 110.All error messages of detected fault are all collected by service processor 135 and are reported during BIST, BAT and the memory test.
If after during taking out BIST, BAT and memory test, being found parts with fault, meaningful/efficient system resource distribution remains possible, and then data handling system 100 is allowed to continue executable code is written into this locality (main frame) storer 160-163.Then, service processor 135 discharges host-processor 101-104 so that carry out the code that is written into local storage 160-163.When separately the operating system of host-processor 101-104 in data handling system 100 was come run time version, service processor 135 entered the pattern of supervision and reporting errors.The item types that is monitored by service processor 135 for example comprises cooling fan rotation speed and operation, thermal sensor, power regulator and by recovering or irrecoverable error that processor 101-104, local storage 160-163 and I/O bridge 110 are reported.
Service processor 135 is responsible for that all are monitored the relevant error message of project in preservations and report and the data handling system 100.Service processor 135 also can be taken action according to type of error and defined threshold value.For example, service processor 135 may be noticed and occur too much recoverable error in the cache memory of a certain processor and judge that this is the omen of hard fault.Based on this judgement, service processor 135 can this resource of mark, so that in current Run Sessions phase and the initial program cancellation configuration during (IPL) of packing in the future.IPL is also sometimes referred to as " guiding " or " boot ".
Can use various business computer system to realize data handling system 100.For example, can use IBM eServer iSeries Model 840 systems of International Business Machine Corporation (IBM) to realize data handling system 100.This type systematic can support to use OS/400 operating system (it also can obtain from International Business Machine Corporation (IBM)) to carry out logical partition.
It will be apparent to one skilled in the art that the hardware shown in Fig. 1 can change to some extent.For example, except that shown in the hardware or hardware shown in substituting, can also use other peripherals such as CD drive.Example illustrated is not to be intended to hint architectural limitation of the present invention.
With reference now to Fig. 2,, Fig. 2 shows the calcspar that wherein can realize example logic partitioned platform of the present invention.Hardware in the logical partition platform 200 may be implemented as the data handling system 100 among Fig. 1 for example.Logical partition platform 200 comprises partitioned hardware 230, operating system 202,204,206,208 and partition 210.Operating system 202,204,206 and 208 can be a plurality of copies of single operating of operation or the operating system of a plurality of isomeries simultaneously on logical partition platform 200.Can use OS/400 to realize these operating system, they are designed to and the partition such as supervisory routine interacts.OS/400 only is used as an example in these exemplary embodiments.Certainly, depend on specific embodiment, can use the operating system of the other types such as AIX and Linux.Operating system 202,204,206 and 208 is arranged in subregion 203,205,207 and 209.Supervisory routine software is an example that can be used for realizing the software of partition 210, and can obtain from International Business Machine Corporation (IBM).Firmware is " software " that is stored in the memory chip that also can keep its content under the situation that does not have electric power, described memory chip such as, for example, ROM (read-only memory) (ROM), programming ROM (PROM), erasable programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM) and nonvolatile RAM (non-volatile ram).
In addition, these subregions also comprise partition firmware 211,213,215 and 217.Abstraction (RTAS, it can obtain from International Business Machine Corporation (IBM)) realized when partition firmware 211,213,215 and 217 can be used bootstrap code, IEEE-1275 standard open firmware and operation.When subregion 203,205,207 and 209 during, the copy of boot code is loaded into subregion 203,205,207 and 209 by platform firmware 210 by instantiation.After this, control is delivered to boot code, and boot code loads Open Firmware and RTAS then.Processor related with subregion or that distribute to subregion is assigned to the storer of subregion then to carry out partition firmware.
With reference now to Fig. 3,, Fig. 3 shows the synoptic diagram of the assembly of the existing logic area data processing system that is used for handling tracking data.In this exemplary example, subregion 300 comprises operating system 302 and RTAS 304.Subregion 306 comprises operating system 308 and RTAS 310.In the example shown in these, can there be 255 subregions at most.All these subregions all manage by the platform firmware such as supervisory routine 312.
When subregion 300 calls supervisory routine 312, will generate supervisory routine code path 318.Be stored in the supervisor call trace buffer zone 316 from these tracking datas that call generation.This tracking data is stored in the list item of supervisor call trace buffer zone 418 of Fig. 4.Tracking data comprises trace information and the critical data value that shows the code path of being taked in these examples.Except tracking data, each list item comprises that also partition identifier makes the subregion that calls (it produces this tracking data) with sign.When code path occurring, zone manager 314 also writes trace information supervisor call trace buffer zone 316.When this zone manager 314 called supervisory routine 312, this trace information was written into.Zone manager 314 is assemblies in the supervisory routine 312.This assembly is used for management zone and comprises such as beginning and stop function the subregion.Be stored in the supervisor call trace buffer zone 316 by calling of making of zone manager 314.
In a similar fashion, when making by subregion 306, will form supervisory routine code path 320, and be used for these tracking datas that call simultaneously and be stored in supervisor call trace buffer zone 316 the calling of supervisory routine 312.In a similar fashion, when supervisory routine 312 is received calling of subregion 306, will form supervisory routine code path 320.The tracking data that is used for this path also is stored in supervisor call trace buffer zone 316.
When each subregion was activated, operating system was loaded and begins and carries out.When the operating system such as operating system 302 needed platform resource, the RTAS that operating system is made RTAS 304 called, and this makes the supervisor call to supervisory routine 312 successively.When supervisor call is carried out, will carry out special " trace point ", wherein tracking data is written into supervisor call trace buffer zone 316 and sequestered partition data not.In other words, be used for all being placed in this buffer zone by all tracking datas that call that all subregions are made.
Be not only the supervisory routine data of representing the subregion executive supervisor to call and be written into single buffer zone, and zone manager 314 (it is the supervisory routine instrument) is also write same trace buffer with its tracking data.In addition, hardware check interrupt handler 322 use identical trace tool and with information stores in supervisor call trace buffer zone 316.
Yet this existing system all is stored in all information in the supervisor call trace buffer zone 316.As a result, because the finite space in the buffer zone of these types, data may be capped in existing system.In large-scale configuration, when all processors all were exclusively used in a plurality of subregions and write the same buffer zone of organizing in a similar manner, the problems referred to above can increase.If the subregion collapse takes place, the tracking data in the supervisory routine code path 318 will be covered by other subregions very soon, cause can the assisted diagnosis problem critical data lose.
With reference now to Fig. 4,, Fig. 4 show according to a preferred embodiment of the present invention be used to manage the synoptic diagram of tracking data with the configuration of eliminating security vulnerabilities.In this exemplary example, there are subregion 400 and subregion 402 and manage by supervisory routine 404.Subregion 400 comprises operating system 406 and RTAS 408, and subregion 402 comprises operating system 410 and RTAS 412.When subregion 400 or subregion 402 produces the calling of supervisory routine 404, will generate supervisory routine code path 414 and supervisory routine code path 416, the while tracking data is stored in the supervisor call trace buffer zone 418.
And then zone manager 420 can produce and call, and wherein tracking data is stored in the supervisor call trace buffer zone 418.Hardware check interrupt handler 422 also with data storage in supervisor call trace buffer zone 418.
Hardware check interrupt handler 422 comprises mistake buffer zone 426.This buffer zone comprises error log 428.This error log comprises a wrong private part, and this part has can be by form selected or that the particular analysis program is used.This daily record also comprises a user area that can be used for storing the tracking data of illustrative examples.
In illustrative examples, hardware check appears at every turn when unusual, and the tracking data in the supervisor call trace buffer zone 418 just is copied in the error log 428 in the wrong buffer zone 426.In these illustrative examples, duplicate by hardware check interrupt handler 422 and carry out.Unusually be the error situation of normal control flow in the change program.Unusually can or cause by hardware or software generation.Hardware anomalies for example comprises resets, interrupts or from the signal of Memory Management Unit.Unusually can by ALU or floating point unit such as remove zero, overflow, numeric error the underflow or the instruction decoding error such as privileged instruction, reserve statement, trap instruction or undefined instruction produce.
It is unusual that the subregion collapse of significant proportion also can produce hardware check.Mechanism of the present invention uses this class incident to obtain tracking data and store tracking data in the mode of avoiding tracking data to lose.Specifically, in these illustrative examples, do not use the User Part of error log 428.This part is used to store tracking data.Any untapped part of error log or other the untapped spaces in the wrong buffer zone 426 can be used to store the tracking data from supervisor call trace buffer zone 418.Hardware check occurs when unusual, hardware check interrupt handler 422 is write the trace buffer data the not use part of error log 428 at every turn.This characteristic allows can easily use these data to debug when error log is delivered to all operation subregions.
Only need that hardware check interrupt handler 422 is carried out a small amount of simple code change and can comprise this additional function.In the hardware check process by unusual startup, the last partition data that the current pointer of sensing tracking data is used to cause hardware check copies to the error log 428 from supervisor call trace buffer zone 418.Can filter so that only the tracking data that will wherein take place in the unusual subregion copies in the error log 428 tracking data.In these illustrative examples, this filtration can realize in the partition data filtrator 424 in the supervisory routine 404.Unusual subregion wherein takes place by check that the partition identifier of finding identifies in the last tracking list item of supervisor call trace buffer zone 418 in partition data filtrator 424.This ID is used to filter the tracking data by 422 requests of hardware check interrupt handler.Certainly, can realize this filtrator in other positions, for example in hardware check interrupt handler 422.
If subregion stops, then when hardware check interrupt handler 422 was called, partition data will be present in the supervisor call trace buffer zone 418.Hardware check interrupt handler 422 copies to error log 428 with tracking data and preserves tracking data to be used for debugging purpose.According to the quantity of free space in the error log 428, be not that all tracking datas can be copied to the error log 428 from supervisor call trace buffer zone 418.Hardware check interrupt handler 422 copies data in the error log 428, all has been replicated or no longer includes free space in error log 428 up to all tracking datas.
After hardware check interrupt handler 422 copied to error log 428 with the subregion tracking data, this daily record was written in the Nonvolatile memory devices such as NVRAM 430.This daily record is collected routine by special mistake and is duplicated.This routine copies to all active partition to analyze with this daily record then.In these examples, described routine is the part of hardware check interrupt handler.
With reference now to Fig. 5,, Fig. 5 shows the process flow diagram of process that is used to filter tracking data according to a preferred embodiment of the present invention.Process shown in Fig. 5 can realize in the filter process such as the partition data filtrator 424 of Fig. 4.When the request of data received from the trace buffer such as the supervisor call trace buffer zone 418 of Fig. 4, this process is activated.In these illustrative examples, receive described request by the hardware check interrupt handler.
Described process starts from receiving the request (step 500) to tracking data.Then, identify partition identifier (PID) (step 502) in the last tracking data list item.Then, select tracking data list item (step 504).Selected list item is the list item that is not sent to requestor as yet.In these illustrative examples, from up-to-date to the oldest each list item of selecting.In other words, at first select up-to-date list item to be used for transmission.
Then, obtain data (step 506) from selected tracking data list item.Then, the decision data PID (step 508) that whether is used to identify.If the PID that data are used to identify, then data are returned hardware check interrupt handler (step 510).Then, judge the more requests (step 512) that whether exist tracking data.If there is no to more requests of tracking data, then after this process stops.
Return refer step 508, if the PID that data are not useable for identifying then selects next tracking data list item to handle (step 514), process is returned aforesaid step 506 then.In step 512, if tracking data has been made more requests, then process is returned aforesaid step 514.
With reference now to Fig. 6,, Fig. 6 shows the process flow diagram that is used for preserving at error log the process of tracking data according to a preferred embodiment of the present invention.Process shown in Figure 6 can realize in the hardware check interrupt handler such as the hardware check interrupt handler 422 of Fig. 4.When occurring needing the hardware check interrupt handler to handle unusually, this process is activated.
Described process starts from identifying the pointer (step 600) that points to the platform firmware tracking data.Current record in this pointed tracking data buffer zone.Then, the tracking data (step 602) that from trace buffer request tracking data buffer zone, is used to write down.This pointer can be successively decreased with the different recording in the request trace buffer so that check.In response to this request, judge whether receive tracking data (step 604).In these examples, returning tracking data or explanation there is not the indication of more multidata existence to respond this request.Then, if receive tracking data, these data are copied to error log (step 606).Then, in the decision error daily record whether more free space (step 608) is arranged.If there is not more free space in the error log, then process stops.
Return refer step 608, if there is more space in the error log, then process is returned aforesaid step 602.Refer again to step 604, if do not receive tracking data, then process stops.
Therefore, the invention provides improved method, device and the computer instruction that is used to handle tracking data.Mechanism of the present invention is preserved tracking data by using existing error log space storage tracking data.In these examples, error log is the daily record related with the hardware check interrupt handler, or create so that the daily record of using by the hardware check interrupt handler.Whenever take place will by the hardware check interrupt handler handle unusual the time, data are acquired.This mechanism is also filtered the data of having obtained, so that only be returned from the data that cause unusual subregion.
Be important to note that, though under the situation of the data handling system of complete function, the present invention has been described, it should be appreciated by those skilled in the art that, can issue all processes of the present invention with the form and the various forms of computer-readable medium of instruction, and the present invention can be suitable for all with being equal to and irrelevant with the particular type that is actually used in the signal bearing medium of carrying out issue.The example of computer-readable medium comprises recordable-type media, such as floppy disk, hard disk drive, RAM, CD-ROM, DVD-ROM and transmission type media, such as the wired or wireless communication link of digital and analog communication links, the transmission form of use such as for example radio frequency and light wave transmissions.Computer-readable medium can be taked the form of coded format, can be to its decoding so that use practically in the particular data disposal system.
Provided the description of this invention with illustration purpose presented for purpose of illustration, and described description is not to be intended to be exhaustive or limit the invention to disclosed form.For a person skilled in the art, many modifications and variations are conspicuous.For example, exemplary embodiment is stored in tracking data in the error log.These data can be stored in can be by in any data structure of handling described unusual process visit.The selection of embodiment and description are in order to explain principle of the present invention, practical application best, and during the specific use of conceiving when being suitable for, make others skilled in the art can understand the various embodiment with various modifications of the present invention.
Claims (28)
1. method of in logic area data processing system, handling tracking data, described method comprises:
Unusual in response to detecting, identify in the component district in the described logic area data processing system with described unusual related subregion to form the subregion that is identified; And
The described tracking data of the described subregion that is identified is stored in the error log uses for the hardware check interrupt handler, wherein said error log is included in the hardware check interrupt handler.
2. according to the method for claim 1, further comprise:
Described error log is stored in the Nonvolatile memory devices.
3. according to the method for claim 2, wherein said Nonvolatile memory devices is a nonvolatile RAM.
4. according to the process of claim 1 wherein that the step of the described subregion of described sign comprises:
Identify the partition identifier in the up-to-date tracking data list item.
5. according to the process of claim 1 wherein that described identification of steps and described storing step carried out by platform firmware.
6. according to the process of claim 1 wherein that described storing step adopts execution by described hardware check interrupt handler.
7. according to the process of claim 1 wherein that described identification of steps carried out by the partition data filtrator.
8. have only the described tracking data of the described subregion that is identified to be stored in the described error log according to the process of claim 1 wherein.
9. according to the process of claim 1 wherein that described tracking data initially is arranged in the platform firmware trace buffer.
10. logic area data processing system that is used to handle tracking data, described logic area data processing system comprises:
Identity device, described identity device are used in response to detecting unusual, identify in the component district in the described logic area data processing system with described unusual related subregion to form the subregion that is identified; And
Memory storage, described memory storage are used for the described tracking data of the described subregion that is identified is stored in error log for the use of hardware check interrupt handler, and wherein said error log is included in the hardware check interrupt handler.
11. according to the data handling system of claim 10, wherein said memory storage is first memory storage and further comprises:
Be used for described error log is stored in second memory storage of Nonvolatile memory devices.
12. according to the data handling system of claim 11, wherein said Nonvolatile memory devices is a nonvolatile RAM.
13. according to the data handling system of claim 10, wherein said identity device comprises:
Be used for identifying the identity device of the partition identifier of up-to-date tracking data list item.
14. according to the data handling system of claim 10, wherein said identity device and described memory storage are arranged in platform firmware.
15. according to the data handling system of claim 10, wherein said memory storage is arranged in described hardware check interrupt handler.
16. according to the data handling system of claim 10, wherein said identity device is arranged in the partition data filtrator.
17., wherein have only the described tracking data of the described subregion that is identified to be stored in the described error log according to the data handling system of claim 10.
18. according to the data handling system of claim 10, wherein said tracking data initially is arranged in the platform firmware trace buffer.
19. an equipment that is used to handle tracking data, described equipment comprises:
First device, described first device is used in response to detecting unusual, the subregion that is identified with formation with described unusual related subregion in the component district in the sign logic area data processing system; And
Second device, described second device are used for the described tracking data of the described subregion that is identified is stored in error log for the use of hardware check interrupt handler, and wherein said error log is included in the hardware check interrupt handler.
20. the equipment according to claim 19 further comprises:
Be used for described error log is stored in the 3rd device of Nonvolatile memory devices.
21. according to the equipment of claim 20, wherein said Nonvolatile memory devices is a nonvolatile RAM.
22. according to the equipment of claim 19, wherein said first device comprises:
Be used for identifying the sub-device of the partition identifier of up-to-date tracking data list item.
23. according to the equipment of claim 19, wherein said first device and described second device are operated by platform firmware.
24. according to the equipment of claim 19, wherein said second device is operated by described hardware check interrupt handler.
25. according to the equipment of claim 19, wherein said first device is operated by the partition data filtrator.
26., wherein have only the described tracking data of the described subregion that is identified to be stored in the described error log according to the equipment of claim 19.
27. according to the equipment of claim 19, wherein said tracking data initially is arranged in the platform firmware trace buffer.
28. a logic area data processing system, described system comprises:
Bus system;
Storer, described storer links to each other with described bus system, and wherein said storer comprises one group of instruction; And
Processing unit, described processing unit links to each other with described bus system, wherein said processing unit is carried out described instruction group, so as to identify in the component district in the described logic area data processing system with unusual related subregion forming the subregion that is identified, with response detect described unusually; And the tracking data of the described subregion that is identified is stored in the error log uses for the hardware check interrupt handler, wherein said error log is included in the hardware check interrupt handler.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US10/857,459 | 2004-05-28 | ||
US10/857,459 US7343521B2 (en) | 2004-05-28 | 2004-05-28 | Method and apparatus to preserve trace data |
Publications (2)
Publication Number | Publication Date |
---|---|
CN1702625A CN1702625A (en) | 2005-11-30 |
CN100356335C true CN100356335C (en) | 2007-12-19 |
Family
ID=35461909
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CNB2005100730734A Expired - Fee Related CN100356335C (en) | 2004-05-28 | 2005-05-27 | Method and apparatus to preserve trace data |
Country Status (4)
Country | Link |
---|---|
US (2) | US7343521B2 (en) |
JP (1) | JP5579354B2 (en) |
CN (1) | CN100356335C (en) |
TW (1) | TW200611115A (en) |
Families Citing this family (41)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB2395578A (en) * | 2002-11-22 | 2004-05-26 | Ibm | Fault tracing in systems with virtualization layers |
US7370240B2 (en) * | 2004-04-29 | 2008-05-06 | International Business Machines Corporation | Method and apparatus for preserving trace data in a logical partitioned data processing system |
US7734741B2 (en) * | 2004-12-13 | 2010-06-08 | Intel Corporation | Method, system, and apparatus for dynamic reconfiguration of resources |
US7738484B2 (en) | 2004-12-13 | 2010-06-15 | Intel Corporation | Method, system, and apparatus for system level initialization |
US7555682B2 (en) * | 2005-05-13 | 2009-06-30 | Texas Instruments Incorporated | Distributed width trace receiver |
US20080155246A1 (en) * | 2006-12-21 | 2008-06-26 | Unisys Corporation | System and method for synchronizing memory management functions of two disparate operating systems |
EP2128764A4 (en) * | 2007-02-09 | 2012-06-20 | Fujitsu Ltd | Degeneratuion method and information processor |
US7853928B2 (en) * | 2007-04-19 | 2010-12-14 | International Business Machines Corporation | Creating a physical trace from a virtual trace |
US20100029193A1 (en) * | 2008-07-31 | 2010-02-04 | International Business Machines Corporation | Method for preventing air recirculation and oversupply in data centers |
JP5176837B2 (en) * | 2008-09-30 | 2013-04-03 | 富士通株式会社 | Information processing system, management method thereof, control program, and recording medium |
US8566798B2 (en) * | 2008-10-15 | 2013-10-22 | International Business Machines Corporation | Capturing context information in a currently occurring event |
US8171257B2 (en) * | 2009-09-25 | 2012-05-01 | International Business Machines Corporation | Determining an end of valid log in a log of write records using a next pointer and a far ahead pointer |
JP5445051B2 (en) * | 2009-11-13 | 2014-03-19 | 株式会社リコー | Information processing apparatus, API program, and log environment providing method |
US8504875B2 (en) * | 2009-12-28 | 2013-08-06 | International Business Machines Corporation | Debugging module to load error decoding logic from firmware and to execute logic in response to an error |
EP2610751A1 (en) * | 2010-08-24 | 2013-07-03 | Fujitsu Limited | System control device, method of controlling log and information processing device |
WO2012026035A1 (en) * | 2010-08-27 | 2012-03-01 | 富士通株式会社 | Fault processing method, fault processing system, fault processing device and fault processing program |
WO2012098434A1 (en) * | 2011-01-21 | 2012-07-26 | Freescale Semiconductor, Inc. | Method, system, and computer program product |
US9916192B2 (en) | 2012-01-12 | 2018-03-13 | International Business Machines Corporation | Thread based dynamic data collection |
US8954546B2 (en) | 2013-01-25 | 2015-02-10 | Concurix Corporation | Tracing with a workload distributor |
US9256969B2 (en) | 2013-02-01 | 2016-02-09 | Microsoft Technology Licensing, Llc | Transformation function insertion for dynamically displayed tracer data |
US9323863B2 (en) | 2013-02-01 | 2016-04-26 | Microsoft Technology Licensing, Llc | Highlighting of time series data on force directed graph |
US8924941B2 (en) | 2013-02-12 | 2014-12-30 | Concurix Corporation | Optimization analysis using similar frequencies |
US20130283102A1 (en) * | 2013-02-12 | 2013-10-24 | Concurix Corporation | Deployment of Profile Models with a Monitoring Agent |
US20130283281A1 (en) | 2013-02-12 | 2013-10-24 | Concurix Corporation | Deploying Trace Objectives using Cost Analyses |
US8843901B2 (en) | 2013-02-12 | 2014-09-23 | Concurix Corporation | Cost analysis for selecting trace objectives |
US8997063B2 (en) | 2013-02-12 | 2015-03-31 | Concurix Corporation | Periodicity optimization in an automated tracing system |
US9021447B2 (en) | 2013-02-12 | 2015-04-28 | Concurix Corporation | Application tracing by distributed objectives |
US9665474B2 (en) | 2013-03-15 | 2017-05-30 | Microsoft Technology Licensing, Llc | Relationships derived from trace data |
US9575874B2 (en) | 2013-04-20 | 2017-02-21 | Microsoft Technology Licensing, Llc | Error list and bug report analysis for configuring an application tracer |
US9734040B2 (en) | 2013-05-21 | 2017-08-15 | Microsoft Technology Licensing, Llc | Animated highlights in a graph representing an application |
US8990777B2 (en) | 2013-05-21 | 2015-03-24 | Concurix Corporation | Interactive graph for navigating and monitoring execution of application code |
US9280841B2 (en) | 2013-07-24 | 2016-03-08 | Microsoft Technology Licensing, Llc | Event chain visualization of performance data |
US9292415B2 (en) | 2013-09-04 | 2016-03-22 | Microsoft Technology Licensing, Llc | Module specific tracing in a shared module environment |
EP3069241B1 (en) | 2013-11-13 | 2018-08-15 | Microsoft Technology Licensing, LLC | Application execution path tracing with configurable origin definition |
US10346292B2 (en) | 2013-11-13 | 2019-07-09 | Microsoft Technology Licensing, Llc | Software component recommendation based on multiple trace runs |
US9329884B2 (en) * | 2014-07-11 | 2016-05-03 | Intel Corporation | Managing generated trace data for a virtual machine |
US9710321B2 (en) | 2015-06-23 | 2017-07-18 | Microsoft Technology Licensing, Llc | Atypical reboot data collection and analysis |
US9965375B2 (en) | 2016-06-28 | 2018-05-08 | Intel Corporation | Virtualizing precise event based sampling |
US10896253B2 (en) * | 2017-02-06 | 2021-01-19 | Huawei Technologies Co., Ltd. | Processor trace-based enforcement of control flow integrity of a computer system |
CN112306908B (en) * | 2020-11-19 | 2024-03-15 | 广州安凯微电子股份有限公司 | ICACHE instruction cache region abnormality positioning method, system, terminal equipment and medium of CPU |
WO2023287517A1 (en) * | 2021-07-13 | 2023-01-19 | SiFive, Inc. | Error management in system on a chip with securely partitioned memory space |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6493837B1 (en) * | 1999-07-16 | 2002-12-10 | Microsoft Corporation | Using log buffers to trace an event in a computer system |
US20030056155A1 (en) * | 2001-09-20 | 2003-03-20 | International Business Machines Corporation | Method and apparatus for filtering error logs in a logically partitioned data processing system |
US20030131039A1 (en) * | 2002-01-10 | 2003-07-10 | International Business Machines Corporation | System, method, and computer program product for preserving trace data after partition crash in logically partitioned systems |
US6711700B2 (en) * | 2001-04-23 | 2004-03-23 | International Business Machines Corporation | Method and apparatus to monitor the run state of a multi-partitioned computer system |
Family Cites Families (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4837767A (en) * | 1987-09-04 | 1989-06-06 | Digital Equipment Corporation | Bus adapter module with improved error recovery in a multibus computer system |
US5168554A (en) * | 1989-10-13 | 1992-12-01 | International Business Machines Corporation | Converting trace data from processors executing in parallel into graphical form |
JPH03217949A (en) * | 1990-01-23 | 1991-09-25 | Hitachi Ltd | Computer system |
DE69113181T2 (en) * | 1990-08-31 | 1996-05-02 | Ibm | Method and device for cross-division control in a distributed processing environment. |
JP3318121B2 (en) * | 1994-08-04 | 2002-08-26 | 富士通株式会社 | Virtual computer system |
US5642478A (en) * | 1994-12-29 | 1997-06-24 | International Business Machines Corporation | Distributed trace data acquisition system |
JP3196004B2 (en) * | 1995-03-23 | 2001-08-06 | 株式会社日立製作所 | Failure recovery processing method |
US5918047A (en) * | 1996-01-26 | 1999-06-29 | Texas Instruments Incorporated | Initializing a processing system |
US6256705B1 (en) * | 1998-06-05 | 2001-07-03 | International Business Machines Corporation | System and method for organizing data stored in a log structured array |
US6636991B1 (en) * | 1999-12-23 | 2003-10-21 | Intel Corporation | Flexible method for satisfying complex system error handling requirements via error promotion/demotion |
US6658591B1 (en) * | 2000-06-08 | 2003-12-02 | International Business Machines Corporation | Recovery from data fetch errors in hypervisor code |
US6915416B2 (en) * | 2000-12-28 | 2005-07-05 | Texas Instruments Incorporated | Apparatus and method for microcontroller debugging |
US6813731B2 (en) * | 2001-02-26 | 2004-11-02 | Emc Corporation | Methods and apparatus for accessing trace data |
US6834363B2 (en) * | 2001-03-22 | 2004-12-21 | International Business Machines Corporation | Method for prioritizing bus errors |
US6851074B2 (en) * | 2001-04-30 | 2005-02-01 | Hewlett-Packard Development Company | System and method for recovering from memory failures in computer systems |
US6883116B2 (en) * | 2001-09-27 | 2005-04-19 | International Business Machines Corporation | Method and apparatus for verifying hardware implementation of a processor architecture in a logically partitioned data processing system |
US6681309B2 (en) * | 2002-01-25 | 2004-01-20 | Hewlett-Packard Development Company, L.P. | Method and apparatus for measuring and optimizing spatial segmentation of electronic storage workloads |
US6886064B2 (en) * | 2002-03-28 | 2005-04-26 | International Business Machines Corporation | Computer system serialization control method involving unlocking global lock of one partition, after completion of machine check analysis regardless of state of other partition locks |
US7383428B2 (en) * | 2003-09-11 | 2008-06-03 | International Business Machines Corporation | Method, apparatus and computer program product for implementing atomic data tracing |
US7370240B2 (en) * | 2004-04-29 | 2008-05-06 | International Business Machines Corporation | Method and apparatus for preserving trace data in a logical partitioned data processing system |
US7496729B2 (en) * | 2004-05-13 | 2009-02-24 | International Business Machines Corporation | Method and apparatus to eliminate interpartition covert storage channel and partition analysis |
-
2004
- 2004-05-28 US US10/857,459 patent/US7343521B2/en not_active Expired - Fee Related
-
2005
- 2005-05-03 TW TW094114177A patent/TW200611115A/en unknown
- 2005-05-27 CN CNB2005100730734A patent/CN100356335C/en not_active Expired - Fee Related
- 2005-05-27 JP JP2005155947A patent/JP5579354B2/en not_active Expired - Fee Related
-
2008
- 2008-01-30 US US12/022,511 patent/US7930594B2/en not_active Expired - Fee Related
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6493837B1 (en) * | 1999-07-16 | 2002-12-10 | Microsoft Corporation | Using log buffers to trace an event in a computer system |
US6711700B2 (en) * | 2001-04-23 | 2004-03-23 | International Business Machines Corporation | Method and apparatus to monitor the run state of a multi-partitioned computer system |
US20030056155A1 (en) * | 2001-09-20 | 2003-03-20 | International Business Machines Corporation | Method and apparatus for filtering error logs in a logically partitioned data processing system |
US20030131039A1 (en) * | 2002-01-10 | 2003-07-10 | International Business Machines Corporation | System, method, and computer program product for preserving trace data after partition crash in logically partitioned systems |
Also Published As
Publication number | Publication date |
---|---|
CN1702625A (en) | 2005-11-30 |
TW200611115A (en) | 2006-04-01 |
US20080140985A1 (en) | 2008-06-12 |
JP2005339561A (en) | 2005-12-08 |
US20050278574A1 (en) | 2005-12-15 |
JP5579354B2 (en) | 2014-08-27 |
US7343521B2 (en) | 2008-03-11 |
US7930594B2 (en) | 2011-04-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN100356335C (en) | Method and apparatus to preserve trace data | |
JP3962393B2 (en) | Logically partitioned data processing system for reporting global errors | |
US20090307688A1 (en) | Virtual Cluster Proxy to Virtual I/O Server Manager Interface | |
JP3910554B2 (en) | Method, computer program, and data processing system for handling errors or events in a logical partition data processing system | |
US7055071B2 (en) | Method and apparatus for reporting error logs in a logical environment | |
CN102597962B (en) | Method and system for fault management in virtual computing environments | |
US7765431B2 (en) | Preservation of error data on a diskless platform | |
US6834340B2 (en) | Mechanism to safely perform system firmware update in logically partitioned (LPAR) machines | |
CN101169735A (en) | Method and system for providing policy-based operation system service in management procedure | |
CN100410912C (en) | System and method for transmitting information from a device drive program to the other | |
US8255639B2 (en) | Partition transparent correctable error handling in a logically partitioned computer system | |
CN1326041C (en) | Data processing system and method of monitoring data processing system | |
CN1329838C (en) | Method and apparatus to eliminate interpartition covert storage channel and partition analysis | |
US7673082B2 (en) | Method and system to determine device criticality for hot-plugging in computer configurations | |
US6934888B2 (en) | Method and apparatus for enhancing input/output error analysis in hardware sub-systems | |
US7370240B2 (en) | Method and apparatus for preserving trace data in a logical partitioned data processing system | |
US7302690B2 (en) | Method and apparatus for transparently sharing an exception vector between firmware and an operating system | |
US20050138479A1 (en) | Method and apparatus for device error log persistence in a logical partitioned data processing system | |
JPH05241886A (en) | Operating system build-up type debug support system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20071219 Termination date: 20200527 |
|
CF01 | Termination of patent right due to non-payment of annual fee |