US20130080709A1 - System and Method for Performing Memory Operations In A Computing System - Google Patents
System and Method for Performing Memory Operations In A Computing System Download PDFInfo
- Publication number
- US20130080709A1 US20130080709A1 US13/683,367 US201213683367A US2013080709A1 US 20130080709 A1 US20130080709 A1 US 20130080709A1 US 201213683367 A US201213683367 A US 201213683367A US 2013080709 A1 US2013080709 A1 US 2013080709A1
- Authority
- US
- United States
- Prior art keywords
- state
- cache line
- cache
- transaction
- processor
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
- 238000000034 method Methods 0.000 title claims description 21
- 230000004044 response Effects 0.000 claims abstract description 6
- 238000001693 membrane extraction with a sorbent interface Methods 0.000 claims description 3
- 239000000470 constituent Substances 0.000 claims 3
- 230000007704 transition Effects 0.000 abstract description 35
- 230000008901 benefit Effects 0.000 description 11
- 238000010586 diagram Methods 0.000 description 8
- 206010000210 abortion Diseases 0.000 description 4
- 238000005457 optimization Methods 0.000 description 4
- 230000008569 process Effects 0.000 description 3
- 230000000717 retained effect Effects 0.000 description 3
- 230000006399 behavior Effects 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000000977 initiatory effect Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000002085 persistent effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0806—Multiuser, multiprocessor or multiprocessing cache systems
- G06F12/0815—Cache consistency protocols
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0806—Multiuser, multiprocessor or multiprocessing cache systems
- G06F12/0815—Cache consistency protocols
- G06F12/0831—Cache consistency protocols using a bus scheme, e.g. with bus monitoring or watching means
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/3004—Arrangements for executing specific machine instructions to perform operations on memory
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30076—Arrangements for executing specific machine instructions to perform miscellaneous control operations, e.g. NOP
- G06F9/30087—Synchronisation or serialisation instructions
Definitions
- the present invention relates in general to computer system processing and more particularly to a system and method for performing memory operations in a computing system.
- a write invalidate scheme allows for a processor to modify the data in its associated cache at a particular time and force the other processors to invalidate that data in their respective caches.
- the modifying processor is then forced to write the modified data back to the main memory.
- a system for performing memory operations in a computing system that includes a processor that operates in one of a plurality of operating states.
- a Normal operating state the processor is not involved with a memory transaction.
- the processor Upon execution of a transaction instruction to access a memory location, the processor transitions to a Transaction operating state.
- the processor performs changes to a cache line in a cache memory associated with the memory location to include changing from a MESI coherency protocol to one of a plurality of transactional coherency states associated with the Transaction operating state.
- any changes to the data and the cache line are not visible to other processors in the computing system. These changes become visible upon the processor entering a Commit operating state in response to receipt of a commit instruction.
- the processor After changes become visible and the cache line is returned to the MESI coherency protocol, the processor returns to the Normal operating state. If an abort event occurs prior to receipt of the commit instruction, the processor transitions to an Abort operating state where any changes to the data and cache line are discarded. Upon discarding the changes, the processor transitions to a Suspended state and awaits receipt of a commit instruction before transitioning to the Normal operating state.
- the present invention provides various technical advantages over conventional coherency protocols. For example, one technical advantage is to treat memory access and operations as transactions. Another technical advantage is to provide a transaction record in the processor to track the state of the processor during memory transactions. Yet another technical advantage is to integrate an extended cache coherency protocol with the transaction record of the processor. Embodiments of the present invention may include all, some, or none of these technical advantages while other technical advantages may be readily apparent to those skilled in the art from the following figures, description, and claims.
- FIG. 1 illustrates a state diagram for a processor in a computing system
- FIG. 2 illustrates the implementation of a transaction record maintained by the processor
- FIG. 3 illustrates the cache coherency state transitions due to instruction execution.
- FIG. 1 shows a state diagram of the transition states that are entered into by a processor during operation.
- the transition states include Normal, Transaction, Commit, Abort, and Suspended.
- the Normal state indicates that there is no active transaction to process.
- the Transaction state indicates that a transaction is in progress.
- the Commit state indicates that a transaction has successfully completed but the transaction is in the process of being cleaned.
- the Abort state indicates that a transaction has been aborted but the transaction is still in the process of being cleaned.
- the Suspended state indicates that a transaction has been aborted and cleaned but the processor has not executed a Commit or Abort instruction.
- the processor provides support for tracking access to memory locations involved in a transaction and state information for recording the processor's transaction state.
- each processor maintains a Transaction Record as well as a mechanism (such as a pointer to a free list) to obtain memory locations for storage of additional transaction state information.
- the primary data cache state field is expanded to include the states of Invalid (I), Shared (S), Exclusive (E), Dirty (D), Shared Transactional (ST), Exclusive Transactional (ET), and Dirty Transactional (DT).
- Each cache tag also includes two added bits, TV and TVE, to indicate that transaction data formerly resided in that line and has been evicted. The TV bit indicates that data was evicted from the ST state. The TVE bit indicates that data was evicted from the ET or DT state. These bits are persistent through changes to the tag but are cleared when the transaction state is cleaned up during the Abort or Commit states.
- FIG. 2 shows the implementation of a Transaction Record maintained by the processor.
- the Transaction Record is a set of hardware registers in the processor storing the following fields: TState ⁇ Normal, Transaction, Commit, Abort, Suspended ⁇ , WBPtr ⁇ pointer to WBRrecord), and EvictPtr ⁇ pointer to evicted shared addresses ⁇ . Other information may be included to support additional functionality.
- the processor When in the Normal state, the processor begins a transaction with the execution of any Transactional Memory Reference instruction (see following section for description of these instructions). This causes transition 1 in the state diagram of FIG. 1 and causes the processor to set the Transaction Record to the Transaction state. As long as the processor remains in the Normal state, it is not involved in a transaction and its actions obey the conventional coherency protocols.
- the processor's behavior changes as it is now engaged in a transaction and, from that point until a successful Commit state, the processor will do nothing which will cause the state of memory visible to other processors in the system to change.
- the processor's cache is used to hold changes which it makes, and any data which is evicted from the primary data cache is copied into an eviction list instead of being sent back to its normal memory location.
- all changes to memory performed during the transaction are made globally visible. If, instead, the transaction aborts, the locations in the cache containing changes made during the transaction and the evicted writebacks are discarded, restoring the state of memory (as viewed by all processors) to what it was at the beginning of the transaction.
- any transactional load instruction to a new address adds that address to the transaction's Read Set and any transactional load exclusive or transactional store instruction adds that address to the transaction's Write Set.
- Any attempt by another processor to write to an address in the Read Set, or to read or write from an address in the Write Set, will cause the current transaction to abort (transition 3 in the state diagram of FIG. 1 ).
- An abort will also be caused by any exception during the transaction or by the execution of an Abort instruction. Certain simple exceptions may be permitted, especially Transaction Lookaside Buffer (TLB) misses (if these are still handled in software) to occur without causing an abort.
- TLB Transaction Lookaside Buffer
- An ABORT instruction may be added at the beginning of the exception handlers instead of doing the abort in hardware.
- Invalidate and Update requests are processed normally, except that if the primary cache line it targets has a TV or TVE bit set, the coherency address is also checked against all addresses in the Evicted or Writeback list, respectively. If both bits are set, both lists will be checked. If the coherency address matches any address in one of these lists, or if it hits a line in the ST, ET, or DT states, the transaction aborts (see below for details of the abort operation).
- Intervention requests that match the tag of a line in the DT state will be processed as if the line were in the ET state—the processor responds with a message indicating that the contents of memory should be used. If the TVE bit for the line is set, the Intervention address is also checked against the Writeback list. If the Intervention address matches a tag or a list address, the transaction aborts.
- the only other way to exit the Transaction state is the execution of a Commit instruction, which causes the transaction state machine to go to the Commit state (transition 2 in the state diagram of FIG. 1 ).
- the processor Upon execution of a COMMIT instruction while in Transaction state, the processor enters the Commit state. In this state, all changes to memory performed during the committed transaction are made visible to the rest of the system. To accomplish this, the following actions are performed:
- the processor Upon execution of an abort instruction, the processor enters the Abort state. In this state, all changes to memory performed during the aborted transaction are discarded, restoring the state of the contents of the Write Set to its state prior to the start of the transaction. To accomplish this, the following actions are performed:
- the processor transitions to the Suspended state (transition 5 in the state diagram of FIG. 1 ) until a Commit instruction is executed (Commit instructions will stall if dispatched while in the Abort state and execute as soon as the transition to the Suspend state occurs).
- the processor enters the Suspended state as soon as it completes the cleanup of the aborted transaction in the Abort state. While in the Suspended state, the processor executes as in the Normal state except that all transactional memory reference instructions are treated as NOPs. Upon executing a Commit instruction, the processor transitions to the Normal state, making it ready to begin another transaction.
- TEST T (R)—Sets register R to a non-zero Reason Code (reason codes to be defined) if the processor is currently in the Abort or Suspended states; sets R to zero otherwise. This instruction is used to test to see whether the current transaction has been aborted to allow skipping the execution of useless instructions.
- ABORT Aborts the current transaction—If the processor is in the Transaction state, sets the Transaction State to the Abort state thereby initiating the actions described above. If the current transaction has already aborted, or the processor is in any state other than the Transaction state, this instruction acts as a NOP.
- Transactional Memory Reference instructions For the following group of Transactional Memory Reference instructions, if the processor's state is Normal, executing these sets the processor state to Transaction. These instructions may be in single and double word, integer, and floating point forms.
- LT Load Transactional
- This instruction acts exactly like an ordinary Load instruction, except that it sets the cache state to the ST state instead of the S state. If the cache is already in the S or E state, it transitions to ST; if already in the D state it performs an ordinary Writeback with Data Retained and transitions to ST. If the cache is already in any *T state, the state remains unchanged.
- LTX Load Transactional Exclusive
- This instruction acts exactly like an ordinary Load instruction, except that it issues a read exclusive request to the directory and sets the cache state to the ET state instead of the S state. If the cache is already in the S, ST, or E states, it sends an Upgrade request to the directory and transitions to ET; if already in the D state it performs an ordinary Writeback with Data Retained and transitions to the ET state. If the cache is already in ET or DT state, the state remains unchanged. This instruction may replace a LL instruction.
- STX Store Transactional—Performs a Store and adds the referenced memory location to the Write Set of the current transaction.
- This instruction acts exactly like an ordinary Store instruction, except that it sets the cache state to the DT state instead of the D state. If the cache is already in the S, ST, or E states, it sends an Upgrade request to the directory and transitions to the DT state; if already in the D state it performs an ordinary Writeback with data retained and transitions to the DT state; if already in the ET state, the cache transitions to the DT state. If the cache is already in the DT state, the state remains unchanged.
- FIG. 3 shows the cache state transitions due to instruction execution. The following shows the system behavior for the various cache states under the extended coherency model needed to support the functions described above.
- Invalid (I) Cache line is not in use and contains no valid data.
- the directory may be in any state.
- the directory will be in the S state and its sharing vector will point at this node.
- the collection of all cache lines in the ST state plus all of the cache lines in the Eviction List constitutes the Read Set of a transaction.
- the directory will be in the S state and its sharing vector will point at this node.
- Exclusive (E)—Cache line contains a copy of data that is the same as the contents of memory. No other cache in the system contains a copy of this data and the processor may write to this line without performing any coherency transactions.
- the directory will be in the E state and its pointer will point at this node.
- the directory will be in the E state and its pointer will point at this node.
- an eviction of the line from the processor's cache will cause the evicted address to be added to the Writeback List and the TVE bit for that cache tag to be set.
- the directory will be in the E state and its pointer will point at this node.
- the directory will be in the E state and its pointer will point at this node.
- an eviction of the line from the processor's cache will cause the evicted address and data to be added to the Writeback List and the TVE bit for that cache tag to be set.
- the state of the processor during memory transactions is maintained in a transaction record of the processor.
- the coherency protocol for the cache lines is extended to include additional states. By providing support for memory transactions along with an expanded cache state implementation, an improved cache coherency protocol is achieved.
- the processing discussed above may be incorporated entirely in computer software code, on a computer readable medium, or be incorporated into a combine software/hardware implementation.
- the cache coherency protocol does not need to be changed. Moreover, the directory structures are unchanged on the memory modules. Another important advantage is that the footprint of a transaction is not limited by the size of the cache within a processor module. A sequence of instructions can be treated as a single transaction that is either atomically executed with respect to other sequences of instructions or is not executed. The number of distinct memory locations referenced by an instruction sequence as a single transaction, in a system having a processor module with a processor and a cache, is not limited by the size of the cache.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Memory System Of A Hierarchy Structure (AREA)
Abstract
A processor may operate in one of a plurality of operating states. In a Normal operating state, the processor is not involved with a memory transaction. Upon receipt of a transaction instruction to access a memory location, the processor transitions to a Transaction operating state. In the Transaction operating state, the processor performs changes to a cache line and data associated with the memory location. While in the Transaction operating state, any changes to the data and the cache line are not visible to other processors in the computing system. These changes become visible upon the processor entering a Commit operating state in response to receipt of a commit instruction. After changes become visible, the processor returns to the Normal operating state. If an abort event occurs prior to receipt of the commit instruction, the processor transitions to an Abort operating state where any changes to the data and cache line are discarded.
Description
- This application is a continuation application of U.S. application Ser. No. 13/084,280, filed Apr. 11, 2011, which is a continuation application of U.S. application Ser. No. 12/168,689 filed Jul. 7, 2008, now U.S. Pat. No. 7,925,839, which is a continuation of U.S. application Ser. No. 10/836,932 filed Apr. 30, 2004, now U.S. Pat. No. 7,398,359 which claims the benefit of U.S. Provisional Application No. 60/467,019 filed Apr. 30, 2003, all of which are hereby incorporated by reference herein.
- The present invention relates in general to computer system processing and more particularly to a system and method for performing memory operations in a computing system.
- In computer systems, there is a disparity between processor cycle time and memory access time. Since this disparity limits processor utilization, caches have been introduced to solve this problem. Caches, which are based on the principal of locality, provide a small amount of extremely fast memory directly connected to a processor to avoid the delay in accessing the main memory and reduce the bandwidth needed to the main memory. Even though caches significantly improve system performance, a coherency problem occurs as a result of the main memory being updated with new data while the cache contains old data. For shared multi-processor systems, a cache is almost a necessity since access latency to memory is further increased due to contention for the path to the memory. It is not possible for the operating system to ensure coherency since processors need to share data to run parallel programs and processors cannot share a cache due to bandwidth constraints.
- Various algorithms and protocols have been developed to handle cache coherency. For example, in a directory based caching structure, a write invalidate scheme allows for a processor to modify the data in its associated cache at a particular time and force the other processors to invalidate that data in their respective caches. When a processor reads the data previously modified by another processor, the modifying processor is then forced to write the modified data back to the main memory. Though such a scheme handles cache coherency in theory, limitations in system performance are still apparent.
- From the foregoing, it may be appreciated by those skilled in the art that a need has arisen for an extended coherency protocol and an ability to track access to memory locations involved in a transaction and processor state information. In accordance with the present invention, there is provided a system and method for performing memory operations in a computing system that substantially eliminates or greatly reduces disadvantages and problems associated with conventional coherency protocols.
- According to an embodiment of the present invention, there is provided a system for performing memory operations in a computing system that includes a processor that operates in one of a plurality of operating states. In a Normal operating state, the processor is not involved with a memory transaction. Upon execution of a transaction instruction to access a memory location, the processor transitions to a Transaction operating state. In the Transaction operating state, the processor performs changes to a cache line in a cache memory associated with the memory location to include changing from a MESI coherency protocol to one of a plurality of transactional coherency states associated with the Transaction operating state. While in the Transaction operating state, any changes to the data and the cache line are not visible to other processors in the computing system. These changes become visible upon the processor entering a Commit operating state in response to receipt of a commit instruction.
- After changes become visible and the cache line is returned to the MESI coherency protocol, the processor returns to the Normal operating state. If an abort event occurs prior to receipt of the commit instruction, the processor transitions to an Abort operating state where any changes to the data and cache line are discarded. Upon discarding the changes, the processor transitions to a Suspended state and awaits receipt of a commit instruction before transitioning to the Normal operating state.
- The present invention provides various technical advantages over conventional coherency protocols. For example, one technical advantage is to treat memory access and operations as transactions. Another technical advantage is to provide a transaction record in the processor to track the state of the processor during memory transactions. Yet another technical advantage is to integrate an extended cache coherency protocol with the transaction record of the processor. Embodiments of the present invention may include all, some, or none of these technical advantages while other technical advantages may be readily apparent to those skilled in the art from the following figures, description, and claims.
- For a more complete understanding of the present invention and the advantages thereof, reference is now made to the following description taken in conjunction with the accompanying drawings, wherein like reference numerals represent like parts, in which:
-
FIG. 1 illustrates a state diagram for a processor in a computing system; -
FIG. 2 illustrates the implementation of a transaction record maintained by the processor; -
FIG. 3 illustrates the cache coherency state transitions due to instruction execution. -
FIG. 1 shows a state diagram of the transition states that are entered into by a processor during operation. The transition states include Normal, Transaction, Commit, Abort, and Suspended. The Normal state indicates that there is no active transaction to process. The Transaction state indicates that a transaction is in progress. The Commit state indicates that a transaction has successfully completed but the transaction is in the process of being cleaned. The Abort state indicates that a transaction has been aborted but the transaction is still in the process of being cleaned. The Suspended state indicates that a transaction has been aborted and cleaned but the processor has not executed a Commit or Abort instruction. - In order to support transactions, the processor provides support for tracking access to memory locations involved in a transaction and state information for recording the processor's transaction state. To track transaction states, each processor maintains a Transaction Record as well as a mechanism (such as a pointer to a free list) to obtain memory locations for storage of additional transaction state information. In addition, the primary data cache state field is expanded to include the states of Invalid (I), Shared (S), Exclusive (E), Dirty (D), Shared Transactional (ST), Exclusive Transactional (ET), and Dirty Transactional (DT). Each cache tag also includes two added bits, TV and TVE, to indicate that transaction data formerly resided in that line and has been evicted. The TV bit indicates that data was evicted from the ST state. The TVE bit indicates that data was evicted from the ET or DT state. These bits are persistent through changes to the tag but are cleared when the transaction state is cleaned up during the Abort or Commit states.
-
FIG. 2 shows the implementation of a Transaction Record maintained by the processor. The Transaction Record is a set of hardware registers in the processor storing the following fields: TState {Normal, Transaction, Commit, Abort, Suspended}, WBPtr {pointer to WBRrecord), and EvictPtr {pointer to evicted shared addresses}. Other information may be included to support additional functionality. When in the Normal state, the processor begins a transaction with the execution of any Transactional Memory Reference instruction (see following section for description of these instructions). This causestransition 1 in the state diagram ofFIG. 1 and causes the processor to set the Transaction Record to the Transaction state. As long as the processor remains in the Normal state, it is not involved in a transaction and its actions obey the conventional coherency protocols. - Upon entering the Transaction state, the processor's behavior changes as it is now engaged in a transaction and, from that point until a successful Commit state, the processor will do nothing which will cause the state of memory visible to other processors in the system to change. The processor's cache is used to hold changes which it makes, and any data which is evicted from the primary data cache is copied into an eviction list instead of being sent back to its normal memory location. Upon executing a Commit state, all changes to memory performed during the transaction are made globally visible. If, instead, the transaction aborts, the locations in the cache containing changes made during the transaction and the evicted writebacks are discarded, restoring the state of memory (as viewed by all processors) to what it was at the beginning of the transaction.
- While in the Transaction state, any transactional load instruction to a new address adds that address to the transaction's Read Set and any transactional load exclusive or transactional store instruction adds that address to the transaction's Write Set. Any attempt by another processor to write to an address in the Read Set, or to read or write from an address in the Write Set, will cause the current transaction to abort (
transition 3 in the state diagram ofFIG. 1 ). An abort will also be caused by any exception during the transaction or by the execution of an Abort instruction. Certain simple exceptions may be permitted, especially Transaction Lookaside Buffer (TLB) misses (if these are still handled in software) to occur without causing an abort. An ABORT instruction may be added at the beginning of the exception handlers instead of doing the abort in hardware. - While in Transaction state, the processor's response to incoming coherency (Invalidate, Update, and Intervention Requests) messages is modified as follows: Invalidate and Update requests are processed normally, except that if the primary cache line it targets has a TV or TVE bit set, the coherency address is also checked against all addresses in the Evicted or Writeback list, respectively. If both bits are set, both lists will be checked. If the coherency address matches any address in one of these lists, or if it hits a line in the ST, ET, or DT states, the transaction aborts (see below for details of the abort operation). Intervention requests that match the tag of a line in the DT state will be processed as if the line were in the ET state—the processor responds with a message indicating that the contents of memory should be used. If the TVE bit for the line is set, the Intervention address is also checked against the Writeback list. If the Intervention address matches a tag or a list address, the transaction aborts.
- Other than an abort, the only other way to exit the Transaction state is the execution of a Commit instruction, which causes the transaction state machine to go to the Commit state (
transition 2 in the state diagram ofFIG. 1 ). Upon execution of a COMMIT instruction while in Transaction state, the processor enters the Commit state. In this state, all changes to memory performed during the committed transaction are made visible to the rest of the system. To accomplish this, the following actions are performed: -
- The Evicted Address list is discarded and the tokens in the list are attached to the end of the free list. The Evict Pointer is set to null.
- All writebacks stored in the Writeback list are converted to WEack messages and written to their home node. All tokens in the Writeback list are attached to the end of the free list. The Writeback Pointer is set to null. The L2 cache is invalidated at the address of the writeback if that address is currently stored in the L2 cache.
- All TV and TVE bits in the primary cache are set to zero.
- All cache lines in the ST state transition to the S state. All cache lines in the ET state transition to the E state. All cache lines in the DT state transition to the D state. Upon completion of the above actions, the processor transitions to the Normal state (transition 4 in the state diagram of
FIG. 1 ).
- While in the Commit state, incoming Intervention, Invalidate, and Update requests are held until the processor exits this state. It may be feasible to handle these requests in this state as a performance optimization by taking the actions needed to produce the same result as would occur after the Commit state is complete. Any transactional memory reference instruction that is issued stalls until the processor exits the Commit state. Commit and Abort instructions are treated as no operation instructions (NOPs) if executed when the processor is not in the Transaction state. In some implementations, these instructions trap if an attempt is made to execute them when already in the Commit state.
- When in the Transaction state, the following situations will cause a transition to the Abort state (
transition 3 in the state diagram ofFIG. 1 ), aborting the current transaction: -
- Execution of an Abort instruction.
- The processor takes an exception.
- An Invalidate or Update Request is received whose address matches any cache line that is part of the Read Set.
- An Intervention is received whose address matches any cacheline that is part of the Write Set.
- Upon execution of an abort instruction, the processor enters the Abort state. In this state, all changes to memory performed during the aborted transaction are discarded, restoring the state of the contents of the Write Set to its state prior to the start of the transaction. To accomplish this, the following actions are performed:
-
- Eliminate messages may be sent to the directory for all addresses in the Evicted Address list (this is a performance optimization which is optional). The Evicted Address list is discarded and the tokens in the list are attached to the end of the free list. The Evict Pointer is set to null.
- Eliminate messages may be sent to the directory for all addresses in the Writeback list (this is a performance optimization which is optional). All writebacks stored in the Writeback list are discarded. All tokens in the Writeback list are attached to the end of the free list. The Writeback Pointer is set to null. The L2 cache is invalidated at the address of the writeback if that address is currently stored in the L2 cache.
- All TV and TVE bits in the primary cache are set to zero.
- All cache lines in the ST state transition to the S state. All cache lines in the ET state transition to the E state. All cache lines in The DT state transition to the I state. Eliminate messages may be sent to the directory for all cache lines transitioned to the I state.
- Upon completion of the above actions, the processor transitions to the Suspended state (
transition 5 in the state diagram ofFIG. 1 ) until a Commit instruction is executed (Commit instructions will stall if dispatched while in the Abort state and execute as soon as the transition to the Suspend state occurs). - While in the Abort state, incoming Intervention, Invalidate, and Update requests are held until the processor exits this state. It may be feasible to handle these requests in this state as a performance optimization by taking the actions needed to produce the same result as would occur after the abort instruction is complete. Any transactional memory reference instruction that is issued stalls until the processor exits the Abort state.
- The processor enters the Suspended state as soon as it completes the cleanup of the aborted transaction in the Abort state. While in the Suspended state, the processor executes as in the Normal state except that all transactional memory reference instructions are treated as NOPs. Upon executing a Commit instruction, the processor transitions to the Normal state, making it ready to begin another transaction.
- The following new processor instructions are added:
- TEST T (R)—Sets register R to a non-zero Reason Code (reason codes to be defined) if the processor is currently in the Abort or Suspended states; sets R to zero otherwise. This instruction is used to test to see whether the current transaction has been aborted to allow skipping the execution of useless instructions.
- ABORT—Aborts the current transaction—If the processor is in the Transaction state, sets the Transaction State to the Abort state thereby initiating the actions described above. If the current transaction has already aborted, or the processor is in any state other than the Transaction state, this instruction acts as a NOP.
- COMMIT (R)—Attempts to commit the current transaction—If the processor is in the Transaction state, sets the Transaction state to the Commit state, performing the commit of the current transaction, as described above. If the current transaction has already aborted (the processor being in the Suspended state), the COMMIT instruction causes a transition to the Normal state. If the current state is the Abort state, the COMMIT instruction stalls until transaction cleanup completes and the processor transitions to the Normal state. Register R is set to a non-zero Reason Code (reason codes to be defined) if the processor is currently in the Abort or Suspended states; R is set to zero otherwise. If executed while in the Normal or Commit states, a COMMIT instruction acts as a NOP or may cause an exception.
- For the following group of Transactional Memory Reference instructions, if the processor's state is Normal, executing these sets the processor state to Transaction. These instructions may be in single and double word, integer, and floating point forms.
- LT (Load Transactional)—Performs a Load for read access only and adds the referenced memory location to the Read Set of the current transaction. This instruction acts exactly like an ordinary Load instruction, except that it sets the cache state to the ST state instead of the S state. If the cache is already in the S or E state, it transitions to ST; if already in the D state it performs an ordinary Writeback with Data Retained and transitions to ST. If the cache is already in any *T state, the state remains unchanged.
- LTX (Load Transactional Exclusive)—Performs a Load for write access and adds the referenced memory location to the Write Set of the current transaction. This instruction acts exactly like an ordinary Load instruction, except that it issues a read exclusive request to the directory and sets the cache state to the ET state instead of the S state. If the cache is already in the S, ST, or E states, it sends an Upgrade request to the directory and transitions to ET; if already in the D state it performs an ordinary Writeback with Data Retained and transitions to the ET state. If the cache is already in ET or DT state, the state remains unchanged. This instruction may replace a LL instruction.
- STX (Store Transactional)—Performs a Store and adds the referenced memory location to the Write Set of the current transaction. This instruction acts exactly like an ordinary Store instruction, except that it sets the cache state to the DT state instead of the D state. If the cache is already in the S, ST, or E states, it sends an Upgrade request to the directory and transitions to the DT state; if already in the D state it performs an ordinary Writeback with data retained and transitions to the DT state; if already in the ET state, the cache transitions to the DT state. If the cache is already in the DT state, the state remains unchanged.
-
FIG. 3 shows the cache state transitions due to instruction execution. The following shows the system behavior for the various cache states under the extended coherency model needed to support the functions described above. - Invalid (I)—Cache line is not in use and contains no valid data. The directory may be in any state.
- Shared (S)—Cache line contains a copy of data which is the same as the contents of memory and the contents of other caches also in S or ST states. The directory will be in the S state and its sharing vector will point at this node.
- Shared Transactional (ST)—Cache line contains a copy of data that is the same as the contents of memory (and the same as the contents of other caches also in the S or ST states). The collection of all cache lines in the ST state plus all of the cache lines in the Eviction List constitutes the Read Set of a transaction. The directory will be in the S state and its sharing vector will point at this node. When a cache line is in the ST state and the processor is in the Transaction state, an eviction of the line from the processor's cache will cause the evicted address to be added to the Eviction List and the TV bit for that cache tag to be set.
- Exclusive (E)—Cache line contains a copy of data that is the same as the contents of memory. No other cache in the system contains a copy of this data and the processor may write to this line without performing any coherency transactions. The directory will be in the E state and its pointer will point at this node.
- Exclusive Transactional (ET)—Cache line contains a copy of data that is the same as the contents of memory. No other cache in the system contains a copy of this data and the processor may write to this line without performing any coherency transactions. The directory will be in the E state and its pointer will point at this node. When a cache line is in the ET state and the processor is in the Transaction state, an eviction of the line from the processor's cache will cause the evicted address to be added to the Writeback List and the TVE bit for that cache tag to be set.
- Dirty (D)—Cache line contains modified data that is different from the contents of memory. No other cache in the system contains a copy of this data and the processor may write to this line without performing any coherency transactions. The directory will be in the E state and its pointer will point at this node.
- Dirty Transactional (DT)—Cache line contains modified data that is different from the contents of memory. The directory will be in the E state and its pointer will point at this node. When a cache line is in DT state and the processor is in the Transaction state, an eviction of the line from the processor's cache will cause the evicted address and data to be added to the Writeback List and the TVE bit for that cache tag to be set.
- In summary, the state of the processor during memory transactions is maintained in a transaction record of the processor. The coherency protocol for the cache lines is extended to include additional states. By providing support for memory transactions along with an expanded cache state implementation, an improved cache coherency protocol is achieved. The processing discussed above may be incorporated entirely in computer software code, on a computer readable medium, or be incorporated into a combine software/hardware implementation.
- One of the advantages provided by the present invention is that the cache coherency protocol does not need to be changed. Moreover, the directory structures are unchanged on the memory modules. Another important advantage is that the footprint of a transaction is not limited by the size of the cache within a processor module. A sequence of instructions can be treated as a single transaction that is either atomically executed with respect to other sequences of instructions or is not executed. The number of distinct memory locations referenced by an instruction sequence as a single transaction, in a system having a processor module with a processor and a cache, is not limited by the size of the cache.
- Thus, it is apparent that there has been provided, in accordance with the present invention, a system and method for performing memory operations in a computing system that satisfies the advantages set forth above. Although the present invention has been described in detail, it should be understood that various changes, substitutions, and alterations may be readily ascertainable by those skilled in the art and may be made herein without departing from the spirit and scope of the present invention as defined by the following claims. Moreover, the present invention is not intended to be limited in any way by any statement made herein that is not otherwise reflected in the appended claims.
Claims (14)
1. A method of performing memory operations in a computing system, comprising:
transitioning a cache line associated with a memory location from a conventional coherency protocol to one of a plurality of extended coherency protocol states associated with an operating state of a processor;
performing an update to the cache line associated with the memory location in accordance with the operating state of the processor, the update to the cache line not being visible to other processors in the computing system; and
tracking access to a memory location by identifying the cache line with the extended coherency protocol state according to the update performed.
2. The method of claim 1 , wherein the conventional coherency protocol includes a MESI coherency protocol.
3. The method of claim 1 , wherein the plurality of extended coherency protocol states is associated with a Transaction operating state of the processor.
4. The method of claim 1 , wherein the plurality of extended coherency protocol states includes a Shared Transactional state characterized by the cache line having a copy of data that is the same as the corresponding contents of the memory and one or more other cache lines also in a Shared Transactional state.
5. The method of claim 4 , when the cache line is in the Shared Transactional state and in response to an eviction of an address from the cache line, further comprising:
adding the evicted address to an Eviction List; and
setting one of two cache tag constituent elements.
6. The method of claim 1 , wherein the plurality of extended coherency protocol states include an Exclusive state characterized by the cache line having an exclusive copy of data that is the same as the corresponding contents of the memory, such that no other cache has a copy of said data.
7. The method of claim 6 , when the cache line is in the Exclusive state, further comprising writing to the cache line without performing a coherency transaction.
8. The method of claim 1 , wherein the plurality of extended coherency protocol states include an Exclusive Transactional state characterized by the cache line having an exclusive copy of data that is the same as the corresponding contents of the memory, such that no other cache has a copy of said data.
9. The method of claim 8 , when the cache line is in the Exclusive Transactional state and in response to an eviction of an address from the cache line, further comprising:
adding the evicted address to a Writeback List; and
setting one of two cache tag constituent elements.
10. The method of claim 8 , further comprising writing to the cache line without performing a coherency transaction when the cache line is in the Exclusive Transactional state.
11. The method of claim 1 , wherein the plurality of extended coherency protocol states include a Dirty state characterized by the cache line having modified data that is different from the corresponding contents of the memory, and wherein no other cache has a copy of the modified data.
12. The method of claim 11 , further comprising writing to the cache line without performing a coherency transaction when the cache line is in the Exclusive Transactional state.
13. The method of claim 1 , wherein the plurality of extended coherency protocol states include a Dirty Transactional state characterized by the cache line having modified data that is different from the corresponding contents of the memory.
14. The method of claim 13 , when the cache line is in the Dirty Transactional state and in response to an eviction of an address from the cache line, further comprising:
adding the evicted address and data to a Writeback List; and
setting one of two cache tag constituent elements.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/683,367 US20130080709A1 (en) | 2003-04-30 | 2012-11-21 | System and Method for Performing Memory Operations In A Computing System |
Applications Claiming Priority (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US46701903P | 2003-04-30 | 2003-04-30 | |
US10/836,932 US7398359B1 (en) | 2003-04-30 | 2004-04-30 | System and method for performing memory operations in a computing system |
US12/168,689 US7925839B1 (en) | 2003-04-30 | 2008-07-07 | System and method for performing memory operations in a computing system |
US13/084,280 US8321634B2 (en) | 2003-04-30 | 2011-04-11 | System and method for performing memory operations in a computing system |
US13/683,367 US20130080709A1 (en) | 2003-04-30 | 2012-11-21 | System and Method for Performing Memory Operations In A Computing System |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/084,280 Continuation US8321634B2 (en) | 2003-04-30 | 2011-04-11 | System and method for performing memory operations in a computing system |
Publications (1)
Publication Number | Publication Date |
---|---|
US20130080709A1 true US20130080709A1 (en) | 2013-03-28 |
Family
ID=39589685
Family Applications (4)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US10/836,932 Active 2025-05-08 US7398359B1 (en) | 2003-04-30 | 2004-04-30 | System and method for performing memory operations in a computing system |
US12/168,689 Expired - Lifetime US7925839B1 (en) | 2003-04-30 | 2008-07-07 | System and method for performing memory operations in a computing system |
US13/084,280 Expired - Lifetime US8321634B2 (en) | 2003-04-30 | 2011-04-11 | System and method for performing memory operations in a computing system |
US13/683,367 Abandoned US20130080709A1 (en) | 2003-04-30 | 2012-11-21 | System and Method for Performing Memory Operations In A Computing System |
Family Applications Before (3)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US10/836,932 Active 2025-05-08 US7398359B1 (en) | 2003-04-30 | 2004-04-30 | System and method for performing memory operations in a computing system |
US12/168,689 Expired - Lifetime US7925839B1 (en) | 2003-04-30 | 2008-07-07 | System and method for performing memory operations in a computing system |
US13/084,280 Expired - Lifetime US8321634B2 (en) | 2003-04-30 | 2011-04-11 | System and method for performing memory operations in a computing system |
Country Status (1)
Country | Link |
---|---|
US (4) | US7398359B1 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150378939A1 (en) * | 2014-06-27 | 2015-12-31 | Analog Devices, Inc. | Memory mechanism for providing semaphore functionality in multi-master processing environment |
WO2017052764A1 (en) * | 2015-09-25 | 2017-03-30 | Intel Corporation | Memory controller for multi-level system memory having sectored cache |
Families Citing this family (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2010052799A1 (en) * | 2008-11-10 | 2010-05-14 | 富士通株式会社 | Information processing device and memory control device |
WO2013085518A1 (en) * | 2011-12-08 | 2013-06-13 | Intel Corporation | A method, apparatus, and system for efficiently handling multiple virtual address mappings during transactional execution |
US10437602B2 (en) | 2012-06-15 | 2019-10-08 | International Business Machines Corporation | Program interruption filtering in transactional execution |
US9436477B2 (en) | 2012-06-15 | 2016-09-06 | International Business Machines Corporation | Transaction abort instruction |
US9317460B2 (en) | 2012-06-15 | 2016-04-19 | International Business Machines Corporation | Program event recording within a transactional environment |
US8682877B2 (en) | 2012-06-15 | 2014-03-25 | International Business Machines Corporation | Constrained transaction execution |
US9367323B2 (en) | 2012-06-15 | 2016-06-14 | International Business Machines Corporation | Processor assist facility |
US9740549B2 (en) | 2012-06-15 | 2017-08-22 | International Business Machines Corporation | Facilitating transaction completion subsequent to repeated aborts of the transaction |
US8880959B2 (en) * | 2012-06-15 | 2014-11-04 | International Business Machines Corporation | Transaction diagnostic block |
US8966324B2 (en) | 2012-06-15 | 2015-02-24 | International Business Machines Corporation | Transactional execution branch indications |
US9442737B2 (en) | 2012-06-15 | 2016-09-13 | International Business Machines Corporation | Restricting processing within a processor to facilitate transaction completion |
US9772854B2 (en) | 2012-06-15 | 2017-09-26 | International Business Machines Corporation | Selectively controlling instruction execution in transactional processing |
US20130339680A1 (en) * | 2012-06-15 | 2013-12-19 | International Business Machines Corporation | Nontransactional store instruction |
US9448796B2 (en) | 2012-06-15 | 2016-09-20 | International Business Machines Corporation | Restricted instructions in transactional execution |
US9384004B2 (en) | 2012-06-15 | 2016-07-05 | International Business Machines Corporation | Randomized testing within transactional execution |
US8688661B2 (en) | 2012-06-15 | 2014-04-01 | International Business Machines Corporation | Transactional processing |
US9361115B2 (en) | 2012-06-15 | 2016-06-07 | International Business Machines Corporation | Saving/restoring selected registers in transactional processing |
US9336046B2 (en) | 2012-06-15 | 2016-05-10 | International Business Machines Corporation | Transaction abort processing |
US9348642B2 (en) | 2012-06-15 | 2016-05-24 | International Business Machines Corporation | Transaction begin/end instructions |
US20210318961A1 (en) * | 2021-06-23 | 2021-10-14 | Intel Corporation | Mitigating pooled memory cache miss latency with cache miss faults and transaction aborts |
US20240303206A1 (en) * | 2023-03-07 | 2024-09-12 | International Business Machines Corporation | Using a transient cache list and prolonged cache list to manage tracks in cache based on a demotion hint with a track access request |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030093624A1 (en) * | 2001-10-16 | 2003-05-15 | International Business Machines Corp. | Symmetric multiprocessor systems with an independent super-coherent cache directory |
US20040117554A1 (en) * | 2002-12-17 | 2004-06-17 | International Business Machines Corporation | Adaptive shared data interventions in coupled broadcast engines |
US6948035B2 (en) * | 2002-05-15 | 2005-09-20 | Broadcom Corporation | Data pend mechanism |
US7502917B2 (en) * | 2002-12-05 | 2009-03-10 | International Business Machines Corporation | High speed memory cloning facility via a lockless multiprocessor mechanism |
Family Cites Families (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4412303A (en) | 1979-11-26 | 1983-10-25 | Burroughs Corporation | Array processor architecture |
US4403286A (en) | 1981-03-06 | 1983-09-06 | International Business Machines Corporation | Balancing data-processing work loads |
JPH09138716A (en) * | 1995-11-14 | 1997-05-27 | Toshiba Corp | Electronic computer |
US5892962A (en) | 1996-11-12 | 1999-04-06 | Lucent Technologies Inc. | FPGA-based processor |
US6098156A (en) * | 1997-07-22 | 2000-08-01 | International Business Machines Corporation | Method and system for rapid line ownership transfer for multiprocessor updates |
US6256753B1 (en) | 1998-06-30 | 2001-07-03 | Sun Microsystems, Inc. | Bus error handling in a computer system |
US6526481B1 (en) | 1998-12-17 | 2003-02-25 | Massachusetts Institute Of Technology | Adaptive cache coherence protocols |
US6823516B1 (en) * | 1999-08-10 | 2004-11-23 | Intel Corporation | System and method for dynamically adjusting to CPU performance changes |
US6640289B2 (en) * | 2000-11-15 | 2003-10-28 | Unisys Corporation | Software controlled cache line ownership affinity enhancements in a multiprocessor environment |
JP4306152B2 (en) * | 2001-06-26 | 2009-07-29 | 株式会社日立製作所 | Web system with clustered application server and database structure |
US6899276B2 (en) | 2002-02-15 | 2005-05-31 | Axalto Sa | Wrapped-card assembly and method of manufacturing the same |
US6912612B2 (en) * | 2002-02-25 | 2005-06-28 | Intel Corporation | Shared bypass bus structure |
US6877056B2 (en) * | 2002-06-28 | 2005-04-05 | Sun Microsystems, Inc. | System with arbitration scheme supporting virtual address networks and having split ownership and access right coherence mechanism |
-
2004
- 2004-04-30 US US10/836,932 patent/US7398359B1/en active Active
-
2008
- 2008-07-07 US US12/168,689 patent/US7925839B1/en not_active Expired - Lifetime
-
2011
- 2011-04-11 US US13/084,280 patent/US8321634B2/en not_active Expired - Lifetime
-
2012
- 2012-11-21 US US13/683,367 patent/US20130080709A1/en not_active Abandoned
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030093624A1 (en) * | 2001-10-16 | 2003-05-15 | International Business Machines Corp. | Symmetric multiprocessor systems with an independent super-coherent cache directory |
US6948035B2 (en) * | 2002-05-15 | 2005-09-20 | Broadcom Corporation | Data pend mechanism |
US7502917B2 (en) * | 2002-12-05 | 2009-03-10 | International Business Machines Corporation | High speed memory cloning facility via a lockless multiprocessor mechanism |
US20040117554A1 (en) * | 2002-12-17 | 2004-06-17 | International Business Machines Corporation | Adaptive shared data interventions in coupled broadcast engines |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150378939A1 (en) * | 2014-06-27 | 2015-12-31 | Analog Devices, Inc. | Memory mechanism for providing semaphore functionality in multi-master processing environment |
WO2017052764A1 (en) * | 2015-09-25 | 2017-03-30 | Intel Corporation | Memory controller for multi-level system memory having sectored cache |
Also Published As
Publication number | Publication date |
---|---|
US7398359B1 (en) | 2008-07-08 |
US7925839B1 (en) | 2011-04-12 |
US8321634B2 (en) | 2012-11-27 |
US20110191545A1 (en) | 2011-08-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8321634B2 (en) | System and method for performing memory operations in a computing system | |
US8688951B2 (en) | Operating system virtual memory management for hardware transactional memory | |
US9740616B2 (en) | Multi-granular cache management in multi-processor computing environments | |
US8706973B2 (en) | Unbounded transactional memory system and method | |
US7644238B2 (en) | Timestamp based transactional memory | |
CN101770397B (en) | Extending cache coherency protocols are supporting equipment, processor, the system and method for locally buffered data | |
US9298626B2 (en) | Managing high-conflict cache lines in transactional memory computing environments | |
US9086974B2 (en) | Centralized management of high-contention cache lines in multi-processor computing environments | |
US5715428A (en) | Apparatus for maintaining multilevel cache hierarchy coherency in a multiprocessor computer system | |
US9329890B2 (en) | Managing high-coherence-miss cache lines in multi-processor computing environments | |
US8924653B2 (en) | Transactional cache memory system | |
US9298623B2 (en) | Identifying high-conflict cache lines in transactional memory computing environments | |
US5265233A (en) | Method and apparatus for providing total and partial store ordering for a memory in multi-processor system | |
US6272602B1 (en) | Multiprocessing system employing pending tags to maintain cache coherence | |
US20070143550A1 (en) | Per-set relaxation of cache inclusion | |
US20100332768A1 (en) | Flexible read- and write-monitored and buffered memory blocks | |
US20020138698A1 (en) | System and method for caching directory information in a shared memory multiprocessor system | |
US8898395B1 (en) | Memory management for cache consistency | |
US6418514B1 (en) | Removal of posted operations from cache operations queue | |
CN113900968B (en) | Method and device for realizing synchronous operation of multi-copy non-atomic write storage sequence | |
US5875468A (en) | Method to pipeline write misses in shared cache multiprocessor systems | |
US6345340B1 (en) | Cache coherency protocol with ambiguous state for posted operations | |
JP2001043133A (en) | Method and system for maintaining cache coherency for write-through-store operation in multiprocessor system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |
|
AS | Assignment |
Owner name: HEWLETT PACKARD ENTERPRISE DEVELOPMENT LP, TEXAS Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:SILICON GRAPHICS INTERNATIONAL CORP.;REEL/FRAME:044128/0149 Effective date: 20170501 |