CN108255777B - Embedded floating point type DSP hard core structure for FPGA - Google Patents

Embedded floating point type DSP hard core structure for FPGA Download PDF

Info

Publication number
CN108255777B
CN108255777B CN201810056827.2A CN201810056827A CN108255777B CN 108255777 B CN108255777 B CN 108255777B CN 201810056827 A CN201810056827 A CN 201810056827A CN 108255777 B CN108255777 B CN 108255777B
Authority
CN
China
Prior art keywords
unit
input
floating
dsp
adder
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810056827.2A
Other languages
Chinese (zh)
Other versions
CN108255777A (en
Inventor
赵赫
杨海钢
黄志洪
魏星
李小龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Electronics of CAS
Original Assignee
Institute of Electronics of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Electronics of CAS filed Critical Institute of Electronics of CAS
Priority to CN201810056827.2A priority Critical patent/CN108255777B/en
Publication of CN108255777A publication Critical patent/CN108255777A/en
Application granted granted Critical
Publication of CN108255777B publication Critical patent/CN108255777B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7828Architectures of general purpose stored program computers comprising a single central processing unit without memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7896Modular architectures, e.g. assembled from a number of identical packages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/483Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Mathematics (AREA)
  • Computing Systems (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Nonlinear Science (AREA)
  • Complex Calculations (AREA)
  • Advance Control (AREA)

Abstract

The present disclosure provides an embedded floating point type DSP hardmac structure for FPGA, comprising: the first input unit is composed of an input register group and a floating-point number multiplication special adder, and carries out input register or bypass selection on input data through a corresponding configuration bit; a multiplier unit connected to the first input unit, receiving input data of a previous stage passing through a register; a second input unit including a second input register set connected to the output terminal of the multiplier unit; the input end of the multi-path selector group unit is connected to the output end of the second input unit and the output end of the first input unit; the ALU unit comprises an adder and a logic operation unit, is used for addition and subtraction and multiplication operations of floating point numbers and fixed point numbers, and provides logic operation for the fixed point numbers; and an output unit. Because the processing and operation of the data are completed in the structure, the operation efficiency is obviously realized by using a soft core mode to realize floating point number operation.

Description

Embedded floating point type DSP hard core structure for FPGA
Technical Field
The present disclosure relates to the field of FPGAs, and in particular, to an embedded floating point DSP hardmac structure for an FPGA.
Background
The FPGA can be widely applied to the fields of communication, aerospace, military and the like by virtue of the advantages of self-programming, high parallelism, good flexibility and the like. Digital signal processing is an important application field, and at present, programmable digital signal processing modules are basically integrated in mainstream FPGA products in the industry. For example, Virtex-7 of Xilinx corporation contains 3600 DSP48E1 units and supports operations such as multiply-add/multiply-subtract/multiply-accumulate, Stratix-V of Altera corporation contains 532 DSP units, a single DSP IP core can be split according to application requirements to realize most functions with minimum resources and support operations such as multiply-add, multiply-subtract, multiply-accumulate, but does not support addition and accumulation. In the process of processing digital signals, an FPGA often needs to call a plurality of DSP modules to perform various mathematical operations on signals, but with the increasing of data volume, signals to be processed are also represented by original fixed-point numbers and are converted into floating-point numbers with a larger numerical range, for example, radar signals, navigation, and the like are represented by floating-point numbers.
In real life, floating point numbers have a wide application space, for example, radar signals are represented by floating point numbers, collected radar signals are sent to a computer for signal processing in the form of floating point numbers, and due to the application requirements, FPGA products of Altera and Xilinx companies provide relevant IP soft cores for floating point number operations. Taking Xilinx as an example, the company develops a related floating point number operation module IP soft core, performs IP calling through an IP Catalog function in Vivado, can support various floating point number operations, and provides operations such as exponent, logarithm, evolution and the like besides the basic operations of the floating point number, and the provided specific functions are shown in the following table 1. The method for generating the hard core circuit structure by adopting the hardware description language to carry out the algorithm modeling is feasible, is too complicated, and has long development period, so that most FPGA manufacturers adopt a traditional floating-point number operation implementation mode, namely an IP soft core mode, and realize related operations in a logic resource or DSP mode.
TABLE 1 Xilinx Floating-Point IP supported operations
Figure BDA0001552906450000021
The DSP module in the present FPGA product adopts a fixed-point DSP structure, and in EDA software Quartus II of Altera and EDA software Vivado of Xilinx, a logic control part in floating-point number operation is mapped to logic resources such as LUT table of FPGA, and operations such as multiplication and addition of floating-point number operation are mapped to a fixed-point multiplier and adder of DSP. Although this method is convenient, it occupies too much resources and the operation efficiency of the IP soft core is not high. Due to the application requirement of floating-point high-speed operation, Intel corporation embeds a floating-point hardmac DSP module in its latest products to improve the support of FPGAs for floating-point calculation, but no chip has been provided so far.
In order to solve the above problems, the present disclosure provides a floating-point DSP structure of a hard core, so as to improve the operation efficiency of floating-point numbers and reduce the use of logic resources in an FPGA.
BRIEF SUMMARY OF THE PRESENT DISCLOSURE
Technical problem to be solved
The present disclosure provides an embedded floating-point DSP hardcore architecture for an FPGA to at least partially solve the technical problems set forth above.
(II) technical scheme
According to one aspect of the present disclosure, there is provided an embedded floating point type DSP hardmac structure for an FPGA, comprising: the first input unit is composed of an input register group and a floating-point number multiplication special adder, and carries out input register or bypass selection on input data through a corresponding configuration bit; a multiplier unit connected to the first input unit, receiving input data of a previous stage passing through a register; a second input unit including a second input register set connected to the output terminal of the multiplier unit; the multi-path selector group unit consists of a plurality of selectors, and the input end of the multi-path selector group unit is connected to the output end of the second input unit and the output end of the first input unit; the ALU unit comprises an adder and a logic operation unit, wherein the adder is used for addition and subtraction and multiplication operations of floating point numbers and fixed point numbers, and simultaneously provides logic operation for the fixed point numbers; and an output unit for outputting the operation result.
In some embodiments of the present disclosure, the ALU unit further comprises an adjusting circuit, a rounding unit, an encoding module, a detection tree module, a preliminary shifting module, and a shift correction module, wherein the adjusting circuit comprises a leading zero detection circuit and a one-bit error adjusting circuit.
In some embodiments of the present disclosure, in the floating-point number multiplication operation, the pre-adder unit of the first input unit is configured to sum an exponent part of an input floating-point number, the multiplier unit is configured to multiply a mantissa part, and the ALU unit is configured to perform adjustment, normalization, and rounding operations on the floating-point number.
In some embodiments of the present disclosure, when performing floating point number addition and subtraction operation, in the ALU unit, two input floating point numbers are respectively sent to two paths, one path of signal performs addition and subtraction operation on the two floating point numbers by using the adder, the obtained result detects the number of 0 in the mantissa part result by using the leading zero detection unit, and performs a preliminary shift and exponent adjustment, and the other path of signal is encoded and then sent to the detection tree structure, and finally generates a signal indicating whether further adjustment is needed for the preliminary shift signal, and finally obtains the result of the floating point number addition and subtraction operation.
In some embodiments of the present disclosure, the output unit includes: an output register group which provides a register unit for an adder unit in the preceding stage ALU unit, registers the calculated result in the adder, and uses the result in an accumulation operation; and the mode detector is a configurable module, and a user is used for detecting whether the output result conforms to the mode by configuring the mode in the mode detector, so that the DSP outputs specific data required by the user.
In some embodiments of the present disclosure, the multiplier unit multiplies the operands by means of a booth encoding and compresses the number of partial products, and the tree adder combined with the multiplier unit further compresses the partial products and combines with the leading zero detection circuit in the ALU unit to further modify the obtained result.
In some embodiments of the present disclosure, the multiplier cells introduce the structure of the pipeline during design.
In some embodiments of the present disclosure, the second input unit is further connected to include: the multiplexer set selects the signal OPMODE, the carry signal CARRYIN, the data input end of the port C and the configuration signal ALUMODE of the ALU operation mode.
In some embodiments of the present disclosure, the input terminal of the multiplexer bank unit is further connected to a cascade signal PCIN including the DSP result, an CARRYINSEL signal for selecting a carry input source, and a feedback signal PCOUT for outputting, and the selectors in the multiplexer bank unit are selected by corresponding gating signals OPMODE, to switch different functions and/or to change a data source input to the adder of the next stage.
In some embodiments of the present disclosure, the first input unit and/or the output unit reserves a port used when the DSPs are cascaded.
(III) advantageous effects
According to the technical scheme, the embedded floating-point DSP hard core structure for the FPGA has at least one of the following beneficial effects:
(1) because the processing and the operation of the data are completed in the structure, compared with a circuit structure which utilizes logic resources in FPGA to map floating point number adjustment and the like, the operation is more efficient, and the operation efficiency is obviously superior to that of the xlix 7 series which realizes the floating point number operation by using a soft core mode;
(2) by using a special floating-point type hard core DSP structure to carry out floating-point number operation, the consumption of logic resources in the FPGA can be reduced.
(3) Compared with a soft core implementation mode of floating point number operation, the floating point DSP hard core structure has lower power consumption on the premise of the same floating point number operation.
Drawings
Fig. 1 is a schematic diagram of an embedded floating-point DSP hardmac structure for an FPGA according to an embodiment of the present disclosure.
FIG. 2 is a schematic diagram of an embedded floating-point DSP hardmac for FPGA according to an embodiment of the present disclosure to implement floating-point number multiplication.
Fig. 3 is a diagram of an implementation structure of a floating-point number multiplication DSP according to the embodiment of the present disclosure.
FIG. 4 is a diagram illustrating operations of complement operations of mantissa multiplication according to an embodiment of the present disclosure.
Fig. 5 is a schematic diagram of an embodiment of the present disclosure for implementing floating point number addition and subtraction operation by using an embedded floating point DSP hardcore of an FPGA.
FIG. 6 is a block diagram of an embodiment of a DSP architecture for floating-point number addition.
Detailed Description
The present disclosure provides an embedded floating point type DSP hardmac architecture for FPGA. For the purpose of promoting a better understanding of the objects, aspects and advantages of the present disclosure, reference is made to the following detailed description taken in conjunction with the accompanying drawings.
Certain embodiments of the present disclosure will now be described more fully hereinafter with reference to the accompanying drawings, in which some, but not all embodiments of the disclosure are shown. Indeed, various embodiments of the disclosure may be embodied in many different forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided so that this disclosure will satisfy applicable legal requirements.
In a first exemplary embodiment of the present disclosure, an embedded floating point type DSP hardmac architecture for an FPGA is provided. Fig. 1 is a schematic structural diagram of an embedded floating-point DSP hardmac structure for an FPGA according to a first embodiment of the present disclosure. As shown in fig. 1, the embedded floating-point DSP hardmac structure for FPGA of the present disclosure includes: the device comprises a first input unit, a multiplier unit, a second input unit, a multiplexer group unit, an ALU unit and an output unit.
The following describes each component of the embedded floating-point DSP hardmac structure for FPGA in detail.
The first input unit mainly comprises an input register group and a floating-point number multiplication special adder, and carries out input register or bypass selection on input data A, B through a corresponding configuration bit; the floating-point multiplication is carried out by adding the exponents of the floating-point numbers by the pre-adder, and the obtained result is sent to the next stage for further operation. A multistage assembly line is introduced into the unit to segment the key path, so that the key path is shortened, and the working frequency of the DSP is ensured. Meanwhile, a port used in DSP cascade connection is reserved, more operations can be completed in a cascade connection mode, and the flexibility of the DSP is improved.
The multiplier unit is connected to the first input unit and is used for receiving input data A and B of a previous stage passing through the register. In this embodiment, the multiplier unit is a two-input multiplier unit (a × B), which performs multiplication on operands in a booth coding manner, compresses the number of partial products, and further compresses the partial products in combination with a tree-shaped adder, thereby reducing area overhead and increasing computation speed. Preferably, the multiplier unit can introduce a pipeline structure in the design process to improve the working frequency.
The second input unit comprises a second input register group connected to the output end of the multiplier unit, and also comprises a multi-channel selector group selection signal OPMODE, a carry signal CARRYIN, a data input end of a port C and a configuration signal ALUMINDE of an ALU operation mode, and the input unit is used for carrying out input register on input and simultaneously dividing a key path from the multiplier to the ALU unit to improve the working frequency.
The input end of the multiplexer group unit is connected to the output end of the second input unit and the output end of the first input unit, and simultaneously comprises a cascade signal PCIN of a DSP result, an CARRYINSEL signal for selecting a carry input source and an output feedback signal PCOUT of the DSP module.
The main unit of the ALU unit is an adder for addition, subtraction and multiplication operations for floating point numbers and fixed point numbers, and also for logical operations for fixed point numbers. In order to support floating point number operation, the ALU unit improves the structure, besides the operation structure, a leading zero detection circuit (LZD), a one-bit error adjustment circuit and a rounding unit are added, so that the ALU unit normalizes the output result to generate the floating point number output meeting the standard while finishing the operation of the floating point number.
The output unit comprises an output register group and a mode detector, the output unit is introduced into a pipeline structure through the output register group, so that the working frequency of the DSP is improved, meanwhile, the register provides a register unit for a preceding-stage adder unit, and the calculated result can be registered in the adder, so that the result can be used for accumulation operation. Meanwhile, the output unit reserves a cascaded port, so that more operations can be completed through cascading DSPs. The mode detector is a module which can be configured by a user, and the user can detect whether the output result is consistent with the mode by configuring the mode in the mode detector, so that the DSP outputs the specific data required by the user.
In implementing floating-point multiplication, the operation is as shown in FIG. 2. In order to realize the calculation process, the structure adds a pre-adder unit in the first input unit, sums the exponent parts of the input floating point numbers A and B, a multiplier unit multiplies the mantissa part, and the adjustment, normalization and rounding operations of the floating point numbers are completed by an ALU unit. The DSP architecture implementation block diagram of the algorithm is shown in fig. 3, where the pre-adder is in the first input unit, the adder after the multiplier and the adjustment circuit are in the ALU unit. The adder as a special exponent adder performs the addition operation on the exponent part of the 32-bit data of the input data, namely, the exponent part of 24-bit to 31-bit data, and the obtained result is used as the exponent part of the floating-point number multiplication.
According to the IEEE-754 protocol, the mantissa part of any normal floating point number is 23 bits, and a hidden bit with a value of 1 is not shown in the floating point number before the 23 bits, but the hidden bit needs to be supplemented to participate in the operation when participating in the floating point number operation, and in order to allow for the multiplication operation of a fixed point number of 25 × 25 bits, 0 needs to be supplemented before the hidden bit to be used as the sign bit of the mantissa, so that the mantissa meets the bit width requirement, as shown in fig. 4. The operation of complementing 0 to the mantissa is an operation in which all floating-point numbers are regarded as positive numbers, but this is not the true sign bit of the result, and therefore, in order to obtain the true sign bit, it is necessary to extract the signs of the two input floating-point multipliers and perform a logical exclusive or operation alone to obtain the true sign bit of the result. Then, the mantissa bits are multiplied to obtain the multiplication result of the mantissa part.
After the values of the exponent portion and the mantissa portion are obtained, the exponent portion and the mantissa portion need to be adjusted to normalize the related floating point numbers. A leading zero detection circuit (LZD) is required to detect the number of 0's in the mantissa part result during normalization to shift the mantissa part. After the leading 0 number value is obtained, corresponding shift operation is carried out on the mantissa result, meanwhile, the exponent value is subtracted by the leading 0 number value, and then the final sign bit, the 8-bit exponent part and the 23-bit mantissa part are combined to form the final floating point number multiplication result.
The principle of the embedded floating-point DSP hardmac for FPGA to implement floating-point addition and subtraction is shown in fig. 5, so that the structure shown in fig. 6 is adopted to operate on floating-point numbers to obtain the result of floating-point addition and subtraction. Two floating point numbers are input into an ALU unit and are respectively sent into two paths, one path of signals are subjected to addition and subtraction operation on the two floating point numbers by using an adder, the obtained result is subjected to primary shift and exponent adjustment through an LZD unit, the other path of signals are coded and then sent into a detection tree structure, and finally signals are generated to indicate whether the signals subjected to the primary shift need to be further adjusted, and the result of the addition and subtraction operation of the floating point numbers is finally obtained. The above operations are all completed in the ALU unit.
When the actual operation is carried out, the addition of the same sign number does not need to pass through the coding and detection tree unit, because the result does not generate errors under the operation condition, but the addition of the different sign or the subtraction of the same sign needs to carry out the adjustment of the result. There is a 1bit error in the process of either the opposite sign addition or the same sign subtraction. After the two input data are coded, tree detection is carried out on the two input data, and therefore whether one-bit error adjustment is needed or not is finally determined.
The floating-point type hard core DSP structure provided by the disclosure has the functions of floating-point number addition, multiplication and accumulation, adopts a standard central core international 28nm CMOS process library, has the voltage of 0.945V and the temperature of 125 ℃, utilizes a DC tool to complete the circuit realization of the floating-point type hard core DSP, and obtains the whole area of 11091 mu m through layout and wiring2
Comparing xilinx 7 series FPGA with the same process node, the model is xc7v585tffg1157-3, under the condition of the same latency (the 6-level pipeline structure is adopted in the disclosure), calling a floating point number addition and subtraction unit, selecting speed-priority structure optimization, mapping the unit to 1 DSP48E1 and logic resources, respectively setting performance and resources as optimization targets to be compared with the structure of the disclosure, and calling a floating point number multiplication unit at the same time, wherein the comparison result is shown in the following table, and the comparison condition of the floating point number addition and multiplication is shown in the following table:
TABLE 2 comparison of the results
Figure BDA0001552906450000081
As can be seen from comparison, in the process of performing floating point number addition, the floating point number addition performance of the floating point type hard core DSP structure proposed by the present disclosure is 1.8 times that of the IP soft core under the performance priority condition; the floating-point number multiplication performance is 1.43 times that of the IP soft core under the performance priority condition. The data processing and operation of the floating-point hard core DSP structure provided by the disclosure are completed in the structure, and then the logic resource in FPGA is utilized to map the circuit structure such as floating point number adjustment and the like, which is not as efficient as a special circuit design. The floating-point number operation efficiency of the floating-point type hard core DSP structure provided by the disclosure is obviously superior to that of the xilinx 7 series which realizes floating-point number operation by using a soft core mode.
Certainly, the hardware structure should further include functional modules such as a power module (not shown), which can be understood by those skilled in the art, and those skilled in the art may also add corresponding functional modules according to the functional requirements, which are not described herein.
Thus, the introduction of the floating-point hardmac DSP structure is completed in the first embodiment of the present disclosure.
So far, the embodiments of the present disclosure have been described in detail with reference to the accompanying drawings. It is to be noted that, in the attached drawings or in the description, the implementation modes not shown or described are all the modes known by the ordinary skilled person in the field of technology, and are not described in detail. Further, the above definitions of the various elements and methods are not limited to the various specific structures, shapes or arrangements of parts mentioned in the examples, which may be easily modified or substituted by those of ordinary skill in the art.
Unless otherwise indicated, the numerical parameters set forth in the specification and attached claims are approximations that can vary depending upon the desired properties sought to be obtained by the present disclosure. In particular, all numbers expressing quantities of ingredients, reaction conditions, and so forth used in the specification and claims are to be understood as being modified in all instances by the term "about". Generally, the expression is meant to encompass variations of ± 10% in some embodiments, 5% in some embodiments, 1% in some embodiments, 0.5% in some embodiments by the specified amount.
Furthermore, the word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements.
The use of ordinal numbers such as "first," "second," "third," etc., in the specification and claims to modify a corresponding element does not by itself connote any ordinal number of the element or any ordering of one element from another or the order of manufacture, and the use of the ordinal numbers is only used to distinguish one element having a certain name from another element having a same name.
In addition, unless steps are specifically described or must occur in sequence, the order of the steps is not limited to that listed above and may be changed or rearranged as desired by the desired design. The embodiments described above may be mixed and matched with each other or with other embodiments based on design and reliability considerations, i.e., technical features in different embodiments may be freely combined to form further embodiments.
Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise. Also in the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware.
Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the disclosure, various features of the disclosure are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various disclosed aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that is, the claimed disclosure requires more features than are expressly recited in each claim. Rather, as the following claims reflect, disclosed aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this disclosure.
The above-mentioned embodiments are intended to illustrate the objects, aspects and advantages of the present disclosure in further detail, and it should be understood that the above-mentioned embodiments are only illustrative of the present disclosure and are not intended to limit the present disclosure, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present disclosure should be included in the scope of the present disclosure.

Claims (9)

1. An embedded floating point type DSP hardmac architecture for FPGA, comprising:
the first input unit is composed of an input register group and a floating-point number multiplication special adder, and carries out input register or bypass selection on input data through a corresponding configuration bit;
a multiplier unit connected to the first input unit, receiving input data of a previous stage passing through a register;
a second input unit including a second input register set connected to the output terminal of the multiplier unit;
the multi-path selector group unit consists of a plurality of selectors, and the input end of the multi-path selector group unit is connected to the output end of the second input unit and the output end of the first input unit;
the ALU unit comprises an adder and a logic operation unit, wherein the adder is used for addition and subtraction and multiplication operations of floating point numbers and fixed point numbers, and simultaneously provides logic operation for the fixed point numbers;
the ALU unit also comprises an adjusting circuit, a rounding unit, a coding module, a detection tree module, a preliminary shift module and a shift correction module, wherein the adjusting circuit comprises a leading zero detection circuit and a one-bit error adjusting circuit; and
and the output unit is used for outputting the operation result.
2. The embedded floating-point DSP hardcore structure of claim 1, wherein the pre-adder unit of the first input unit is configured to sum an exponent portion of an input floating-point number, the multiplier unit is configured to multiply a mantissa portion, and the ALU unit is configured to perform the adjustment, normalization, and rounding operations of the floating-point number when performing the floating-point number multiplication operation.
3. The embedded floating-point DSP hardcore structure of claim 1, wherein when performing floating-point addition and subtraction operation, in the ALU unit, two input floating-point numbers are sent into two paths, respectively, one path of signal is added and subtracted by the adder for the two floating-point numbers, the obtained result detects the number of 0 in the mantissa partial result by the leading zero detection unit, and performs a preliminary shift and exponent adjustment, the other path of signal is encoded and sent into the detection tree structure, and finally generates a signal indicating whether further adjustment is needed for the preliminary shift signal, and finally obtains the result of the floating-point addition and subtraction operation.
4. The embedded floating point DSP hardmac structure of claim 1, the output unit comprising:
an output register group which provides a register unit for an adder unit in the preceding stage ALU unit, registers the calculated result in the adder, and uses the result in an accumulation operation;
and the mode detector is a configurable module, and a user is used for detecting whether the output result conforms to the mode by configuring the mode in the mode detector, so that the DSP outputs specific data required by the user.
5. The embedded floating point type DSP hardmac architecture of claim 1,
the multiplier unit multiplies the operands in a booth coding mode, compresses the number of partial products, further compresses the partial products by combining with a tree adder of the multiplier unit, and further corrects the obtained result by combining with a leading zero detection circuit in the ALU unit.
6. The embedded floating point type DSP hardmac architecture of claim 1,
the multiplier unit introduces the structure of the pipeline during the design process.
7. The embedded floating point type DSP hardmac architecture of claim 1,
the second input unit is further connected to a circuit including: the multiplexer set selects the signal OPMODE, the carry signal CARRYIN, the data input end of the port C and the configuration signal ALUMODE of the ALU operation mode.
8. The embedded floating point type DSP hardmac architecture of claim 1,
the input terminals of the multiplexer bank unit are further connected to a cascade signal PCIN including the result of the DSP for selecting the CARRYINSEL signal from which the carry input comes and outputting a feedback signal PCOUT, and the selectors in the multiplexer bank unit are selected by a corresponding gate signal OPMODE to switch different functions and/or change the source of data input to the adder of the next stage.
9. The embedded floating point type DSP hardmac architecture of claim 1,
the first input unit and/or the output unit reserves a port used when the DSPs are cascaded.
CN201810056827.2A 2018-01-19 2018-01-19 Embedded floating point type DSP hard core structure for FPGA Active CN108255777B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810056827.2A CN108255777B (en) 2018-01-19 2018-01-19 Embedded floating point type DSP hard core structure for FPGA

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810056827.2A CN108255777B (en) 2018-01-19 2018-01-19 Embedded floating point type DSP hard core structure for FPGA

Publications (2)

Publication Number Publication Date
CN108255777A CN108255777A (en) 2018-07-06
CN108255777B true CN108255777B (en) 2021-08-06

Family

ID=62726925

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810056827.2A Active CN108255777B (en) 2018-01-19 2018-01-19 Embedded floating point type DSP hard core structure for FPGA

Country Status (1)

Country Link
CN (1) CN108255777B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110825439B (en) * 2018-08-10 2021-03-09 北京百度网讯科技有限公司 Information processing method and processor
US10790830B1 (en) 2019-05-20 2020-09-29 Achronix Semiconductor Corporation Fused memory and arithmetic circuit
US11256476B2 (en) 2019-08-08 2022-02-22 Achronix Semiconductor Corporation Multiple mode arithmetic circuit
CN113746471B (en) * 2021-09-10 2024-05-07 中科寒武纪科技股份有限公司 Arithmetic circuit, chip and board card
CN114296682A (en) * 2021-12-31 2022-04-08 上海阵量智能科技有限公司 Floating point number processing device, floating point number processing method, electronic equipment, storage medium and chip
CN117891430B (en) * 2024-03-18 2024-05-14 中科亿海微电子科技(苏州)有限公司 Floating point multiplication and addition structure applied to FPGA embedded DSP
CN117931123B (en) * 2024-03-25 2024-06-14 中科亿海微电子科技(苏州)有限公司 Low-power-consumption variable-precision embedded DSP hard core structure applied to FPGA

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1818851A (en) * 2005-02-09 2006-08-16 国际商业机器公司 System and method for carrying out a floating point arithmetic operation
CN1831753A (en) * 2005-03-08 2006-09-13 中国科学院计算技术研究所 Floating-point multiplicator and method of compatible double-prepcision and double-single precision computing
CN101706712A (en) * 2009-11-27 2010-05-12 北京龙芯中科技术服务中心有限公司 Operation device and method for multiplying and adding floating point vector
CN101847087A (en) * 2010-04-28 2010-09-29 中国科学院自动化研究所 Reconfigurable transverse summing network structure for supporting fixed and floating points
CN102819520A (en) * 2011-05-09 2012-12-12 阿尔特拉公司 DSP block with embedded floating point structures
CN104246690A (en) * 2012-04-20 2014-12-24 华为技术有限公司 System and method for signal processing in digital signal processors

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10042606B2 (en) * 2016-05-03 2018-08-07 Altera Corporation Fixed-point and floating-point arithmetic operator circuits in specialized processing blocks

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1818851A (en) * 2005-02-09 2006-08-16 国际商业机器公司 System and method for carrying out a floating point arithmetic operation
CN1831753A (en) * 2005-03-08 2006-09-13 中国科学院计算技术研究所 Floating-point multiplicator and method of compatible double-prepcision and double-single precision computing
CN101706712A (en) * 2009-11-27 2010-05-12 北京龙芯中科技术服务中心有限公司 Operation device and method for multiplying and adding floating point vector
CN101847087A (en) * 2010-04-28 2010-09-29 中国科学院自动化研究所 Reconfigurable transverse summing network structure for supporting fixed and floating points
CN102819520A (en) * 2011-05-09 2012-12-12 阿尔特拉公司 DSP block with embedded floating point structures
CN104246690A (en) * 2012-04-20 2014-12-24 华为技术有限公司 System and method for signal processing in digital signal processors

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
一种支持高效加法的FPGA嵌入式DSP IP设计;王楠,黄志洪,杨海钢,丁健1;《太赫兹科学与电子信息学报》;20170930;867-873 *

Also Published As

Publication number Publication date
CN108255777A (en) 2018-07-06

Similar Documents

Publication Publication Date Title
CN108255777B (en) Embedded floating point type DSP hard core structure for FPGA
CN110221808B (en) Vector multiply-add operation preprocessing method, multiplier-adder and computer readable medium
US9098332B1 (en) Specialized processing block with fixed- and floating-point structures
US20190042191A1 (en) Reduced floating-point precision arithmetic circuitry
JPH02196328A (en) Floating point computing apparatus
CN110688086A (en) Reconfigurable integer-floating point adder
CN116400883A (en) Floating point multiply-add device capable of switching precision
KR102481418B1 (en) Method and apparatus for fused multiply-add
Lyu et al. PWL-based architecture for the logarithmic computation of floating-point numbers
CN109298848B (en) Dual-mode floating-point division square root circuit
CN114860193A (en) Hardware operation circuit for calculating Power function and data processing method
Burud et al. Design and Implementation of FPGA Based 32 Bit Floating Point Processor for DSP Application
Shirke et al. Implementation of IEEE 754 compliant single precision floating-point adder unit supporting denormal inputs on Xilinx FPGA
CN116627379A (en) Reconfigurable method and system for supporting multi-precision floating point or fixed point operation
Zhang et al. Low-Cost Multiple-Precision Multiplication Unit Design For Deep Learning
Chong et al. Flexible multi-mode embedded floating-point unit for field programmable gate arrays
Kuang et al. A multi-functional multi-precision 4D dot product unit with SIMD architecture
CN110506255A (en) Energy-saving variable power adder and its application method
US7047271B2 (en) DSP execution unit for efficient alternate modes for processing multiple data sizes
Bokade et al. CLA based 32-bit signed pipelined multiplier
Li et al. A Reconfigurable Processing Element for Multiple-Precision Floating/Fixed-Point HPC
Saini et al. Efficient Implementation of Pipelined Double Precision Floating Point Multiplier
US10127013B1 (en) Specialized processing blocks with fixed-point and floating-point structures
Kumar et al. Simulation And Synthesis Of 32-Bit Multiplier Using Configurable Devices
Saini et al. Efficient implementation of pipelined double precision floating point unit on FPGA

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant