-
Notifications
You must be signed in to change notification settings - Fork 146
/
xed-doc-top.txt
1614 lines (1274 loc) · 67.1 KB
/
xed-doc-top.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
#BEGIN_LEGAL
#
#Copyright (c) 2024 Intel Corporation
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http:https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
#END_LEGAL
// This file does not contain any code
// it just contains additional information for
// inclusion with doxygen
// ===========================================================================
/*!
@mainpage Intel® X86 Encoder Decoder User Guide
April 2024
@section INTRO Introduction
Intel® XED is an acronym for X86 Encoder Decoder. The
latter part is pronounced like the (British) English "z".
Intel® X86 Encoder Decoder (Intel® XED) is a software library
(and associated headers) written in C for encoding and decoding X86
(IA-32 instruction set and Intel® 64 instruction set) instructions.
The decoder takes sequences of 1-15 bytes along with machine mode information
and produces a data structure describing the opcode and operands, and flags.
The generic encoder takes a similar data structure and produces a sequence
of 1 to 15 bytes.
There is another encoder called "enc2" available that is much faster than
the generic encoder mentioned above. Rather than using a generic
interface, in enc2, instruction encoding is done by calling one of a
very large number of functions, passing as arguments the registers and
constants that would be used in the assembly language description of the
instruction. There are two interfaces to the enc2 encoder:
unchecked and checked. The unchecked version is faster and assumes
that the arguments passed in are in the correct ranges. The checked
version validates that the arguments passed in are in the correct
ranges and if that succeeds, it calls the corresponding unchecked
version of the function. The checking can be skipped if desired using
a runtime setting. The enc2 encoder is available in builds with
the "--enc2" option. Due to the large amount of code generated, that
build takes longer.
Intel® XED is multi-thread safe.
Intel® XED was designed to be very fast and extensible.
Intel® XED compiles with the following compilers:
<ul>
<li> GNU GCC
<li> Microsoft Visual Studio
<li> LLVM/Clang
</ul>
Intel® XED works with the following operating systems:
<ul>
<li> Linux
<li> Microsoft Windows (with and without cygwin)
<li> FreeBSD
</ul>
The Intel® XED examples (@ref EXAMPLES) also include binary image readers for
Windows PECOFF, ELF, and Mac OS X* MACHO binary file formats for 32b and
64b. These allow Intel® XED to be used as a simple (not symbolic)
disassembler. The Intel® XED disassembler supports 3 output formats: Intel,
ATT SYSV, and a more detailed internal format describing all resources
read and written.
@section TOC Table of Contents
- @ref BUILD "Building" Building your program with Intel® XED
- @ref EXTERN "External" External Requirements
- @ref TERMS "Terms" Terminology
- @ref OVERVIEW "Overview" Overview of the Intel® XED approach
- @ref API_REF "API reference" Detailed descriptions of the API
- @ref EXAMPLES "Examples" Examples
- @ref LEGAL "Disclaimer and Legal Information"
@section BUILD Building your program using Intel® XED.
This section describes the requirements for compiling with Intel® XED and
linking the libxed.a library. It assumes you are building from an
Intel® XED kit and not directly from the sources. (See the "install"
option in the Intel® XED build manual for information on making kits).
The structure of a Intel® XED kit is as follows:
@code
|-bin------
|-doc------|-html-
|-examples-
|-xed-kit-name-|-include--
|-lib------
|-misc-----
@endcode
To use Intel® XED your sources should include the top-most header file: xed-interface.h.
Your compilation statement must include:
@code
-Ixed-kit-name/include
@endcode
where "xed-kit-name" is where you've unpacked the Intel® XED kit.
Your Linux or Mac OS X* link statement must reference the libxed library:
@code
-Lxed-kit-name/lib -lxed
@endcode
(or link against xed.lib for Windows).
Intel® XED uses base types with the following names: xed_uint8_t,
xed_uint16_t, xed_uint32_t, xed_uint64_t xed_int8_t, xed_int16_t,
xed_int32_t, and xed_int64_t. Intel® XED also defines a "xed_uint_t" type
that is shorthand for "unsigned int".
Please see the section @ref INIT for more information about using
Intel® XED, and also the examples in @ref EXAMPLES.
@section EXTERN External Requirements
Intel® XED was designed to have minimal external requirements. Intel® XED makes no
system calls. Intel® XED allocates no memory. (The examples are
different). The following external functions/symbols are required for
linking a program with libxed, with one caveat: The functions fprint
and abort and the data object stderr are optional. If users register
their own abort handler using #xed_register_abort_function () , then
fprintf, stderr, and abort are not required and can be stubbed out to
satisfy the linker.
Required:
<ul>
<li>memcmp
<li>memcpy
<li>memset
<li>strcmp
<li>strlen
<li>strncat
</ul>
Optional:
<ul>
<li>abort
<li>fprintf
<li>stderr
</ul>
@section TERMS Terminology
X86 instructions are 1-15 byte values. They consist of several
well-defined components:
<ul>
<li> Prefix bytes.
<ul>
<li> Legacy prefix bytes used for many purposes (described further below).
<li> REX prefix byte but only in 64b mode. It has 4 1-bit
fields: W, R, X, and B. The W bit modifies the operation
width. The R, X and B fields extend the register
encodings. The REX byte must be right before the opcode
bytes else it is ignored.
<li> REX2 prefix, a 2-byte variant of the REX prefix, introduced with Intel® APX extensions (see @ref APX),
adds 16 Extended General Purpose Registers (EGPRs) across the legacy instruction set.
It has eight 1-bit fields: M0, R4, X4, B4, W, R3, X3 and B3.
R3, X3, B3 and W bits are the same as R, X and B bits in the REX prefix.
While R4, X4, and B4 are additional bits used to encode the 32 EGPR registers.
M0 bit selects between legacy maps 0 and 1 (1-byte opcodes no escape and 2-byte opcodes escape 0x0F respectively).
<li> VEX prefix byte sequence. The VEX prefix is used
mostly for AVX1 and AVX2 instructions as well as BMI1/2
instructions and mask operations in Intel® AVX512. The VEX prefix
comes in two forms. The 2-byte sequence begins with an
0xC5 byte. The 3-byte sequence begins with an 0xC4 byte.
<li> EVEX prefix. The EVEX 4-byte sequence used for
encoding Intel® AVX512 instructions and begins with an 0x62 byte. Intel® APX provides
an extended version of the prefix, where the semantics of several payload bits are redefined.
The extension is essentially used to provide Intel® APX features for legacy instructions that cannot be provided
by other prefixes, such as support for the new data destination (see @ref APX) or status flags update suppression
"no flags" which are represented by the ND and NF bits respectively in the third payload byte.
Note that the byte following the extended EVEX prefix is always interpreted as the main opcode byte.
</ul>
There are somewhat complex rules about which prefixes are
allowed, in what order, and in what modes. Intel® XED handles that
complexity.
<li> 1-3 opcode bytes. When more than one opcode byte is required
the leading bytes (called escapes) are either 0x0F, 0x0F 0x38, or
0x0F 0x3A. With VEX and EVEX prefixes, the escape bytes are
encoded differently.
<li> MODRM byte. Used for addressing memory, refining opcodes,
and specifying registers. Optional, but common. It has three fields: the
2-bit "mod", the 3-bit "reg" and 3-bit "r/m" fields.
<li> SIB byte. Used for specifying memory addressing, optional.
It has three fields: the 2-bit scale, 3-bit index, and 3-bit base.
<li> Displacement bytes. Used for specifying memory offsets, optional.
<li> Immediate bytes. Optional
</ul>
Immediates and displacements are usually limited to 4 bytes, but several
variants of the MOV instruction can take 8B values.
The AMD 3DNow ISA extension uses the immediate field to
provide additional opcode information.
The legacy prefix bytes are used for:
<ul>
<li> operand size overrides (1 prefix),
<li> address size overrides (1 prefix),
<li> atomic locking (1 prefix),
<li> default segment overrides (6 prefixes),
<li> repeating certain instructions (2 prefixes), and
<li> opcode refinement.
</ul>
There are 11 distinct legacy prefixes. Three of them (operand size
and the two repeat prefixes) have different meanings in different
contexts. Sometimes they are used for opcode refinement and do not
have their default meaning. Less frequently, two of the segment
overrides can be used for conditional branch hints.
There are also multiple ways to encode certain instructions, with the
same or differing length.
For additional information on the instruction semantics and encodings:
<ul>
<li> <a href="http:https://www.intel.com/sdm">http:https://www.intel.com/sdm</a> The Intel® 64 and IA-32 Architectures Software Developers Manuals
<li> <a href="http:https://www.intel.com/software/isa">http:https://www.intel.com/software/isa</a> Information on future ISA extensions.
</ul>
@subsection APX Intel® APX
Intel® Advanced Performance Extensions (Intel® APX) expands the Intel® 64 instruction set architecture with
access to more registers and adds various new features that improve general-purpose performance. The
extensions are designed to provide efficient performance gains across a variety of workloads without
significantly increasing the silicon area or power consumption of the core.
The main features of Intel® APX include:
<ul>
<li> Extended GPRs, also known as EGPRs (see @ref APX_OPERANDS)
<li> Three-operand instructions with a new data destination (NDD); legacy integer instructions can now use EVEX to encode a dedicated
destination register operand – turning them into three-operand instructions and reducing the need for extra register move instructions.
The NDD receives the result of the computation, and all other operands (including the original destination operand) become read-only source operands
<li> Legacy-promoted instructions that support status flag update suppression "no flags" (NF); an option for the compiler to suppress the status flags writes
of common instructions (no CSPAZO flags, such as Parity, Overflow...)
<li> Conditional ISA improvements: New conditional load, store and, compare instructions
<li> Optimized register state save/restore operations
<li> A new 64-bit absolute direct jump instruction
<li> Zero Upper (ZU) support for several APX-Promoted instructions, which zero the upper bits of a destination GPR. The destination GPR will get the
instruction’s result in bits [OSIZE-1:0] and, if OSIZE < 64b, have its upper bits [63:OSIZE] zeroed
</ul>
Intel® APX instructions' definition by Intel® XED;
Legacy:
- Instructions with REX2 prefix are not defined with new iforms or new ISA-SETs
- The APXLEGACY extension group includes new APX-F instructions
EVEX:
- Existing (non-APX) EVEX instructions with EGPRs are not defined with new iforms or new ISA-SETs
- Promoted and new instructions are defined with new iforms using the '_APX' suffix
- Promoted new data destination instructions with the 'APX_NDD' attribute
- Promoted no flags instructions with the 'APX_NF' attribute
- The APXEVEX extension group includes new and promoted APX-F instructions
@subsection AVX10 Intel® AVX10
Intel® Advanced Vector Extensions 10 (Intel® AVX10) establishes a common, converged vector instruction set across all Intel® architectures, incorporating the modern
vectorization aspects of Intel® AVX-512.
The Intel® AVX10 architecture introduces several new features and capabilities;
<ul>
<li> Introduces a version-based instruction set enumeration
<li> Allows a converged implementation supported on all Intel® CPUs to include all the existing Intel® AVX-512 capabilities such
as EVEX encoding, 32 vector registers and 8 32-bit opmask registers at maximum vector length of 256 (Intel® AVX10/256)
<li> Allows an implementation to include support for 512-bit vector and 64-bit opmask registers on P-Core CPUs (Intel® AVX10/512) for
heavy vector compute applications that can leverage the additional vector length
<li> Introduces embedded rounding and Suppress All Exceptions (SAE) control for YMM versions of the instructions
</ul>
@section OVERVIEW Overview of Intel® XED approach
Intel® XED has two fundamental interfaces: encoding and decoding. Supporting
these interfaces are many data structures, but the two starting points
are the #xed_encoder_request_t and the #xed_decoded_inst_t . The
#xed_decoded_inst_t has more information than the
#xed_encoder_request_t , but both types are derived from a set of
common fields called the #xed_operand_values_t.
The output of the decoder, the #xed_decoded_inst_t , includes additional
information that is not required for encoding but provides more
information about the instruction resources.
The common operand fields, used by both the encoder and decoder, hold
the operands and the memory addressing information.
The decoder has an operands array that holds the order of the decoded
operands. This array indicates whether or not the operands are read or
written.
The encoder has an operand array where the encoder user must specify
the order of the operands used for encoding.
@subsection CPUID CPUID
Intel® XED ISA-SETs can be mapped to one or more CPUID groups, each being mapped to one or more CPUID records.
The CPUID record contains information about the register containing the bits to be set, the leaf, subleaf and bit indices.
When the leaf and subleaf values are loaded into the EAX and ECX registers, respectively, the CPUID instruction sets the specified
bits of the specified register, indicating support for the ISA or the feature, which is often the CPUID name field.
Intel® AVX10 introduced a versioned approach for enumeration that ensures that all Intel® CPUs support the same features
and instructions at a given Intel® AVX10 version number. This approach also reduced the required number of CPUID feature flags
to be checked to determine feature support. This way, usually, it is only needed to check three fields:
1. A CPUID feature bit indicating that the Intel® AVX10 ISA is supported
2. A version number to ensure that the supported version is greater than or equal to the desired version
3. A vector length bit indicating the maximum supported vector length
Determining whether an ISA-SET is supported by a chip:
For ISA-SETs with a single CPUID group, all of its CPUID records must be set in order to be supported by the chip.
For ISA-SETs with multiple CPUID groups, at least one CPUID group must be satisfied. In order to match one group, all of its cpuid records
must be set. To simplify things, we can transform it into a logical expression -
@code
"CPUID GROUP A" OR "CPUID GROUP B" OR ...
("CPUID RECORD A.A" AND "CPUID RECORD A.B" AND ... ) OR ("CPUID RECORD B.A" AND "CPUID RECORD B.B" AND ... ) OR ...
@endcode
If one CPUID group is satisfied, the whole expression will be satisfied ("OR" relationship), thus indicating chip support for the ISA.
Since the CPUID group itself is an "AND" expression between all of its CPUID records, all CPUID records must be set (satisfied)
in order to satisfy the sub-expression.
For instance, the ISA-SET AVX512F_512 has the following CPUIDS groups:
The Intel® AVX10 CPUID group with three CPUID records:
<ul>
<li> CPUID name avx10_enabled, leaf 0x7, sub-leaf 0x1, register EDX, bit 19
<li> CPUID name avx10_ver1, leaf 0x24, sub-leaf 0x0, register EBX, bits 0 to 7
<li> CPUID name avx10_512vl, leaf 0x24, sub-leaf 0x0, register EBX, bit 18
</ul>
The feature group with a single CPUID record:
<ul>
<li> CPUID name avx512f, leaf 0x7, sub-leaf 0x0, register EBX, bit 16
</ul>
This means that a chip supports AVX512F_512 ISA if at least one of the two groups has a match.
In order to match one CPUID group, all of its records must be set. So either the first group's three CPUID records or the second
group's single CPUID record must be set.
To provide further insight on Intel® AVX10 CPUID, let's discuss the first CPUID group of AVX512F_512:
The first record ("AVX10 Converged Vector ISA Enable" bit) is indicative of processor support of Intel® AVX10 ISA. The second CPUID record
specifies the processor's minimal required Intel® AVX10 version (in this case, AVX10.1). The last CPUID record is the vector length bit indicating
the maximum supported VL (512).
For the recommended usage of the Intel® XED CPUID APIS, see @ref SMALLEXAMPLES .
// ===========================================================================
@section ICLASS Instruction classes
The #xed_iclass_enum_t class describes the instruction names. The
names are (mostly) taken from the Intel manual, with exceptions only
for certain ambiguities. This is what is typically thought of as the
instruction mnemonic. Note, Intel® XED does not typically distinguish
instructions based on width unless the ISA manuals do so as well. For
example, #xed_iclass_enum_t's are not suffixed with "w", "l" or "q"
typically. There are instructions whose #xed_iclass_enum_t ends in a
"B" or a "Q" (including all byte operations and certain string
operations) and those names are preserved as described in the Intel
programmers' reference manuals.
@subsection SPECIAL Special Cases
There are many special cases that must be accounted for in attempting
to handle all the nuances of the ISA. This is an attempt to explain
the nonstandard handling of certain instruction names.
The FAR versions of 3 opcodes (really 6 distinct opcodes) are given
the opcode names CALL_FAR, JMP_FAR, and RET_FAR. The AMD documentation
lists the far return as RETF. I call that RET_FAR to be consistent
with the other far operations.
To distinguish the SSE2 MOVSD instruction from the base string
instruction MOVSD, Intel® XED calls the SSE version MOVSD_XMM.
In March 2015, a change was made to certain Intel® XED iclasses to simplify
the implementation. The changes are as follows:
<ul>
<li> XED_ICLASS_JRCXZ was split in to three distinct iclasses:
XED_ICLASS_JCXZ, XED_ICLASS_JECXZ and XED_ICLASS_JRCXZ.
<li> The REP-prefixed (0xF2, 0xF3) string instructions were split
in to new iclasses making them distinct from the underlying
non-REP-prefixed instructions. For example XED_ICLASS_REP_STOSW
is distinct from XED_ICLASS_STOSW. The CMPS{B,W,D,Q} and
SCAS{B,W,D,Q} instructions have "REPE_" or "REPNE_" prefixes to
correspond to REPE (0xF3) or REPNE (0xF2).
<li> LOCK-prefixed (0xF0) atomic read-modify-write memory
instructions were split in to separate iclasses that contain the
substring "_LOCK". LOCK-prefixed instructions had an attribute
XED_ATTRIBUTE_LOCKED. Memory instructions that could have a lock
prefix added to them when encoding, have an attribute
XED_ATTRIBUTE_LOCKABLE. For example, XED_ICLASS_CMPXCHG16B_LOCK
has a lock prefix, but XED_ICLASS_CMPXCHG16B does not have a lock
prefix. As always, XCHG is atomic with or without a LOCK prefix
as per the rules of the ISA, so XED_ICLASS_XCHG does not have a
_LOCK suffix in the xed_iclass_enum_t name.
</ul>
@subsection NOPs
NOPs are very special. Intel® XED allows for encoding NOPs of 1 to 9 bytes
through the use of the XED_ICLASS_NOP (the one-byte nop), and
XED_ICLASS_NOP2 ... XED_ICLASS_NOP9. These use the recommended NOP
sequences from the Intel® 64 and IA-32 Architectures Software Developers Manual.
The instruction 0x90 is very special in the instruction set because it
gets special treatment in 64b mode. In 64b mode, 32b register writes
normally zero the upper 32 bits of a 64b register. Not so for 0x90. If
it did zero the upper 32 bits, it would not be a NOP.
There are two important NOP categories. XED_CATEGORY_NOP and
XED_CATEGORY_WIDENOP. The XED_CATEGORY_NOP applies only to the 0x90
opcode. The WIDENOP category applies to the NOPs in the two-byte table
row 0F19...0F1F. The WIDENOPs take MODRM bytes, and optional SIB and
displacements.
// ===========================================================================
// @section X86-OPERANDS Operands
Intel® XED uses the operand order documented in the Intel Programmers'
Reference Manual. In most cases, the first operand is a source and
destination (read and written) and the second operand is just a source
(read).
For decode requests (#xed_decoded_inst_t), the operands array is
stored in the #xed_inst_t structure once the instruction is
decoded. The request's operand order is stored in the #xed_encoder_request_t
for encode requests.
There are several types of operands:
<ul>
<li> registers (#xed_reg_enum_t)
<li> branch displacements
<li> memory operations (which include base, index, segment and memory displacements)
<li> immediates
<li> pseudo resources (which are listed in the #xed_reg_enum_t)
</ul>
Each operand has two associated attributes: the R/W action and a
visibility. The R/W actions (#xed_operand_action_enum_t) indicate
whether the operand is read, written or both read-and-written, or
conditionally read or written. The visibility attribute
(#xed_operand_visibility_enum_t) is described in the next subsection.
The memory operation operand is really a pointer to separate fields
that hold the memory operation information. The memory operation information is comprised of the following:
<ul>
<li> a segment register
<li> a base register
<li> an index register
<li> a displacement
</ul>
There are several important things to note:
<ul>
<li> There can only be two memory operations, MEM0 and MEM1.
<li> MEM0 could also be an AGEN, which stands for "Address
Generation". AGEN is a special operand that uses memory
information but does not actually read memory. This is only
used for the LEA instruction.
<li> There can only be an index and displacement associated with
MEM0 (or AGEN).
<li> There is just one displacement associated with the common
fields. It could be associated with either the AGEN/MEM0 or
with a branch or call instruction.
</ul>
@subsection AVX512_OPERANDS Intel® AVX512 Operands
Intel® AVX512 adds write masking, merging, and zeroing to the
instruction set via the EVEX encodings. Write masking, merging, and
zeroing are properties of the instruction encoding and are not visible
by looking at individual operands. Write masking with merging makes it
possible for values of the destination register to live on from prior
to the execution of the instruction. Write masking with merging
results in an extra register read of the destination operand. In
contrast write masking with zeroing always completely overwrites the
destination operand, either with values computed by the instruction or
with zeros for elements that are "masked off".
For most operands, to learn if the operand reads or writes its
associated resource, one can use #xed_operand_rw(const xed_operand_t*
p). However, because masking, merging and zeroing are properties of the
instruction, and not just the operand, use of a different function is
required.
To handle this, Intel® XED has a new interface function
#xed_decoded_inst_operand_action(), which takes a #xed_decoded_inst_t
pointer and an operand index and indicates how the read/write behavior
is modified in the presence of masking with merging or masking with
zeroing.
The following list attempts to summarize how the value returned from
xed_operand_rw() is internally modified for the 0th operand, except
for stores:
<ul>
<li> no masking: no change.
<li> masking with zeroing: no change.
<li> masking with merging : destination register operands
that are nominally "rw" or "w" become "rcw" indicating
a read with a conditional write.
</ul>
@subsection APX_OPERANDS Intel® APX Operands
2023 saw the introduction of Intel® Advanced Performance Extensions (Intel® APX),
which expands the entire x86 instruction set with access to more registers.
Intel® APX doubles the number of general-purpose registers (GPRs) from 16 to 32 (Extended GPRs or EGPRs).
New and promoted APX-F instructions are defined in one of the following Intel® XED extension groups:
- XED_EXTENSION_APXLEGACY: For new APX-F instructions within the Legacy encoding space
- XED_EXTENSION_APXEVEX: For new and promoted APX-F instructions within the EVEX encoding space
CCMP and CTEST are two new sets of instructions for conditional CMP and TEST, respectively. These instructions
introduce a new 4-bit pseudo-register for "Default Flags Values" called DFV (EVEX.[OF, SF, ZF, CF]).
The register index represents the bits for the default flags, for example, DFV10.index == 10 == 0b1010 -> OF=1, SF=0, ZF=1, CF=0.
The DFV pseudo-register should be explicitly defined in an encoder request.
The xed_decoded_inst_get_dfv_reg() API can be used to retrieve a DFV register enumeration from a decoded instruction.
The xed_flag_dfv_get_default_flags_values() API can be used to get the default flags values given a DFV register enumeration.
Developers can, however, dynamically disable Intel® APX architecture encoder support using the 'NO_APX' API xed3_operand_set_no_apx().
The xed3_operand_set_must_use_evex() API can also be used for APX promoted instructions in order to force EVEX space upon the encoding request.
Developers wishing to encode No-Flags Intel® APX instructions should set the NF Intel® XED operand.
@subsection OPERAND_VISIBILITY Operand Resource Visibilities
See #xed_operand_visibility_enum_t .
There are three basic types of resource visibilities:
<ul>
<li> EXPLICIT (EXPL),
<li> IMPLICIT (IMPL), and
<li> IMPLICIT SUPPRESSED (SUPP) (usually referred to as just "SUPPRESSED").
</ul>
Explicit are what you think they are: resources that
are required for the encoding, and for each explicit resource, and there is
a field in the corresponding instruction encoding for each explicit resource. The implicit and
suppressed resources are more subtle.
SUPP operands are:
<ul>
<li> not used in picking an encoding,
<li> not printed in disassembly,
<li> not represented using operand bits in the encoding.
</ul>
IMPL operands are:
<ul>
<li> used in picking an encoding,
<li> expressed in disassembly, and
<li> not represented using operand bits in the encoding (like SUPP).
</ul>
The implicit resources are required for selecting an encoding but do
not show up as a specific field in the instruction
representation. Implicit resources do show up in a conventional
instruction disassembly. In the IA-32 instruction set or Intel64
instruction set, there are many instructions that use EAX or RAX
implicitly, for example. Sometimes, the CL or RCX register is
implicit. Also, some instructions have an implicit 1 immediate. The
opcode you chose fixes your choice of implicit register or immediate.
The suppressed resources are a form of implicit resource, but they are
resources not required for encoding. The suppressed operands are not
normally displayed in a conventional disassembly. The suppressed
operands are emitted by the decoder but are not used when
encoding. They are ignored by the encoder. Examples are the stack
pointer for PUSH and POP operations. There are many others, like
pseudo resources.
The explicit and implicit resources are expressed resources -- they show
up in disassembly and are required for encoding.
The suppressed resources are considered a kind of implicit
resources that are not expressed in ATT System V or Intel disassembly formats.
The suppressed operands are always after the implicit and explicit operands
in the operand order.
@subsection X87_REG_STACK x87 Register stack popping
The Intel® 64 and IA-32 Architectures Software Developers Manual indicates that "FADDP st2",
reads st0, st2 writes st2 and pops the x87 stack. The result ends up
in st1 after the instruction executes. That is not how Intel® XED represents
the operation. Intel® XED will say that "FADDP st2" reads st0 and st2 and
writes st2. The output register that Intel® XED provides is essentially "pre
pop". The pop occurs afterward, conceptually. The actual result ends
up in the st1 register after the stack pop operation. Intel® XED also lists
the pseudo resources indicating that a stack pop has occurred. This
behavior affects the output register of the following instructions: FADDP,
FMULP, FSUBRP, FSUBP, FDIVRP, FDIVP.
@subsection PSEUDO_RESOURCES Pseudo Resources
Some instructions reference machine registers or perform interesting
operations that we need to represent. For example, the IDTR and GDTR
are represented as pseudo resources. Operations that pop the x87
floating point register stack can have an X87POP or X87POP2 "register"
to indicate if the x87 register stack is popped once or twice. These
are part of the #xed_reg_enum_t.
@subsection IMM_DIS Immediates and Displacements
Using the API functions for setting immediates, memory displacements,
and branch displacements. Immediates and Displacements are stored in
normal integers internally, but they are stored endian swapped and
left justified. The API functions take care of all the endian
swapping and positioning so you don't have to worry about that detail.
Immediates and displacements are different things in the ISA. They can
be 1, 2, 4 or 8 bytes. Branch displacements (1, 2 or 4 bytes) and
Memory displacements (1, 2, 4 or 8 bytes) refer to the signed
constants that are used for relative distances or memory "offsets"
from a base register (including the instruction pointer) or start of a
memory region.
Immediates are signed or unsigned and are used for numerical
computations and shift distances. They also hold things like segment
selectors for far pointers for certain jump or call instructions.
There is also a second 1B immediate used only for the ENTER
instruction.
Intel® XED will try to use the shortest allowed width for a displacement or
immediate. You can control Intel® XED's selection of allowed widths using a
notion of "legal widths". A "legal width" is a binary number where
each bit represents a legal desired width. For example, when you have
a valid base register in 32 or 64b addressing, and a displacement is
required, your displacement must be either 1 byte or 4 bytes
long. This is expressed by OR'ing 1 and 4 together to get 0101 (base
2) or 5 (base 10).
If a four-byte displacement was required, but the value was
representable in fewer than four bytes, then the legal width should be
set to 0100 (base 2) or 4 (base 10).
@section API_REF API Reference
- @ref INIT "INIT" Initialization
- @ref DEC "DEC" Decoding instructions
- @ref ENC "ENC" Generic API for encoding instructions
- @ref ENCHL "ENCHL" High level API for the generic encoder
- @ref ENCHLPATCH "ENCHLPATCH" Patching instructions
- @ref ENC2 "ENC2" Fast encoder for specific instructions
- @ref OPERANDS "OPERANDS" Operand storage fields
- @ref IFORM "IFORM" Iforms
- @ref ISASET "ISASET" ISA-sets and chips
- @ref PRINT "PRINT" Printing (disassembling) instructions
- @ref REGINTFC "REGINTFC" Register interface functions
- @ref FLAGS "FLAGS" Flags interface functions
- @ref AGEN "AGEN" Address generation calculation support
- @ref ENUM "ENUM" Enumerations
- @ref EXAMPLES "Examples" Examples
@section LEGAL Disclaimer and Legal Information
The information in this manual is subject to change without notice and
Intel Corporation assumes no responsibility or liability for any
errors or inaccuracies that may appear in this document or any
software that may be provided in association with this document. This
document and the software described in it are furnished under license
and may only be used or copied in accordance with the terms of the
license. No license, express or implied, by estoppel or otherwise, to
any intellectual property rights is granted by this document. The
information in this document is provided in connection with Intel
products and should not be construed as a commitment by Intel
Corporation.
EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH
PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS
ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL
PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A
PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT,
COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. Intel products are not
intended for use in medical, life saving, life sustaining, critical
control or safety systems, or in nuclear facility applications.
Designers must not rely on the absence or characteristics of any
features or instructions marked "reserved" or "undefined." Intel
reserves these for future definition and shall have no responsibility
whatsoever for conflicts or incompat- ibilities arising from future
changes to them.
The software described in this document may contain software defects
that may cause the product to deviate from published
specifications. Current characterized software defects are available
on request.
Intel, the Intel logo, Intel SpeedStep, Intel NetBurst, Intel
NetStructure, MMX, Intel386, Intel486, Celeron, Intel Centrino, Intel
Xeon, Intel XScale, Itanium, Pentium, Pentium II Xeon, Pentium III
Xeon, Pentium M, and VTune are trademarks or registered trademarks of
Intel Corporation or its subsidiaries in the United States and other
countries.
Other names and brands may be claimed as the property of others.
Copyright (c) 2002-2023 Intel Corporation. All Rights Reserved.
*/
// =============================================================
/*! @defgroup DEC Decoding Instructions
To decode an instruction you are required to provide
<ul>
<li> a machine state (operating mode and stack addressing width)
<li> a pointer to the instruction text array of bytes
<li> a length of the text array
</ul>
The machine state is passed in to decoder via the class
#xed_state_t .
That
state is set via the constructor of each
#xed_decoded_inst_t .
The
#xed_decoded_inst_t
contains the results of decoding after a successful decode.
The #xed_decoded_inst_t includes an array of #xed_operand_values_t
and that is where most of the information about the operands,
resources etc. are stored. See the @ref OPERANDS interface. The
array is indexed by the #xed_operand_enum_t enumeration. Do not
access it directly though; use the interface functions in the @ref
OPERANDS interface for portability.
After decoding the #xed_decoded_inst_t contains a pointer to the
#xed_inst_t which acts like a kind of template giving static
information about the decoded instruction: what are the types of
the operands, the iclass, category extension, etc. The #xed_inst_t
is accessed via the #xed_decoded_inst_inst(cont
xed_decoded_inst_t* xedd) function.
Before every decode, you must call one of the initialization
functions. The most common case would be to use
#xed_decoded_inst_zero_keep_mode() or maybe
#xed_decoded_inst_zero_set_mode().
*/
/*! @defgroup ENC Encoding Instructions
When you call xed_encode() to encode instruction you must pass:
<ul>
<li> an encode structure that includes a machine state ( #xed_state_t )
<li> a pointer to the instruction text
<li> a length of the text array
</ul>
The class #xed_encoder_request_t includes a #xed_operand_values_t and
that is where most of the information about the operands,
resources etc. are stored.
To get nondefault width operands, during encoding, you have to
call #xed_encoder_request_set_effective_operand_width() .
To set nondefault addressing widths, you must call
#xed_encoder_request_set_effective_address_size().
To encode instructions you must set the following
in the #xed_encoder_request_t.
<ol>
<li> the machine mode (machine width, and stack addressing width)
<li> the effective operand width
<li> the iclass
<li> for some instructions you need to specify prefixes (like REP,
REPNE or LOCK).
<li> the operands:
<ol>
<li>operand kind
(XED_OPERAND_{AGEN,MEM0,MEM1,IMM0,IMM1,RELBR,ABSBR,PTR,REG0...REG15}
<li>operand order <BR>
xed_encoder_request_set_operand_order(&req,operand_index, XED_OPERAND_*);
where the operand_index is a sequential index starting at zero.
<li>operand details
<ol>
<li> FOR MEMOPS: base,segment,index,scale,displacement
for memops,
<li> FOR REGISTERS: register name
<li> FOR IMMEDIATES: immediate values
</ol>
</ol>
</ol>
See @ref ENCODE_EXAMPLE for an example of using the encoder.
*/
/*! @defgroup ENCHL High Level API for Encoding Instructions
This is a higher level API for encoding instructions.
A full example is present in examples/xed-ex5-enc.c
In the following example we create one instructions template that can
be passed to the encoder.
@code
xed_encoder_instruction_t x;
xed_encoder_request_t enc_req;
xed_state_t dstate;
dstate.mmode=XED_MACHINE_MODE_LEGACY_32;
dstate.stack_addr_width=XED_ADDRESS_WIDTH_32b;
xed_inst2(&x, dstate, XED_ICLASS_ADD, 0,
xreg(XED_REG_EAX),
xmem_bd(XED_REG_EDX, xdisp(0x11223344, 32), 32));
xed_encoder_request_zero_set_mode(&enc_req, &dstate);
convert_ok = xed_convert_to_encoder_request(&enc_req, &x);
if (!convert_ok) {
fprintf(stderr,"conversion to encode request failed\n");
continue;
}
xed_error = xed_encode(&enc_req, itext, ilen, &olen);
@endcode
The high-level encoder interface allows passing the effective operand
width for the xed_inst*() function as 0 (zero) when the effective
operand width is the default.
The default width in 16b mode is 16b. The default width in 32b or 64b
modes is 32b. So if you do a 16b operation in 32b/64b mode, you must
set the effective operand width. If you do a 64b operation in 64b
mode, you must set it (the default is 32). Or if you do a more rare
32b operation in 16b mode you must also set it.
When all the operands are "suppressed" operands, then the effective
operand width must be supplied for nondefault operation widths.
*/
/*! @defgroup ENCHLPATCH Patching instructions
These functions are useful for JITs and other uses where one must
modify certain fields of instructions after encoding. To modify an
instruction, one must encode it (creating an itext array of bytes) and
then decode it (so that the patching routines know where the various
fields are located.). Once the itext and the decoded instruction are
available, certain fields can be modified.
The decode step required to create patchable instructions obviously
takes additional time so it is suggested one only create patchable
instructions once as templates and re-use them as needed.
See examples/xed-ex9-patch.c for an example.
*/
/*! @defgroup ENC2 Fast Encoder for Specific Instructions
The basic idea for the ENC2 fast encoder is that there is one encode
function per variant of every instruction. The instructions are
encoded in 3 encoding spaces (legacy, VEX and EVEX). We need to have
different function names for every variation as well. To come up with
unique names, ENC2 uses a few function naming conventions. For legacy
encoded instructions, we often have 3 variations in 64b mode (2 in
other modes) to handle 16-bit, 32-bit and 64-bit operands. Those 3
sizes are usually differentiated with "_o16", "_o32" and "_o64" in the
ENC2 function names. Having unique names is complicated as there are
often multiple encodings for the same operation in the instruction
set. To disambiguate alias encodings the same function names include
substring "_vrN" where N is a integer. Similarly, VEX and EVEX
encodings for related instructions often need to be distinguished when
their instruction name and operands are the same. To accomplish that
all ENC2 EVEX encoding functions names contain the substring "_e".
The checked interface functions end with "_chk".
For instructions that take conventional x86 memory operands, there are
6 functions generated depending on the addressing mode required. The 6
functions are denoted: b, bd8, bd32, bis, bids8, and bisd32 where:
<ul>
<li> "b" indicates a base register,
<li> "d8" indicates an 8-bit displacement,
<li> "d32" indicates an 32-bit displacement,
<li> "i" indicates an index register, and
<li> "s" indicates an a scale factor (1,2,4,8) for the index register.
</ul>
The idea behind having different functions for the different addressing
modes is to make the encode functions simpler and more straight-line code.
Memory instructions also indicate their effective addressing width
with one of "_a16", "_a32" or "_a64" substrings.
The libraries for the ENC2 encoder are built when the "--enc2" switch
is included during the build process. There is one set of
libraries and headers generated for each supported
configuration. Currently Intel® XED ENC2 supports 64b mode with 64b addressing
(m64,a64) and 32b mode with 32b addressing (m32,a32). The build
process creates an enc2-m64-a64 directory and an enc2-m32-a32
directory, each with two libraries for the checked and unchecked
interfaces. There are 2 headers as well, one for each version of each
library in the hdr/xed subdirectory of their respective enc2-*
directory. On linux, for a static build, you'd see:
@code
enc2-m64-a64/
libxed-chk-enc2-m64-a64.a
libxed-enc2-m64-a64.a
hdr/
xed/
xed-chk-enc2-m64-a64.h
xed-enc2-m64-a64.h
@endcode
Given the large size of the generated ENC2 headers, doxygen
documentation is not created for those header files. Please view the
headers directly in your editor.
Even with the unchecked interface, some register checking is done for the
addressing registers. In the x86 encoding system, some choices of
base register require that an 8-bit or 32-bit displacement is also
used. In those cases, the ENC2 encoder is capable of supplying a
zero-valued displacement.
Intel® XED also offers the capability to test ENC2 with either the "--enc2-test-checked" flag or
the "--enc2-operands-checked" flag. Building XED with any of these flags consequently leads to a longer build.
The former flag allows developers to test the ENC2 checked interface in a more sparing matter, where each
instruction is then decoded and its IFORM gets validated. The latter flag offers a more rigid testing. Each
instruction is decoded and then its IFORM and all operands involved in the encoding get validated as well.
Users can install their own error handler by calling
#xed_enc2_set_error_handler() passing a function pointer that takes
stdarg variable arguments. See examples/xed-enc2-2.c for an example.
When using the checked interface, one can disable the checking at
runtime by calling
#xed_enc2_set_check_args() with an integer value 0.
With a nonzero argument, the argument checking can be re-enabled.
To minimize copying, ENC2 users are required to supply a pointer to an
output buffer where the encoding bytes will be placed. That buffer is
required to be 15 bytes in length. Valid x86 encodings are shorter
than 15 bytes and only reach that length if redundant legacy prefixes
are employed. XED ENC2 does not generate redundant legacy prefixes.
Here is an example of creating an LEA instruction using the checked
interface and several fixed registers:
@code
xed_uint32_t create_lea_64b(xed_uint8_t* output_buffer)
{
xed_reg_enum_t dest, base, index;