Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support CST_CODE_INLINEASM constant codes added in LLVM 13 and 14 #184

Closed
RyanGlScott opened this issue Apr 20, 2022 · 3 comments · Fixed by #202
Closed

Support CST_CODE_INLINEASM constant codes added in LLVM 13 and 14 #184

RyanGlScott opened this issue Apr 20, 2022 · 3 comments · Fixed by #202

Comments

@RyanGlScott
Copy link
Contributor

Prior to LLVM 13, there were two constant codes (18 and 23) for inline assembly statements. There are now two more such codes as of LLVM 14:

Currently, llvm-pretty-bc-parser does not support either of these. Here is a test case:

int main() {
  asm("nop");
  return 0;
}

When compiled with ~/Software/clang+llvm-13.0.1/bin/clang test.c -c -emit-llvm, this produces the following parse error:

λ> parseBitCodeFromFile "test.bc"
Left (Error {errContext = ["CONSTANTS_BLOCK","@main","FUNCTION_BLOCK","FUNCTION_BLOCK_ID","value symbol table","MODULE_BLOCK","Bitstream"], errMessage = "Unknown constant record code 28\nAre you sure you're using a supported compiler?\nCheck here: https://github.com/GaloisInc/llvm-pretty-bc-parser\n"})

The error is much the same when compiled with LLVM 14, except that the code will be 30 instead of 28.

With some effort, we should be able to adapt the existing llvm-pretty-bc-parser code for code 18 (now named CST_CODE_INLINEASM_OLD as of LLVM 14):

18 -> label "CST_CODE_INLINEASM_OLD" $ do
let field = parseField r
ty <- getTy
flags <- field 0 numeric
let sideEffect = testBit (flags :: Int) 0
alignStack = (flags `shiftR` 1) == 1
alen <- field 1 numeric
asm <- UTF8.decode <$> parseSlice r 2 alen char
clen <- field (2+alen) numeric
cst <- UTF8.decode <$> parseSlice r (3+alen) clen char
return (getTy, Typed ty (ValAsm sideEffect alignStack asm cst):cs)

And for code 23 (now named CST_CODE_INLINEASM_OLD2 as of LLVM 14):

23 -> label "CST_CODE_INLINEASM" $ do
let field = parseField r
mask <- field 0 numeric
let test = testBit (mask :: Word32)
hasSideEffects = test 0
isAlignStack = test 1
_asmDialect = mask `shiftR` 2
asmStrSize <- field 1 numeric
Assert.recordSizeGreater r (1 + asmStrSize)
constStrSize <- field (2 + asmStrSize) numeric
Assert.recordSizeGreater r (2 + asmStrSize + constStrSize)
asmStr <- fmap UTF8.decode $ parseSlice r 2 asmStrSize char
constStr <- fmap UTF8.decode $ parseSlice r (3 + asmStrSize) constStrSize char
ty <- getTy
let val = ValAsm hasSideEffects isAlignStack asmStr constStr
return (getTy, Typed ty val : cs)

@RyanGlScott
Copy link
Contributor Author

@eddywestbrook also notes that this bug is triggered when compiling the following program with rustc-1.60.0 (bundled with LLVM 14):

/* A function that immediately panics */
pub fn get_out () -> ! {
    panic!("Uh oh!")
}

This is because the generated LLVM bitcode will use inline assembly, somewhat surprisingly:

; Function Attrs: inlinehint uwtable
define void @_ZN4core4hint9black_box17haf534fee2d513d5fE() unnamed_addr #0 {
start:
  call void asm sideeffect "", "r,~{memory}"({}* undef), !srcloc !7
  br label %bb1

bb1:                                              ; preds = %start
  ret void
}

@RyanGlScott
Copy link
Contributor Author

Note that:

  • Code 28 (CST_CODE_INLINEASM_OLD3) adds an unwind flag
  • Code 30 (CST_CODE_INLINEASM) adds a function type

In order to support this new information, we will need to augment the ValAsm data constructor in llvm-pretty with additional fields.

@langston-barrett
Copy link
Contributor

These constants also appear when compiling an x86_64 "tinyconfig" Linux kernel 5.17 with LLVM 13 like so. Bitcode: vmlinux-5.17-llvm-13.bc.zip.

RyanGlScott added a commit that referenced this issue Mar 16, 2023
One code (`CST_CODE_INLINEASM_OLD3`) was introduced in LLVM 13, and another
(`CST_CODE_INLINEASM`) was introduced in LLVM 14. For the most part, they are
parsed identically to previous inline `asm` codes, but with some minor
differences. I have consolidated the logic for parsing all inline `asm` codes
into a single `parseInlineAsm` function.

Fixes #184.
RyanGlScott added a commit that referenced this issue Mar 26, 2023
One code (`CST_CODE_INLINEASM_OLD3`) was introduced in LLVM 13, and another
(`CST_CODE_INLINEASM`) was introduced in LLVM 14. For the most part, they are
parsed identically to previous inline `asm` codes, but with some minor
differences. I have consolidated the logic for parsing all inline `asm` codes
into a single `parseInlineAsm` function.

The existing `disasm-test/tests/callbr.ll` test case ensures that we handle all
of these different `inline asm` codes correctly. As an added bonus, it's
portable across multiple LLVM versions!

Fixes #184.
RyanGlScott added a commit that referenced this issue Mar 27, 2023
One code (`CST_CODE_INLINEASM_OLD3`) was introduced in LLVM 13, and another
(`CST_CODE_INLINEASM`) was introduced in LLVM 14. For the most part, they are
parsed identically to previous inline `asm` codes, but with some minor
differences. I have consolidated the logic for parsing all inline `asm` codes
into a single `parseInlineAsm` function.

The existing `disasm-test/tests/callbr.ll` test case ensures that we handle all
of these different `inline asm` codes correctly. As an added bonus, it's
portable across multiple LLVM versions!

Fixes #184.
RyanGlScott added a commit that referenced this issue Mar 27, 2023
One code (`CST_CODE_INLINEASM_OLD3`) was introduced in LLVM 13, and another
(`CST_CODE_INLINEASM`) was introduced in LLVM 14. For the most part, they are
parsed identically to previous inline `asm` codes, but with some minor
differences. I have consolidated the logic for parsing all inline `asm` codes
into a single `parseInlineAsm` function.

The existing `disasm-test/tests/callbr.ll` test case ensures that we handle all
of these different `inline asm` codes correctly. As an added bonus, it's
portable across multiple LLVM versions!

Fixes #184.
RyanGlScott added a commit that referenced this issue Apr 3, 2023
One code (`CST_CODE_INLINEASM_OLD3`) was introduced in LLVM 13, and another
(`CST_CODE_INLINEASM`) was introduced in LLVM 14. For the most part, they are
parsed identically to previous inline `asm` codes, but with some minor
differences. I have consolidated the logic for parsing all inline `asm` codes
into a single `parseInlineAsm` function.

The existing `disasm-test/tests/callbr.ll` test case ensures that we handle all
of these different `inline asm` codes correctly. As an added bonus, it's
portable across multiple LLVM versions!

Fixes #184.
RyanGlScott added a commit that referenced this issue Apr 3, 2023
One code (`CST_CODE_INLINEASM_OLD3`) was introduced in LLVM 13, and another
(`CST_CODE_INLINEASM`) was introduced in LLVM 14. For the most part, they are
parsed identically to previous inline `asm` codes, but with some minor
differences. I have consolidated the logic for parsing all inline `asm` codes
into a single `parseInlineAsm` function.

The existing `disasm-test/tests/callbr.ll` test case ensures that we handle all
of these different `inline asm` codes correctly. As an added bonus, it's
portable across multiple LLVM versions!

Fixes #184.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants