Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MACOS - handling of A5World references #654

Open
gbody opened this issue Jun 15, 2018 · 6 comments
Open

MACOS - handling of A5World references #654

gbody opened this issue Jun 15, 2018 · 6 comments

Comments

@gbody
Copy link
Contributor

gbody commented Jun 15, 2018

@uxmal
I have been looking through the code being generated after Analyze Dataflow has been processed.
The routine I have looked at is get_name in the second code segment, and should return a PTR from the global variables in the A5World. I know you sorted out JSR to A5World address gets redirected to the correct procedure address. Is there any support for memory access/address using A5 references.
get_name basically checks a value/index passed on the stack and returns a ptr within the A5World global variable space in register D0.
The RTL code looks ok, the Analyzed code produced doesn't have any information relating to referenced address in the A5World pointed to by A5.

Disassembly of 68K code segment for get_name

0010999E 4E56 0000 link a6,#$0000
001099A2 202E 0008 move.l $0008(a6),d0
001099A6 6700 00B8 beq $00109A60
001099AA 5380 subq.l #$01,d0
001099AC 6700 00BC beq $00109A6A
001099B0 5380 subq.l #$01,d0
001099B2 6700 00C0 beq $00109A74
001099B6 5380 subq.l #$01,d0
001099B8 6700 00C4 beq $00109A7E
001099BC 5380 subq.l #$01,d0
001099BE 6700 00C8 beq $00109A88
001099C2 5380 subq.l #$01,d0
001099C4 6700 00CC beq $00109A92
001099C8 5380 subq.l #$01,d0
001099CA 6700 00D0 beq $00109A9C
001099CE 5380 subq.l #$01,d0
001099D0 6700 00D4 beq $00109AA6
001099D4 5380 subq.l #$01,d0
001099D6 6700 00D8 beq $00109AB0
001099DA 5380 subq.l #$01,d0
001099DC 6700 00DC beq $00109ABA
001099E0 5380 subq.l #$01,d0
001099E2 6700 00E0 beq $00109AC4
001099E6 5380 subq.l #$01,d0
001099E8 6700 00E4 beq $00109ACE
001099EC 5380 subq.l #$01,d0
001099EE 6700 00E8 beq $00109AD8
001099F2 5380 subq.l #$01,d0
001099F4 6700 00EC beq $00109AE2
001099F8 5380 subq.l #$01,d0
001099FA 6700 00EE beq $00109AEA
001099FE 5380 subq.l #$01,d0
00109A00 6700 00F0 beq $00109AF2
00109A04 5380 subq.l #$01,d0
00109A06 6700 00F2 beq $00109AFA
00109A0A 5380 subq.l #$01,d0
00109A0C 6700 00F4 beq $00109B02
00109A10 5380 subq.l #$01,d0
00109A12 6700 00F6 beq $00109B0A
00109A16 5380 subq.l #$01,d0
00109A18 6700 00F8 beq $00109B12
00109A1C 5380 subq.l #$01,d0
00109A1E 6700 00FA beq $00109B1A
00109A22 5380 subq.l #$01,d0
00109A24 6700 00FC beq $00109B22
00109A28 0480 0000 006B subi.l #$0000006B,d0
00109A2E 6700 00FA beq $00109B2A
00109A32 5380 subq.l #$01,d0
00109A34 6700 00FC beq $00109B32
00109A38 5380 subq.l #$01,d0
00109A3A 6700 00FE beq $00109B3A
00109A3E 5380 subq.l #$01,d0
00109A40 6700 0100 beq $00109B42
00109A44 5380 subq.l #$01,d0
00109A46 6700 0102 beq $00109B4A
00109A4A 5380 subq.l #$01,d0
00109A4C 6700 0104 beq $00109B52
00109A50 5380 subq.l #$01,d0
00109A52 6700 0106 beq $00109B5A
00109A56 5380 subq.l #$01,d0
00109A58 6700 0108 beq $00109B62
00109A5C 6000 010A bra $00109B68
00109A60 41ED EE12 lea -$11EE(a5),a0
00109A64 2008 move.l a0,d0
00109A66 6000 0100 bra $00109B68
00109A6A 41ED EE18 lea -$11E8(a5),a0
00109A6E 2008 move.l a0,d0
00109A70 6000 00F6 bra $00109B68
00109A74 41ED EE1C lea -$11E4(a5),a0
00109A78 2008 move.l a0,d0
00109A7A 6000 00EC bra $00109B68
00109A7E 41ED EE20 lea -$11E0(a5),a0
00109A82 2008 move.l a0,d0
00109A84 6000 00E2 bra $00109B68
00109A88 41ED EE24 lea -$11DC(a5),a0
00109A8C 2008 move.l a0,d0
00109A8E 6000 00D8 bra $00109B68
00109A92 41ED EE28 lea -$11D8(a5),a0
00109A96 2008 move.l a0,d0
00109A98 6000 00CE bra $00109B68
00109A9C 41ED EE2C lea -$11D4(a5),a0
00109AA0 2008 move.l a0,d0
00109AA2 6000 00C4 bra $00109B68
00109AA6 41ED EE30 lea -$11D0(a5),a0
00109AAA 2008 move.l a0,d0
00109AAC 6000 00BA bra $00109B68
00109AB0 41ED EE36 lea -$11CA(a5),a0
00109AB4 2008 move.l a0,d0
00109AB6 6000 00B0 bra $00109B68
00109ABA 41ED EE3A lea -$11C6(a5),a0
00109ABE 2008 move.l a0,d0
00109AC0 6000 00A6 bra $00109B68
00109AC4 41ED EE3E lea -$11C2(a5),a0
00109AC8 2008 move.l a0,d0
00109ACA 6000 009C bra $00109B68
00109ACE 41ED EE42 lea -$11BE(a5),a0
00109AD2 2008 move.l a0,d0
00109AD4 6000 0092 bra $00109B68
00109AD8 41ED EE46 lea -$11BA(a5),a0
00109ADC 2008 move.l a0,d0
00109ADE 6000 0088 bra $00109B68
00109AE2 41ED EE4A lea -$11B6(a5),a0
00109AE6 2008 move.l a0,d0
00109AE8 607E bra $00109B68
00109AEA 41ED EE4E lea -$11B2(a5),a0
00109AEE 2008 move.l a0,d0
00109AF0 6076 bra $00109B68
00109AF2 41ED EE52 lea -$11AE(a5),a0
00109AF6 2008 move.l a0,d0
00109AF8 606E bra $00109B68
00109AFA 41ED EE56 lea -$11AA(a5),a0
00109AFE 2008 move.l a0,d0
00109B00 6066 bra $00109B68
00109B02 41ED EE5A lea -$11A6(a5),a0
00109B06 2008 move.l a0,d0
00109B08 605E bra $00109B68
00109B0A 41ED EE5E lea -$11A2(a5),a0
00109B0E 2008 move.l a0,d0
00109B10 6056 bra $00109B68
00109B12 41ED EE64 lea -$119C(a5),a0
00109B16 2008 move.l a0,d0
00109B18 604E bra $00109B68
00109B1A 41ED EE68 lea -$1198(a5),a0
00109B1E 2008 move.l a0,d0
00109B20 6046 bra $00109B68
00109B22 41ED EE6E lea -$1192(a5),a0
00109B26 2008 move.l a0,d0
00109B28 603E bra $00109B68
00109B2A 41ED EE74 lea -$118C(a5),a0
00109B2E 2008 move.l a0,d0
00109B30 6036 bra $00109B68
00109B32 41ED EE78 lea -$1188(a5),a0
00109B36 2008 move.l a0,d0
00109B38 602E bra $00109B68
00109B3A 41ED EE7E lea -$1182(a5),a0
00109B3E 2008 move.l a0,d0
00109B40 6026 bra $00109B68
00109B42 41ED EE84 lea -$117C(a5),a0
00109B46 2008 move.l a0,d0
00109B48 601E bra $00109B68
00109B4A 41ED EE8A lea -$1176(a5),a0
00109B4E 2008 move.l a0,d0
00109B50 6016 bra $00109B68
00109B52 41ED EE90 lea -$1170(a5),a0
00109B56 2008 move.l a0,d0
00109B58 600E bra $00109B68
00109B5A 41ED EE96 lea -$116A(a5),a0
00109B5E 2008 move.l a0,d0
00109B60 6006 bra $00109B68
00109B62 41ED EE9C lea -$1164(a5),a0
00109B66 2008 move.l a0,d0
00109B68 4E5E unlk a6
00109B6A 4E75 rts
00109B6C 88 67 65 74 .get
00109B70 5F 6E 61 6D 65 00 00 00 _name

RTL code
void get_name()
{
get_name_entry:
l0010999E:
a7 = fp
a5 = a5world
a7 = a7 - 0x04
Mem0[a7:word32] = a6
a6 = a7
a7 = a7 + 0x00
d0 = Mem0[a6 + 0x08:word32]
CVZN = cond(d0)
branch Test(EQ,Z) l00109A60
l001099AA:
d0 = d0 - 0x01
CVZNX = cond(d0)
branch Test(EQ,Z) l00109A6A
l001099B0:
d0 = d0 - 0x01
CVZNX = cond(d0)
branch Test(EQ,Z) l00109A74
l001099B6:
d0 = d0 - 0x01
CVZNX = cond(d0)
branch Test(EQ,Z) l00109A7E
l001099BC:
d0 = d0 - 0x01
CVZNX = cond(d0)
branch Test(EQ,Z) l00109A88
l001099C2:
d0 = d0 - 0x01
CVZNX = cond(d0)
branch Test(EQ,Z) l00109A92
l001099C8:
d0 = d0 - 0x01
CVZNX = cond(d0)
branch Test(EQ,Z) l00109A9C
l001099CE:
d0 = d0 - 0x01
CVZNX = cond(d0)
branch Test(EQ,Z) l00109AA6
l001099D4:
d0 = d0 - 0x01
CVZNX = cond(d0)
branch Test(EQ,Z) l00109AB0
l001099DA:
d0 = d0 - 0x01
CVZNX = cond(d0)
branch Test(EQ,Z) l00109ABA
l001099E0:
d0 = d0 - 0x01
CVZNX = cond(d0)
branch Test(EQ,Z) l00109AC4
l001099E6:
d0 = d0 - 0x01
CVZNX = cond(d0)
branch Test(EQ,Z) l00109ACE
l001099EC:
d0 = d0 - 0x01
CVZNX = cond(d0)
branch Test(EQ,Z) l00109AD8
l001099F2:
d0 = d0 - 0x01
CVZNX = cond(d0)
branch Test(EQ,Z) l00109AE2
l001099F8:
d0 = d0 - 0x01
CVZNX = cond(d0)
branch Test(EQ,Z) l00109AEA
l001099FE:
d0 = d0 - 0x01
CVZNX = cond(d0)
branch Test(EQ,Z) l00109AF2
l00109A04:
d0 = d0 - 0x01
CVZNX = cond(d0)
branch Test(EQ,Z) l00109AFA
l00109A0A:
d0 = d0 - 0x01
CVZNX = cond(d0)
branch Test(EQ,Z) l00109B02
l00109A10:
d0 = d0 - 0x01
CVZNX = cond(d0)
branch Test(EQ,Z) l00109B0A
l00109A16:
d0 = d0 - 0x01
CVZNX = cond(d0)
branch Test(EQ,Z) l00109B12
l00109A1C:
d0 = d0 - 0x01
CVZNX = cond(d0)
branch Test(EQ,Z) l00109B1A
l00109A22:
d0 = d0 - 0x01
CVZNX = cond(d0)
branch Test(EQ,Z) l00109B22
l00109A28:
d0 = d0 - 0x6B
CVZNX = cond(d0)
branch Test(EQ,Z) l00109B2A
l00109A32:
d0 = d0 - 0x01
CVZNX = cond(d0)
branch Test(EQ,Z) l00109B32
l00109A38:
d0 = d0 - 0x01
CVZNX = cond(d0)
branch Test(EQ,Z) l00109B3A
l00109A3E:
d0 = d0 - 0x01
CVZNX = cond(d0)
branch Test(EQ,Z) l00109B42
l00109A44:
d0 = d0 - 0x01
CVZNX = cond(d0)
branch Test(EQ,Z) l00109B4A
l00109A4A:
d0 = d0 - 0x01
CVZNX = cond(d0)
branch Test(EQ,Z) l00109B52
l00109A50:
d0 = d0 - 0x01
CVZNX = cond(d0)
branch Test(EQ,Z) l00109B5A
l00109A56:
d0 = d0 - 0x01
CVZNX = cond(d0)
branch Test(EQ,Z) l00109B62
l00109A5C:
goto l00109B68
l00109A60:
a0 = a5 + -0x000011EE
d0 = a0
CVZN = cond(d0)
goto l00109B68
l00109A6A:
a0 = a5 + -0x000011E8
d0 = a0
CVZN = cond(d0)
goto l00109B68
l00109A74:
a0 = a5 + -0x000011E4
d0 = a0
CVZN = cond(d0)
goto l00109B68
l00109A7E:
a0 = a5 + -0x000011E0
d0 = a0
CVZN = cond(d0)
goto l00109B68
l00109A88:
a0 = a5 + -0x000011DC
d0 = a0
CVZN = cond(d0)
goto l00109B68
l00109A92:
a0 = a5 + -0x000011D8
d0 = a0
CVZN = cond(d0)
goto l00109B68
l00109A9C:
a0 = a5 + -0x000011D4
d0 = a0
CVZN = cond(d0)
goto l00109B68
l00109AA6:
a0 = a5 + -0x000011D0
d0 = a0
CVZN = cond(d0)
goto l00109B68
l00109AB0:
a0 = a5 + -4554
d0 = a0
CVZN = cond(d0)
goto l00109B68
l00109ABA:
a0 = a5 + -0x000011C6
d0 = a0
CVZN = cond(d0)
goto l00109B68
l00109AC4:
a0 = a5 + -0x000011C2
d0 = a0
CVZN = cond(d0)
goto l00109B68
l00109ACE:
a0 = a5 + -0x000011BE
d0 = a0
CVZN = cond(d0)
goto l00109B68
l00109AD8:
a0 = a5 + -0x000011BA
d0 = a0
CVZN = cond(d0)
goto l00109B68
l00109AE2:
a0 = a5 + -0x000011B6
d0 = a0
CVZN = cond(d0)
goto l00109B68
l00109AEA:
a0 = a5 + -0x000011B2
d0 = a0
CVZN = cond(d0)
goto l00109B68
l00109AF2:
a0 = a5 + -0x000011AE
d0 = a0
CVZN = cond(d0)
goto l00109B68
l00109AFA:
a0 = a5 + -0x000011AA
d0 = a0
CVZN = cond(d0)
goto l00109B68
l00109B02:
a0 = a5 + -0x000011A6
d0 = a0
CVZN = cond(d0)
goto l00109B68
l00109B0A:
a0 = a5 + -0x000011A2
d0 = a0
CVZN = cond(d0)
goto l00109B68
l00109B12:
a0 = a5 + -0x0000119C
d0 = a0
CVZN = cond(d0)
goto l00109B68
l00109B1A:
a0 = a5 + -0x00001198
d0 = a0
CVZN = cond(d0)
goto l00109B68
l00109B22:
a0 = a5 + -0x00001192
d0 = a0
CVZN = cond(d0)
goto l00109B68
l00109B2A:
a0 = a5 + -0x0000118C
d0 = a0
CVZN = cond(d0)
goto l00109B68
l00109B32:
a0 = a5 + -0x00001188
d0 = a0
CVZN = cond(d0)
goto l00109B68
l00109B3A:
a0 = a5 + -0x00001182
d0 = a0
CVZN = cond(d0)
goto l00109B68
l00109B42:
a0 = a5 + -0x0000117C
d0 = a0
CVZN = cond(d0)
goto l00109B68
l00109B4A:
a0 = a5 + -0x00001176
d0 = a0
CVZN = cond(d0)
goto l00109B68
l00109B52:
a0 = a5 + -4464
d0 = a0
CVZN = cond(d0)
goto l00109B68
l00109B5A:
a0 = a5 + -0x0000116A
d0 = a0
CVZN = cond(d0)
goto l00109B68
l00109B62:
a0 = a5 + -0x00001164
d0 = a0
CVZN = cond(d0)
l00109B68:
a7 = a6
a6 = Mem0[a7:word32]
a7 = a7 + 0x04
return
get_name_exit:
}

Analyzed and rewritten RTL code

void get_name(word32 dwArg04)
{
get_name_entry:
l0010999E:
branch dwArg04 == 0x00 l00109A60
l001099AA:
branch dwArg04 == 0x01 l00109A6A
l001099B0:
branch dwArg04 == 0x02 l00109A74
l001099B6:
branch dwArg04 == 0x03 l00109A7E
l001099BC:
branch dwArg04 == 0x04 l00109A88
l001099C2:
branch dwArg04 == 0x05 l00109A92
l001099C8:
branch dwArg04 == 0x06 l00109A9C
l001099CE:
branch dwArg04 == 0x07 l00109AA6
l001099D4:
branch dwArg04 == 0x08 l00109AB0
l001099DA:
branch dwArg04 == 0x09 l00109ABA
l001099E0:
branch dwArg04 == 0x0A l00109AC4
l001099E6:
branch dwArg04 == 11 l00109ACE
l001099EC:
branch dwArg04 == 0x0C l00109AD8
l001099F2:
branch dwArg04 == 0x0D l00109AE2
l001099F8:
branch dwArg04 == 0x0E l00109AEA
l001099FE:
branch dwArg04 == 0x0F l00109AF2
l00109A04:
branch dwArg04 == 0x10 l00109AFA
l00109A0A:
branch dwArg04 == 0x11 l00109B02
l00109A10:
branch dwArg04 == 0x12 l00109B0A
l00109A16:
branch dwArg04 == 0x13 l00109B12
l00109A1C:
branch dwArg04 == 0x14 l00109B1A
l00109A22:
branch dwArg04 == 0x15 l00109B22
l00109A28:
branch dwArg04 == 0x80 l00109B2A
l00109A32:
branch dwArg04 == 0x81 l00109B32
l00109A38:
branch dwArg04 == 0x82 l00109B3A
l00109A3E:
branch dwArg04 == 131 l00109B42
l00109A44:
branch dwArg04 == 0x84 l00109B4A
l00109A4A:
branch dwArg04 == 133 l00109B52
l00109A50:
branch dwArg04 == 0x86 l00109B5A
l00109A56:
branch dwArg04 == 0x87 l00109B62
l00109A5C:
goto l00109B68
l00109A60:
goto l00109B68
l00109A6A:
goto l00109B68
l00109A74:
goto l00109B68
l00109A7E:
goto l00109B68
l00109A88:
goto l00109B68
l00109A92:
goto l00109B68
l00109A9C:
goto l00109B68
l00109AA6:
goto l00109B68
l00109AB0:
goto l00109B68
l00109ABA:
goto l00109B68
l00109AC4:
goto l00109B68
l00109ACE:
goto l00109B68
l00109AD8:
goto l00109B68
l00109AE2:
goto l00109B68
l00109AEA:
goto l00109B68
l00109AF2:
goto l00109B68
l00109AFA:
goto l00109B68
l00109B02:
goto l00109B68
l00109B0A:
goto l00109B68
l00109B12:
goto l00109B68
l00109B1A:
goto l00109B68
l00109B22:
goto l00109B68
l00109B2A:
goto l00109B68
l00109B32:
goto l00109B68
l00109B3A:
goto l00109B68
l00109B42:
goto l00109B68
l00109B4A:
goto l00109B68
l00109B52:
goto l00109B68
l00109B5A:
goto l00109B68
l00109B62:
l00109B68:
return
get_name_exit:
}

@uxmal
Copy link
Owner

uxmal commented Jun 15, 2018

The issue here is that reko can't prove that d0 is live out from get_name. There is only one caller, illegal:

	call get_name (retsize: 4;)
	a7 = a7 + 4
	a7 = a7 - 4
	v18 = d0
	Mem0[a7:word32] = v18
	CVZN = cond(v18)
	v19 = Mem0[a5 + -1592:word32]
	a7 = a7 - 4
	v20 = v19
	Mem0[a7:word32] = v20
	CVZN = cond(v20)
	a7 = a7 - 0x00000004
	Mem0[a7:word32] = a5 + -396
	call fprintf (retsize: 4;)

get_name returns its value in d0, which is then pushed as a parameter to the following call to fprintf. fprintf is variadic, but rreko doesn't know this. Neither does reko know how to interpret the parameters of fprintf to discover that the pushed value of d0 is used. d0 is therefore dead-out from get_name, and that means that all those assignments to a0 in get_name are dead and are eliminated.

To get this to work, you could force d0 alive by telling reko the signature of get_name:

char * get_name(int n)

@gbody
Copy link
Contributor Author

gbody commented Sep 20, 2018

Just a follow up on the A5World handling. Would an A5 negative offset pointer, that points to a string, be turned into strings and show within fprintf statement.
ie.
 fprintf(&a5world->wFFFFFE5C + 0x0C, a5world->ptrFFFFF8A4, tLoc14);
 fprintf("String1", "String2", tLoc14);

@gbody
Copy link
Contributor Author

gbody commented Sep 20, 2018

@uxmal Just a follow up on the A5World handling. Would an A5 negative offset pointer, that points to a string, be turned into strings and show within fprintf statement.
ie.
 fprintf(&a5world->wFFFFFE5C + 0x0C, a5world->ptrFFFFF8A4, tLoc14);
 fprintf("String1", "String2", tLoc14);

@uxmal
Copy link
Owner

uxmal commented Sep 21, 2018

There are two things stopping this from happening now.

  1. Currently the MacOS platform has no definition of what signature fprintf has, so it has no hope of determining that the second parameter to fprintf is a pointer to a character string. Adding a int fprintf(FILE *, const char * format,...); signature to MacOS would fix this.

  2. Currently, the a5world identifier is treated as a variable, so that references to global memory via the A5 register stand out clearly in decompiled code. Because it is a variable it has an indeterminate value, so Reko can't a5world=>ptrFFFF8A4 as a constant pointer. This prevents Reko from chasing the pointer to find the string.

There are a couple of alternatives to fix (2). One is to aggressively replace all a5 = a5World statements in all procedures with a5 = 0x<init> where <init> is the address that the ResourceFork.AddResourcesToImageMap allocates for the A5 world segment. This would result in expressions like a5->ptr<constant-offset> to be flattened to <constant-addr>, and these can definitely be chased to find a string constant. However, you lose the a5world constants which may or may not be desired.

The other approach is to somehow mark the a5world variable as "special" and having a known address. Reko would be able to evaluate such constant expressions as a5world + C without flattening them. Implementing such a solution would be more time consuming, but would preserve the a5world parameter.

How important is it to preserve the a5world identifier?

@gbody
Copy link
Contributor Author

gbody commented Sep 22, 2018

For the application I have been poking about, I have tried setting signatures for what I know/think are library procedures. This is to see how Reko handles the later stages of the decompiling.

I was thinking the only real use for the A5 constant offsets would be for a different view of the A5World. This would require collecting all the unique A5 constant offsets and sorting the list. Then using the list, display the offset constant as the address/label and the data being displayed until the next constant offset, then starting a new address/label.

eg. A5World displayed as offsets.
First column the constant offset, followed by number of bytes until the next constant offset
-3210 00 00 00 00
-320C 41 42 43 44 45 46 47 48
-3204 00 00 00 00
..
00 00 00 00
-3100 AA BB CC DD
..
-0004 00 00 00 00
0000
..
0020 - start of jump table

It might be good to have the A5World identifier available, at least while developing/debugging

@uxmal
Copy link
Owner

uxmal commented Sep 23, 2018

Let me mull on this a little to figure out what is going to work best. It shoudn't be too hard to make a special renderer just for a5world.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants