Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed a bug in the Lua 5.1 bytecode pattern #298

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

Guest257351
Copy link

Lua 5.1 bytecode upvalue fields were incorrectly defined as a Vector<u32> instead of Vector<LuaString>.

Lua 5.1 bytecode upvalue fields were incorrectly defined as a `Vector<u32>` instead of `Vector<LuaString>`.
@paxcut
Copy link
Contributor

paxcut commented Aug 27, 2024

I know very little about Lua byte code, but having separate patterns for each version of Lua makes dealing with problems like this a pain. Instead of checking every other Lua to see if the error exists there I unified all the Lua versions into one. The resulting pattern has this definition for the upValues

struct UpValues {
    if (minorVersion == 1)
        Vector<u32> upvalues;
    else
        Vector<LuaString> upvalues;
}[[inline]];

Not only shows a potential error but also suggests what the fix would be and all of this without having to visit each and every version there is. I am attaching the unified version (change the extension that github cant deal with to .hexpat) I wrote which I tested successfully with all the tests for lua versions ImHex has, but probably needs more rigorous tests given that this error was not found.
lua.txt

@Guest257351
Copy link
Author

I know very little about Lua byte code, but having separate patterns for each version of Lua makes dealing with problems like this a pain. Instead of checking every other Lua to see if the error exists there I unified all the Lua versions into one. The resulting pattern has this definition for the upValues

struct UpValues {
    if (minorVersion == 1)
        Vector<u32> upvalues;
    else
        Vector<LuaString> upvalues;
}[[inline]];

Not only shows a potential error but also suggests what the fix would be and all of this without having to visit each and every version there is. I am attaching the unified version (change the extension that github cant deal with to .hexpat) I wrote which I tested successfully with all the tests for lua versions ImHex has, but probably needs more rigorous tests given that this error was not found. lua.txt

As far as I'm aware lua does not use u32s or integers of any kind for it's upvalue vectors. I don't know of any lua version that does. The upvalue vector is only there for debug information.
As for having multiple version being a pain, I might look into how lua creates bytecode for versions 5.1 onward to see if they can easily be put into a single pattern.

Lua 5.1 bytecode string sizes were incorrectly parsed as a 64-bit integer, instead of a 32-bit integer.
@Guest257351
Copy link
Author

Guest257351 commented Aug 29, 2024

Just added a commit for a separate similar bug. I forgot to include this in the initial pull request.
It's also worth noting that the number of bytes in the string size is not static. By default lua uses 32-bit integers and just about every implementation of it also does since there's no good reason to change it. The number of bytes is controlled by size_of_size_t in LuaBinaryHeader so it might be worth changing the pattern to support this value being anything other than 4.

@paxcut
Copy link
Contributor

paxcut commented Aug 29, 2024

As for having multiple version being a pain, I might look into how lua creates bytecode for versions 5.1 onward to see if they can easily be put into a single pattern.

I put a link to the unified version of the lua pattern. Did you look at it? Ill post it here in case you cant get the link

#pragma description Lua 5.1, 5.2, 5.3 and 5.4 bytecode

import std.io;
import std.mem;
import type.base;

u8 minorVersion = std::mem::read_unsigned(4,1)&15;

namespace impl {
    fn transform_Size_array(auto array) {
        u128 res = 0;
        for(u8 i = 0, (array[i] & 0x80 == 0) && (i < 9), i+=1) {
            res <<= 7;
            res |= array[i] & 0x7f;
        }
        res <<= 7;
        res |= array[sizeof(array)-1] & 0x7f;
        return res;
    };
    
    fn format_Size(auto leb128) {
        if (minorVersion == 4) {
            u128 res = impl::transform_Size_array(leb128.array);
            return std::format("{} ({:#x})", res, res);
        } else
            return std::format("{}",leb128.size);
    };

    fn format_StringSize(auto leb128) {
        if (minorVersion == 4) {
            u128 res = impl::transform_Size_array(leb128.size.array);
            return std::format("{} ({:#x})", res, res);
        } else
            return std::format("{}",leb128.size);
    };
    fn transform_Size(auto leb128) {
        if (minorVersion == 4) 
            return impl::transform_Size_array(leb128.array);
         else
            return leb128.size;
    };

    fn transform_StringSize(auto leb128) {
         if (minorVersion == 4) 
            return impl::transform_Size_array(leb128.size.array);
         else
            return leb128.size;
    };

    fn format_LuaString(auto string) {
       if (string.size == 0) {
           return "None";
        }
        return std::format("\"{}\"", string.data);
    };
    
    fn format_Constant(auto constant) {
       return constant.data;
    };
    
    fn format_Version(auto ver) {
       return std::format("Ver. {}.{}", ver.major, ver.minor);
    };
}

using LuaFunction;

struct Size {
    if (minorVersion == 4)
        u8 array[while(addressof(this) == $ || ((addressof(this)-$ < 9) && (std::mem::read_unsigned($-1, 1) & 0x80 == 0)))] [[hidden]];
    else
        u32 size[[hidden]];
} [[sealed, format("impl::format_Size"), transform("impl::transform_Size")]];
    
    
struct LuaStringSize {
   if (minorVersion == 4)
        Size size;
    else if (minorVersion == 3)
        u8 size;
    else
        u64 size;
}[[sealed, format("impl::format_StringSize"), transform("impl::transform_StringSize")]];



bitfield Version {
    minor : 4;
    major : 4;
} [[format("impl::format_Version")]];

struct LuaBinaryHeader {
    char magic[4];
    Version version_number;
    u8 format_version;
    if (minorVersion == 3 || minorVersion == 4)
        char LUAC_DATA[6];
    else
        u8 endianness;

    u8 size_of_int;
    u8 size_of_size_t;
    match (minorVersion) {
        (1 | 2) : {
            u8 size_of_Instruction;
            u8 lua_Number;
            u8 is_lua_Number_integral;
            if (minorVersion==2)
                char LUAC_TAIL[6];
        }
        (3 | 4) : { 
            if (minorVersion == 3) {
                u8 size_of_Instruction;
                u8 size_of_lua_Integer;
            }
            u8 sizeof_lua_Number;
            type::Hex<u64> LUAC_INT;
            double LUAC_NUM;
        }
    }
};

struct LuaString {
    LuaStringSize size;
    bool correction = (minorVersion == 3 || minorVersion == 4);
    if (size > 0) 
        char data[size-correction];
}[[format("impl::format_LuaString")]];

struct SourceOnDebug {
    if (minorVersion == 2)
        LuaString source;
}[[inline,format("format_source_on")]];

struct SourceOnFunction {
    if (minorVersion != 2)
        LuaString source;
}[[inline,format("format_source_on")]];

fn format_source_on(auto v) {
    return std::format("{}",v.source);
};

enum LUA : u8 {
    TNIL = 0,
    TBOOLEAN,
    TNUMFLT = minorVersion==4 ? 3 : 0x13,
    TSHRSTR = 4,
    TNUMINT = minorVersion==4 ? 0X13 : 3,
    TLNGSTR = 0x14
};


struct LuaConstant {
    LUA type;
    match (type) {
        //(LUA::TNIL)             : NULL
        (LUA::TBOOLEAN) : u8 data;
        (LUA::TNUMFLT)  : double data;
        (LUA::TNUMINT)  : u64 data;
        (LUA::TSHRSTR | LUA::TLNGSTR)   : LuaString data;
    }
}[[format("impl::format_Constant")]];

struct LuaUpvalue {
    u8 instack;
    u8 idx;
    if (minorVersion == 4)
        u8 kind;
};

struct Vector<T> {
    Size size;
    if (size > 0)
        T values[size];
};

struct AbsLineInfo {
    Size pc;
    Size line;
};

struct LocalVar {
    LuaString varname;
    Size startpc;
    Size endpc;
};

struct LineInfo {
    if (minorVersion == 4) {
        Vector<s8> lineInfo;
        Vector<AbsLineInfo> abslineinfo;
    } else
        Vector<u32> lineInfo;
}[[inline]];

struct UpValues {
    if (minorVersion == 1)
        Vector<u32> upvalues;
    else
        Vector<LuaString> upvalues;
}[[inline]];

struct LuaDebugInfo {
    SourceOnDebug source;
    LineInfo lineInfo;
    Vector<LocalVar> local_vars;
    UpValues upvalues;
};

struct LuaCUP{ //Constants,Upvalues and Protos
    Vector<LuaConstant> constants;
    if (minorVersion == 1 || minorVersion==2) {
        u32 sizep;  // size of proto
        if (sizep > 0) {
            LuaFunction protos[sizep];
        }
    }
    if (minorVersion != 1)
        Vector<LuaUpvalue> upvalues;
    if (minorVersion == 3 || minorVersion == 4)
        Vector<LuaFunction> protos;
}[[inline]];

struct LuaFunction {
    SourceOnFunction source;
    Size line_defined;
    Size last_line_defined;
    if (minorVersion == 1)
        u8 nups;  // number of upvalues
    u8 number_of_parameters;
    u8 is_vararg;
    u8 maxstacksize;
    Vector<u32> code;
    LuaCUP cup; 
    LuaDebugInfo debugInfo;
};

struct LuaFile {
    LuaBinaryHeader header;
    if (minorVersion == 3 || minorVersion == 4)
        u8 sizeupvalues;
    LuaFunction func;
};

LuaFile file @ 0;

@paxcut
Copy link
Contributor

paxcut commented Aug 29, 2024

I just checked the sample Lua 5.1 sample that is used in unit tests and the string is preceded by a 64 bit integer holding its size. The change that you propose turns a successful test into a failing one of reading past the end of the file. Make sure to test the changes on actual files and if you think the ones we use are invalid provided valid ones.

image

@Guest257351
Copy link
Author

I just checked the sample Lua 5.1 sample that is used in unit tests and the string is preceded by a 64 bit integer holding its size. The change that you propose turns a successful test into a failing one of reading past the end of the file. Make sure to test the changes on actual files and if you think the ones we use are invalid provided valid ones.

image

I did test the patterns before making these changes. After looking into how lua defines size_t, the test samples were created from a lua 64 bit compiler, where as I was using a 32 bit compiler. The unified pattern you sent does seem to account for size_t's variable size, so I guess this was an issue specific to the lua 5.1 pattern. I'll update my pull request to account for size_t's variable size.

@paxcut
Copy link
Contributor

paxcut commented Aug 29, 2024

The unified pattern you sent does seem to account for size_t's variable size, so I guess this was an issue specific to the lua 5.1 pattern. I'll update my pull request to account for size_t's variable size.

I don't follow. The unified pattern is exactly equivalent to each of the separate versions, or at least , it aims to be. If one of the versions has a problem, then the unified version will have the exact same problem. For example, for string size type it defines:

struct LuaStringSize {
   if (minorVersion == 4)
        Size size;
    else if (minorVersion == 3)
        u8 size;
    else
        u64 size;
}[[sealed, format("impl::format_StringSize"), transform("impl::transform_StringSize")]];

So both versions 1 and 2 use u64 like the 5.1 and 5.2 versions do. Im looking into getting better samples so we can test things better.

@Guest257351
Copy link
Author

Guest257351 commented Aug 29, 2024

The unified pattern you sent does seem to account for size_t's variable size, so I guess this was an issue specific to the lua 5.1 pattern. I'll update my pull request to account for size_t's variable size.

I don't follow. The unified pattern is exactly equivalent to each of the separate versions, or at least , it aims to be. If one of the versions has a problem, then the unified version will have the exact same problem. For example, for string size type it defines:

struct LuaStringSize {
   if (minorVersion == 4)
        Size size;
    else if (minorVersion == 3)
        u8 size;
    else
        u64 size;
}[[sealed, format("impl::format_StringSize"), transform("impl::transform_StringSize")]];

So both versions 1 and 2 use u64 like the 5.1 and 5.2 versions do. Im looking into getting better samples so we can test things better.

Sorry I've made a mistake again.😅The unified version does not account for 32-bit vs 64-bit compilers, and assumes 64-bit. I probably should have read through it more thoroughly.

@Guest257351
Copy link
Author

Guest257351 commented Aug 29, 2024

Also I'm not sure how I'd easily make either the unified version or the 5.1 version account for this. Ideally LuaString should use a u32 if LuaBinaryHeader.size_of_size_t is 4, and a u64 if it's 8. However I'm not experienced enough with the pattern language to implement this.

Updated the pattern to allow for 32-bit and 64-bit integers for the size of the string length field.
@paxcut
Copy link
Contributor

paxcut commented Aug 29, 2024

something like this should work:
create a global

u8 global_size_of_size_t=4;

we assign a default just in case. Then inside header, right after size_of_size_t is created

global_size_of_size_t = size_of_size_t;

finally on StringSize

struct LuaStringSize {
   if (minorVersion == 4)
        Size size;
    else if (minorVersion == 3)
        u8 size;
    else {
        if (global_size_of_size_t == 4 )
            u32 size;
        else
            u64 size;
    }
}[[sealed, format("impl::format_StringSize"), transform("impl::transform_StringSize")]];

I am showing the code in unified pattern but it should be the same for 5.1. What about the other versions?

@Guest257351
Copy link
Author

something like this should work: create a global

u8 global_size_of_size_t=4;

we assign a default just in case. Then inside header, right after size_of_size_t is created

global_size_of_size_t = size_of_size_t;

finally on StringSize

struct LuaStringSize {
   if (minorVersion == 4)
        Size size;
    else if (minorVersion == 3)
        u8 size;
    else {
        if (global_size_of_size_t == 4 )
            u32 size;
        else
            u64 size;
    }
}[[sealed, format("impl::format_StringSize"), transform("impl::transform_StringSize")]];

I am showing the code in unified pattern but it should be the same for 5.1. What about the other versions?

That could work, I've managed to implement it differently. You can review that and see which works better.

@paxcut
Copy link
Contributor

paxcut commented Aug 29, 2024

Why are you only fixing lua 5.1? Are the other Luas also not possibly erroneous? Considering that we have to deal with 4 versions, there should be an attempt to keep them similar to each other to help debug issues that deal with more than just one version, so breaking structures that are kept together in the other versions doesnt seem like a good idea.

The main issue is why don't you think it is a good idea to get rid of the version files and use one instead? That avoids code duplication, reduces code management, makes it more clear what is the same and what isn't across versions and helps showcasing the pattern language ability to deal with special cases among other things. What are the downsides?

@paxcut
Copy link
Contributor

paxcut commented Aug 29, 2024

A better solution than both

u8 stringSize  =  std::mem::read_unsigned((minorVersion>2) ? 13 : 8,1);

eliminates the need to copy from the pattern to a global or the need to break the main placed variable into pieces.

@Guest257351
Copy link
Author

Guest257351 commented Aug 29, 2024

Why are you only fixing lua 5.1? Are the other Luas also not possibly erroneous? Considering that we have to deal with 4 versions, there should be an attempt to keep them similar to each other to help debug issues that deal with more than just one version, so breaking structures that are kept together in the other versions doesnt seem like a good idea.

The main issue is why don't you think it is a good idea to get rid of the version files and use one instead? That avoids code duplication, reduces code management, makes it more clear what is the same and what isn't across versions and helps showcasing the pattern language ability to deal with special cases among other things. What are the downsides?

We should combine the versions, I didn't do that because it's not what the original issue was trying to fix. But yeah absolutely we should. I also have lots of experience with lua 5.1 bytecode, but haven't looked into the other versions much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants