Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement functions for working with binary keyvalues #3

Closed
3 tasks done
rossengeorgiev opened this issue Nov 23, 2015 · 8 comments
Closed
3 tasks done

Implement functions for working with binary keyvalues #3

rossengeorgiev opened this issue Nov 23, 2015 · 8 comments

Comments

@rossengeorgiev
Copy link
Contributor

Those are used in richpresence and other places, would be useful for the steam module eventually.

More info:
SteamRE/SteamKit#108

  • implement binary dump/load functions
  • verify against real data from steam
  • document usage and type hinting in README
@SleepProgger
Copy link
Contributor

Also used in the PICSProductInfoResponse for the package data.
I'll port the version pysteamkit uses and submit a pull request in a bit.

@SleepProgger
Copy link
Contributor

I finished the first version of the parse_binary function.
826a5d2
It works so far for every package data i tested it with.
Some test data (base64 encoded) at http:https://pastebin.com/mf16sHQh .
Let me know what you think.

Some notes:

  • node-steam.js uses an int32 for the Color and Pointer types. I guess we should do the same. https://github.com/seishun/node-steam/blob/master/lib/VDF.js At the moment i raise an exception on those types.
  • There seem to be an Widestring type but no implementation i saw used it, so i guess we can assume it isn't used widely in steam.
  • pysteamkit (and it seems node-steam.js, too) is using the native byte order. I am not sure if that is correct and steam knows about the byte order of the client(?). I used explicit little endian, that looks correct, but we should check if it is.
  • _read_till_seperator is a bit inefficient and we could improve the performance by caching all separator indices we find when reading a chunk, but i am not sure the increased complexity is really worth the performance benefit.

Additionally, i am not sure if creating the load_binary and loads_binary functions or just supplying an optional binary parameter to load/s would be better..

Edit:
Added experimental version of dump_binary.
b1f96ab
It needs the scalars in the form (data_type, value).
IMHO, that is a bit strange, but we NEED to know as which type to save the data (esp numeric values).
Additionally there is en/decoding missing, plus the same types as in the parse_binary function are missing (BIN_POINTER, BIN_WIDESTRING, BIN_COLOR).

@rossengeorgiev
Copy link
Contributor Author

Let's just deal with bytes strings. Then we can use indexing and slicing to parse it. I suggest find method for slicing out bin_string.

>>> x = b'bbb\x00aaaa\x00'
>>> x[0: x.find(b'\x00', 0)]
b'bbb'
>>> x[4: x.find(b'\x00', 4)]
b'aaaa'

If we want to make load/dump symmetric then the only way is to use objects for point, widestring, and color.

@rossengeorgiev
Copy link
Contributor Author

I think that covers everything. I haven't tested it on any real data yet.

@SleepProgger
Copy link
Contributor

I am almost finished with my version, too. IMHO it is a bit cleaner.
https://github.com/SleepProgger/vdf/blob/parse_binary_impl/vdf/vdf_binary.py

Changes:

  • I am using some type wrapper now to contain the types (int, uint, ...). This allows us to reserialize data pretty nicely.
  • There was a bug in _dump_gen_binary that is now fixed.
  • I changed Pointer and Color to an UINT32, because that is the most reasonable format for them i guess.
  • Everything works now with python 2.7 and 3 (at least the two versions i tested)
    • I moved all the stuff to another file, to not clutter the vdf namespace

So far i tested it with all the packages i could get from steam, and deserializeing them and serializing them again returned the correct results.
The problem is, so far all the data i tested was of the types String and int32. I am looking for other testdata (especially with widestrings in it).

If you want i could merge your changes with mine ?

@rossengeorgiev
Copy link
Contributor Author

Did you see my implementation? It's feature complete and fully tested.

Here is quick perf comparison.

In [1]: import vdf, vdf_binary
In [2]: raw = vdf.binary_dumps(dict(map(lambda x: (str(x), x),
range(5000))))
In [3]: %timeit vdf.binary_loads(raw)
100 loops, best of 3: 15.1 ms per loop
In [4]: %timeit vdf_binary.parse(raw)
10 loops, best of 3: 54.1 ms per loop
In [5]: raw = vdf.binary_dumps(dict(map(lambda x: (str(x), str(x)*100),
range(5000))))
In [6]: %timeit vdf.binary_loads(raw)
100 loops, best of 3: 15 ms per loop
In [7]: %timeit vdf_binary.parse(raw)
10 loops, best of 3: 103 ms per loop

@SleepProgger
Copy link
Contributor

Well, your parsing version doesn't support streams (and therefore is ofc faster with memory data ;)), but i can live with that.
After rereading your code, they are pretty similar in most cases.

Some stuff i noticed:

  • You declared the struct instances, but aren't using them in _binary_dump_gen (~ int32.pack(value))
  • I guess POINTER and COLOR should be unsigned ints (at least for pointer that sounds pretty right)
    • IMHO it would be wise to conserve if the value was a WIDESTRING, too just in case.

@SleepProgger
Copy link
Contributor

Good news everybody,
I tested your implementation against all current existing productInfo i could get (about 3000 objects), and everything went well, and all data is correctly parsed and dumped ( except the minor issue mentioned in #5 ).
All the test data can be found here. I will update that repo with other files (from richpresence and other sources which may exist)

Every single binary vdf blob i tested only contained int32 and string types btw.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants