Skip to content

rasberry/TuckBytesInCode

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TuckBytesInCode

Experimenting with ways to stuff binary data into source code - Ended up creating a general purpose text-encoded base conversion program

math

There seems be some unerlying math for any base converison formula.

The next table shows that bytes in and characters out can be used to calculate the base using the formula: ceiling(2 ^ (8 * bytes in / chars out)) Ceiling is used since we need a whole number of characters in a base

base name bytes-in chars-out calculated base
base 64 3 4 64
base 85 4 5 85
base 91 13 16 91
base 14938 26 15 14938

This next table shows adjusting the chars-out to fit a multiple of 8. i'm choosing 8 so that things fit nicely within bytes

base co-LCM mult bi * mult 2 ^ bi * mult log2(base)
64 8 2 6 64 6
85 40 8 32 4294967296 6.40939093
91 16 8 13 8192 6.50779464
14938 120 1 208 4.113761E+062 13.8666993
  • co-LCM: least common multiple between chars-out and 8
  • mult: co-LCM / co - what we would need to multiply bytes-in to preserve the ratio
  • bi * mult: bytes-in * multiplier
  • 2 ^ bi * mult: number of characters needed to represent the adjusted bytes-in
  • log2(base): number of bits needed to fit the number base

What's notable about this table is that base 64 fits exacly - so we could read 6 bits and directly map to the output character map - no base conversion needed.

ratios

What if we wanted to determine the bytes-in to chars-out ratio for any base ? We can use the already esablished formula base = 2 ^ (8 * bytes in / chars out) replacing the bytes in / chars out with just r we get base = 2 ^ (8 * r) solving for r we get r = log(base) / (8 * log(2))

Here's the ratios for the bases being used so far compared to the bytes-in / chars-out ratio

base log(base)/8*log2 ratio decimal base no ceiling
64 0.75 3 / 4 0.75 64
85 0.801173867017213 4 / 5 0.8 84.4485062894653
91 0.813474330024837 13 / 16 0.8125 90.5096679918781
14938 1.733337422860250 26 / 15 1.73333333333333 14937.6612525378

You can see that the fractions that were picked produce decimals that are just slightly below the target (r) ratio. Also notice that when we don't ceiling the ratio to base calculation, that the base number is close to the whole number base.

The difficulty with finding a bytes-in / chars-out fraction starting from just the ratio is that it's somewhat difficult to find a fraction that fits - we want a ratio that as close as possible without going over but that still uses small-ish numbers.

There are several algorithms for finding fractions that fit our decimal. I've included some links with more information. One method I've been using with excel is a brute force search which is described below.

links related to converting decimal to fraction

brute force fraction using a spread sheet

These are the formulas use in the spread sheet. formulas #3-6 are meant to be copied to each row in the spread sheet. Most spread sheets support coping entries by highlighting the cells on row 4 and 'draggin down' using the small box in the bottom-right corner.

# name formula
#1 ratio from base =LOG(A2)/(8*LOG(2))
#2 min matcher =MATCH(MIN(D4:D33),D4:D33,0)+3
#3 calc numerator =FLOOR(B4*$B$2)
#4 ratio =A4/B4
#5 difference =ROUND(1000*ABS(C4-$B$2),2)
#6 closest base =CEILING(POWER(2,8*A4/B4))

here's an example of the spread sheet using base 91 (in A2). The best row turns out to be 19 with the fraction 13 / 16 which matches the optimum values from before.

A B C D E
1 base base ratio #1 minimum #2
2 91 0.81347433002 19
3 numerator #3 denominator ratio #4 difference #5 closest base #6
4 0 1 0 813.47 1
5 1 2 0.5 313.47 16
6 2 3 0.66666666666 146.81 41
7 3 4 0.75 63.47 64
8 4 5 0.8 13.47 85
9 4 6 0.66666666666 146.81 41
10 5 7 0.71428571428 99.19 53
11 6 8 0.75 63.47 64
12 7 9 0.77777777777 35.7 75
13 8 10 0.8 13.47 85
14 8 11 0.72727272727 86.2 57
15 9 12 0.75 63.47 64
16 10 13 0.76923076923 44.24 72
17 11 14 0.78571428571 27.76 79
18 12 15 0.8 13.47 85
19 13 16 0.8125 0.97 91
20 13 17 0.76470588235 48.77 70
21 14 18 0.77777777777 35.7 75
22 15 19 0.78947368421 24 80
23 16 20 0.8 13.47 85

TuckBytes.ChangeBase

In my first attempt at base conversion I started to write encode and decode functions, but I found that most of the code was the similar. On second try, I was able to combine both into a single function. TuckBytes.ChangeBase is a binary-to-text style converter and is not an intended for general-purpose numeric base conversion. Currently it really only works well if one of the sides is base256 (binary). It's possible that there might be a way to go directly say from base64 to base85, but I have not determined if that's possible without first going through base256.

TODO

  • base91 doesn't seem to encode the same way as base64 or base85. Need to investigate further.
  • add support for variants (see RFC-4648)
  • take a look at https://en.wikipedia.org/wiki/Binary-to-text_encoding
  • take a look at https://www.edn.com/design/systems-design/4460458/2/Convert-binary-number-to-any-base
  • maybe implement normal number base conversions
    • for integers only ? might be difficult to parse fractional non-base10 inputs
  • maybe add a way to input a custom base
    • provide an alphabet (ascii or utf8)
    • determine in/out based on calculation
      • denominator (loop from 1 to 40 (or more))
      • numerator = FLOOR(den * base)
      • calculate closest base =CEILING(POWER(2,8 * num / den))
      • search until closest base matches base - num/dev is the in/out

Releases

No releases published

Packages

No packages published

Languages