Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Differentiate the current from_str_radix into signed and unsigned versions #25

Open
AaronKutch opened this issue May 22, 2018 · 9 comments

Comments

@AaronKutch
Copy link
Contributor

AaronKutch commented May 22, 2018

I was looking at the from_str_radix implementation and several things look wrong to me.

-No option for signed ApInts

-The input should be a &str not a String for the from_str_radix type.

-How is the width of the ApInt determined? There should be an input for the width of the ApInt produced. Maybe the functions should have an extra width in the name (and order the words in the order of the inputs). An example function signature could be:

pub fn unsigned_from_str_radix_width(s: &str, radix: Into<Radix>, width: Into<BitWidth>) -> Result<ApInt>

I am making a rough draft for it now.

As an aside, almost everything in this code snippet has something that I did not know Rust could do, I might use this technique for something.

    let d = match b {
        b'0'..=b'9' => b - b'0',
        b'a'..=b'z' => b - b'a' + 10,
        b'A'..=b'Z' => b - b'A' + 10,
        b'_' => continue,
        _ => ::std::u8::MAX
    };
@Robbepop
Copy link
Owner

The implementation of from_str_radix was never finished and is highly unstable at the moment. I needed division for long apint instances before I was able to implement and test this code properly. So the interface might be sub-optimal but I have more or less taken it over from num::bitint.

@AaronKutch
Copy link
Contributor Author

So if I finish the division implementation, do you have all the necessary stuff to fix the function?

@Robbepop
Copy link
Owner

Robbepop commented May 22, 2018

With the division operation we could easily look at num::biguint how they implemented it and should have all tools available to us that we need to implement it for apint.

@AaronKutch
Copy link
Contributor Author

AaronKutch commented Aug 15, 2019

I have been using this crate to serialize strings into a fixed point format for a fractional integers crate I have not made public yet. I have been thinking about this for several days now, and I have arrived at two functions (with signed and unsigned variants) that could be called by all our other functions, and be used directly by end users for more complex and custom serialization. All of our other string serialization functions are only concerned with the integer part, so the truncated_repr field can be used directly without any rounding complexities.
The new unsigned_from_str_radix, for example, could use
ApInt::signed_from_floating_point_str(bw, 0, false, str, &"", radix, 0).truncated_repr

Here is the rough draft documentation (note that this is not ready for compilation yet, and apologies for the lack of screen wrapping):

    /// Construct an `ApInt` from a bitwidth, fixed point position, sign, integer part, fractional part, radix, and exponent.
    /// 
    /// Purely integral fixed point representation (`fixed_point == 0`) and purely fractional
    /// representation (`fixed_point == bitwidth`) is supported. Any `_` characters in the `&str`
    /// inputs are ignored.
    /// 
    /// # Example
    /// 
    /// The base 10 string "-6.28318530718*10^3" can be converted to an `ApInt` fixed point
    /// representation with 3 bits in the integer part and 39 bits in the fractional part by using
    /// ```
    /// let bw = NonZeroUsize::new(3 + 39).unwrap();
    /// let mut x = ApInt::signed_from_floating_point_str(bw, 39, true, &"6", &"28318530718", 10, 3).unwrap().truncated_result;
    /// // If we use the plain integer display functions, the result will be unrecognizable due to
    /// // the fixed point representation, but we can see the "-628" of the integer part by shifting
    /// // out the fractional bits:
    /// assert_eq!(
    ///     x.clone().into_wrapping_ashr(39).unwrap().to_string(),
    ///     "-628"
    /// );
    /// // We can serialize using the functions which handle fixed point:
    /// assert_eq!(
    ///     x.truncated_result.signed_to_floating_point_string(39, 11, 11, 10, 3),
    ///     SerializationReslults {true,"6","2831853072",TODO}
    /// );
    /// // We can serialize using a different radix or exponent
    /// assert_eq!(
    ///     x.truncated_result.signed_to_floating_point_string(39, 11, 11, 10, 0),
    ///     SerializationReslults {true,"628","3185307180",TODO}
    /// );
    /// ```
    ///
    /// # Rounding
    /// 
    /// To simplify the cases this function has to handle and to reduce the information needed to
    /// return exact values for rounding purposes, this function will return an error if the input
    /// overflows the fixed point representation specified.
    /// 
    /// For radixes that are not a power of 2, it is possible for the fractional part to be
    /// impossible to be represented exactly, regardless of how many bits are allocated for the
    /// fractional part. For example, 0.1 in base 10 will result in the bits of the fixed point
    /// representation repeating forever as a sequence of "1100":
    /// ```
    /// let bw = NonZeroUsize::new(32).unwrap();
    /// // deserialize 0.1 and store it in a fixed point representation that is purely fractional
    /// let s = ApInt::from_signed_floating_point_str(bw, 32, &"", &"1", 10, 0).unwrap();
    /// assert_eq!(format!("{:b}", s), &"1100110011001100");
    /// ```
    ///
    /// One option is to round the `ApInt` to the closest approximation to the string possible, but
    /// there are multiple decisions that should be made on the user side, and information they
    /// might want:
    /// What rounding should be applied when the string is equally far from two `ApInt` values?
    /// How are minimum and maximum values handled?
    /// What is the exact error of this function?
    /// 
    /// Instead, what this function does is return the truncated fixed point representation
    /// with no rounding applied, and an unsimplified rational number of two `ApInt`s together
    /// representing the remainder of this function (in units of the ULP for the given fixed point).
    /// If the rational number simplifies to 0, for example, then the fixed point representation
    /// exactly matches the inputs. If the rational number simplifies to 1/2, then the inputs
    /// specify a value that is exactly between two fixed point representations. The rational
    /// number is not equal to or greater than 1, because a value of 1 corresponds to an increment
    /// of the unit in the last place. The bitwidths of the `remainder_numerator` and
    /// `remainder_denominator` are equal, but they may not be equal to the bitwidth of the
    /// `truncated_repr`, due to no limits being placed on the length of the input `str`s. The
    /// bitwidths of `remainder_numerator` and `remainder_denominator` are at least one more than
    /// neccessary, for ease of use in custom rounding functions.
    ///
    /// This example has a function which rounds to the closest fixed point representation, rounds
    /// away from zero when the remainder is 1/2 of an ULP, and does not round if rounding causes
    /// overflowing.
    /// ```
    /// fn f(
    ///     bitwidth: NonZeroUsize,
    ///     fixed_point: usize,
    ///     sign: bool,
    ///     integer_part: &str,
    ///     fractional_part: &str,
    ///     radix: u32,
    ///     exponent: isize,
    /// ) -> Result<ApInt> {
    ///     let r = ApInt::signed_from_floating_point_str(bitwidth,fixed_point,sign,integer_part,fractional_part,radix,exponent)?;
    ///     if r.truncated_repr.is_max_signed_value() || r.truncated_repr.is_min_signed_value() {
    ///         // prevent overflow corner cases
    ///         return Ok(r.truncated_repr)
    ///     }
    ///     // Finding the halfway point between two ULPs.
    ///     // Note: if the rounding is done differently, then the least significant bit discarded
    ///     // here may need to be accounted for, to handle odd radixes correctly.
    ///     r.remainder_denominator.wrapping_lshr_assign(1).unwrap();
    ///     if r.remainder_numerator.checked_uge(&r.remainder_denominator).unwrap() {
    ///         if sign {
    ///             r.truncated_repr.wrapping_neg();
    ///         } else {
    ///             r.truncated_repr.wrapping_inc();
    ///         }
    ///     }
    ///     return Ok(r.truncated_repr)
    /// }
    ///
    /// TODO show the rounding example
    /// ```
    ///
    /// # Errors
    /// An error is returned if:
    ///  - `fixed_point` is not in the range `0..=bitwidth`
    ///  - both `integer_part` and `fractional_part` are empty strings
    ///  - `radix` is not in the range `2..=36`
    ///  - `integer_part` or `fractional_part` contains characters not supported by
    ///    `std::char::to_digit` in the given `radix`, except for `_` characters which are ignored
    ///  - Overflow of the fixed point representation occurs
    pub signed_from_floating_point_str(
        bitwidth: NonZeroUsize,
        fixed_point: usize,
        sign: bool,
        integer_part: &str,
        fractional_part: &str,
        radix: u32,
        exponent: isize,
    ) -> Result<DeserializationResults> {
}
...
pub struct SerializationResults {
    pub sign: bool,
    pub integer_part: String,
    pub fractional_part: String,
    pub remainder_numerator: ApInt,
    pub remainder_denominator: ApInt,
}

    pub signed_to_floating_point_str(
        fixed_point: usize,
        integer_part_len: usize,
        fractional_part_len: usize,
        radix: u32,
        exponent: isize,
    ) -> Result<SerializationResults> {

@AaronKutch
Copy link
Contributor Author

I will use Into<Radix> for the radix, instead of u32.

@Robbepop
Copy link
Owner

The whole from_str serialization parts are super old and have never even been finished. Ideally I shouldn't have pushed them to master but it happened. So please feel free to completely revamp them as needed.

@AaronKutch
Copy link
Contributor Author

I think my new goal is to fix this issue with robust string serialization and deserialization functions before university starts again. I have a fixed point way of removing the libm dependency right now:

/// Binary logarithms of the numbers 2..=36 in I3F13 fixed point format and rounded up.
/// This is used for robustly calculating the maximum number of bits needed for a string
/// representation of a number in some radix to be represented. This may give more bits
/// than needed, but is guaranteed to never underestimate the number of bits needed.
/// For example,
/// When the radix 10 string "123456789" is going to be converted to an `ApInt`,
/// `LB_2_36_I3F13` is indexed by 10 - 2 which gives 27214. 27214 multiplied by the length
/// of the string plus 1 is 27214 * (9 + 1) = 272140. This is shifted right by 13 for the
/// fixed point which produces 33. The actual number of bits needed is 27. The plus 1
/// added to the string length is needed to handle strings with all the digits being the
/// maximum for the given radix (e.g. "999999999" which needs 30 bits).
const LB_2_36_I3F13: [u16; 35] = [
    8192, 12985, 16384, 19022, 21177, 22998, 24576, 25969, 27214, 28340, 29369, 30315,
    31190, 32006, 32768, 33485, 34161, 34800, 35406, 35982, 36532, 37058, 37561, 38043,
    38507, 38953, 39382, 39797, 40198, 40585, 40960, 41324, 41677, 42020, 42353,
];

This is completely robust if I use checked arithmetic (so that string lengths close to usize::MAX cause panics instead of causing the expression to overflow). I think it is definitely superior to using floating point for getting a lower bound on the number of bits needed for a deserialization.

I have been updating my fixed point deserialization function which will serve as the base deserialization function for all the others. The signed and unsigned from_str_radix functions will call this function and use banker's rounding based off of the returned error.

@Robbepop
Copy link
Owner

I think my new goal is to fix this issue with robust string serialization and deserialization functions before university starts again. I have a fixed point way of removing the libm dependency right now:

/// Binary logarithms of the numbers 2..=36 in I3F13 fixed point format and rounded up.
/// This is used for robustly calculating the maximum number of bits needed for a string
/// representation of a number in some radix to be represented. This may give more bits
/// than needed, but is guaranteed to never underestimate the number of bits needed.
/// For example,
/// When the radix 10 string "123456789" is going to be converted to an `ApInt`,
/// `LB_2_36_I3F13` is indexed by 10 - 2 which gives 27214. 27214 multiplied by the length
/// of the string plus 1 is 27214 * (9 + 1) = 272140. This is shifted right by 13 for the
/// fixed point which produces 33. The actual number of bits needed is 27. The plus 1
/// added to the string length is needed to handle strings with all the digits being the
/// maximum for the given radix (e.g. "999999999" which needs 30 bits).
const LB_2_36_I3F13: [u16; 35] = [
    8192, 12985, 16384, 19022, 21177, 22998, 24576, 25969, 27214, 28340, 29369, 30315,
    31190, 32006, 32768, 33485, 34161, 34800, 35406, 35982, 36532, 37058, 37561, 38043,
    38507, 38953, 39382, 39797, 40198, 40585, 40960, 41324, 41677, 42020, 42353,
];

This is completely robust if I use checked arithmetic (so that string lengths close to usize::MAX cause panics instead of causing the expression to overflow). I think it is definitely superior to using floating point for getting a lower bound on the number of bits needed for a deserialization.

I have been updating my fixed point deserialization function which will serve as the base deserialization function for all the others. The signed and unsigned from_str_radix functions will call this function and use banker's rounding based off of the returned error.

The question is what interface the from_str_radix should have.
Imo ideally the user provide the required bit width together with the string-based representation of the Apint instance.
Then your routine would quickly check if the required (wished for) bit width can be meet by the given target bitwidth and if not we bail out early.

@AaronKutch
Copy link
Contributor Author

The signature will probably end up like:

    pub fn unsigned_from_str_radix_width(
        src: &str,
        radix: Radix,
        bitwidth: BitWidth
    ) -> Result<ApInt> {}

For the signed version, I don't know if we pass the sign through src or if we add a bool input. Or, we don't make a signed version at all and make the user do .wrapping_neg()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants