Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFE: add support for comparisons against 32-bit arguments #384

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

jhenstridge
Copy link

@jhenstridge jhenstridge commented May 11, 2022

This is intended to fix #383.

The basic implementation strategy was:

  1. add an SCMP_CMP_32BIT flag that can be bitwise ORed with the existing comparison operators, indicating that a 32-bit comparison is requested even on 64-bit platforms. This should be ABI backward compatible, and should also cleanly return an error should an application link against an old libseccomp.
  2. extract the "per argument comparison" logic from _db_rule_gen_64 and _db_rule_gen_32 into _db_rule_gen_arg_64 and _db_rule_gen_arg_32 helper functions that share the same signature.
  3. merge the two _db_rule_gen_* functions now that they are basically the same except for which helper they invoke in the loop.
  4. have _db_rule_gen unconditionally generate 32-bit comparison tree nodes if the argument comparison had the SCMP_CMP_32BIT flag set.

@coveralls
Copy link

coveralls commented May 11, 2022

Coverage Status

Coverage decreased (-0.1%) to 89.454% when pulling 518493f on jhenstridge:32-bit-comparisons into 72198d9 on seccomp:main.

@jhenstridge
Copy link
Author

I think this is ready for review now. I've added tests to demonstrate that the 32-bit comparisons work and ignore data in the high bits, and also made sure rule_add() will return -EINVAL if unknown flags are set in the scmp_compare operation for better forward compatibility.

One thing I haven't done is change the SCMP_A?_32() macros to automatically set SCMP_CMP_32BIT in the comparison operation. I think it is probably what most people using those macros would actually want, but is a change in behaviour.

@jhenstridge
Copy link
Author

I've rebased against main, and added a Python version of the new test. It all looked okay locally, but I hadn't built with --enable-python.

@pcmoore
Copy link
Member

pcmoore commented May 19, 2022

I want to apologize @jhenstridge, you've put a lot of work in on this and we haven't had a chance to look at it yet. I just want to let you know that this PR hasn't gone unnoticed or been forgotten, we just happen to have a lot on our plates and this is not only tricky code but an API addition so it takes some time to review.

Thank you for your help!

Copy link
Member

@drakenclimber drakenclimber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @jhenstridge!

I looked through the changes pretty closely, and it looks solid to me. I took a couple notes along the way as I worked through the patches, and I'll leave them in there for now in case they help others, but they don't affect the overall outcome of the review. Thanks for the refactor of the _db_rule_gen_* logic into a single function.

As a small nitpick, I am used to more descriptive commit comments, but I honestly can't fault what you've added. They explained each commit.

It feels like too significant of a change to be part of v2.5.5, and thus I would lean toward it going in v2.6.0. I'm open to discussion to hear others' thoughts here though.

Thanks!

Acked-by: Tom Hromatka <[email protected]>

@@ -1944,6 +1944,72 @@ static struct db_sys_list *_db_rule_gen_64(const struct arch_def *arch,
return NULL;
}

static int _db_rule_gen_arg_32(struct db_sys_list *s_new,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self as much as anything - I don't see this function having any code coverage even though it's getting called 500,000+ times.

https://coveralls.io/builds/49252361/source?filename=src%2Fdb.c#L1918

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I think something is broken for me and coveralls.io, I'm not able to see the marked-up source anymore ... :/

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind, it looks like I just needed to re-auth to coveralls.io.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Back to the original comment ... yes, something is odd here, both the 64-bit and 32-bit variants of this function appear to have zero code coverage, which is not right ...

@@ -1959,8 +2025,8 @@ static struct db_sys_list *_db_rule_gen_32(const struct arch_def *arch,
unsigned int iter;
struct db_sys_list *s_new;
const struct db_api_arg *chain = rule->args;
struct db_arg_chain_tree *c_iter = NULL, *c_prev = NULL;
bool tf_flag;
struct db_arg_chain_tree *prev_nodes[4] = { NULL, };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another note to self - the prev_nodes[] change will become important in later commits. This commit is (slightly) more than just a refactoring of code to a separate function.

Copy link
Member

@pcmoore pcmoore Jul 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fixed 4 is very "magic" too, which makes me uneasy. At the very least that needs a comment (why 4?), but a much better approach would be a macro definition and and comment with the macro.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I now understand where the 4 comes from, and to be fair we have similar constants in the current _db_rule_gen_64() function, but the important difference is that in the current implementation the magic numbers are contained as local variables within the function, here they are part of the function's signature and are thus much more important.

Copy link
Member

@pcmoore pcmoore Jul 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm hoping that as I get farther into this review we can revisit how this code was extracted, because I'm not really liking this right now.

@cyphar
Copy link

cyphar commented May 23, 2022

I wonder if it would be possible (or even preferable) to merge this PR with the requirements of #310 and just have a set of MASKED_* comparators so that you can do any masked comparison you like? It would result in the same BPF output for 32-bit but would be more general?

@jhenstridge
Copy link
Author

@cyphar: I think handling 32-bit arguments is important enough to have on its own, given how many syscalls have 32-bit arguments. And if libseccomp ever leans how to perform signed comparisons, the argument width will be important due to different positions for the sign bit. It also has the benefit of generating smaller BPF for 32-bit argument checks compared to generalised masked 64-bit comparisons.

If the part of my proposal to add flag bits to scmp_compare is accepted though, it might provide a better basis for generalised mask comparisons. Rather than adding new scmp_compare values for the other comparisons like was done with SCMP_CMP_MASKED_EQ, you could have a flag that can be ORed to indicate that the argument should be masked before performing the comparison.

@@ -94,6 +94,9 @@ enum scmp_compare {
SCMP_CMP_GT = 6, /**< greater than */
SCMP_CMP_MASKED_EQ = 7, /**< masked equality */
_SCMP_CMP_MAX,

SCMP_CMP_OPMASK = 0xFFFF,
SCMP_CMP_32BIT = 1 << 16, /**< operation is 32-bit */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still working my way through this PR, but I think we want to move SCMP_CMP_OPMASK and SCMP_CMP_32BIT out of the enum and have them be standalone preprocessor macros/constants.

Copy link
Member

@pcmoore pcmoore May 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess if we do that we need to add a "filler" value to cause the enum to be a certain size, which feels a little icky but I think it's okay if we make sure to represent all of the flag space, e.g.

enum scmp_compare {
	_SCMP_CMP_MIN = 0,
	SCMP_CMP_NE = 1,
	SCMP_CMP_LT = 2,
	SCMP_CMP_LE = 3,
	SCMP_CMP_EQ = 4,
	SCMP_CMP_GE = 5,
	SCMP_CMP_GT = 6,
	SCMP_CMP_MASKED_EQ = 7,
	_SCMP_CMP_MAX,
	_SCMP_CMP_OPMASK = 0x0000FFFF,
	_SCMP_CMP_FLAGMASK = 0xFFFF0000,
};

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... which somewhat questions the value of moving the flag definitions out of the enum and into their own macros/constants, but that still seems like a goodish idea to me at this point.

@@ -38,6 +38,7 @@ struct db_api_arg {
scmp_datum_t datum;

bool valid;
bool is_32bit;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd prefer to drop the is_32bit flag and just preserve the SCMP_CMP_XXX flags in the db_api_arg::op field; there aren't that many places that check the comparison op so masking them shouldn't be too terrible. If necessary we can always wrap it in a preprocessor macro, e.g.

#define CMP_OP(x) (x & __SCMP_CMP_OPMASK)
#define CMP_FLAGS(x) (x & __SCMP_CMP_FLAGMASK)

@pcmoore
Copy link
Member

pcmoore commented May 27, 2022

My apologies @jhenstridge, I just got pinged that I've run out of time today, if I don't get a chance to finish this over the weekend I'll pick it up next week.

@pcmoore pcmoore added this to the v2.6.0 milestone Jul 11, 2022
if (*tf_flag)
prev_nodes[iter]->nxt_t = _db_node_get(c_iter[0]);
else
prev_nodes[iter]->nxt_f = _db_node_get(c_iter[0]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know braces ("{ ... }") aren't strictly required for this for-loop body, but please add them to help readability and prevent silly mistakes in the future.

@pcmoore
Copy link
Member

pcmoore commented Jul 11, 2022

A quick comment independent of any one particular commit - all of the commits in this PR appear pretty substantial, yet there are no commit descriptions in the first few commits I've reviewed. Please add some meaningful commit descriptions to each patch explaining what the patch does, and why this change is important.

We often let trivial patches go into the tree without commit descriptions, but that should be seen as a special case and not something that should be taken as a general rule.

@@ -1944,6 +1944,72 @@ static struct db_sys_list *_db_rule_gen_64(const struct arch_def *arch,
return NULL;
}

static int _db_rule_gen_arg_32(struct db_sys_list *s_new,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function also needs a comment block at the top like all the other functions in libseccomp.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as with the 64-bit variant, it looks like the comment block is added in 0936a72 ("db: merge _db_rule_gen_64 and _db_rule_gen_32"); it would be best to add the comment block in the first appearance of the function.

static int _db_rule_gen_arg_32(struct db_sys_list *s_new,
const struct arch_def *arch,
const struct db_api_arg *api_arg,
struct db_arg_chain_tree *prev_nodes[4],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking out-loud here, but I wonder if the readability would be better if the prev_nodes[] array was expanded into individual variables with more descriptive names?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, this function only appears to ever use one prev_nodes[] array slot, right?

*/
static struct db_sys_list *_db_rule_gen_64(const struct arch_def *arch,
const struct db_api_rule_list *rule)
static int _db_rule_gen_arg_64(struct db_sys_list *s_new,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a comment block describing the function, see the others for an example.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, so it looks like it is added in 0936a72 ("db: merge _db_rule_gen_64 and _db_rule_gen_32"); it would be best to add the comment block in the first appearance of the function.

static int _db_rule_gen_arg_64(struct db_sys_list *s_new,
const struct arch_def *arch,
const struct db_api_arg *api_arg,
struct db_arg_chain_tree *prev_nodes[4],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other comments about prev_nodes[] apply here.

if (*tf_flag)
(*node)->nxt_t = _db_node_get(c_iter);
else
(*node)->nxt_f = _db_node_get(c_iter);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know braces ("{ ... }") aren't strictly required for this for-loop body, but please add them to help readability and prevent silly mistakes in the future.

prev_nodes[iter]->nxt_t = _db_node_get(c_iter[0]);
else
prev_nodes[iter]->nxt_f = _db_node_get(c_iter[0]);
prev_nodes[0] = prev_nodes[1] = prev_nodes[2] = prev_nodes[3] = NULL;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personal nit-pick: please don't chain assignments together like this, put them each on their own line.

{
unsigned int iter;
struct db_sys_list *s_new;
const struct db_api_arg *chain = rule->args;
struct db_arg_chain_tree *prev_nodes[4] = { NULL, };
bool is_32bit = arch->size == ARCH_SIZE_32;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since is_32bit is only used once below, why not just drop the dedicated local variable and just do the arch->size comparison where needed?

@@ -94,7 +94,8 @@ check_PROGRAMS = \
56-basic-iterate_syscalls \
57-basic-rawsysrc \
58-live-tsync_notify \
59-basic-empty_binary_tree
59-basic-empty_binary_tree \
60-sim-32b_args_on_64b
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the new tests :)

@@ -2331,6 +2331,11 @@ int db_col_rule_add(struct db_filter_col *col,
rc = -EINVAL;
goto add_return;
}
/* Check that no unknown flags are specified in the op */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be a good idea to check this up near the top of the if-block once the once the chain entry is considered valid (the fail early idea).

@@ -2331,6 +2331,11 @@ int db_col_rule_add(struct db_filter_col *col,
rc = -EINVAL;
goto add_return;
}
/* Check that no unknown flags are specified in the op */
if ((arg_data.op & ~(SCMP_CMP_OPMASK | SCMP_CMP_32BIT)) != 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may argue for a SCMP_CMP_FLAGMASK or similar, see the comments on the first patch in the PR.

@@ -155,7 +155,8 @@ EXTRA_DIST_TESTPYTHON = \
56-basic-iterate_syscalls.py \
57-basic-rawsysrc.py \
58-live-tsync_notify.py \
59-basic-empty_binary_tree.py
59-basic-empty_binary_tree.py \
60-sim-32b_args_on_64b.tests.py
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double bonus points for the Python tests :)

@pcmoore
Copy link
Member

pcmoore commented Jul 12, 2022

First off, I'm really sorry @jhenstridge that it took me so long to give this a proper review, that was pretty crappy and you deserved better considering the time and effort that no doubt went into this PR. That said, I'm finally done going through the PR, but please don't feel that my comments are "you MUST do this!", I think most of my comments were intended more as starting points for some discussion between you, @drakenclimber, myself, and anyone else who is interested in this functionality. Put another way, please don't rush out to change a lot of your code just yet, let's make sure we are all in agreement before you spend a bunch more time revising this patchset :)

@pcmoore
Copy link
Member

pcmoore commented Oct 14, 2022

Hi @jhenstridge, it's been a while so I just wanted to check in on this to see where things were at?

@pcmoore pcmoore modified the milestones: v2.6.0, v2.7.0 May 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RFE: add support for comparisons against 32-bit arguments
5 participants