Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SMTChecker: Inline let expressions in solvers' models #15336

Merged
merged 1 commit into from
Aug 19, 2024
Merged

Conversation

blishko
Copy link
Contributor

@blishko blishko commented Aug 15, 2024

Models computed by CHC solvers (especially Z3) can contain SMT-LIB's let expressions. These are used to name a subterm and use the name instead of repeating the subterm possibly many times.

The easiest (but not necessary the best) way to deal with let terms is to inline (expand) them. In our current experiments inlining did not pose any problem. If, in the future, this would turn out to cause problems, we can devise a way to process let terms without expanding the term to its full tree-like representation.

This is part of the preparation to use Z3 as an external solver. It allows us to properly process models returned by Z3.
It has been separated from #15252.

@ekpyron
Copy link
Member

ekpyron commented Aug 15, 2024

We potentially also have quantifiers and thus otherwise bound variables in the smt expressions, right?
Could stuff like

  let x = 0 in (forall x . x < 0)

(I'm not fluent in writing smtlib, so however you'd actually write this properly)

happen?

If so, the current code looks like it'd still just substitute x, even if it's bound in a subexpression of the let-body. (So you'd get something like forall x . 0 < 0 or even forall 0 . 0 < 0, which is either no longer equivalent or even invalid...)
Will this only ever be run on smt expressions that don't contain other binders like quantifiers? Or do we have some unique-name property that will prevent a nested binder from using the same name that was previously bound by a let?

If this can happen, then I guess, the let-body (the part in which the let-bound things occur I mean by that) would need to be checked for other binders of the let-bound names and they'd need to be prevented from being replaced in further subexpressions of such nested binders or such...

But yeah, I haven't thought about what the expressions could look like that this runs on, so maybe this will just never happen and it's fine - but for arbitrary smtlib2 expressions I'd guess this could be an issue...
But scopedParser.toSMTUtilExpression that is run on the result of all this in the end at least does have quantifier cases :-).

@ekpyron
Copy link
Member

ekpyron commented Aug 15, 2024

If I'm right, maybe something along the lines of this is safer?

void inlineLetExpressions(SMTLib2Expression& _expr, std::unordered_map<std::string, std::reference_wrapper<SMTLib2Expression>> const& _substitutions = {})
{
	if (isAtom(_expr))
	{
		if (auto* bound = util::valueOrNullptr(_substitutions, asAtom(_expr)))
			_expr = *bound;
		return;
	}
	auto& subexprs = asSubExpressions(_expr);
	solAssert(!subexprs.empty());
	auto const& first = subexprs.at(0);
	if (isAtom(first))
	{
		if (asAtom(first) == "let")
		{
			solAssert(subexprs.size() == 3);
			solAssert(!isAtom(subexprs[1]));
			auto& bindingExpressions = asSubExpressions(subexprs[1]);
			auto newSubstitutions = _substitutions;
			for (auto& binding: bindingExpressions)
			{
				solAssert(!isAtom(binding));
				auto& bindingPair = asSubExpressions(binding);
				solAssert(bindingPair.size() == 2);
				solAssert(isAtom(bindingPair.at(0)));
				inlineLetExpressions(bindingPair.at(1), _substitutions);
				if (
					auto [it, newlyInserted] = newSubstitutions.emplace(asAtom(bindingPair.at(0)), bindingPair.at(1));
					!newlyInserted
				)
					// overwrite existing bindings
					it->second = bindingPair.at(1);
			}
			
			inlineLetExpressions(subexprs.at(2), newSubstitutions);

			// update the expression
			auto tmp = std::move(subexprs.at(2));
			_expr = std::move(tmp);
			return;
		}
		else if (asAtom(first) == "forall" || asAtom(first) == "exists") // any other binders?
		{
			auto newSubstitutions = _substitutions;
			smtAssert(subexprs.size() == 3);
			for (auto const& sortedVar: asSubExpressions(subexprs.at(1)))
				// remove substitutions for the quantifier-bound names
				newSubstitutions.erase(asAtom(asSubExpressions(sortedVar).at(0)));
			inlineLetExpressions(subexprs.at(2), newSubstitutions);
			return;
		}
	}

	// not a let expression, just process all arguments recursively
	for (auto& subexpr: subexprs)
		inlineLetExpressions(subexpr, _substitutions);
}

?

(I've rather blindly written that - the snippet stores references to SMTLib2Expressions and I haven't fully thought through whether what they reference stays alive long enough - and not sure how expensive it'd be to copy instead)

auto const& first = subexprs.at(0);
if (isAtom(first) && asAtom(first) == "let")
{
solAssert(subexprs.size() >= 3);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
solAssert(subexprs.size() >= 3);
solAssert(subexprs.size() == 3);

or why can the rest be ignored, if there can be any more subexpressions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed to assert exactly 3 subexpressions.

@blishko
Copy link
Contributor Author

blishko commented Aug 16, 2024

You are right in the sense that the expression could contain quantifiers and theoretically it could bind the same variable.
In such cases, it would indeed not work correctly.
I was counting on the solver being sane and not doing that.
In general, AFAIK, solvers behave reasonably and use unique names when introducing let terms.

I will check your suggestion.

@ekpyron
Copy link
Member

ekpyron commented Aug 16, 2024

You are right in the sense that the expression could contain quantifiers and theoretically it could bind the same variable. In such cases, it would indeed not work correctly. I was counting on the solver being sane and not doing that. In general, AFAIK, solvers behave reasonably and use unique names when introducing let terms.

I will check your suggestion.

Yeah, ok - this is part of parsing the solver-generated counterexamples, right? It stands to reason that they internally use unique names and then also serialize them sanely, yeah. So if my suggestion doesn't work due to some life-time issue (or you think it copying the substitutions more often is too expensive for the kinds of expressions we expect here), I'd be fine with just documenting it - but that would be nice at least then, just to make sure that nobody ever tries to use this on code for which we couldn't assume sane/unique names. Conversely, if we assume unique names anyways, I guess that'd also go for the same let-name not being "re-bound", which could simplify the code a bit further... unfortunately, asserting against non-unique names in binders there is almost as much of a complication as treating them properly :-).

But yeah, your call, I'd approve this as is, if this part is noted in a comment or such.

Models computed by CHC solvers (especially Z3) can contain SMT-LIB's let
expressions. These are used to name a subterm and use the name instead
of repeating the subterm possibly many times.

The easiest (but not necessary the best) way to deal with let terms is
to inline (expand) them. In our current experiments inlining did not pose
any problem. If, in the future, this would turn out to cause problems,
we can devise a way to process let terms without expanding the term to
its full tree-like representation.

This is part of the preparation to use Z3 as an external solver.
It allows us to properly process models returned by Z3.
@blishko
Copy link
Contributor Author

blishko commented Aug 19, 2024

@ekpyron, I added a case to handle quantifiers based on your suggestion, but I kept my original data structure.
I think your approach would work, but I wanted to avoid all the copying (and for very deep nested lets, keeping all copies around might consume unexpectedly large amounts of memory).
I can use my data structure by explicitly rebinding names of quantified variables to the variable itself.
This way, if a let contains a quantifier with the same variable name as the let expression, inside the quatified expression the variable would stay as is, and not be expanded according to its let definition.

@blishko
Copy link
Contributor Author

blishko commented Aug 19, 2024

And yes, this is only for parsing the solver-generated counterexamples. I could even hide the method in the cpp file and not expose it, if that is preferable.

@blishko blishko merged commit 98b22f8 into develop Aug 19, 2024
72 checks passed
@blishko blishko deleted the smt-lets branch August 19, 2024 12:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🟡 PR review label
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants