Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic OR predicate on multiple columns #32092

Closed
clement911 opened this issue Oct 19, 2023 · 21 comments
Closed

Dynamic OR predicate on multiple columns #32092

clement911 opened this issue Oct 19, 2023 · 21 comments

Comments

@clement911
Copy link
Contributor

In EF core, is it possible to have WHERE with a multi-column OR predicate that is based on a dynamic list?

Let me explain with some simple pseudo code.

Assume I need to fetch a list of orders, based of a list of order ids.

I can do as follows:

public class Order
{
    [Key]
    public long Id {get;set;}

    //other properties...
}

long[] orderIds = //array of ids of the orders that need to be fetched
var orders = await ctx.Orders.Where(o => orderIds.Contains(o.Id)).ToArrayAsync();

That works great.

Now, imagine that the order class has a composite key made of multiple columns.

I would like to write something like this:

public class Order
{
     [Key]
     public long TenantId {get;set;}

    [Key]
     public long Id {get;set;}

    //other properties...
}

(long tenantId, long orderId)[] orderIds = //array of (long, long) tuples of the composite ids of the orders that need to be fetched
var orders = await ctx.Orders.Where(o => orderIds.Contains((o.TenantId, o.Id)).ToArrayAsync(); //this doesn't work

However, this doesn't work.
Is there an alternative?

As of now, we generate sql manually and use FromSqlRaw. That's not ideal though.

@ilmax
Copy link
Contributor

ilmax commented Oct 19, 2023

I was also running into this the other day, I think it's a missing piece in Linq. It would be nice to have some sort of composable Any so we could so something like the following pseudo code:

var query = something...
foreach (var item in list)
{
query = query.Any(x => x.Id = item.Id && x.TenantId = item.TenantId)
}

As of today we're forced to take some workarounds to build such queries like for example manually constructing expression trees.

The code above doesn't work today because Any doesn't return an IQuerybale<T> but a bool, that's why to me it looks like something missing in Linq itself.

Maybe I'm missing something obvious but I'm not aware of how we can express a list of OR conditions in Linq today.

@clement911
Copy link
Contributor Author

I've seen people mention this PredicateBuilder but it's not clear if it's supported.
I'd prefer to see a native EF solution.
It seems like a pretty basic scenario.

@roji
Copy link
Member

roji commented Oct 19, 2023

Duplicate of #11799

@roji roji marked this as a duplicate of #11799 Oct 19, 2023
@roji
Copy link
Member

roji commented Oct 19, 2023

We've made various progress around Contains that goes in this direction, but full support for tuples (which the above requires) isn't yet supported.

As of now, we generate sql manually and use FromSqlRaw.

@clement911 what SQL are you currently using for the above?

@ilmax I'm not sure what's missing in LINQ here. First, the above - Contains with tuple - is already the right LINQ expression for the operation; the problem is that EF doesn't yet support translating it (because support is lacking for tuples in general).

When we implement support for this, the array of tuples parameter ((long tenantId, long orderId)[]) would likely be encoded as a JSON string and send to the database, to be unpacked into a relational table using e.g. SQL Server OPENJSON. At that point it can be treated like a regular table, using the regular SQL means to translate the operation.

@ilmax
Copy link
Contributor

ilmax commented Oct 19, 2023

I'm not sure what's missing in LINQ here

@roji I think I described my wish in the comment above. I'd like the ability to construct multiple conditions and then OR- them together, same thing we can do with the Where but using OR instead of AND

The contains with tuple to me just doesn't feels good. It's difficult to read compared with my pseudo proposal above (but this is my personal opinion) and it's also more limited than what I was proposing above.

@roji
Copy link
Member

roji commented Oct 19, 2023

@ilmax it sounds like you're asking for #15670.

If so, that approach generates different SQLs for arrays of different size (since there are more ORs); this is bad, since it defeats query plan caching at various levels and hurts performance considerably (see #15670 (comment) for more on this).

For that reason, I definitely think that's not the right solution here. Packing the parameter as a JSON parameter and expanding it to a table in the database is the better way to do this.

The contains with tuple to me just doesn't feels good. It's difficult to read compared with my pseudo proposal above (but this is my personal opinion) and it's also more limited than what I was proposing above.

Well, "doesn't feel good" isn't a very strong argument... The contains with tuple works exactly the same way as the contains without a tuple, so I'm not sure why anyone would find it difficult to read. More importantly, your proposal doesn't show what the query would actually look like, in code. If you're asking for us to add some new, unknown LINQ construct for this, I definitely don't see us doing that when a perfectly suitable LINQ construct already exists.

@ilmax
Copy link
Contributor

ilmax commented Oct 19, 2023

isn't a very strong argument

I agree, that's why I said it's just my personal opinion, the second point about limitation is more important and it should have been added as first.

Having said that and ignoring the perf side of thing for now, I was saying that to me looks like a Linq limitation because I feel (but I may be wrong) that's not easy with Linq today to compose several query with OR because Any has no deferred execution semantics so we can't compose query with it.

So what I feel like is missing is the ability to compose multiple conditions together with an OR. this is possible today in Linq to object but not always properly translated by various providers (not talking only about EF core here and that's my point about exposing a simpler method in Linq and risking to go O.T. here).

Take for example this trivial linq 2 object code that's just for illustration purposes:

var customers = new List<Customer>();
for (int i = 0; i < 20; i++)
{
	customers.Add(new Customer(i, i % 5, i*2));
}

var customersToFind = new List<(int Id, int TenantId)>
{
	{(8, 4)},
	{(9, 4)},
	{(14, 4)}
};

var found = customers.Where(i => customersToFind.Any(x => i.TenantId == x.TenantId && (x.Id == i.Id || x.Id == i.AlternateId))).ToArray();

public record Customer(int Id, int TenantId, int AlternateId);

If we want to translate the same query to EF or other Linq provider (looking at you, Cosmos SDK) out there, we're mostly out of luck.

EF today (tested with the latest 8.0-RC2 version available, see gist here) can't translate a similar query.
It would be nice to be able to do something like the following and have the same semantics of the linq to object query expressed above (name Or is just the first one that came to my mind):

var customerQuery = customers.AsQueryable();
foreach (var custId in customersToFind)
{
	customerQuery = customerQuery.Or(c => c.TenantId == custId.TenantId && (c.Id == custId.Id || c.AlternateId == custId.Id));
}

In order to express such functionality today and not pulling in 3rd parties libraries (unless I'm missing something) we have to take a stroll into manually creating expression trees and that's not trivial and, if memory serves me right, it was discouraged by the EF team itself.

We can argue that's probably not the best way to query perf wise and that you should have a better db design to avoid such queries and everything else but, every now and then, those query came up and today I always end up with manually created expression trees.

I hope I expressed myself in a clearer way now :)

@roji
Copy link
Member

roji commented Oct 19, 2023

So again, I think you're really conflating two things:

  1. The ability to express something with LINQ.
  2. The ability of a specific LINQ provider to translate that LINQ.

I'm seeing nothing in this issue that can't be expressed with LINQ - via Contains and tuples; this works perfectly with LINQ to Objects, and IMHO is also the ideal and compact way to express the operation. Then there's the question of specific LINQ providers (and/or EF providers) actually translating that construct; that's really a separate question. Note that there are various other unrelated LINQ constructs that EF can't translate to SQL.

I've already written above why generating dynamic SQL is a bad idea (bad perf because of how SQL plan caching works). I suggest fully understanding that; it should make clear why your proposal wouldn't be a good way forward.

Finally, if constructing dynamic ORs is really what you're looking for, nobody's stopping you - LINQ does allow you to dynamically construct expression trees as you you like. It may not be as pretty or easy as writing C# code, but there's nothing inherently wrong about it. It's also trivial to create an extension or 3rd-party tool which does this (like PredicateBuilder), so there really isn't any reason for EF to include it (not everything has to be part of the product).

@ilmax
Copy link
Contributor

ilmax commented Oct 19, 2023

So again, I think you're really conflating two things:

  1. The ability to express something with LINQ.
  2. The ability of a specific LINQ provider to translate that LINQ.

I'm well aware of how Linq and EF works, that was why I was thinking about exposing it in Linq itself (even though I'm aware that the Linq library has a very high bar to accept any changes so I knew my idea would go nowhere, I just wanted to see if there's a need of such functionality)

I'm seeing nothing in this issue that can't be expressed with LINQ - via Contains and tuples; this works perfectly with LINQ to Objects, and IMHO is also the ideal and compact way to express the operation. Then there's the question of specific LINQ providers (and/or EF providers) actually translating that construct; that's really a separate question. Note that there are various other unrelated LINQ constructs that EF can't translate to SQL.

I'm not sure I agree on the ideal part. What I propose has far superior readability (subjective point) and can be composed easily, I also think that's probably easier for a linq provider to translate since it doesn't have to understand the reference a collection and translating the Any()

I've already written above why generating dynamic SQL is a bad idea (bad perf because of how SQL plan caching works). I suggest fully understanding that; it should make clear why your proposal wouldn't be a good way forward.

I've read the comments and understand the drawbacks, but we don't live in a perfect world and sometimes those kind of trade-offs have to be made

Finally, if constructing dynamic ORs is really what you're looking for, nobody's stopping you - LINQ does allow you to dynamically construct expression trees as you you like. It may not be as pretty or easy as writing C# code, but there's nothing inherently wrong about it. It's also trivial to create an extension or 3rd-party tool which does this (like PredicateBuilder), so there really isn't any reason for EF to include it (not everything has to be part of the product).

I agree, everything can be done today but it's impractical. What I was hoping was to simplify what to me seems like a common enough case to deserve to be treated as a first class citizen.

Given that I've failed to prove my case, I will live with my manually constructed expression trees :)

@clement911
Copy link
Contributor Author

@roji We use the following SQL work around.
(I simplified the code but you should get the idea).

(long tenantId, long orderId)[] orderIds = //array of (long, long) tuples of the composite ids of the orders that need to be fetched
 string orPredicate = orderIds.Select(i => $"({nameof(Order.TenantId)} = {i.tenantId} AND {nameof(Order.Id)} = {i.orderId})")
                          .Join(" OR ");
 var orders = await ctx.Set<Order>()
                                   .FromSqlRaw($"""
                                                     SELECT *
                                                     FROM [{SCHEMA_PUBLIC}].[{nameof(Order)}]
                                                     WHERE ({orPredicate})
                                                     """)
                                  .ToArrayAsync();

But of course we don't like because it's brittle, verbose, and we lose many of the benefits of querying with EF.
Hence why we would prefer if EF had a built-in feature to support this properly.

Using a linq query like below would be ideal.

var orders = await ctx.Orders.Where(o => orderIds.Contains((o.TenantId, o.Id)).ToArrayAsync();

My understanding is that Postgres natively supports tuple comparison.
It supports not only (in)equality operators but also other comparison operators such as greater than, lower than, etc...
For example, it is valid to write a postgres query like this:

select * from books WHERE 
(title, id) > ('KVFNdl5F', 994364)

Unfortunately, we use sql server and sql server doesn't have native support for tuples.
That's why we have to create a query with an OR clause.

You mentioned that EF would likely encode the array of IDs as a JSON string and send to the database, to be unpacked with OPENJSON?
That seems a lot more complicated to me that generating a plain OR clause.
I agree that this means that the query will not benefit from query plan caching since the sql will be different each time.
However, this is no different that the regular single-value contains, right?
i.e if today I write await ctx.Orders.Where(o => orderIds.Contains(o.Id)), EF will generate an IN statement with all the values, which means that the sql will be different each time.
So it would make sense to me that the tuple version uses an OR, which is the closest equivalent to an IN.

But to take a step back, we don't care too much about the actual generated SQL, as long as it performs well.

It hadn't occur to me to use OPENJSON.
I do think that @ilmax direction is more flexible though.
Indeed, Contains is good for equality and !Contains for inequality but sometimes we need more complicated logic.

For example, imagine that the user provides a list of prefixes and the query needs to return all orders for which the product name starts with any of the supplied prefixes. Contains will not help but it would be great to be able to write something like this.

var orPredicate = EF.Or<Order>();
foreach (string prefix in prefixes)
  orPredicate = orPredicate.Or(o => o.ProductName.StartsWith(prefix);
var orders = ctx.Orders.Where(o => o.TenantId = tenantId && orPredicate).ToArrayAsync();

I know that we can build expression trees by hand but who does?
It's very tricky to do. If EF provided a built-in utility such as PredicateBuilder, it would add a ton of flexibility to build arbitrary complex tree of AND/OR condition trees that can operate on multiple columns using the full power of EF.

If I understand correctly, the EF sql generation pipeline wouldn't even need to change, since it wouldn't care that the expression tree was built using a utility rather than with a 'normal' query. So it seems like it would be a quick win.

While I agree that Contains on tuples is a different feature to an OR/AND predicate builder utility, I do think that the latter would allow us to handle the former, and more.

@roji
Copy link
Member

roji commented Oct 22, 2023

You mentioned that EF would likely encode the array of IDs as a JSON string and send to the database, to be unpacked with OPENJSON?
That seems a lot more complicated to me that generating a plain OR clause.
I agree that this means that the query will not benefit from query plan caching since the sql will be different each time.
However, this is no different that the regular single-value contains, right?
i.e if today I write await ctx.Orders.Where(o => orderIds.Contains(o.Id)), EF will generate an IN statement with all the values, which means that the sql will be different each time.
So it would make sense to me that the tuple version uses an OR, which is the closest equivalent to an IN.

The Contains behavior is changing in 8.0, precisely because of the query plan issues inherent in varying SQL, see this blog post for more details. I think that in principle, it's best if EF never generate varying SQLs (like it does for Contains over a parameterized list prior to 8.0).

Regard whether it's more or less complicated, I don't think that should be something the user cares much about... As long as the query works correctly and has good perf, the underlying mechanisms used to translate the LINQ query are implementation details.

For example, imagine that the user provides a list of prefixes and the query needs to return all orders for which the product name starts with any of the supplied prefixes. Contains will not help but it would be great to be able to write something like this.

There's no need for dynamic ORs here either; in fact, EF Core 8.0 supports this kind of query using the same OPENJSON technique used for Contains. The following LINQ query:

var prefixes = new[] { "foo", "bar" };
_ = await ctx.Blogs.Where(b => prefixes.Any(p => b.Name.StartsWith(p))).ToListAsync();

Translates in 8.0 as follows:

SELECT [b].[Id], [b].[Flag], [b].[Name]
FROM [Blogs] AS [b]
WHERE EXISTS (
    SELECT 1
    FROM OPENJSON(@__prefixes_0) WITH ([value] nvarchar(max) '$') AS [p]
    WHERE [p].[value] IS NOT NULL AND LEFT([b].[Name], LEN([p].[value])) = [p].[value])

In other words, OPENJSON (or the equivalent in other databases) is used to unpack a JSON list into a table, at which point the regular relational tools can be used. This is extremely powerful/flexible and allows translating a wide array of LINQ constructs which weren't translatable prior to 8.0.

If EF provided a built-in utility such as PredicateBuilder, it would add a ton of flexibility to build arbitrary complex tree of AND/OR condition trees that can operate on multiple columns using the full power of EF.

If PredicateBuilder works and satisfies your needs, then why does it need to be a built-in utility in EF Core? We very much don't believe that all possible functionality needs to be in-the-box inside EF, and are always happy to see 3rd-party utilities integrate with EF. It's simply not practical to expect the EF team to do everything.

@clement911
Copy link
Contributor Author

clement911 commented Oct 23, 2023

Thanks @roji
Assuming OPENJSON performs similarly to IN/OR then I think it's fantastic that EF is trying to avoid varying SQL and ensure query plans are cached and reused.

There's no need for dynamic ORs here either; in fact, EF Core 8.0 supports this kind of query using the same OPENJSON technique used for Contains.

That's really cool.
So if I understand, you're saying that this way of formulating the query will work single column predicates but not with multi-column/tuple predicates?
e.g.

Works:

long[] orderIds= ...;
await ctx.Orders.Where(o => orderIds.Any(id => o.Id == id).ToListAsync();

Doesn't work:

(long tenantId, long orderId)[] orderIds = ...;
await ctx.Orders.Where(o => orderIds.Any(id => o.TenantId == id.tenantId && o.Id == id.orerId).ToListAsync();

Right?

Also I'm curious if using OPENJSON works properly when using various collations?
We have a lot of columns with non-default collations.
My understanding is that n(var)char parameters always have the default DB collation.
If you compare to a column that has a different collation, sql server will reject the query unless a COLLATE clause is added.

I understand that EF cannot realistically implement and support every possible feature.
I will review existing third party solutions to build OR predicates.

@roji
Copy link
Member

roji commented Oct 23, 2023

Assuming OPENJSON performs similarly to IN/OR [...]

Take a look at this comment and at that issue in general. I haven't benchmarked dynamic ORs specifically, but I have compared with the previous Contains behavior (i.e. embed constants in the SQL). The bottom line is that if you concentrate solely on the specific query performance, we've seen the OPENJSON-based technique perform slightly worse than constant embedding; however, once you consider the overall perf impact on other queries (i.e. other query plans getting evicted because of varying SQLs), paying that relatively small constant perf hit should make sense.

So if I understand, you're saying that this way of formulating the query will work single column predicates but not with multi-column/tuple predicates?

Yes - arrays of tuples aren't yet supported (as opposed to arrays of e.g. ints). This is something I hope we'll be able to improve in the near future.

If you compare to a column that has a different collation, sql server will reject the query unless a COLLATE clause is added.

AFAIK that's not how it works - have you tested this? Parameters - just like nvarchar constants - do not have a collation (by default); when you compare them to a column, the column's collation is used for the comparison operation. Similarly, the OPENJSON-generated table has a column with no collation, so OPENJSON shouldn't make any difference compared to sending a simple string parameter. If you're seeing another behavior please open a new issue with some code so we can investigate.

@roji
Copy link
Member

roji commented Oct 23, 2023

I think that the discussion here has run its course. To summarize my point of view:

  • We want to avoid translating LINQ queries in a way which produces dynamic SQL, since that's quite bad for perf. While there are scenarios where dynamic SQL is required, I think it's important that EF not do this automatically, to make sure that the user is well-aware of what's happening (and the potential perf problems that introduces).
  • Specifically for the query above, I believe the right translation would make use of OPENJSON; this continues the direction we've started in EF Core 8.0 primitive collections (requiring adding tuple support). This technique avoids dynamic SQL.
  • It makes a lot of sense for dynamic SQL to require dynamic LINQ, i.e. dynamically constructing the LINQ expression tree. This preserves the symmetry between the two sides, and makes it extremely clear in .NET code that query dynamicity is being introduced.
  • At the same time, it's trivial to wrap the dynamic LINQ complexity via a package like PredicateBuilder. Since this is an area that's easily extensible by 3rd-party packages, we don't believe we need to provide a dynamic predicate builder within EF.

Given the above I'll go ahead and close this issue, but of course feel free to post back here with any other thoughts and continue the conversation.

@roji roji closed this as not planned Won't fix, can't repro, duplicate, stale Oct 23, 2023
@clement911
Copy link
Contributor Author

@roji I appreciate your fast and clear comments.
Your points all make sense.

However, I did some quick testing and I think that the collation thing will be an issue that needs to be addressed otherwise it'll break queries with a .Contains that operate on char columns with a non default collation.

@roji
Copy link
Member

roji commented Oct 24, 2023

@clement911 can you post your quick testing so we can investigate?

@clement911
Copy link
Contributor Author

@roji Yes, I will.
Do I have to install .net 8 preview and vs 2022 preview to use ef8?

@Meligy
Copy link

Meligy commented Oct 24, 2023

@clement911 you need the preview SDK. Then you can use VS Code to write the code or get the right VS version.

@clement911
Copy link
Contributor Author

I raised a bug issue here: #32147

@n0099
Copy link

n0099 commented Jul 10, 2024

For anyone who have to do #32092 (comment)

Finally, if constructing dynamic ORs is really what you're looking for, nobody's stopping you - LINQ does allow you to dynamically construct expression trees as you you like. It may not be as pretty or easy as writing C# code, but there's nothing inherently wrong about it. It's also trivial to create an extension or 3rd-party tool which does this (like PredicateBuilder), so there really isn't any reason for EF to include it (not everything has to be part of the product).

and copy-pasted so many PredicateBuilders then want to reuse them, here's an extension method:

/// <see>https://github.com/dotnet/efcore/issues/32092#issuecomment-2221633692</see>
/// <see>https://stackoverflow.com/questions/70744232/query-where-multiple-columns-have-to-match-a-value-set-simultaneously-in-ef-core/78732959#78732959</see>
public static IQueryable<TEntity> WhereOrContainsValues<TEntity, TToCompare>(
    this IQueryable<TEntity> queryable,
    IEnumerable<TToCompare> valuesToCompare,
    IEnumerable<Func<TToCompare, Expression<Func<TEntity, bool>>>> comparatorExpressionFactories) =>
    queryable.Where(valuesToCompare.Aggregate(
        LinqKit.PredicateBuilder.New<TEntity>(),
        (outerPredicate, valueToCompare) => outerPredicate.Or(
            comparatorExpressionFactories.Aggregate(
                LinqKit.PredicateBuilder.New<TEntity>(),
                (innerPredicate, expressionFactory) =>
                    innerPredicate.And(expressionFactory(valueToCompare))))));

Taking the example from #32092 (comment)

public class Order
{
     [Key]
     public long TenantId {get;set;}

    [Key]
     public long Id {get;set;}

    //other properties...
}

(long tenantId, long orderId)[] orderIds = //array of (long, long) tuples of the composite ids of the orders that need to be fetched
var orders = await ctx.Orders.Where(o => orderIds.Contains((o.TenantId, o.Id)).ToArrayAsync(); //this doesn't work

it now would be:

var orders = await ctx.Orders.WhereOrContainsValues(orderIds,
[
    orderId => order => orderId.tenantId == order.TenantId,
    orderId => order => orderId.orderId == order.Id 
]).ToArrayAsync();

and translated to SQL like:

SELECT fields FROM Orders
WHERE (TenantId = $1 AND Id = $2)
   OR (TenantId = $3 AND Id = $4)
-- keep go on for each item in orderIds

n0099 added a commit to n0099/open-tbm that referenced this issue Jul 10, 2024
…mment) to simplify multiple usages of `LinqKit.PredicateBuilder` @ ExtensionMethods.cs

@ c#/crawler
@n0099
Copy link

n0099 commented Jul 10, 2024

Also try out a more generic one: https://stackoverflow.com/questions/67934374/ef-core-5-contains-with-several-columns/67935339#67935339
with some modifications:

/// <see>https://stackoverflow.com/questions/67666649/lambda-linq-with-contains-criteria-for-multiple-keywords/67666993#67666993</see>
public static IQueryable<T> FilterByItems<T, TItem>(
    this IQueryable<T> query,
    IEnumerable<TItem> items,
    Expression<Func<T, TItem, bool>> filterPattern,
    bool isOr = true)
{
    var predicate = items.Aggregate<TItem, Expression?>(null, (current, item) =>
    {
        var itemExpr = Expression.Constant(item);
        var itemCondition = FilterByItemsExpressionReplacer
            .Replace(filterPattern.Body, filterPattern.Parameters[1], itemExpr);

        return current == null
            ? itemCondition
            : Expression.MakeBinary(isOr ? ExpressionType.OrElse : ExpressionType.AndAlso,
                current,
                itemCondition);
    }) ?? Expression.Constant(false);
    var filterLambda = Expression.Lambda<Func<T, bool>>(predicate, filterPattern.Parameters[0]);

    return query.Where(filterLambda);
}

private sealed class FilterByItemsExpressionReplacer(IDictionary<Expression, Expression> replaceMap) : ExpressionVisitor
{
    private readonly IDictionary<Expression, Expression> _replaceMap
        = replaceMap ?? throw new ArgumentNullException(nameof(replaceMap));

    public static Expression Replace(Expression expr, Expression toReplace, Expression toExpr) =>
        new FilterByItemsExpressionReplacer(new Dictionary<Expression, Expression> {{toReplace, toExpr}})
            .Visit(expr);

    [return: NotNullIfNotNull(nameof(node))]
    public override Expression? Visit(Expression? node) =>
        node != null && _replaceMap.TryGetValue(node, out var replacement)
            ? replacement
            : base.Visit(node);
}
var orders = await ctx.Orders.FilterByItems(orderIds, (ids, o) => ids.Contains((o.TenantId, o.Id)).ToArrayAsync();

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants