Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BulkRead erroring with " An item with the same key has already been added" on one-to-many relation #733

Closed
Tyler-V opened this issue Jan 19, 2022 · 7 comments
Labels

Comments

@Tyler-V
Copy link

Tyler-V commented Jan 19, 2022

Hi @borisdj

We are attempting to implement bulk services to read a parent table by Id, then read the child relation (one-to-many) by the joining/navigational property of the parent id, then assign those child properties to the parent to return the complete object.

The problem is the child relation, is a one-to-many and can be assigned to many of the parent Entity (Package)

System.Private.CoreLib: Exception while executing function: Package-Reorder. System.Private.CoreLib: An item with the same key has already been added. Key: 19.

This is the error we are encountering, is there a work-around for this other than querying for the ids of the child table with a where contains and then materializing the child query?

            var packageAssignmentIds = await _packageReservationContext.PackageAssignment
                .Where(pa => packageIds.Contains(pa.PackageId))
                .Select(pa => pa.Id)
                .ToListAsync();

This is what we are using that triggered the error,

        public async Task<List<Package>> GetPackagesAsync(List<int> packageIds)
        {
            var packages = packageIds.Select(id => new Package() { Id = id }).ToList();
            await _packageReservationContext.BulkReadAsync(packages, new BulkConfig
            {
                UpdateByProperties = new List<string> { nameof(Package.Id) },
                BatchSize = 10000
            });
            
            var packageAssignments = packageIds.Select(id => new PackageAssignment() { PackageId = id }).ToList();
            await _packageReservationContext.BulkReadAsync(packageAssignments, new BulkConfig // <----- ERROR
            {
                UpdateByProperties = new List<string> { nameof(PackageAssignment.PackageId) },
                BatchSize = 10000
            });

Entities below,

    public abstract class BaseEntity
    {
        [DatabaseGenerated(DatabaseGeneratedOption.Identity)]
        [Key]
        [Column(Order = 0)]
        public int Id { get; set; }

        public string CreatedBy { get; set; }

        public string UpdatedBy { get; set; }

        public DateTimeOffset? DateCreated { get; set; }

        public DateTimeOffset? DateModified { get; set; }

        public string Notes { get; set; }
    }
    public class Package : BaseEntity
    {
        public int AccumulationGroupId { get; set; }
        public string NdcNumber { get; set; }
        public int? OrderDetailId { get; set; }
        public DateTimeOffset? ReplenishmentCompleteDate { get; set; }
        public OrderDetail OrderDetail { get; set; }
        public int? InvoiceDetailId { get; set; }
        public InvoiceDetail InvoiceDetail { get; set; }
        public int?  WholesalerAccountId { get; set; }
        public PackageStatus Status { get; set; } = PackageStatus.Open;
        public List<PackageAssignment> PackageAssignments { get; set; } = new List<PackageAssignment>();
    }
    public class PackageAssignment : BaseEntity
    {
        public int PackageId { get; set; }
        public Package Package { get; set; }
        public int AccumulationFactId { get; set; }
        public int CaptureHistoryId { get; set; }
        public decimal PercentOfPackage { get; set; }
        public decimal Quantity { get; set; }
    }

image

image

@borisdj
Copy link
Owner

borisdj commented Jan 20, 2022

Interesting use case, or so to say problem.
BulkRead needs unique column by which to merge, so in this situation you need Contains.
However regular Contains for very large lists, more then around 50 K records in EF 6, will not work (TimeOut exception).
Workaround, for list that size or more, is to create custom temp Table using BulkConfig.CustomDestinationTableName and SqlRaw.
Example:

TableInfo tableInfo = TableInfo.CreateInstance(context, typeof(Package), new List<Package>(), OperationType.Insert, null);
var db = context.Database;
db.ExecuteSqlRaw(SqlQueryBuilder.CreateTableCopy(tableInfo.FullTableName, tableInfo.FullTempTableName, tableInfo));
db.ExecuteSqlRaw(SqlQueryBuilder.AlterTableColumnsToNullable(tableInfo.FullTempTableName, tableInfo));
string destinationTableName = tableInfo.FullTempTableName.Replace("[", "").Replace("]", "");
context.BulkInsert(packageAssignmentIds, new BulkConfig() { CustomDestinationTableName = destinationTableName });
string pkColumn = nameof(Package.Id);
string joinColumn = nameof(PackageAssignment.PackageId);
var records = context.Items.FromSqlRaw(
    $"SELECT * FROM {tableInfo.FullTableName}" +
    $"WHERE {pkColumn } IN " +
    $"(SELECT {joinColumn} FROM {tableInfo.FullTempTableName})"
).AsNoTracking().ToList();
db.ExecuteSqlRaw(SqlQueryBuilder.DropTable(tableInfo.TempTableName, false));

PS
Take the latest v.6.3.1 since there was a bug with CustomDestinationName that is now fixed.

More related infos: IN() list queries

@borisdj borisdj closed this as completed Jan 25, 2022
@Tyler-V
Copy link
Author

Tyler-V commented Jan 26, 2022

Thank you @borisdj we'll give it a shot

@Tyler-V
Copy link
Author

Tyler-V commented May 3, 2022

@borisdj any plans to make this extensible and a permanent part of the library's API?

I found this library which would be an improvement but I think the approach of using a bulk inserted temp table would yield better results than he has graphed

https://github.com/yv989c/BlazarTech.QueryableValues

@borisdj
Copy link
Owner

borisdj commented May 3, 2022

Did not planned it, but can consider.
Still this use case is very specific and not easy to make more general since in broad case there could be more then one child type entities, then subchilds in depth and even PK could be complex made of several columns.

I guess what could be done, just for simple case (support for only one type of child), is to encapsulate this in a method
using generics BulkReadDual<Parent, Child>(entities)
or with 2 lists in arguments: BulkReadDual(entities, subEntities)

If you are up for a challenge to try implement it and make a PR.

@Tyler-V
Copy link
Author

Tyler-V commented May 5, 2022

@borisdj I will see if I can properly submit a PR, not sure I have the interior knowledge of the library required but this is the extension method we came up with that works quite well for our use case,

    public static class BulkExtensionsExtensions
    {
        public static async Task<List<T>> BulkReadWhereContainsAsync<T, TA>(this DbSet<T> dbSet, string propertyName, List<TA> values) where T : class
        {
            var context = dbSet.GetDbContext();
            var entities = CreateEntities<T, TA>(propertyName, values);
            var tableInfo = TableInfo.CreateInstance(context, typeof(T), new List<T>(), OperationType.Insert, null);
            var db = context.Database;
            await db.ExecuteSqlRawAsync(SqlQueryBuilder.CreateTableCopy(tableInfo.FullTableName, tableInfo.FullTempTableName, tableInfo));
            await db.ExecuteSqlRawAsync(SqlQueryBuilder.AlterTableColumnsToNullable(tableInfo.FullTempTableName, tableInfo));
            var tempTableName = tableInfo.FullTempTableName.Replace("[", "").Replace("]", "");
            await context.BulkInsertAsync(entities, new BulkConfig() { CustomDestinationTableName = tempTableName });
            var readTableInfo = TableInfo.CreateInstance(context, typeof(T), new List<T>(), OperationType.Read, null);
            var records = await dbSet.FromSqlRaw(
                $"SELECT * FROM {readTableInfo.FullTableName} " +
                $"WHERE {propertyName} IN " +
                $"(SELECT {propertyName} FROM {tableInfo.FullTempTableName}) "
            ).AsNoTracking().ToListAsync();
            await db.ExecuteSqlRawAsync(SqlQueryBuilder.DropTable(tableInfo.TempTableName, false));
            return records;
        }

        private static DbContext GetDbContext<T>(this DbSet<T> dbSet) where T : class
        {
            var infrastructure = dbSet as IInfrastructure<IServiceProvider>;
            var serviceProvider = infrastructure.Instance;
            var currentDbContext = serviceProvider.GetService(typeof(ICurrentDbContext))
                as ICurrentDbContext;
            return currentDbContext?.Context;
        }

        private static List<T> CreateEntities<T, TA>(string propertyName, List<TA> values) where T : class
        {
            var entities = values
                .Select(v =>
                {
                    var entity = (T)Activator.CreateInstance(typeof(T));
                    entity?.GetType().GetProperty(propertyName)?.SetValue(entity, v, null);
                    return entity;
                })
                .ToList();
            return entities;
        }
    }

EFCore 6.0 Contains

                await _packageReservationContext.PackageAssignment
                    .Where(r => ids.Contains(r.PackageId))
                    .ToListAsync();

BlazarTech.QueryableValues

                await _packageReservationContext.PackageAssignment
                   .Where(r => _packageReservationContext
                       .AsQueryableValues(ids)
                       .Contains(r.PackageId))
                   .ToListAsync()

BulkReadWhereContainsAsync

                await _bulkService.BulkReadWhereContainsAsync(
                    _packageReservationContext.PackageAssignment,
                    nameof(PackageAssignment.PackageId)
                    ids,
                );

Test 1
Scenario: Find entities where they contain one of 50,000 properties, returning a list of 150,000 entities
EFCore 6.0 Contains: 6250ms
BlazarTech.QueryableValues: 1418ms
BulkReadWhereContainsAsync: 1036ms

Test 2
Scenario: Find entities where they contain one of 60,000 properties, returning a list of 180,000 entities
EFCore 6.0 Contains: (Fails) Internal error: An expression services limit has been reached. Please look for potentially complex expressions in your query, and try to simplify them.
BlazarTech.QueryableValues: 1546ms
BulkReadWhereContainsAsync: 1156ms

Test 3
Scenario: Find entities where they contain one of 100,000 properties, returning a list of 300,000 entities
EFCore 6.0 Contains: (Fails) Internal error: An expression services limit has been reached. Please look for potentially complex expressions in your query, and try to simplify them.
BlazarTech.QueryableValues: 2481ms
BulkReadWhereContainsAsync: 1937ms

@borisdj
Copy link
Owner

borisdj commented May 7, 2022

Will see if it can be added / integrated into library.

@borisdj
Copy link
Owner

borisdj commented May 25, 2023

This can now be done directly using config ReplaceReadEntities:

when set to True result of BulkRead operation will be provided using replace instead of update. Entities list parameter of BulkRead method will be repopulated with obtained data. Enables functionality of Contains/IN which will return all entities matching the criteria and only return the first (does not have to be by unique columns).

@borisdj borisdj closed this as completed May 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants