Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(memorymanager) Replace the Memorylist used in the MemoryManager by a MemoryHashTable #1701

Merged
merged 22 commits into from
Jun 14, 2024

Conversation

Manangka
Copy link
Contributor

@Manangka Manangka commented Apr 3, 2024

This commit replaces the the List container in the the MemoryList by a HashTable. To aid in the way a HashTableworks the KeyValueList class has been introduced. This is the internal container used by the HashTable. This list differs from a regular list in that it also assigns a key to a node. This helps in retrieving the correct node from a list.

Due to the way a HashTable works the accompanying iterator don't have to return the values of the HashTable in the same order as in which they are added. This is different than a regular List that was used before. This resulted in some errors when deallocating memory. This has been solved by adding the key when deallocating the memory. For example:
call mem_deallocate(this%alh)
becomes
call mem_deallocate(this%alh, 'ALH', trim(this%memoryPath))

Tests have been performed on the new data container and show that a HashTable out performs when it comes to initializing an simulation as well as performing value lookup. The results can be found below.

The test simulation contains 105743 variables. A worst case scenario is tested where we try to obtain the list variable in the list.
Each simulation has been tested with a debug and release version of ModFlow. The values displayed below are the average values after running each test case 5 times

List (Debug) HashTable (Debug) List (Release) HashTable (Release)
Elapsed init time, avg (s) 492,1802948 155,4768408 328,6092749 94,18164263
Elapsed get time, avg (s) 0,044770432 0,00027034 0,025319958 0,00016108

This PR is still in draft waiting for #1765 to be merged

@Manangka Manangka marked this pull request as draft April 3, 2024 14:35
@Manangka Manangka force-pushed the MemoryManangerHashTable branch 2 times, most recently from a454d11 to b53ca71 Compare April 9, 2024 12:53
@emorway-usgs
Copy link
Contributor

Hello @Manangka, @mjr-deltares - some weeks ago I mentioned that I had a version of a FloPy script that attempted to generate a version of the "Barends" model that was a series of interconnected 1D models (setup in a manner similar to what's shown in the image).
multi-model

Overall, there were 2,002 models: 1 GWF and 1 GWE model representative of the saturated zone, and 1,000 GWF and 1,000 GWE 1-dimensional, vertically oriented, models representative of the "overburden". I eventually abandoned this approach in favor of a DISU approach; however, I'm attaching the script here in case it may be useful for your purposes. The large number of models within the simulation caused both FloPy and MF6 to run excessively slow. I never got to the point of running the model successfully (i.e., convergence), but as I recall the script would fill-out the simulation directory with the necessary model input (took over an hour as I recall) and then MF6 would attempt to run it, but just reading in the input took over an hour, as best I can recall.
test_gwe_thermal_bleed_2.txt

@Manangka
Copy link
Contributor Author

Manangka commented May 8, 2024

Hello @Manangka, @mjr-deltares - some weeks ago I mentioned that I had a version of a FloPy script that attempted to generate a version of the "Barends" model that was a series of interconnected 1D models (setup in a manner similar to what's shown in the image). multi-model

Overall, there were 2,002 models: 1 GWF and 1 GWE model representative of the saturated zone, and 1,000 GWF and 1,000 GWE 1-dimensional, vertically oriented, models representative of the "overburden". I eventually abandoned this approach in favor of a DISU approach; however, I'm attaching the script here in case it may be useful for your purposes. The large number of models within the simulation caused both FloPy and MF6 to run excessively slow. I never got to the point of running the model successfully (i.e., convergence), but as I recall the script would fill-out the simulation directory with the necessary model input (took over an hour as I recall) and then MF6 would attempt to run it, but just reading in the input took over an hour, as best I can recall. test_gwe_thermal_bleed_2.txt

@emorway-usgs Thank you for the example. I used it to compare this branch against the develop branch and the performance difference is quite impressive. I first used your script to produce the the simulation results then I measured how long it took MODFLOW to run it before encountering an error:

branch container time
develop list 1h33m41s
MemoryManagerHashTable hashtable 5m30s

# Conflicts:
#	autotest/TestListIterator.f90
#	autotest/TestMemoryContainerIterator.f90
#	autotest/meson.build
#	autotest/tester.f90
#	make/makefile
#	msvs/mf6lib.vfproj
#	src/Utilities/Iterator.f90
#	src/Utilities/ListIterator.f90
#	src/Utilities/Memory/MemoryContainerIterator.f90
#	src/Utilities/Memory/MemoryList.f90
#	utils/mf5to6/make/makefile
#	utils/mf5to6/msvs/mf5to6.vfproj
#	utils/mf5to6/pymake/extrafiles.txt
@Manangka Manangka marked this pull request as ready for review June 13, 2024 09:19
@Manangka Manangka changed the title feat(memorymanager) Replace the Memorylist used in the MemoryManager by a MemoryHasTable feat(memorymanager) Replace the Memorylist used in the MemoryManager by a MemoryHashTable Jun 13, 2024
@@ -1771,7 +1771,7 @@ subroutine gwf_gwf_da(this)
call mem_deallocate(this%condsat)
call mem_deallocate(this%idxglo)
call mem_deallocate(this%idxsymglo)
call mem_deallocate(this%simvals)
call mem_deallocate(this%simvals, 'SIMVALS', this%memoryPath)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remind me why these particular changes are now necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mem_deallocate method has 2 ways of finding the variable to deallocate. If a name and path is provided it first tries to find the variable using that. If it fails or if no name and path is provided it tries to find the variables by matching the pointer (e.g. the this%simvals with the pointer in a memorytype)

The second thing that is important is that there are 2 types of memory types , a master record and copies.

As mentioned before, when no name and path is provided it tries to find the variable by matching the pointer. When iterating through a regular list it will go through the items in the order they are added. As master items are added before copies this will be matched and returned.

However the iterator of a HashTable doesn't have the same ordering characteristics of a List iterator. This means that a copy can be hit before the master record. This caused an error because the master record is never deallocated.

A master record and a copy do have different names. Therefor by matching the memory types by name and path will result in the correct memory type to be deallocated

@Manangka
Copy link
Contributor Author

@mjr-deltares @langevin As discussed on the call I renamed the MemoryList to MemoryStore.
I will create a separate PR in which I will cleanup/rename all references to memorylist. The reason for a separate PR is because there are over 180 references to a memorylist and I don't want to pollute this PR.

Further we also talked about throwing a warning when an item is added to the MemoryStore with a name and path that has already been added (duplicate item). I thought I didn't add it but I was wrong. I added it to the HastTable class

@langevin-usgs
Copy link
Contributor

This is great, @Manangka. I'd like to get this in. There is an unrelated failure (I think), which I triggered to rerun. When it passes, I'll push the button to bring this in. Thanks! Really nice work.

@langevin-usgs langevin-usgs merged commit df1f389 into MODFLOW-USGS:develop Jun 14, 2024
18 checks passed
Manangka added a commit that referenced this pull request Jun 19, 2024
…rystore (#1876)

In PR #1701 the MemoryList type has been renamed to MemoryStore.
This commit contains some cleanup actions. It renames all references to the memorylist to memorystore.
It doesn't contain any functional changes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants