Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce freelist interface #773

Closed
tjungblu opened this issue Jun 24, 2024 · 8 comments · Fixed by #775
Closed

Introduce freelist interface #773

tjungblu opened this issue Jun 24, 2024 · 8 comments · Fixed by #775

Comments

@tjungblu
Copy link
Contributor

tjungblu commented Jun 24, 2024

bbolt features two types of freelist: "array" and "hashmap". They are currently implemented in one monolithic struct, with their differences in functions separated into two go files.

Testing has been a pain so far, with many of the freelist details (i.e.releasing pages and serde to disk) leaking outside of the freelist struct. This makes reasoning about the implementations very difficult, too. Adding a new type of list is becoming increasingly cumbersome without a proper interface.

Quickly pulling out the existing functions into an interface could look like this:

type Freelist interface {

	// Init initializes this freelist with the given list of pages. 
        // TODO currently is "readIDs", maybe should be called Read(page *common.Page)
	Init(ids common.Pgids) 

	// Allocate returns the starting page id of a contiguous block of a given size in number of pages.
	// If a contiguous block cannot be found then 0 is returned.
	Allocate(txid common.Txid, numPages int) common.Pgid

	// Count returns the number of free pages.
	Count() int

	// FreePageIds returns all free pages.
	FreePageIds() common.Pgids

	// MergeSpans merges the given pages into the existing freelist.
	MergeSpans(ids common.Pgids) // TODO should probably named "Release" or "Free"

	// Write writes the freelist into the given page. The freelist MUST fit into the given page.
	Write(page *common.Page) error
}

Ideally we would move this into its own internal package in order to avoid leaking any more internals in the future. HashMap and Array should be two implementation as its own struct. Some shared functions (ie read/write) could be implemented in a common struct for now. Even though I expect those to become implementation dependent at some point, it seems quite inefficient to write low-digit uint64 integers without vint compression (or as a bitset), for example.

@ahrtr
Copy link
Member

ahrtr commented Jun 24, 2024

High level makes sense. Eventually we should consider to deprecate the Array free list.

@tjungblu
Copy link
Contributor Author

tjungblu commented Jun 24, 2024

Admittedly, this interface was too idealistic for such an old codebase :) Turns out we're leaking implementation details everywhere and for some optimization also have introduced bad coupling. The tests are especially fun to decouple 🥇

Also the coupling of the transaction to the pages should not go into the freelist (imho) and we should have another mapping in the Tx that contains what pages have been allocated for it.

Below seems like a functional interface, but I think we would need to cut this down a bit further to be practical.

type Freelist interface {

	// Init initializes this freelist with the given list of pages.
	Init(ids common.Pgids)

	// Allocate returns the starting page id of a contiguous block of a given size in number of pages.
	// If a contiguous block cannot be found then 0 is returned.
	Allocate(txid common.Txid, numPages int) common.Pgid

	// Count returns the number of free pages.
	Count() int

	// PendingCount returns the number of pending pages.
	// TODO(thomas): implementation detail?
	PendingCount() int

	// FreePageIds returns all free pages.
	FreePageIds() common.Pgids

	// Release moves all page ids for a transaction id (or older) to the freelist.
	Release(txid common.Txid)

	// ReleaseRange moves pending pages allocated within an extent [begin,end] to the free list.
	ReleaseRange(begin, end common.Txid)

	// Free releases a page and its overflow for a given transaction id.
	// If the page is already free then a panic will occur.
	Free(txid common.Txid, p *common.Page)

	// Rollback removes the pages from a given pending tx.
	Rollback(txid common.Txid)

	// Freed returns whether a given page is in the free list.
	Freed(pgId common.Pgid) bool

	// Reload reads the freelist from a page and filters out pending items.
	// TODO(thomas): could be just Read?
	Reload(p *common.Page)

	// NoSyncReload reads the freelist from Pgids and filters out pending items.
	// TODO(thomas): could be just Init?
	NoSyncReload(Pgids []common.Pgid)

	/*
	 * SerDe Section
	 */

	// Read calls Init with the page ids stored in te given page.
	Read(page *common.Page)

	// EstimatedWritePageSize returns the size of the page after serialization in Write.
	// TODO(thomas): implementation detail?
	EstimatedWritePageSize() int

	// Write writes the freelist into the given page.
	Write(page *common.Page)
}

source (not everything is implemented/updated yet):
tjungblu@d3831cb

Do you want to continue through a PR? Happy to adjust and refactor the remaining pieces.

tjungblu added a commit to tjungblu/bbolt that referenced this issue Jun 25, 2024
This introduces an interface for the freelist, splits it into two concrete
implementations.

fixes etcd-io#773

Signed-off-by: Thomas Jungblut <[email protected]>
tjungblu added a commit to tjungblu/bbolt that referenced this issue Jun 25, 2024
This introduces an interface for the freelist, splits it into two concrete
implementations.

fixes etcd-io#773

Signed-off-by: Thomas Jungblut <[email protected]>
@tjungblu
Copy link
Contributor Author

opened #775, let's focus on the interface first. I think we also need @ivanvc's benchmark for it :)

tjungblu added a commit to tjungblu/bbolt that referenced this issue Jun 25, 2024
This introduces an interface for the freelist, splits it into two concrete
implementations.

fixes etcd-io#773

Signed-off-by: Thomas Jungblut <[email protected]>
tjungblu added a commit to tjungblu/bbolt that referenced this issue Jun 25, 2024
This introduces an interface for the freelist, splits it into two concrete
implementations.

fixes etcd-io#773

Signed-off-by: Thomas Jungblut <[email protected]>
@ahrtr
Copy link
Member

ahrtr commented Jun 26, 2024

I suggest to move all array related methods into a separate file freelist_array.go as the very first step, so as to simplify the freelist.go. Please feel free to deliver a PR for that, thx.

@ahrtr
Copy link
Member

ahrtr commented Jun 26, 2024

After that we can continue to discuss how to refactor the freelist management in your PR #775.

Also as your pointed out above, we need the benchmark (#750) workflow check ready beforehand. @ivanvc can you take that as a higher priority task? thx

@tjungblu
Copy link
Contributor Author

there you go @ahrtr
#777

@ivanvc
Copy link
Member

ivanvc commented Jun 26, 2024

@ivanvc can you take that as a higher priority task? thx

Yes, I was able to work on it yesterday. I'll try to have something by the EOD.

tjungblu added a commit to tjungblu/bbolt that referenced this issue Jun 27, 2024
This introduces an interface for the freelist, splits it into two concrete
implementations.

fixes etcd-io#773

Signed-off-by: Thomas Jungblut <[email protected]>
tjungblu added a commit to tjungblu/bbolt that referenced this issue Jun 27, 2024
This introduces an interface for the freelist, splits it into two concrete
implementations.

fixes etcd-io#773

Signed-off-by: Thomas Jungblut <[email protected]>
tjungblu added a commit to tjungblu/bbolt that referenced this issue Jun 27, 2024
This introduces an interface for the freelist, splits it into two concrete
implementations.

fixes etcd-io#773

Signed-off-by: Thomas Jungblut <[email protected]>
@ahrtr
Copy link
Member

ahrtr commented Jul 1, 2024

As mentioned in #772 (comment), the motivation of the refactoring is to achieve better testability and understandability (and accordingly better maintainability).

Just submitted a PR #783, we still need to think more about it from test perspective. (test driven).

We may also need to draft a document to summarize the data structure and the algorithms based on it, and also how we will test it.

tjungblu added a commit to tjungblu/bbolt that referenced this issue Jul 2, 2024
This introduces an interface for the freelist, splits it into two concrete
implementations.

fixes etcd-io#773

Signed-off-by: Thomas Jungblut <[email protected]>
tjungblu added a commit to tjungblu/bbolt that referenced this issue Jul 2, 2024
This introduces an interface for the freelist, splits it into two concrete
implementations.

fixes etcd-io#773

Signed-off-by: Thomas Jungblut <[email protected]>
tjungblu added a commit to tjungblu/bbolt that referenced this issue Jul 2, 2024
This introduces an interface for the freelist, splits it into two concrete
implementations.

fixes etcd-io#773

Signed-off-by: Thomas Jungblut <[email protected]>
tjungblu added a commit to tjungblu/bbolt that referenced this issue Jul 2, 2024
This introduces an interface for the freelist, splits it into two concrete
implementations.

fixes etcd-io#773

Signed-off-by: Thomas Jungblut <[email protected]>
tjungblu added a commit to tjungblu/bbolt that referenced this issue Jul 2, 2024
This introduces an interface for the freelist, splits it into two concrete
implementations.

fixes etcd-io#773

Signed-off-by: Thomas Jungblut <[email protected]>
tjungblu added a commit to tjungblu/bbolt that referenced this issue Jul 2, 2024
This introduces an interface for the freelist, splits it into two concrete
implementations.

fixes etcd-io#773

Signed-off-by: Thomas Jungblut <[email protected]>
tjungblu added a commit to tjungblu/bbolt that referenced this issue Jul 2, 2024
This introduces an interface for the freelist, splits it into two concrete
implementations.

fixes etcd-io#773

Signed-off-by: Thomas Jungblut <[email protected]>
tjungblu added a commit to tjungblu/bbolt that referenced this issue Jul 2, 2024
This introduces an interface for the freelist, splits it into two concrete
implementations.

fixes etcd-io#773

Signed-off-by: Thomas Jungblut <[email protected]>
tjungblu added a commit to tjungblu/bbolt that referenced this issue Jul 2, 2024
This introduces an interface for the freelist, splits it into two concrete
implementations.

fixes etcd-io#773

Signed-off-by: Thomas Jungblut <[email protected]>
@ahrtr ahrtr closed this as completed in 62e81c0 Jul 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging a pull request may close this issue.

3 participants