Skip to content

perama-v/min-know

Repository files navigation

Min-know

An implementation of the ERC-time-ordered-distributable-database (TODD) as a generic library. It can be used to make data TODD-compliant to facilitate peer-to-peer distribution.

Status: prototype

Why does this library exist?

To test out a new database design, where user participation makes the entire database more available.

Questions for you:

  • Do you have data that grows over time and that you would like users to host?
  • Are you providing data as a public good and are wondering how to wean to community?

Min-know makes data into an append-only structure that anyone can publish to. Distribution happens like a print publication where users obtain Volumes as they are released. A user becomes a distributer too.

Volumes contain Chapters that can be obtained separately. This effectively divides the database, making large databases manageable for resource-constrained users.

Principles

📘🔍🐟

To make any database TODD-compliant so that data-users become data-providers.

TODD-compliance is about:

  1. Delivering a user the minimum knowledge that is useful to them.
  2. Delivering a user some extra data.
  3. Making it easy for a user to become a data provider for the next user.

A minnow is a small fish 🐟 that can be part of a larger collective.

End Users

Data is published in Volumes.

📘 - A Volume

Volumes are added over time:

📘 📘 📘 📘 📘 ... 📘 <--- 📘 - All Volumes (published so far).

Volumes have Chapters for specific content. Chapters can be obtained individually.

  • 📘 An example volume with 256 Chapters
    • 📕 0x00 First chapter (1st)
    • ...
    • ...
    • 📙 0xff Last Chapter (256th)

A Manifest 📜 exists that lists all Chapters for all Volumes. A manifest simple contains IPFS hashes for data (see example manifests). A user can check the manifest and find which Chapter is right for them. They can ignore the IPFS hashes that don't match their needs.

📜🔍🐟

The user starts with something they know (a key), for example, an address. For every key, only one Chapter will be important.

  • User (🐟) key is an address: 0xf154...f00d.
  • Data is divided into chapters using the first two characters of address (Chapter = 0xf1)

Visually:

  • 📕 0x00
  • ...
  • ...
  • 📗 0xf1 <--- 🐟 0xf154...f00d (user only needs this Chapter)
  • ...
  • ...
  • 📙 0xff

For every published Volume, the user only downloads the right Chapter for their needs. The Min-know library automates this by using the CIDs in the manifest to find files on IPFS.

This means obtaining one Chapter from every Volume that has ever been published. Hence, the user 🐟 only needs 1/256th of the entire database.

Once downloaded, the Chapters can be queried for useful information that the database contains.

Optionally, they can also pin their Chapters to IPFS, which makes the data available from more sources.

Interface

Iteraction with the library occurs the Todd struct ([database::types::Todd]) through the methods:

  • For users:
    • obtain_relevant_data()
    • check_completeness()
    • find()
  • For maintainers:
    • full_transformation()
    • extend()
    • repair_from_raw()
    • generate_manifest()
    • manifest()

Architecture

See ./ARCHITECTURE.md for how this library is structured.

Examples

All examples can be seen with the following command:

cargo run --example

See ./examples/README.md for more information.

Databases

See ./DATABASES.md for different databases that have been implmemented in this library.

Database Maintainers

The maintainer methods in the examples are used to create and extend a TODD-compliant database.

This requires having a local "raw" source, which will be different for every data type. The library will use the methods in the ./extraction module to convert the data.

For example:

  • The address-appearance-index is created and maintained by having locally available Unchained Index chunk files (produced by trueblocks-core https://github.com/TrueBlocks/trueblocks-core)). They are parsed and reorganised to form the TODD-compliant format.
  • The nametags database is created and maintained by having individual files (one per address) that contain JSON-encoded names and tags.

Other raw formats might be flat files containing data of various kinds.

Extend the library for your data