Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CIP-0073? | Oracle Datum Standard #357

Draft
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

codybutz
Copy link

This CIP proposes a standard for Oracle Datum formats. The goal of this initial CIP is to provide the standard for KYC and Price feeds. However, this Oracle CIP is intended to be a living standard, allowing other forms of Datums to be available to be standardized. For example, this CIP could be extended to include standards for weather data, population data, etc.

Charli3 is finalizing a open-source PlutusTX library for this standard.

@codybutz codybutz marked this pull request as draft October 20, 2022 18:32
@KtorZ KtorZ changed the title CIP-XXXX | Oracle Datum Standard Oracle Datum Standard Oct 21, 2022
@peterVG
Copy link

peterVG commented Oct 24, 2022

This CIP is of obvious interest to the Orcfax oracle project as well. We support the rationale for an oracle datum standard and would like to make it compatible within our Cardano Open Oracle Protocol which is a slightly broader, opinionated design for requesting and consuming oracle datum on Cardano. We have just learned of this CIP draft update and will need time to schedule our own review before we can contribute any meaningful input if that is useful/desirable.

Comment on lines 31 to 32
, ? 1 : posixtime ; unix timestamp related to when the price data was created
, ? 2 : posixtime ; unix timestamp related to when the price data is expired

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering using a single time to denote when the value was 'observed'. In frp terminology, price is a function over 'continuous' time and it simplifies working with it in such way. So one could just have a field ? 1 : posixtime ; unix timestamp of when the price was observed.

Any script running onchain can make their own decision on how much 'interval' to put around an observation by simply saying contains someIntervalAroundObservation observedAt

{ ? 0: oracle_provider_id
, ? 1: data_source_count
, ? 2: data_signatories_count
, ? 3: oracle_provider_signature

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what this signature will represent? The oracle signs some X, what is X?

@bladyjoker
Copy link

Hey @codybutz nice work, I'm glad that structured data representation is being discussed as it directly relates to the project we're running funded by Catalyst (https://cardano.ideascale.com/c/idea/421376).

Allow me to paint a different picture of how this could work, that wouldn't require any adoption at the Cardano level via a CIP and CBOR/CDDL wouldn't even have to be mentioned. It would all rely on standardized 'PlutusData' encoding for any type specified in a configuration file.

Perhaps it's useful to start with listing related technologies in this space and the engineering principles they embody:

  1. Google Protobuf
  2. Apache Thrift
  3. ASN.1
  4. And indeed CDDL

These technologies enable a very important separation of concerns between what's called an 'Abstract syntax' (ie. your types) and 'Concrete syntax' (ie. values of your types, encodings).
CDDL is of course one such technology, but it is in many ways inappropriate due to it's complexity and very limited tooling support, but also it is highly coupled with its 'Concrete syntax' which is CBOR.

When specifying 'shared types' we must consider a broader perspective in which these types are going to be shared in. For instance, backend-frontend communication, storage, databases and of course when building Cardano transactions. Each of these settings if not accounted for will require a lot of manual effort, and this proposal imo could have nicer gains if a different route altogether is considered.

In our project we'd aim to support definitions of types much like you're used to do in Haskell...

oracle.lambuff

-- | NOTE: Not Haskell! But an IDL language inspired by Haskell
module Cardano.Oracle where

import Plutus (AssetClass, POSIXTime)

-- | OracleDatum is a generic Oracle envelope
data OracleDatum a = OracleDatum {
  provenance :: OracleProvenance
  datum :: a
}

-- | Provenance information provided with each datum
data OracleProvenance = OracleProvenance {
  providerId :: Nat,
  dataSourceCount :: Nat,
  dataSignatoriesCount :: Nat,
  oracleSignature :: Bytes
}

data Price = Price {
  ratio :: Decimal,
  at :: POSIXTime,
  baseAsset :: AssetClass,
  quoteAsset  :: AssetClass
}

type PriceOracleDatum = OracleDatum Price

Now specifying types is ofc only one part of the story, the second part is where actual value comes in:

  1. Codegen Purescript and Haskell type libraries with runtime support for standardized PlutusData and JSON encodings
  2. Codegen tailored for common PAB backends such as cardano-transaction-lib, and common Plutus frontends such as PlutusTx and Plutarch
  3. Backwards compatibility management
  4. Specified and thoroughly tested cross-language compatibility

What this would mean in short is that Charli3 and Orcfax could agree on a 'Oracle schema' they publish as a configuration file in a repository and would have to provide zero code. After that users can leverage our tooling to have these types communicable in any of the aforementioned scenarios (transaction building, plutus programming, databases and storage, analytics, frontend-backend communication).

Apologies for the huge rant, I just wanted to take the opportunity to advertise our project and illustrate its goals.

@KtorZ KtorZ changed the title Oracle Datum Standard CIP-???? | Oracle Datum Standard Oct 25, 2022
Copy link
Contributor

@GeorgeFlerovsky GeorgeFlerovsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the motivations of this CIP, and think that oracle projects should do more over the coming year to improve the experience for data consuming projects to consume multi-provider oracle data.

My main objection to the CIP is that using integer CBOR tags inevitably leads to global/central registries, which could become centralised gatekeepers. There are alternative approaches to coordinating the consumer-provider interaction that would not lead to this centralisation -- in particular, tagging content with a schema hash would avoid the need to look at a global registry. (See my specific review comment)

More narrowly, this CIP standard is unfortunately not compatible with the datum structure used by Cardano Open Oracle Protocol (COOP) and Orcfax. In the medium-longer term, I'm sure we can find a way to reconcile them, but it's not feasible in the short term (i.e. Indigo launch / COOP release).

In general, I think that this CIP may be too ambitious right now. It would probably be more beneficial to have more domain-specific standards for several particular usecases, and then see the emerging unifying theme / best practices emerge over the course of the following year.

For now, I would recommend that Indigo proceed with what it's already committed to, and perhaps even to simplify the CIP to remove the generalisation features, as I suspect this and other standards will be rather transient over the course of the coming year. Particularly, I would hold off on launching the global Cardano tag registry and oracle provider registry.

Comment on lines 38 to 46
### Data Format

The basis for this standard is the Concise Binary Object Representation (CBOR) data format, for which more information can be found here: [https://www.rfc-editor.org/rfc/rfc8949.html](https://www.rfc-editor.org/rfc/rfc8949.html)

A central aspect of the CBOR standard that is used extensively is the concept of a Tag, which allows data consumers to identify and consume the types of data that they expect, and gracefully ignore other pieces of data that they are not interested in.

Similar to the CBOR standard this CIP will also propose a CBOR Tags Registry. Data providers are able to provide data according to the formats provided in this registry, and data consumers can discern the different available formats to pick the exact kind of data they need out of a feed.

To maintain compatibility with the main CBOR Tags Registry, the tags proposed in this are defined as an offset from Base Oracle Datum CIP Tag Number, which is defined later in this document. For example, Tag +5 would correspond to the actual number OracleDatumCipTagNumber + 5.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using an integer to identify content is inherently centralising, because it implies that there is a global central data structure into which the integer is indexing.

By contrast, tagging a data object with metadata about the hash of a schema does not require such global info -- as long as the consumer has the schema definition that corresponds to the schema hash, he can interpret any data that the provider gives the consumer under that schema, or else reject it as invalid data (i.e. incompatible with the schema).

An opt-in global cache of schema definitions can help consumers discover schemas that they want to support, but crucially, membership of a schema in that cache is not required for a consumer to use it. Any consumer can unilaterally declare that they support a particular schema, and then any providers can choose to provide data under that schema to that consumer, without interacting with any other third parties.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you talking about transaction metadata, or another kind of metadata? It is my understanding that transaction metadata is not available in any trusted way to the on-chain validators, so cannot be used by it to parse the data in a trusted way.

In my view having data be verifiable on-chain by a validator is a key feature of any useful oracle datum, which is why this pre-defined schema approach was chosen.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I'm not talking about transaction metadata here. I'm talking about adding something like a "data_schema_hash" field to the data object that gets put into the oracle utxo datum, which identifies the schema that corresponds to the data object.

The consumer of the data object is able to properly interpret the data object as long as he is aware of a data schema definition that hashes to that data schema hash. The consumer can obtain such data schema definitions from any convenient source, and has full confidence that a data schema definition obtained from any source will correspond to the data object of interest, as long as the hashes match.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems what you're proposing here is an entirely different approach, not sharing very much if anything with the proposal given here. While that approach may have advantages and disadvantages, I'm not sure this proposal is the correct venue to get into that discussion. Is there a CIP proposal or forum write-up of this somewhere where this discussion could be had?

Comment on lines 58 to 62
### CIP CBOR Tags Registry

As additional use cases and data fields become necessary, this Tags Registry should be amended and updated to add new Tags and fields inside specific tags. Care should be taken to not change or remove existing Tags and fields inside specific tags, so as to maintain backwards compatibility to as great an extent as possible.

Tags are optional; data providers may provide multiple different tags covering the same data. The standard permits oracle feed providers to be flexible about what data they provide, while adhering to the standard. This allows feed consumers to programmatically discover what information is on the feed, while ignoring information that is not relevant to their use case.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that a CBOR Tags Registry would be a centralising influence on Cardano. An oracle provider should not have to apply to the registry maintainers to register a new tag, every time that the oracle provider wants to publish a new data series.

CBOR integer tags make it impossible to coordinate the consumer/provider consensus on what a particular integer means, without both referring to the global central registry. Whereas, under schema-addressing, all they need is a schema definition to agree that they will use the corresponding schema hash. A global cache of schemas then becomes a useful place to discover/obtain schema definitions, but not strictly necessary for consumers to interact with providers.

Copy link

@ipo ipo Feb 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Such a tag registry could probably created in a decentralized way as well, perhaps governed by an on-chain DAO or some sort of staking-to-add-entry or even outright paying a nominal ADA amount to add entries, the amount which could then be forwarded to a charity or something to prevent a profit motive.

I'm not sure the benefit of decentralizing this aspect would be worth the squeeze. Additionally I think that if someone tried to unfairly assert control through the tag registry, the open nature of the standard would allow it to be forked under more democratically-minded leadership, ultimately.

There are 3 major categories of Tags, described in further detail later on, SharedData, GenericData and ExtendedData. In the list the ordering is defined as [ ? SharedData, 1* GenericData, ? ExtendedData ]. Or in plain english, a SharedData may optionally come first, then 1 or more GenericData in no particular order, and finally an ExtendedData may optionally come last.


### CIP CBOR Tags Registry
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, the current datum structure used in Cardano Open Oracle Protocol (COOP) and in Orcfax are incompatible with this CBOR tag based approach, and we are not able to support it in the short-term.

Theoretically, in the medium-longer term, it should be possible to adapt COOP and/or Orcfax to support oracle providers that publish data under this CIP standard, or to provide adapters to convert to/from the two approaches.

Copy link

@ipo ipo Dec 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Orcfax were invited to participate in the process, we were very open, but there didn't seem to be a great deal of interest at the time, and as far as I am aware have not participated at any point at all.

Copy link
Contributor

@GeorgeFlerovsky GeorgeFlerovsky Jan 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a representative of Orcfax. As an MLabs dev/manager, I collaborated with Orcfax on a (now concluded) Catalyst-funded project to implement the Cardano Open Oracle Protocol (COOP).

It is unfortunate that there weren't more active discussion earlier in your CIP process to potentially converge on a single standard. My comment here merely states the fact that your CIP is incompatible with COOP because you identify oracle providers via integer ID pointers to an external registry, whereas COOP identifies oracle providers via the minting policy ID of the tokens the oracle uses to authenticate its published data. The two approaches are fundamentally not compatible (unless you allow oracle providers to be identified by either an integer ID or a minting policy ID).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two approaches can certainly co-exist in the Cardano ecosystem, and I'm curious to see which one will be more beneficial in practice.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please believe that I am making my comments in good faith and hoping that they are constructive for you. 🙏

Comment on lines 17 to 25
## Motivation

Oracles provide a way for real-world data to be provided to a blockchain or smart contract to allow interaction with external data sources. Oracle's query, verify, and authenticate external data and then relay it to the blockchain. Oracles can provide multiple types of data feeds, for example: asset spot price, access to indexes, and statistics data for a particular blockchain.

Oracles act as a data provider to the blockchain. As a data provider, it makes sense for the data feeds to be standardized so that they can be reused by multiple different projects, reducing fee overhead and decreasing feed setup overhead.

Data consumers, who are reading the price data from an oracle, want to be able to support multiple data feeds from different providers for fall-back mechanisms and additional decentralization. If each oracle solution is providing a different datum/metadata structure, these particular data structures have to be supported by the data consumers' smart contracts and/or applications. By having a standard data structure for each particular data feed type, as a data consumer I can write my smart contracts/application to support that standard and expect the oracle to support that particular standard.

Data consumers that share a common format for the data feeds they consume, may also be able to collaborate to provide more well-secured and frequently updated data feeds to share together, than they would otherwise be able to accomplish by each using their own standard.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

100% agree with this motivation. Oracle projects so far have mostly taken a provider-centric perspective, perhaps neglecting some of these issues that come up for consumers seeking to use info from multiple providers.

Consumers and oracle providers need to meet in the middle with standards that work for both sides to publish and use data.

Comment on lines 31 to 36
Such a standard should accomplish a number of somewhat contradictory objectives

* Data provided by the feed should be possible to alter over time, to add more detailed data or remove parts of the data, without major disruption to existing users of the data feed
* Data provided in the feed should be stored in an efficient manner, to avoid data duplication and excessive transaction costs
* The standard should be flexible enough to capture the majority of use-cases, even those that may not yet be conceived of at the writing of this document
* The standard should provide room for individual data providers to add their own data fields to a feed, without interfering with the common standard
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the first two objectives.

I think that it is, to some extent, premature to attempt a standard as general/flexible as suggested by the third objective, as oracle providers are realistically only just starting to prototype the provision of data for various domains. I don't think we're in a position yet to develop a grand unified data standard for oracles on Cardano.

I agree with the fourth objective, but I think it must mean that the CIP itself cannot prescribe domain-specific data standards (e.g. data types in specific CBOR tags, or specific json schemas to be used for price feeds, etc). It must defer to schemas defined outside the CIP standard, though it can prescribe a standard way of referring to those schemas.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will always know more in the future than what we do now about how the standards are shaping up, but Charli3 (which I represent, in the interest of full disclosure) is already running free public feeds since several months back. It is in the Cardano community's interest that these feeds are provided in a broadly-useful and as standardized a way as possible, which is why we have had this discussion around standard for a long time prior to submitting this proposal, and are now requesting input from everyone else as to what is good and what needs improvement.

The sooner we begin this process, the sooner we can start to move towards something that is good. I don't imagine this standard will be perfect from the first draft and I think we should all be open to it evolving over time. Ideally in such a way that it remains backwards-compatible, that would be my biggest wish for the future.

; but indices below 100 are "reserved" by the spec.
extended_data = #6.122(extended_map)
extended_map =
{ ? 0: oracle_provider_id
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that a global Cardano-wide registry of oracle providers is too centralised. An oracle provider should not have to apply to some registry maintainers to obtain the right to provide data to the Cardano ecosystem.

It's fine for individual consumer dApps to whitelist the providers that they individually accept (as they would already do under this proposal, too), but providers should not have to apply to a global registry beforehand..

I think that oracle providers should be identified by the currency symbol / asset class / pubkey (method still to be standardised across the oracle industry) that they provide to consumers to verify their fact statements.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is not a registry of which ids represent which providers, but dApp consumers are "fine to whitelist the providers", how would the dApp consumers know which ids represent which providers, in such a way that collisions aren't happening, without a registry?

I think you are reading too much "control" into a registry, it's not for a registry to accept and deny, it's for a registry to keep obvious spam out and prevent collisions.

Copy link
Contributor

@GeorgeFlerovsky GeorgeFlerovsky Jan 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suppose that there is sport betting protocol and an oracle provider for horse-racing results. The sport betting protocol wants to whitelist the oracle provider.

Under your proposal, the oracle provider would have to apply to the Cardano-wide oracle registry to obtain an integer ID, so that the sport betting protocol could whitelist that oracle provider using its integer ID.


Alternatively, if the oracle were instead identified by either an oracle signing key or minting policy ID, then the sport betting protocol could whitelist it simply by referring to that signing key or minting policy ID, without waiting for that oracle to be added to some registry.

A registry could still be useful in that case for dApps to discover potential oracle providers that it could whitelist, but dApps would also be free to whitelist any oracle providers outside the registry if they so choose.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this registry was decentralized to an on-chain registry, as I touched on in a previous comment, this issue could perhaps be avoided. The sports betting protocol could then simply reserve their ID in advance, go ahead and implement without asking for permission while at the same time dealing with a pull-request to the standard

That way, if there was a stall in the CIP process the providers and any consumers could still go ahead and use their ID with full confidence that it will be 'theirs'. The standards process might still leave the final standardization for that ID being different than what the premature implementations used, but that's just a problem inherent to any consensus-driven standards process and not really related to the ID registry.

Comment on lines 64 to 65
, ? 1: data_source_count
, ? 2: data_signatories_count
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simply providing the number of data sources and the number of data provider nodes as integers is insufficient. Much more info is needed to properly represent the oracle pool consensus and the data triangulation from sources, for it to be useful for a consumer to assess the centralisation risk of the data point.

Copy link

@ipo ipo Dec 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per the specification, the part you are now criticizing is individually customized by oracle providers to provide data they deem relevant, allowing provider-specific extensions. This is the part where each data provider would be putting things specific to their implementation, so a provider with more specific data source metrics could add detailed fields specific to their application.

Do you have suggestions for generally applicable additions that are missing?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see. So that part of your CDDL is just an example of how an oracle could specify the relevant properties of how its consensus was reached on the data point?

I think my point still stands that the particular example (data source count and data signatories count) is insufficient, but it's not a blocker for your CIP if your CIP isn't mandating anything specific for oracle providers to put in there.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's correct. Our expectation would be that each oracle provider would be able to (through some reasonable process) simply declare "this is our custom properties" and it would be basically merged outright.

It would probably be subject to a low bar such as "being used in practice somewhere" just to keep things relevant, but the idea is to give each provider free reign here.

The underlying idea is that this would allow oracle providers to do their own thing in areas where they disagree with the standard, while still being able to conform to the standard in the areas they do agree. Like a safety release valve for disagreements around the standard, so that implementers are minimally constrained while still allowing maximum format sharing where consensus exist.


## Rationale

This CIP is ready to be made active; since it is a process standard, it requires no implementation.
Copy link
Member

@KtorZ KtorZ Nov 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it does require projects adopting the standard. This is usually a good requirement for the Path to Active section: have several projects adopting the standard specified in the document. If there are already some known projects that intend to, they can also explicitly be listed here.

Note also that this covers the Path to Active section, but not the Rationale section. Rationale is about providing justifications for the various design decisions made in the proposed document. For example, it can show how the solution compares to the state of the art on other blockchain ecosystem, or explain the rationale behind the choice of a particular technology. The rationale may also explain why some solutions that could have been considered were rejected.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that is a fair expectation. As has been mentioned before Charli3 are currently implementing this standard in publicly accessible, partially Catalyst-funded community feeds, which we hope can help wider adoption of the standard.

@KtorZ KtorZ changed the title CIP-???? | Oracle Datum Standard CIP-0073? | Oracle Datum Standard Nov 8, 2022
CIP-XXXX/README.md Outdated Show resolved Hide resolved
CIP-XXXX/README.md Outdated Show resolved Hide resolved
CIP-XXXX/README.md Outdated Show resolved Hide resolved
CIP-XXXX/README.md Outdated Show resolved Hide resolved
CIP-XXXX/README.md Outdated Show resolved Hide resolved
CIP-XXXX/README.md Outdated Show resolved Hide resolved
CIP-XXXX/README.md Outdated Show resolved Hide resolved
CIP-XXXX/README.md Outdated Show resolved Hide resolved
@rphair
Copy link
Collaborator

rphair commented Dec 6, 2022

@ipo thanks for edits... I've resolved conversations relating to preamble format.

@KtorZ KtorZ added the Category: Metadata Proposals belonging to the 'Metadata' category. label Mar 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Category: Metadata Proposals belonging to the 'Metadata' category.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants