Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Next generation jams #208

Open
bmcfee opened this issue Apr 22, 2020 · 6 comments
Open

Next generation jams #208

bmcfee opened this issue Apr 22, 2020 · 6 comments
Assignees
Labels
API change management question schema Issues pertaining to schema definitions

Comments

@bmcfee
Copy link
Contributor

bmcfee commented Apr 22, 2020

This issue is intended to consolidate many of the long-standing issues and offline discussions we've had around revising the jams specification for a variety of applications and use-cases.


Goals of revising the schema

  1. Migrate to a fully json-schema compliant spec RFC: more rigid, but simpler schema validation #178 (instead of hybrid / dynamic namespaces).
  2. Add versioning to the schema definitions. This way, old files can still validate according to their specified jams version. This in turn makes it easier to evolve the schema without breaking compatibility.
  3. Simplify (and accelerate) the validation code from the python side.

Revision phase 1: full json-schema definition

The first step is to move all namespace definitions into full jsonschema definitions. In the proposed change, a namespace definition now becomes a secondary schema for the Annotation object.

Annotation objects must validate against both the template schema (our current annotation schema def), and exactly one of the pre-defined namespace schemas. Each namespace schema defines an exact match on the Annotation.namespace field, in addition to whatever constraints placed on the value and confidence fields.

The is_sparse flag will be removed, as this is not part of jsonschema. (We'll come back to this later).

This phase will complete #178 .

Revision phase 2: hosted and versioned schema

Completing phase 1 will result in a fully json-schema compatible implementation of our specification, against which all current JAMS files should validate.

The next step (phase 2) is to place this schema under version control and host it remotely (e.g. `jams.github.io/schema/v0.3/schema.json`` or something). We can then revise the schema to include a version number in its definition, so that jams files can self-identify which version they are valid under.

With the remote schema implementation, it should be possible/easy to promote all jams definitions to top-level objects, so that you can independently validate an Annotation or FileMetadata object without having it belong to a full JAMS file.

This phase will complete #86 and facilitate #40 , by allowing partial storage.

Revision phase 3: extending the Annotation class

As mentioned in #24 , the current annotation structure might be a bit too rigid for more general media objects. @justinsalamon and I discussed this offline, and arrived at the following proposal:

  • Rename Annotation def to IntervalAnnotation, in which observations are (time, duration, value, confidence) tuples
  • Add new annotation types
    • StaticAnnotation: just (value, confidence)
    • BoundingBoxAnnotation: (x, y, width, height, value, confidence)
    • TimeBoundingBoxAnnotation: (time, x, y, duration, width, height, value, confidence)
    • possibly others: polygons, instantaneous samples, etc...
  • Annotation validation now becomes and(oneOf([Interval, Static, BoundingBox, ...]), oneOf([namespaces]))

This provides maximal flexibility in combining different annotation contents (tags etc) with annotation extents (time intervals, bounding boxes, etc). Including a StaticAnnotation type also provides a way to resolve #206.

Phase 3 completes the proposed changes to the schema.

Alongside schema changes, we also want to generalize some things about the python implementation. Notably, it would be good to extend the search function to also support annotation contents. This way, we could find and excerpt annotations by value (eg time intervals labeled as guitar or bounding boxes with face). This isn't a huge change from what the search function already does, but it will take a bit more implementation work.

@bmcfee bmcfee added question schema Issues pertaining to schema definitions management API change labels Apr 22, 2020
@bmcfee
Copy link
Contributor Author

bmcfee commented Oct 14, 2020

Had a chat with @rabitt about some of this at ISMIR, and she pointed out that we currently have a bit of a blind spot when it comes to annotations of symbolic data. Concretely, objects like a score or a midi file may not have a fixed "duration" (in seconds), but may have similar extent specifications in terms of beats or ticks.

This seems soluble in the proposed framework by introducing extent types for symbolic data. We may need to wiggle a bit on the top-level schema (JAMS object) to make this work, but I think it would be worth doing in the long run.

@urinieto
Copy link
Contributor

Nice, I like this idea. ISMIR always sparking great conversations! <3

@justinsalamon
Copy link
Contributor

Related: following our music source separation tutorial where we use Scaper (which relies on JAMS) to generate training data, people were asking if it would be possible to beat-align stems (e.g. from different songs). One way to achieve this would be to support time in e.g. beats rather than seconds.

@bmcfee by extent types are you referring to how time is represented more generally? I.e. currently we support time/duration in seconds. Would the idea be to support time/duration in units other than seconds?

@bmcfee
Copy link
Contributor Author

bmcfee commented Oct 21, 2020

One way to achieve this would be to support time in e.g. beats rather than seconds.

I'm not sure how that would help / work? You'd still need some mapping of beats to time in that case, right?

Would the idea be to support time/duration in units other than seconds?

Yep, but as a separate extent type. Either beats or ticks, possibly both depending on how much need there is for it.

@MCMcCallum
Copy link

MCMcCallum commented Jan 2, 2021

I've been taking a crack at this over the break. I'm most of the way there, though I've realized to make this work we may have to alter the JAMs schema a little, resulting in some currently existing valid jams data becoming invalid in the latest version..

Currently we have a list of annotations which can each contain a list of observations, and that list of observations can either be a sparse type or a dense type of list. I'm proposing that we change this to always be a list of observations (i.e., no dense / not dense distinction) and that observation type therein can either be a single observation (in the sparse case), or a observation containing lists of values (in the dense case). This will move all current jams dense observations down one level to the observation type, rather than being a different Annotation type overall.

This way the Annotation type itself has all the non-data dependent properties (e.g., Curator, sandbox, etc..) and it is only its data attribute that is defined by the observation type (both the. data and namespace attributes will be defined by the namespace). This data attribute is always an array of observations, and in the case of current DenseObservation types that exist out there in the wild, it will be a single element array with the observation type itself containing value, confidence, time, and duration arrays.

This greatly simplifies the code and schema, but will change the schema for dense observations from something like:

{
    "annotations": [
        {
            "data": {
                "value": [ 1.0, 0.5 ],
                "time": [ 1.0, 2.0 ],
                "confidence": [ 0.9, 0.9 ],
                "duration": [ 1.0, 1.0 ]
            }
        }
    ]
}

to something like:

{
    "annotations": [
        {
            "data": [
                {
                    "values": [ 1.0, 0.5 ],
                    "times": [ 1.0, 2.0 ],
                    "confidences": [ 0.9, 0.9 ],
                    "durations": [ 1.0, 1.0 ]
                }
            ]
        }
    ]
}

This has the added benefit of one annotation having possibly multiple dense observations in an Annotation. E.g., in the case of pitch contours, multiple pitch contours beginning and ending according to a vocal activity detector, or in an annotation application where the annotator is able to draw contours over a waveform, each drawn contour could be sampled and represented as a single DenseObservation.

At phase 3 of this issue, we can then further include a dense sampled observation type, e.g.:

{
    "annotations": [
        {
            "data": [
                {
                    "values": [ 1.0, 0.5, 0.3 ],
                    "start_time": 1.0,
                    "sample_rate": 1000.0,
                }
            ]
        }
    ]
}

@bmcfee
Copy link
Contributor Author

bmcfee commented Mar 10, 2021

I've been taking a crack at this over the break. I'm most of the way there, though I've realized to make this work we may have to alter the JAMs schema a little, resulting in some currently existing valid jams data becoming invalid in the latest version..

I'm okay with that in the long run -- schemas should change and improve! But I think any changes we make should come after we translate the existing schema into something that can be properly validated in jsonschema and version-stamped. This will make forward-migration much easier, and cut down on friction.

I'm proposing that we change this to always be a list of observations (i.e., no dense / not dense distinction) and that observation type therein can either be a single observation (in the sparse case), or a observation containing lists of values (in the dense case).

This is an interesting suggestion, and I'm trying to noodle out all the downstream consequences. For background, the sparse/dense idea is really just a storage optimization hack: from the object model (python implementation) perspective, all we have are sparse observations, and this uniformity is extremely helpful in a lot of cases (eg abstract data augmentation).

If I understand the proposal correctly, it basically amounts to making everything dense, and annotations that we currently treat as sparse are just special cases where the length of the array is 1. Do I have that right?

This has the added benefit of one annotation having possibly multiple dense observations in an Annotation. E.g., in the case of pitch contours, multiple pitch contours beginning and ending according to a vocal activity detector, or in an annotation application where the annotator is able to draw contours over a waveform, each drawn contour could be sampled and represented as a single DenseObservation.

That's pretty nice! Right now, we hack around it by forcing contour ids into the observations, which requires some post-filtering to extract out.

At phase 3 of this issue, we can then further include a dense sampled observation type, e.g.:

Hm. Are you thinking of this as something like a mixin type, eg (SampledAnnotation + Pitch_Hz)? Or something that would be coded into the namespace directly?

In general, I'm less keen on adding variable fields this deep into the schema because it will break uniformity of representation across namespaces. This might be necessary at times, I'm not sure. But if possible, I'd like to keep things uniform because it significantly simplifies downstream abstract code, eg jams.display and jams.eval.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API change management question schema Issues pertaining to schema definitions
Projects
None yet
Development

No branches or pull requests

4 participants