Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

profile: new command to extract and infer metadata from a dataset, given an optional spec file, and create a .qsv file #1705

Open
jqnatividad opened this issue Mar 29, 2024 · 3 comments
Labels
CKAN interoperability with CKAN Data Management System DCAT3 enhancement New feature or request qsv pro requires backend/cloud services

Comments

@jqnatividad
Copy link
Owner

jqnatividad commented Mar 29, 2024

Now that DCAT-US v3 has reached a recommendation snapshot status -

https://doi-do.github.io/dcat-us/

The metadata command will extract/infer the metadata and save it to a .metadata file.

If a spec file is given, it will use it to inform metadata inferencing (the default spec file will be based on DCAT-US v3 profile).

When uploading to CKAN, the uploader (qsv pro, DP++) will use the .metadata file to prepopulate metadata fields and set the package to DRAFT mode so the metadata can be curated by the Data Publisher.

The .metadata file will also be used to create an expanded data dictionary for the CKAN resource.

@jqnatividad jqnatividad added enhancement New feature or request CKAN interoperability with CKAN Data Management System labels Mar 29, 2024
@jqnatividad
Copy link
Owner Author

For an extended discussion about the data dictionary format, see DOI-DO/dcat-us#138

@jqnatividad jqnatividad changed the title metadata: new command to extract and infer metadata from a dataset, given an optional spec file, and create a .metadata file profile: new command to extract and infer metadata from a dataset, given an optional spec file, and create a .metadata file Sep 4, 2024
@jqnatividad
Copy link
Owner Author

Call this command profile instead. This is consistent with the verb convention for command names.

Also, the file it produces uses a .profile extension instead of .metadata.

In addition, the DCAT3 spec file will be able to call either luau or python scripts to compute and infer metadata.

@jqnatividad jqnatividad changed the title profile: new command to extract and infer metadata from a dataset, given an optional spec file, and create a .metadata file profile: new command to extract and infer metadata from a dataset, given an optional spec file, and create a .qsv file Sep 5, 2024
@jqnatividad jqnatividad added the qsv pro requires backend/cloud services label Sep 11, 2024
@jqnatividad
Copy link
Owner Author

Because of the interactive nature of profiling a dataset, the need to use LLMs and other web services, this will need to be done in qsv pro.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CKAN interoperability with CKAN Data Management System DCAT3 enhancement New feature or request qsv pro requires backend/cloud services
Projects
None yet
Development

No branches or pull requests

1 participant