You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, there are several areas in SCRAPI that we export and import DataFrames, and their schemas are misaligned.
This causes issues with streamlining a pipeline of events because column renaming or ETLs need to be done.
Examples: Intents.intent_proto_to_dataframe exports columns = display_name, training_phrase in basic mode.
In advanced mode for the same method, the utterance is now called text.
Mismatch of schema and semantics in the same method.
Step 3 will break due to misaligned schema.
We should always be in alignment with "like for like" export/import (i.e. basic and basic should match 100%).
We should also be in alignment semantically across modes (i.e. basic and advanced have different schemas, but the columns that are shared are 100% named identically)
Expected Behavior
All DataFrame schemas within the same Resource type (i.e. Intents, Entity Types, etc.) should be in alignment.
Possible Solution
Centralize the creation and validation of all schema types to a file outside of the class that is using them.
Introduce core/schemas.py or similar to maintain a central schema repository.
Then each respective class can pull their schema and schema validation rules from the central class, ensuring that we have continuity in DataFrame resources.
Steps to Reproduce
Try the following
Intents to Dataframe
Dataframe to Sheet
Sheet to Dataframe (without modifying your sheet. leave it as-is)
The text was updated successfully, but these errors were encountered:
@MRyderOC I found this bug / issue when prepping for my SCRAPI demo today.
More of a minor annoyance than a bug, but I think we should be able to easily enforce this across the entire library.
Current Behavior
Currently, there are several areas in SCRAPI that we export and import DataFrames, and their schemas are misaligned.
This causes issues with streamlining a pipeline of events because column renaming or ETLs need to be done.
Examples:
Intents.intent_proto_to_dataframe exports columns =
display_name
,training_phrase
inbasic
mode.In advanced mode for the same method, the utterance is now called
text
.Mismatch of schema and semantics in the same method.
In DataframeFunctions.bulk_update_intents_from_dataframe, the
basic
mode expects input columns ofdisplay_name
andtext
.This is misaligned from the above schemas of the generated dataframes in Intents class.
So if your workflow is this:
Step 3 will break due to misaligned schema.
We should always be in alignment with "like for like" export/import (i.e. basic and basic should match 100%).
We should also be in alignment semantically across modes (i.e. basic and advanced have different schemas, but the columns that are shared are 100% named identically)
Expected Behavior
All DataFrame schemas within the same Resource type (i.e. Intents, Entity Types, etc.) should be in alignment.
Possible Solution
Centralize the creation and validation of all schema types to a file outside of the class that is using them.
Introduce
core/schemas.py
or similar to maintain a central schema repository.Then each respective class can pull their schema and schema validation rules from the central class, ensuring that we have continuity in DataFrame resources.
Steps to Reproduce
Try the following
The text was updated successfully, but these errors were encountered: