Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: request contextualisation - core functionality #65

Open
wants to merge 31 commits into
base: main
Choose a base branch
from

Conversation

jcierocki
Copy link
Collaborator

@jcierocki jcierocki commented Jun 26, 2024

Closes #54

Jakub Cierocki and others added 3 commits June 21, 2024 16:36
@jcierocki jcierocki added the feature New feature or request label Jun 26, 2024
@jcierocki jcierocki self-assigned this Jun 26, 2024
Copy link

github-actions bot commented Jun 26, 2024

badge

Code Coverage Summary

Filename                                                  Stmts    Miss  Cover    Missing
------------------------------------------------------  -------  ------  -------  ---------------------------------------------------------------------
dbally/_main.py                                              13       1  92.31%   10
dbally/_types.py                                              8       1  87.50%   24
dbally/exceptions.py                                          1       0  100.00%
dbally/assistants/base.py                                    24       0  100.00%
dbally/assistants/openai.py                                  59       2  96.61%   59-76
dbally/audit/event_tracker.py                                36       8  77.78%   38-40, 53, 64, 74, 91, 97
dbally/audit/events.py                                       28       0  100.00%
dbally/audit/spans.py                                         7       0  100.00%
dbally/audit/event_handlers/base.py                          15       0  100.00%
dbally/audit/event_handlers/buffer_event_handler.py           8       8  0.00%    1-28
dbally/audit/event_handlers/cli_event_handler.py             52      33  36.54%   11-13, 43-44, 47-55, 65-66, 79-90, 108-115, 126-130
dbally/audit/event_handlers/langsmith_event_handler.py       29      25  13.79%   6-106
dbally/collection/collection.py                              86       0  100.00%
dbally/collection/exceptions.py                              13       0  100.00%
dbally/collection/results.py                                 14       0  100.00%
dbally/context/_utils.py                                     32       5  84.38%   38-39, 43, 69, 75
dbally/context/context.py                                    16       1  93.75%   40
dbally/context/exceptions.py                                  6       0  100.00%
dbally/embeddings/base.py                                     5       0  100.00%
dbally/embeddings/exceptions.py                              15       6  60.00%   10-11, 20, 29-30, 39
dbally/embeddings/litellm.py                                 28      12  57.14%   7-8, 44, 68-84
dbally/gradio/gradio_interface.py                           111     111  0.00%    1-300
dbally/iql/_exceptions.py                                    25       1  96.00%   35
dbally/iql/_processor.py                                     94       9  90.43%   76, 93, 99, 105, 114, 120, 143, 146, 152
dbally/iql/_query.py                                         19       1  94.74%   12
dbally/iql/_type_validators.py                               47       4  91.49%   26, 30, 74, 83
dbally/iql/syntax.py                                         36       9  75.00%   6-9, 27, 36, 60, 63-66
dbally/iql_generator/iql_generator.py                        26       0  100.00%
dbally/iql_generator/prompt.py                               16       1  93.75%   33
dbally/llms/base.py                                          28       1  96.43%   34
dbally/llms/litellm.py                                       24      10  58.33%   8-9, 48-54, 61, 78
dbally/llms/local.py                                         18      18  0.00%    1-60
dbally/llms/clients/base.py                                  23       2  91.30%   46-47
dbally/llms/clients/exceptions.py                            15       6  60.00%   10-11, 20, 29-30, 39
dbally/llms/clients/litellm.py                               44      21  52.27%   8-9, 65-71, 97-120
dbally/llms/clients/local.py                                 33      33  0.00%    1-95
dbally/nl_responder/nl_responder.py                          24       4  83.33%   71-82
dbally/nl_responder/prompts.py                               19       4  78.95%   64-67
dbally/prompt/elements.py                                    25       1  96.00%   29
dbally/prompt/template.py                                    65       7  89.23%   33, 41, 49, 110, 127, 153, 205
dbally/similarity/chroma_store.py                            37       0  100.00%
dbally/similarity/elastic_vector_search.py                   19      16  15.79%   5-102
dbally/similarity/elasticsearch_store.py                     22      19  13.64%   5-107
dbally/similarity/faiss_store.py                             38      35  7.90%    5-103
dbally/similarity/fetcher.py                                  5       0  100.00%
dbally/similarity/index.py                                   26       0  100.00%
dbally/similarity/sqlalchemy_base.py                         44      19  56.82%   35-37, 46, 68, 77, 86-89, 99-105, 123-126
dbally/similarity/store.py                                    7       0  100.00%
dbally/view_selection/base.py                                 7       0  100.00%
dbally/view_selection/llm_view_selector.py                   17       0  100.00%
dbally/view_selection/prompt.py                               9       0  100.00%
dbally/view_selection/random_view_selector.py                10      10  0.00%    1-36
dbally/views/base.py                                         18       1  94.44%   58
dbally/views/decorators.py                                    6       0  100.00%
dbally/views/exposed_functions.py                            50       4  92.00%   28, 47, 51, 56
dbally/views/methods_base.py                                 35       2  94.29%   76, 84
dbally/views/pandas_base.py                                  33       1  96.97%   64
dbally/views/sqlalchemy_base.py                              37       7  81.08%   48, 63-65, 83-87
dbally/views/structured.py                                   39       0  100.00%
dbally/views/freeform/text2sql/config.py                     21       1  95.24%   47
dbally/views/freeform/text2sql/exceptions.py                  7       3  57.14%   12-14
dbally/views/freeform/text2sql/prompt.py                     11       0  100.00%
dbally/views/freeform/text2sql/view.py                       94      22  76.60%   50, 53, 56, 59, 62, 74, 154, 158-161, 164, 194, 206-207, 215, 228-233
dbally_cli/main.py                                            5       5  0.00%    1-13
dbally_cli/text2sql.py                                       94      94  0.00%    1-248
dbally_codegen/autodiscovery.py                             122      18  85.25%   54-57, 241-243, 264-278, 281-284, 353-354, 450-455
dbally_codegen/generator.py                                 175       7  96.00%   81, 91, 314, 342, 360, 374, 420
TOTAL                                                      2175     609  72.00%

Diff against main

Filename                                  Stmts    Miss  Cover
--------------------------------------  -------  ------  --------
dbally/collection/collection.py              +1       0  +100.00%
dbally/context/_utils.py                    +32      +5  +84.38%
dbally/context/context.py                   +16      +1  +93.75%
dbally/context/exceptions.py                 +6       0  +100.00%
dbally/iql/_processor.py                    +15      +2  -0.71%
dbally/iql/_query.py                         +2       0  +0.62%
dbally/iql/_type_validators.py               +9      +2  -3.25%
dbally/iql_generator/iql_generator.py        +1       0  +100.00%
dbally/views/base.py                         +2       0  +0.69%
dbally/views/exposed_functions.py           +17      +3  -4.97%
dbally/views/methods_base.py                 +1       0  +0.17%
dbally/views/structured.py                   +1       0  +100.00%
dbally/views/freeform/text2sql/view.py       +1       0  +0.26%
TOTAL                                      +104     +13  +0.78%

Results for commit: 1294a9c

Minimum allowed coverage is 60%

♻️ This comment has been updated with latest results

@jcierocki jcierocki changed the title Request contextualisation - core functionality feat: request contextualisation - core functionality Jun 26, 2024
@jcierocki jcierocki force-pushed the jc/issue-54-request-context branch from 0ded684 to 9ba89e5 Compare July 3, 2024 12:58
Copy link

github-actions bot commented Jul 4, 2024

Trivy scanning results.

@jcierocki jcierocki force-pushed the jc/issue-54-request-context branch from 7a49b4d to 3c87020 Compare July 4, 2024 18:31
@jcierocki jcierocki force-pushed the jc/issue-54-request-context branch from 3c87020 to c0b0522 Compare July 4, 2024 21:20
@jcierocki jcierocki force-pushed the jc/issue-54-request-context branch from 44a2fdc to 2d0ef4b Compare July 5, 2024 09:08
@jcierocki jcierocki marked this pull request as ready for review July 5, 2024 09:32
CustomContext: TypeAlias = "BaseCallerContext"


class BaseCallerContext(BaseModel):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pydantic is not a required dependency at the moment, and we should avoid it if possible (v1 vs v2 issues). Maybe a bare dataclass is enough here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can still support pydantic models as context, but we shouldn't enforce it

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, thanks :)

@@ -74,6 +74,10 @@ def __init__(
"You MUST use only these methods:\n"
"\n{filters}\n"
"It is VERY IMPORTANT not to use methods other than those listed above."
"Finally, if a called function argument value is not directly specified in the query but instead requires knowledge of some additional context, than substitute that argument value by: BaseCallerContext()."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe just Context()?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. I will adjust it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks :)

@jcierocki jcierocki force-pushed the jc/issue-54-request-context branch from f78c101 to 3423033 Compare July 8, 2024 09:42
filter_method_: type_ext.Callable, hidden_args: Sequence[str]
) -> Tuple[Sequence[MethodParamWithTyping], ContextClass]:
"""
Processes the MethodsBaseView filter method signauture to extract the args and type hints in the desired format.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typos: "signauture", "claases", "are getting excluded the returned"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

a class implementing this interface, selected based on the filter method signature (type hints).
"""

_alias: str = "Context"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: instead of providing a default value, I would require inheriting classes to define their own (throw exception otherwise or even better use an abstract method definition), to ensure that they don't use the default value by accident. Especially since it might be easy for people to assume that the name of the class itself is used as an alias automatically.

Copy link
Collaborator

@micpst micpst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the docs on how contexts are defined and how they work.

)
except IQLError as exc:
# TODO handle the possibility of variable `response` being not initialized
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can it be not initialized at this point?


Returns:
Generated IQL query.
"""
prompt_format = IQLGenerationPromptFormat(
question=question,
filters=filters,
examples=examples,
examples=examples or [],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is handled in the PromptFormat initializator.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All dbally exceptions should inherit from dbally.exceptions.DbAllyError. To be consistent with the rest of the code base:

  • BaseContextException should inherit from DbAllyError
  • rename BaseContextException -> ContextError
  • use inheritance for ContextNotAvailableError and ContextualisationNotAllowed

@@ -74,6 +74,12 @@ def __init__(
"You MUST use only these methods:\n"
"\n{filters}\n"
"It is VERY IMPORTANT not to use methods other than those listed above."
"To specify a reference to the query execution context, as an API function argument, please call: AskerContext()."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good, although for me including an "if" statement would also be okay, as long as it directly references user's question (and not some other hidden parts of the system). For example:

"If the question does not include an explicit value for a function argument and instead references the execution context, use the ASKER_CONTEXT constant as a placeholder."

(may be modified for the "AskerContext()" function-like syntax, although I like the simplicity of asking the LLM to use a constant - asking it to "call" something seems a bit more complicated, potentially causing more varied outputs).

Anyway, we should ultimately test multiple versions anyway. What's important for me is that we test for false-positives (i.e., we have solid test cases for various situations when a filter supports context in general but it should not be used for the particular query) - false-positives is the thing that worries me the most at the moment.

…, b] -> a | b; removed typing & typing_extensions module name prefixes
@jcierocki jcierocki force-pushed the jc/issue-54-request-context branch from b5827c9 to 1294a9c Compare July 9, 2024 15:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
Status: In progress
Development

Successfully merging this pull request may close these issues.

feat: request contextualization
4 participants