Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Chunker Config does not work as described in doc strings #251

Closed
cachho opened this issue Jul 12, 2023 · 1 comment
Closed

[BUG] Chunker Config does not work as described in doc strings #251

cachho opened this issue Jul 12, 2023 · 1 comment
Labels
bug Something isn't working

Comments

@cachho
Copy link
Contributor

cachho commented Jul 12, 2023

My code

from embedchain import App
from embedchain.config import AddConfig, InitConfig
from embedchain.config.AddConfig import ChunkerConfig

app = App(config=InitConfig(log_level="DEBUG"))

chunker_config = ChunkerConfig(chunk_size=1, chunk_overlap=1, length_function=1)
config = AddConfig(chunker_config)
app.add("text", "lorem_ipsum", config)

returns

Traceback (most recent call last):
  File "test.py", line 9, in <module>
    app.add("text", "lorem_ipsum", config)
  File "/home/c/code/embedchain/embedchain/embedchain.py", line 53, in add
    data_formatter = DataFormatter(data_type, config)
  File "/home/c/code/embedchain/embedchain/data_formatter/data_formatter.py", line 26, in __init__
    self.chunker = self._get_chunker(data_type, config.chunker)
  File "/home/c/code/embedchain/embedchain/data_formatter/data_formatter.py", line 59, in _get_chunker
    "youtube_video": YoutubeVideoChunker(config),
  File "/home/c/code/embedchain/embedchain/chunkers/youtube_video.py", line 21, in __init__
    text_splitter = RecursiveCharacterTextSplitter(**config)
TypeError: ABCMeta object argument after ** must be a mapping, not ChunkerConfig

I think it's because it expects a dict but ChunkerConfig is a class.
I'm not sure why ChunkerConfig should be a dict when everything else is a config class.

@cachho cachho added the bug Something isn't working label Jul 12, 2023
@cachho
Copy link
Contributor Author

cachho commented Jul 12, 2023

The readme example works

app = App(config=InitConfig(log_level="DEBUG"))

add_config = {
        "chunker": {
                "chunk_size": 1,
                "chunk_overlap": 1,
                "length_function": len,
        }
}
app.add("text", "lorem ipsum", AddConfig(**add_config))

returns

2023-07-12 21:27:30,807 [clickhouse_connect.driver.ctypes] [INFO] Successfully imported ClickHouse Connect C data optimizations
2023-07-12 21:27:30,807 [clickhouse_connect.driver.ctypes] [DEBUG] Successfully import ClickHouse Connect C/Numpy optimizations
2023-07-12 21:27:30,811 [clickhouse_connect.json_impl] [INFO] Using python library for writing JSON byte strings
2023-07-12 21:27:30,864 [chromadb.db.duckdb] [INFO] loaded in 164 embeddings
2023-07-12 21:27:30,866 [chromadb.db.duckdb] [INFO] loaded in 1 collections
2023-07-12 21:27:30,868 [chromadb.db.duckdb] [INFO] collection with name embedchain_store already exists, returning existing collection
2023-07-12 21:27:30,876 [openai] [DEBUG] message='Request to OpenAI API' method=post path=https://api.openai.com/v1/engines/text-embedding-ada-002/embeddings
2023-07-12 21:27:30,876 [openai] [DEBUG] api_version=None data='{"input": ["l", "o", "r", "e", "m", " ", "i", "p", "s", "u"], "encoding_format": "base64"}' message='Post details'
2023-07-12 21:27:32,142 [openai] [DEBUG] message='OpenAI API response' path=https://api.openai.com/v1/engines/text-embedding-ada-002/embeddings processing_ms=36 request_id=a930dc3cfcde368195cfa94b24529b13 response_code=200
2023-07-12 21:27:32,156 [chromadb.db.index.hnswlib] [DEBUG] Index saved to db/index/index.bin
Successfully saved lorem ipsum. New chunks count: 10

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant