Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update discord badge in readme.md #361

Merged
merged 27 commits into from
Jul 24, 2023
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
1e9301f
added SitemapLoader
aaishikdutta Jun 24, 2023
da0b8e0
Merge branch 'main' into main
cachho Jul 6, 2023
ec95d66
resolved merge conflict
Jul 10, 2023
abf2559
resolved conflict
Jul 11, 2023
ad65062
Merge branch 'main' into main
cachho Jul 11, 2023
6bbc4e6
Merge branch 'main' of https://github.com/embedchain/embedchain
Jul 11, 2023
7cb7554
added sitemap modified
Jul 11, 2023
b4a4eb7
added sitemap modified
Jul 11, 2023
357162b
Merge branch 'main' of https://github.com/embedchain/embedchain
Jul 12, 2023
49d10b0
added refactor and lint format fixes
Jul 12, 2023
7b12ca5
Update README.md
deshraj Jul 12, 2023
7aedab5
Update embedchain/data_formatter/data_formatter.py
deshraj Jul 12, 2023
8063b56
Update README.md
deshraj Jul 12, 2023
b50c0b8
Update embedchain/data_formatter/data_formatter.py
deshraj Jul 12, 2023
a93d37c
incorporated review comments
Jul 12, 2023
6f77293
Merge branch 'main' of https://github.com/aaishikdutta/embedchain
Jul 12, 2023
b417434
Merge branch 'main' of https://github.com/embedchain/embedchain
Jul 13, 2023
1cff684
Merge branch 'embedchain:main' into main
aaishikdutta Jul 17, 2023
5dd54c2
added fix for PersonSourceApp not instantiating
Jul 21, 2023
2ab8ab3
Merge branch 'main' of https://github.com/aaishikdutta/embedchain
Jul 21, 2023
5ec0aa1
added fix for PersonSourceApp not instantiating
Jul 21, 2023
e3ae3b2
added dry_run to Person App
Jul 22, 2023
4101a05
resolved conflicts
Jul 22, 2023
da77ca1
fixed test case
Jul 22, 2023
8d30a56
Merge branch 'main' of https://github.com/embedchain/embedchain
Jul 22, 2023
00e595a
Merge branch 'embedchain:main' into main
aaishikdutta Jul 24, 2023
d4b93db
Update README.md
aaishikdutta Jul 24, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
added refactor and lint format fixes
  • Loading branch information
Aaishik Dutta authored and Aaishik Dutta committed Jul 12, 2023
commit 49d10b0dcd989b54a2c2ae1e7f78aeaf55440b94
1 change: 1 addition & 0 deletions embedchain/config/InitConfig.py
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,7 @@ def _set_db_to_default(self):
Sets database to default (`ChromaDb`).
"""
from embedchain.vectordb.chroma_db import ChromaDB

self.db = ChromaDB(ef=self.ef, host=self.host, port=self.port)

def _setup_logging(self, debug_level):
Expand Down
29 changes: 10 additions & 19 deletions embedchain/loaders/site_map.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,27 +7,18 @@
class SitemapLoader:
def load_data(self, sitemap_url):
"""
This method takes a sitemap url as input and retrieves
all the urls to use the WebPageLoader to load content
This method takes a sitemap URL as input and retrieves
all the URLs to use the WebPageLoader to load content
of each page.
"""
output = []
web_page_loader = WebPageLoader()

response = requests.get(sitemap_url)

if response.status_code == 200:
soup = BeautifulSoup(response.text, features="xml")
links = [link.text for link in soup.find_all("loc")]

for link in links:
each_load_data = web_page_loader.load_data(link)
# WebPageLoader returns a list with single element
# which is extracted and appended to the output list
# containing data for all pages
output.append(each_load_data[0])

return output

else:
raise response.raise_for_status()
response.raise_for_status()

soup = BeautifulSoup(response.text, "xml")
links = [link.text for link in soup.find_all("loc")]
for link in links:
each_load_data = web_page_loader.load_data(link)
output.append(each_load_data)
return [data[0] for data in output]
2 changes: 1 addition & 1 deletion embedchain/vectordb/chroma_db.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
import os
import logging
import os

import chromadb
from chromadb.utils import embedding_functions
Expand Down