Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update discord badge in readme.md #361

Merged
merged 27 commits into from
Jul 24, 2023
Merged
Changes from 1 commit
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
1e9301f
added SitemapLoader
aaishikdutta Jun 24, 2023
da0b8e0
Merge branch 'main' into main
cachho Jul 6, 2023
ec95d66
resolved merge conflict
Jul 10, 2023
abf2559
resolved conflict
Jul 11, 2023
ad65062
Merge branch 'main' into main
cachho Jul 11, 2023
6bbc4e6
Merge branch 'main' of https://github.com/embedchain/embedchain
Jul 11, 2023
7cb7554
added sitemap modified
Jul 11, 2023
b4a4eb7
added sitemap modified
Jul 11, 2023
357162b
Merge branch 'main' of https://github.com/embedchain/embedchain
Jul 12, 2023
49d10b0
added refactor and lint format fixes
Jul 12, 2023
7b12ca5
Update README.md
deshraj Jul 12, 2023
7aedab5
Update embedchain/data_formatter/data_formatter.py
deshraj Jul 12, 2023
8063b56
Update README.md
deshraj Jul 12, 2023
b50c0b8
Update embedchain/data_formatter/data_formatter.py
deshraj Jul 12, 2023
a93d37c
incorporated review comments
Jul 12, 2023
6f77293
Merge branch 'main' of https://github.com/aaishikdutta/embedchain
Jul 12, 2023
b417434
Merge branch 'main' of https://github.com/embedchain/embedchain
Jul 13, 2023
1cff684
Merge branch 'embedchain:main' into main
aaishikdutta Jul 17, 2023
5dd54c2
added fix for PersonSourceApp not instantiating
Jul 21, 2023
2ab8ab3
Merge branch 'main' of https://github.com/aaishikdutta/embedchain
Jul 21, 2023
5ec0aa1
added fix for PersonSourceApp not instantiating
Jul 21, 2023
e3ae3b2
added dry_run to Person App
Jul 22, 2023
4101a05
resolved conflicts
Jul 22, 2023
da77ca1
fixed test case
Jul 22, 2023
8d30a56
Merge branch 'main' of https://github.com/embedchain/embedchain
Jul 22, 2023
00e595a
Merge branch 'embedchain:main' into main
aaishikdutta Jul 24, 2023
d4b93db
Update README.md
aaishikdutta Jul 24, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
added sitemap modified
  • Loading branch information
Aaishik Dutta authored and Aaishik Dutta committed Jul 11, 2023
commit 7cb75549d39f7b75cc961630ab4d1cfa4c8eab91
19 changes: 10 additions & 9 deletions embedchain/loaders/site_map.py
Original file line number Diff line number Diff line change
@@ -1,14 +1,15 @@
import requests

from bs4 import BeautifulSoup

from embedchain.loaders.web_page import WebPageLoader


class SitemapLoader:
def load_data(self, sitemap_url):
"""
This method takes a sitemap url as input and retrieves
all the urls to use the WebPageLoader to load content
of each page.
This method takes a sitemap url as input and retrieves
all the urls to use the WebPageLoader to load content
of each page.
"""
output = []
web_page_loader = WebPageLoader()
Expand All @@ -17,16 +18,16 @@ def load_data(self, sitemap_url):

if response.status_code == 200:
soup = BeautifulSoup(response.text, features="xml")
links = [link.text for link in soup.find_all('loc')]
links = [link.text for link in soup.find_all("loc")]

for link in links:
each_load_data = web_page_loader.load_data(link)
# WebPageLoader returns a list with single element which is extracted and appended to
# the output list containing data for all pages
# WebPageLoader returns a list with single element
# which is extracted and appended to the output list
# containing data for all pages
output.append(each_load_data[0])

return output

else:
raise response.raise_for_status()