This project provides a powerful web scraping tool that fetches search results and converts them into Markdown format using FastAPI, SearXNG, and Browserless. It includes the capability to use proxies for web scraping and handles HTML content conversion to Markdown efficiently.
- FastAPI: A modern, fast web framework for building APIs with Python.
- SearXNG: An open-source internet metasearch engine.
- Browserless: A web browser automation service.
- Markdown Output: Converts HTML content to Markdown format.
- Proxy Support: Utilizes proxies for secure and anonymous scraping.
Ensure you have the following installed:
- Python 3.11
- Virtualenv
- Docker (for searxng or browserless)
-
Clone the repository:
git clone https://github.com/your-username/search-result-scraper-markdown.git cd search-result-scraper-markdown
-
Create and activate virtual environment:
virtualenv venv source venv/bin/activate
-
Install dependencies:
pip install -r requirements.txt
-
Create a .env file in the root directory with the following content:
SEARXNG_URL=https://localhost:8888 BROWSERLESS_URL=https://localhost:3000 TOKEN=BROWSERLESS_TOKEN PROXY_PROTOCOL=http PROXY_URL=your_proxy_url PROXY_USERNAME=your_proxy_username PROXY_PASSWORD=your_proxy_password PROXY_PORT=your_proxy_port REQUEST_TIMEOUT=300
-
Run Docker containers for SearXNG and Browserless:
./run-services.sh
-
Start the FastAPI application:
uvicorn main:app --host 0.0.0.0 --port 8000
To perform a search query, send a GET request to the root endpoint /
with the query parameters q
(search query) and num_results
(number of results).
Example:
curl "https://localhost:8000/?q=python&num_results=5"
To fetch and convert the content of a specific URL to Markdown, send a GET request to the /r/{url:path}
endpoint.
Example:
curl "https://localhost:8000/r/https://example.com"
This project uses Geonode proxies for web scraping. You can use my Geonode affiliate link to get started with their proxy services.
This project is licensed under the MIT License. See the LICENSE file for details.
Essa Mamdani - essamamdani.com
Contributions are welcome! Please feel free to submit a Pull Request.