Search Result Scraper with Markdown Output Using FastAPI, SearXNG, and Browserless

Description

This project provides a powerful web scraping tool that fetches search results and converts them into Markdown format using FastAPI, SearXNG, and Browserless. It includes the capability to use proxies for web scraping and handles HTML content conversion to Markdown efficiently.

Features

FastAPI: A modern, fast web framework for building APIs with Python.
SearXNG: An open-source internet metasearch engine.
Browserless: A web browser automation service.
Markdown Output: Converts HTML content to Markdown format.
Proxy Support: Utilizes proxies for secure and anonymous scraping.

Prerequisites

Ensure you have the following installed:

Python 3.11
Virtualenv
Docker (for searxng or browserless)

Setup Instructions

Clone the repository:

git clone https://github.com/your-username/search-result-scraper-markdown.git
cd search-result-scraper-markdown

Create and activate virtual environment:

virtualenv venv
source venv/bin/activate

Install dependencies:
```
pip install -r requirements.txt
```

Create a .env file in the root directory with the following content:

SEARXNG_URL=https://localhost:8888
BROWSERLESS_URL=https://localhost:3000
TOKEN=BROWSERLESS_TOKEN
PROXY_PROTOCOL=http
PROXY_URL=your_proxy_url
PROXY_USERNAME=your_proxy_username
PROXY_PASSWORD=your_proxy_password
PROXY_PORT=your_proxy_port
REQUEST_TIMEOUT=300

Run Docker containers for SearXNG and Browserless:
```
./run-services.sh
```

Start the FastAPI application:

uvicorn main:app --host 0.0.0.0 --port 8000

Usage

Search Endpoint

To perform a search query, send a GET request to the root endpoint / with the query parameters q (search query) and num_results (number of results).

Example:

curl "https://localhost:8000/?q=python&num_results=5"

Fetch URL Content

To fetch and convert the content of a specific URL to Markdown, send a GET request to the /r/{url:path} endpoint.

Example:

curl "https://localhost:8000/r/https://example.com"

Using Proxies

This project uses Geonode proxies for web scraping. You can use my Geonode affiliate link to get started with their proxy services.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
docker-compose.yml		docker-compose.yml
main.py		main.py
readme.md		readme.md
requirements.txt		requirements.txt
run-services.sh		run-services.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Search Result Scraper with Markdown Output Using FastAPI, SearXNG, and Browserless

Description

Features

Prerequisites

Setup Instructions

Usage

Search Endpoint

Fetch URL Content

Using Proxies

License

Author

Contributing

Acknowledgements

About

Releases

Packages

Languages

License

ifeiduoduo/search-result-scraper-markdown

Folders and files

Latest commit

History

Repository files navigation

Search Result Scraper with Markdown Output Using FastAPI, SearXNG, and Browserless

Description

Features

Prerequisites

Setup Instructions

Usage

Search Endpoint

Fetch URL Content

Using Proxies

License

Author

Contributing

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages