Web Scraping App

Overview

This application scrapes data from open browser tabs, aggregates it into a local API, and displays the data in a dynamic interface for real-time updates and configuration. The aggregated data can also be used to regenerate web pages using AI models like Anthropic, OpenAI, Gemini, or Groq.

Setup

Clone the repository:

git clone https://github.com/yourusername/web_scraping_app.git
cd web_scraping_app

Run the Script:
```
bash scripts/setup.sh
```
Configure your environment variables:
- Edit the .env file created from the .env.example template and add your API Keys and settings.

Usage

Start the FastAPI application:
```
bash scripts/start_api.sh
```
Start the Streamlit interface:
```
bash scripts/start_interface.sh
```
Open your browser and navigate to http:https://localhost:8501 to access the Streamlit interface.

Testing

Run the tests:
```
python -m unittest discover tests
```

Logging

Logs are stored in logs/scraper.log with detailed information about scraping activities.

Caching

A simple caching mechanism is implemented to avoid redundant data fetching, with a default TTL (Time to Live) of 300 seconds.

Running the Project

Clone the Repository:

git clone https://github.com/amacsmith/macscrape.git

Run the Setup Script:
```
bash scripts/setup.sh
```
Configure Environment Variables:
- Edit the .env file and add your API keys and settings
Start the FastAPI Application:
```
bash scripts/start_api.sh
```
Start the Streamlit Interface:
```
bash scripts/start_interface.sh
```
Access the Interface:
- Open your browser and navigate to http:https://localhost:8501 to access the Streamlit interface.
- Configure scraping parameters and view data in real-time.
- Use AI integration to regenerate web pages based on scraped data.
By following these steps, the setup and deployment process will be automated, making it easier to get the application running quickly.

File and Folder Structure

macscrape/
├── ai_integration/
│   ├── __init__.py
│   ├── ai_regenerator.py
│   ├── openai.py
│   ├── anthropic.py
│   ├── claude_opus.py
│   ├── sonnet.py
├── api/
│   ├── __init__.py
│   ├── local_api.py
├── config/
│   ├── __init__.py
│   ├── config_manager.py
├── dynamic_interface/
│   ├── __init__.py
│   ├── app.py
│   ├── static/
│   ├── templates/
│   │   ├── __init__.py
│   │   ├── base.html
│   │   ├── index.html
├── logs/
│   ├── scraper.log
├── scraping/
│   ├── __init__.py
│   ├── scraper.py
│   ├── config_manager.py
├── scripts/
│   ├── __init__.py
│   ├── check_env.py
│   ├── setup.py
│   ├── start.py
│   ├── start_api.py
│   ├── start_interface.py
│   ├── websocket_server.py
├── tests/
│   ├── __init__.py
│   ├── test_scraper.py
├── utils/
│   ├── __init__.py
│   ├── helpers.py
├── venv/
├── .env
├── .env.example
├── main.py
├── README.md
├── requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web Scraping App

Overview

Setup

Usage

Testing

Logging

Caching

Running the Project

File and Folder Structure

About

Releases

Sponsor this project

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.github		.github
ai_integration		ai_integration
api		api
config		config
dynamic_interface		dynamic_interface
scraping		scraping
scripts		scripts
.env.example		.env.example
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

License

amacsmith/mac-scrape

Folders and files

Latest commit

History

Repository files navigation

Web Scraping App

Overview

Setup

Usage

Testing

Logging

Caching

Running the Project

File and Folder Structure

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Sponsor this project

Packages 0

Languages

Packages