Skip to content

Lubmminy/polymath

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

39 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Polymath 🕷️

Polymath comes from the Greek πολυμαθής and means "having learned much".

Here, it is a robot that will analyze the Web pages (Crawler) and that will bring a mass of knowledge to the world. Billions of sources will be analyzed to redistribute all the knowledge to the users!

Architecture

flowchart TD
    subgraph "Crawl Webpage"
        Y[Web Server] -->|Produce| Z(Kafka Topic)
    end

    subgraph "Polymath Consumer"
        Z -->|Consume| B(Crawling Consumer)
    end

    subgraph "Crawling Consumer"
        B --> C[(Crawling Queue)]
        C -->|Crawling| D[Fetch Page]
        D -->|Queuing| C
    end

    subgraph "Data Processing"
        D --> E[Data Processing]
        E --> F[Image Processing]
        E --> G[PDF Processing]
        E --> H[HTML Processing]
    end

    subgraph "Indexing"
        H -->|Indexing| J[(Apache Solr)]
        G -->|Indexing| J
    end
Loading

Feature highlights

License

This project is Licensed under Apache License, Version 2.0.