Skip to content

Commit

Permalink
#10 Fix small errors
Browse files Browse the repository at this point in the history
  • Loading branch information
Createdd committed Jul 23, 2018
1 parent 73bc71f commit b20e483
Showing 1 changed file with 2 additions and 7 deletions.
9 changes: 2 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,11 +18,6 @@ For example, for [arxiv.org](https://arxiv.org/help/api/index).

### Potential tools to implement

## Web Miner

This repository deploys a web spider and documenr miner, that given a specific set of sources (URLs), should locate new documents (web-pages) and save them in the DB for future processing.
When possible, in websites that allow, an API can be used. For example, for [arxiv.org](https://arxiv.org/help/api/index).

We lean heavily on existing tools as well as developing our own new methods.

- [scrapy](https://scrapy.org/) which later we hope to host on [scrapy-cloud](https://scrapinghub.com/scrapy-cloud)
Expand Down Expand Up @@ -84,9 +79,9 @@ python manage.py server

We follow the [clean architecture style](https://blog.thedigitalcatonline.com/blog/2016/11/14/clean-architectures-in-python-a-step-by-step-example/) and structure the codebase accordingly.

![ceanArchitecture image](https://cdn-images-1.medium.com/max/1600/1*B7LkQDyDqLN3rRSrNYkETA.jpeg)
![cleanArchitecture image](https://cdn-images-1.medium.com/max/1600/1*B7LkQDyDqLN3rRSrNYkETA.jpeg)

*Image creadit to [Thang Chung under MIT terms](https://github.com/thangchung/blog-core)*
*Image credit to [Thang Chung under MIT terms](https://github.com/thangchung/blog-core)*

## Who are we?

Expand Down

0 comments on commit b20e483

Please sign in to comment.