Usage

example-1.org

...
** [[https://github.com/xeroxcat/org-scrape]] 

...

Executing org-scrape-link with the cursor on the above link replaces the same with a copy of the page (fetched through the python requests library, parsed with BeautifulSoup and converted to org syntax with pandoc):

...
** GitHub - xeroxcat/org-scrape: scrape contents of a webpage and convert to org-mode markup language, adds as subtree to org document
[[#start-of-content][Skip to content]]
[[/join?ref_cta=Sign+up&ref_loc=header+logged+out&ref_page=%2F%3Cuser-name%3E%2F%3Crepo-name%3E&source=header-repo][Sign up]]
- Why GitHub?
  [[/features][Features →]]
  - [[/features/code-review/][Code review]]
  - [[/features/project-management/][Project management]]
  - [[/features/integrations][Integrations]]
...

example-2.org

...
** [[https://github.com/xeroxcat/org-scrape][div#readme]]


...

When the link has a description, it is interpreted as a CSS selector and only the first corresponding element is formatted and inserted.

...
** GitHub - xeroxcat/org-scrape: scrape contents of a webpage and convert to org-mode markup language, adds as subtree to org document: div#readme
**** README.org
     :PROPERTIES:
     :CUSTOM_ID: readme.org
     :CLASS:    Box-title pr-3
     :END:
*** [[#usage][]]Usage
    :PROPERTIES:
    :CUSTOM_ID: usage
    :END:
 =example-1.org=
     ...
...

Designed for easily caturing formatted text from websites into an org file with a focus on extracting the same field from a set of identically formatted pages.

Setup

scrape.py

Scrape to org-mode formatted text.

Usage:
  scrape.py <url> [-e element] [-n] [-t]

Options:
  -e element  A CSS select string specifying the element to scrape from
  -n          Don't remove blank lines from output
  -t          Don't remove all org mode <<targets>> generated by pandoc

configuration

Edit the shebang line of scrape.py to point to a Python3 environment with the libraries in requirements.txt.

org-scrape.el

An elisp snippet that defines a function to convert a link at the cursor to a heading in the current document rendered by scrape.py.

configuration (spacemacs)

Edit the path of scrape.py to its stored location.
Paste the snippet into the body of the function (with-eval-after-load 'org body) in dotspacemacs/user-config.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Usage

Setup

scrape.py

configuration

org-scrape.el

configuration (spacemacs)

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
LICENSE		LICENSE
README.org		README.org
org-scrape.el		org-scrape.el
requirements.txt		requirements.txt
scrape.py		scrape.py

License

cphouser/org-scrape

Folders and files

Latest commit

History

Repository files navigation

Usage

Setup

scrape.py

configuration

org-scrape.el

configuration (spacemacs)

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages