Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Obelisk archiving #353

Open
fmartingr opened this issue Feb 10, 2022 · 12 comments · May be fixed by #481
Open

Support Obelisk archiving #353

fmartingr opened this issue Feb 10, 2022 · 12 comments · May be fixed by #481

Comments

@fmartingr
Copy link
Member

fmartingr commented Feb 10, 2022

It seems that shiori depends on warc which is currently archived. We need to find a replacement for warc. Maybe obelisk?

Acceptance criteria

  • Add a migration that will define in which archiver type the content is (put warc for already existing rows, but obelisk as default)
  • Add logic to allow multiple archivers to be used, do not remove Warc logic, just refactor it.
  • Allow the /bookmark/:id/archive handler to load multiple archive types (to load old and new)
  • Allow the POST /api/v1/bookmarks/POST /api/v1/bookmarks/cache/POST /api/v1/bookmarks/:id/cache to select which archiver to use (but hardcode/default it to obelisk).
  • Add a documentation page describing the archivers, available options, pro-cons.
  • Determine if different extensions should be used from now on (leave current filename expectations intact)
  • All code logic should be properly tested
  • Swagger documentation should be updated
@efrecon
Copy link
Contributor

efrecon commented Feb 18, 2022

obelisk is great, I have just tested the latest release on a few examples and it does a good job at preserving the original layout and content.

@fmartingr
Copy link
Member Author

I still haven't tested/checked it yet, but the other day I stumbled randomly with https://github.com/gildas-lormeau/SingleFile and it also seemed quite good (and having just a single HTML as output it's quite useful as well).

@grawlinson
Copy link

grawlinson commented Feb 19, 2022

I'm in the process of packaging shiori for the AUR, and I strongly recommend staying within the Go ecosystem (obelisk can be imported as a go module!) as relying on external tools (e.g. SingleFile) defeats one of shiori's major selling points.

EDIT: Additionally, SingleFile requires a browser binary to be present, which is a Pandora's box in itself.

@fmartingr
Copy link
Member Author

fmartingr commented Feb 19, 2022

I'm in the process of packaging shiori for the AUR, and I strongly recommend staying within the Go ecosystem (obelisk can be imported as a go module!) as relying on external tools (e.g. SingleFile) defeats one of shiori's major selling points.

EDIT: Additionally, SingleFile requires a browser binary to be present, which is a Pandora's box in itself.

Just to clarify (because I didn't express myself very well): I like how SingleFile works (the single HTML file output) but I do not plan to replace warc with it. The plan still is to go for Obelisk. :)

Edit: Yeah, when I made my first comment I didn't realise that Obelisk's output is also a Single HTML file 😅

@grawlinson
Copy link

Thanks for clarifying that!

A package is now available on the AUR, so if there are any bug reports relating to Arch Linux, tag me and I'll attempt to help out.

@gildas-lormeau
Copy link

gildas-lormeau commented Oct 13, 2022

EDIT: Additionally, SingleFile requires a browser binary to be present, which is a Pandora's box in itself.

For the record, this statement is false. SingleFile can work with JSDOM. Anyway, good luck!

@fmartingr
Copy link
Member Author

EDIT: Additionally, SingleFile requires a browser binary to be present, which is a Pandora's box in itself.

For the record, this statement is false. SingleFile can work with JSDOM. Anyway, good luck!

Thanks for the clarification, and even if I love SingleFile (I has helped me a ton while moving out to a new flat!), it would add unnecessary complexity for us. So far, obelisk seems to provide the expected results, and we could use this migration to move that project further in the go world :)

@gildas-lormeau
Copy link

Thanks for the feedback! Personally, I think that in 2022, you have to use a web browser for this kind of tasks. Also, it's really becoming essential when it comes to determining what to really save. This is where SingleFile, generally, stands out. A very large part of the code consists in optimizing the size of the saved page. To do this, a browser is unfortunately required.

@fmartingr fmartingr modified the milestones: 1.6.0, 1.6.1 Jul 4, 2023
@fmartingr fmartingr modified the milestones: 1.6.1, 2.1.0 Jul 21, 2023
@fmartingr fmartingr changed the title Replace Warc with Obelisk Support Obelisk archiving Jul 21, 2023
@ivanrg99
Copy link

What's the status on this? One of the reasons why we choose to run software like Shiori is for archiving purposes, to prevent link-rot and preserve information/knowledge. Having our bookmarks stored in a binary data format as opposed to plain text hurts data preservation. Do you need any help with the transition to Obelisk? Is anyone working on this at the moment?

@Monirzadeh
Copy link
Collaborator

Monirzadeh commented Jan 18, 2024

Personally try to make it ready to use later. currently i work on go-shiori/obelisk#96 and go-shiori/obelisk#98
we have some open issue there too. you can work on any aspect that you like.

@fmartingr
Copy link
Member Author

I need to sit down and pave the way for people to start implementing this features. I started a draft under #481 some time ago but didn't sat down again on that since there were other things that had priority like the API. I guess the API migration will get faster over time while we refactor the logic in different components, but that's still the main priority now.

For this to work, we will need to isolate the archiving logic in its own domain and provide backwards compatibility, which will require a migration adding a new column specifying which archive format a bookmark is currently in.

What I'm trying to say is that it can be done and on my radar, but is not trivial. Once 1.6 is released I need to sit down and work on the roadmap again, defining some issues that we need to work on several things and probably making some PRs to preprare for that to happen.

@dehlen
Copy link

dehlen commented Feb 26, 2024

Hey,

I am eagerly awaiting the work on this issue. I would like to migrate my catalog of bookmarks saved in instapaper to shiori and self host this on my local network. However what is holding me back is that the current implementation stores the archived bookmark in a bolt database. I am now wondering whether I should wait for obelisk support in shiori or if it makes sense to migrate right away. I do not want to import all my bookmarks again whenever obelisk is added and am wondering how likely it is there will exist a migration path for previously archived bookmarks to be converted from the bolt db to an html output created by obelisk.

As I understand it is definitely on your radar but it's just something you didn't find time to look at yet. My comment shouldn't pressure you in any way it's more of a +1 for this feature and to be subscribed to the ongoing discussion. Whenever you have new information I am very keen to hear them regarding this issue :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: To do
Development

Successfully merging a pull request may close this issue.

7 participants