Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Article download stopped working #9

Closed
irgendwienet opened this issue Jul 2, 2017 · 53 comments
Closed

Article download stopped working #9

irgendwienet opened this issue Jul 2, 2017 · 53 comments

Comments

@irgendwienet
Copy link

A few days ago the article download just stopped working.

The plugin will login and fetch the list of articles since I can see it processing all articles in the calibre jobs window.
But it will not download any article and just produces an empty book in calibre with a size of < 0.1 MB.

The logfile is full of entries like

EOS 6D Mark II: Canon stellt neue Spiegelreflexkamera vor from The latest
https://getpocket.com/a/read/1802030952
Traceback (most recent call last):
  File "site-packages\calibre\utils\threadpool.py", line 100, in run
  File "site-packages\calibre\web\feeds\news.py", line 1148, in fetch_obfuscated_article
  File "<string>", line 191, in get_obfuscated_article
  File "<string>", line 176, in get_textview
  File "re.py", line 146, in search
TypeError: expected string or buffer

Maybe pocket starts hating us (or just changes something) - see comment in code This function will break when pocket hates us

Unfortunately I've got no python dev environment available to go deeper into it.

@alvaroreig
Copy link
Collaborator

I can confirm the issue:
1% Starting download [5 thread(s)]... Failed to download article: Machine learning como ventaja competitiva from https://getpocket.com/a/read/1804699537 2% Article download failed: Machine learning como ventaja competitiva Failed to download article: Qué es la autofagia y qué puedes tomar durante un ayuno from https://getpocket.com/a/read/1804789356 4% Article download failed: Qué es la autofagia y qué puedes tomar durante un ayuno Failed to download article: Just because you can doesn't mean you should from https://getpocket.com/a/read/1805644174 5% Article download failed: Just because you can doesn't mean you should Failed to download article: El fin de Terra from https://getpocket.com/a/read/1805693620 7% Article download failed: El fin de Terra Failed to download article: Propuesta de mejora en la función pública Española from https://getpocket.com/a/read/1806643733 8% Article download failed: Propuesta de mejora en la función pública Española Failed to download article: Cultura pop bajo la bota del franquismo from https://getpocket.com/a/read/1804561122 10% Article download failed: Cultura pop bajo la bota del franquismo Failed to download article: Your Bob Dylan story from https://getpocket.com/a/read/1804567774
I also receive the empty ebook.

Marcin, I do not have any experience with python and calibre, but let me know if I can help you somehow.

Thanks a lot, regards,

@mmagnus
Copy link
Owner

mmagnus commented Jul 3, 2017

Thanks, I will look into that in the evening. I hope we can fix it asap.

@mmagnus
Copy link
Owner

mmagnus commented Jul 3, 2017

Ok, confirmed:

Failed to download article: H2O.ai on Twitter from https://getpocket.com/a/read/1802691317
Traceback (most recent call last):
  File "site-packages/calibre/utils/threadpool.py", line 100, in run
  File "site-packages/calibre/web/feeds/news.py", line 1148, in fetch_obfuscated_article
  File "<string>", line 192, in get_obfuscated_article
  File "<string>", line 177, in get_textview
  File "lib/python2.7/re.py", line 146, in search
TypeError: expected string or buffer

Let's see what I can do...

@dramalho
Copy link

dramalho commented Jul 5, 2017

Any clues? Enquiring and very appreciative minds want to know :)

@mmagnus
Copy link
Owner

mmagnus commented Jul 5, 2017

Not yet. I guess Pockets hates us right now.. I'll try again today. :(

@dramalho
Copy link

dramalho commented Jul 5, 2017

I sort of tried the article URL + the ajax postfix - which is how I think you get the articles? - but only on the browser, which only redirects back .. so not great help there :) .

@dramalho
Copy link

dramalho commented Jul 5, 2017

I think they hate everybody, the calibre stock recipe also fails :)

@dramalho
Copy link

dramalho commented Jul 5, 2017

anyhoo, thank you Marcin :)

@boguszk
Copy link

boguszk commented Jul 5, 2017

Hi, I can share what I found till now, maybe it will help somehow.
fc_id = re.search(r"formCheck = \'([\d\w]+)\';", fc_tag).group(1)
is broken beacuse
fc_tag = soup.find('script', text=re.compile("formCheck"))
returns None.
It returns None, beacuse in
soup = self.index_to_soup(url)
we don't have pretty article page with var formCheck waiting to be parsed,
but only getpocket login page.
For some reason, in get_browser() we cannot successfuly login to Pocket anymore.
reCaptcha maybe?

I also saw how current articles download method was created.

Maybe we could switch to use only methods from API and replace forbidden by Pocket's v3 API textview and download full or somehow parsed articles?

@mmagnus
Copy link
Owner

mmagnus commented Jul 7, 2017

Not successful yet in fixing. Maybe next week ;( ...

@mgreen3
Copy link

mgreen3 commented Jul 10, 2017

Thanks for all the work Marcin, keep us updated! Really get good use out of and depend on this plugin :D

@mmagnus mmagnus reopened this Jul 13, 2017
@mmagnus
Copy link
Owner

mmagnus commented Jul 13, 2017

@boguszk It seems that we can log in because I got names of my articles. I was able to get fc_id but still, I was not able to get it to run. I pushed the code with some attempts to solve it. Maybe someone is able to take it from here. I have to read more to fix it.

	fc_id dd6ccf03f4999b393ef511be8a9047cdddce6bfe29792314c0a5de35caf59504
	('ai', u'936379328')
	('data', 'itemId=936379328&form_check=dd6ccf03f4999b393ef511be8a9047cdddce6bfe29792314c0a5de35caf59504')
	https://getpocket.com/a/x/getArticle.php
	itemId=936379328&form_check=dd6ccf03f4999b393ef511be8a9047cdddce6bfe29792314c0a5de35caf59504
	{'p3p': 'policyref="/w3c/p3p.xml", CP="ALL CURa ADMa DEVa OUR IND UNI COM NAV INT STA PRE"', 'date': 'Thu, 13 Jul 2017 16:33:19 GMT', 'pragma': 'no-cache', 'cache-control': 'no-store, no-cache, must-revalidate, post-check=0, pre-check=0', 'content-type': 'application/json', 'x-frame-options': 'SAMEORIGIN', 'server': 'Apache', 'connection': 'keep-alive', 'set-cookie': 'PHPSESSID=jc4n8akaf2312iv1rah066k7l2; path=/', 'status': '200', 'expires': 'Thu, 19 Nov 1981 08:52:00 GMT', 'content-length': '22'}
	{"status":0,"error":1}
	<Response [200]>
	Failed to download article: *args and **kwargs in python explained from https://getpocket.com/a/read/936379328

@mmagnus
Copy link
Owner

mmagnus commented Jul 18, 2017

It's interesting that https://www.crofflr.com works. Temporary solution until we find a solution for this plugin.

@mmagnus
Copy link
Owner

mmagnus commented Jul 28, 2017

Hey, I'm so sorry. I have the super hard deadline for my PhD at the beginning of September and I can't work to fix this plugin. Maybe some of you, is able to look at it. If not, maybe we can reach the original team to ask them for help. Sorry, this is how it looks right now.

@alvaroreig
Copy link
Collaborator

@mmagnus best of luck with your PhD, and many thanks! I will give it a try.

@alvaroreig
Copy link
Collaborator

alvaroreig commented Jul 29, 2017

(disclaimer: python & calibre rookie)

I am where @mmagnus was some days ago:

It looks like the formCheck value is (now) located in the original_url_container:

<li class="original_url_container"><a class="original_url" href="https://getpocket.com/redirect?url=http%3A%2F%2Fwww.jotdown.es%2F2017%2F07%2Fnostalgia-remasterizada-recuerdos-alta-definicion%2F&amp;**formCheck=923af088bcb279dd1c3820c502e694c2**" rel=" noopener noreferrer" target="_blank" title="Ver original">jotdown.es</a></li>

However, I still can't get to https://getpocket.com/a/x/getArticle.php?itemId=1837879542&formCheck=923af088bcb279dd1c3820c502e694c2

(where 1837879542 = item id)

I still get {"status":0}

Despite being logged into pocket (I am testing from chrome before anything)

We could contact pocket. It is true that we are using web scrapping, but they already know that: https://getpocket.com/apps/ebooks/

@spinda
Copy link

spinda commented Aug 16, 2017

It appears the problem is that the script doesn't have the right cookies set to send to getArticle.php... because the login is failing... because they've added a reCAPTCHA challenge to the login form :(

@victorvogelpoel
Copy link

Pity, just now I wanted to convert my year's supply of Pocket articles to ePUB for reading on an e-reader during vacation...

@Podesta
Copy link

Podesta commented Aug 23, 2017

For you guys that know a bit what you are doing, which is not my case, you can contact pocket dev team directly and ask for their help at [email protected].

@spinda
Copy link

spinda commented Aug 23, 2017

Personally I've rigged up a version of this which takes an API key with access to Pocket's protected GetText API. If I have the time I'll polish it up and publish it somewhere.

@caglorithm
Copy link

@spinda let us please have it as well!

@mmagnus
Copy link
Owner

mmagnus commented Aug 24, 2017

@spinda can you make a pull request or put it anywhere. I have still two hardcore weeks of PhD writing... I can't do anything before ;(

@dannyfile
Copy link

Hey, @spinda. It's been a long time. Maybe you could share your fix with us?

@bluesodium
Copy link

A fix for this would be great..

@mmagnus
Copy link
Owner

mmagnus commented Oct 11, 2017

@spinda can you help us. I just submitted my thesis ;-) I should have more time for reading, and I would love to have this plugin working! :-)

@alvaroreig
Copy link
Collaborator

Congrats @mmagnus!

@caglorithm
Copy link

caglorithm commented Nov 15, 2017 via email

@rayslava
Copy link

The same question about environment. Maybe it's possible to "cut" that basic browser functional from Calibre into separate "browser host" for debug purposes?

@alvaroreig
Copy link
Collaborator

@jonathancardonarojas I know how you feel, I used to run this software every night, and even coded a Docker wrapper around it: https://github.com/alvaroreig/pocket2kindle

However, as @mmagnus noted, https://www.crofflr.com still works. It is not as good (all the items are together, you can't split by tag, the images are worse, etc.) but you can still read your news.

Regards,

@vucalur
Copy link

vucalur commented Nov 15, 2017

To all of you seeking a competitive fall back,
take a second look at Instapaper's recipe.

You can have your content broken down into categories too:
Simply add links to your Instapaper's folders to the feeds list.

@jonathancardonarojas, apart from breaking down by tags and marking as read,
what is it about Pocket's recipe that you find missing in Instapaper's one ?

@spinda Could you please share your recipe ?
As you can see from Instapaper's recipe, it doesn't have to be long and have all the bells and whistles.
Just downloading articles. That's all.

Off-topic:
At the end it all comes down to the parsing capabilities of a service (Instapaper vs. Pocket). The recipe is just a tiny little tool.
2 years ago both services had troubles with parsing some web content.
I experimented a bit and found out that Instapaper does a better job for the content I tend to read.
To this date, it still has problems with, say, 15% of articles I tend to read.
For those 15% I fire up Pocket's recipe.
Therefore, even with a decent fall back in place, I would greatly appreciate to have this recipe fixed.
Besides, it seems that parsing in Pocket has improved since I ditched it for Instapaper 2 years ago. Using just one service to keep track of one's reading, instead of bouncing around between two would be great.

@vucalur
Copy link

vucalur commented Nov 15, 2017

Oh, forgot to mention pagination. Yeah, you must provide explicit links to all of the pages of a folder.
However, it's not a big deal with following tips. Besides, you have a better control over what's being fetched and what's not.

pro-tip 1

feeds is an array. An array ( […, …, …] ) of tuples ((…, …, …)), to be exact.
The code in the repo is a bit unfortunate for demonstrational purposes, since the array contains only one element - only one folder.
Instead of fetching each folder, and each page of a folder in a separate recipe, you can do all at once:

    feeds = [
            (u'Unread p1', u'https://www.instapaper.com/u/1'),
            (u'Unread p2', u'https://www.instapaper.com/u/2'),
            (u'Unread p3', u'https://www.instapaper.com/u/3'),
            (u'Unread p4', u'https://www.instapaper.com/u/4'),
#            (u'Instapaper Starred', u'https://www.instapaper.com/starred'),
            (u'Philosophy', u'https://www.instapaper.com/u/folder/9327481/philosophy'),
            (u'RNA', u'https://www.instapaper.com/u/folder/8935324/rna'),
            (u'OSX', u'https://www.instapaper.com/u/folder/8374045/osx')
    ]

The end result on a kindle will be perfectly the same as for Pocket recipe.

pro-tip 2

Give a hit an url of an arbitrarily high page in your Instapaper's folder: https://www.instapaper.com/u/12345. Instead of returning HTTP error code (e.g. 404), Instapaper returns a page with no articles. That's why the recipe won't fail if you over-declare folder pages.
In the example above: What if you have only 2 pages in unread, not 4? Nothing, the fetch will complete successfully, but the resulting ebook will have some empty sections.

This enables you to keep some extra pages in the config, for folders that easily gather up loads of articles, and don't check the config on every single run.

For non-python folks

  • everything after # is a comment, till the end of a line.
    In the example Starred folder is left out in a comment for easy activation, should there be a need in the future.

  • u-prefixed strings (u'blah ąłżó') are strings that support unicode.

@vucalur
Copy link

vucalur commented Nov 15, 2017

  • Apart from the recipe, there's also a download to kindle option built-in Instapaper itself.
    I use the recipe anyway for historical reasons.
  • This does not mean, that the recipe for Pocket can be ditched. As I've mentioned, a proper parsing, or the lack thereof, is what distinguishes the two services.

@caglorithm
Copy link

caglorithm commented Nov 15, 2017 via email

@mmagnus
Copy link
Owner

mmagnus commented Nov 24, 2017

I'm trying also to fix it. Trying to learn anything from other tools. This works https://github.com/rakanalh/pocket-cli

@mmagnus mmagnus closed this as completed Nov 24, 2017
@mmagnus mmagnus reopened this Nov 24, 2017
@Podesta
Copy link

Podesta commented Nov 24, 2017

Hey @mmagnus , I know I've said it already, but a couple months ago I contacted pocket support about it, and they said I could talk to the devs directly with [email protected] From experience they tend to be quite friendly and helpful.

@mmagnus
Copy link
Owner

mmagnus commented Nov 24, 2017

@Podesta Thanks. I can contact them, let's see if then can help :-)

@mmagnus
Copy link
Owner

mmagnus commented Nov 24, 2017

Dear @rayslava @caglorithm I don't know if you have been there. I just discovered the way to debug the plugin:

[mm] custom_recipes$ ebook-convert Pocket__1004.recipe .epub --password XXXXX --username XXXXXX -vvv --debug-pipeline debug
Conversion options changed from defaults:
  debug_pipeline: u'debug'
  verbose: 3
  test: None
Resolved conversion options
calibre version: 3.12.0
{'asciiize': False,
 'author_sort': None,
 'authors': None,
...
{"status":0,"error":1}
<Response [200]>
Failed to download article: Jak to możliwe, że Chomikuj.pl nie jest jeszcze zamknięte? from https://getpocket.com/a/read/169221044

I see the title, so something works. I'm trying to fix parsing.

@rayslava
Copy link

@mmagnus, great! Thanks. I think I'll have some time next week to dive into issue.

@saschalalala
Copy link

saschalalala commented Nov 26, 2017

I tried to fiddle around with it and after getting some information about what went wrong, I'm quite sure that you are going to have to implement https://github.com/rakanalh/pocket-api if you want to make it work again. It seems to me like accessing the API without oAuth is not supported anymore.

Edit: For me it looks like there is no way to access the articles themselves via API. The docs say

The Pocket Article View API is currently only open to partners that are integrating Pocket specific features or full-fledged Pocket clients. For example, building a Pocket client for X platform.
If you are looking for a general text parser or to provide "read now" functionality in your app - we do not currently support that. There are other companies/products that provide that type of API, for example: Diffbot. (See https://getpocket.com/developer/docs/v3/article-view)

and the pocket-cli mentioned above also only retreives metadata and can open the articles in a webbrowser using the URL of the article, which is alwayshttps://getpocket.com/a/read/<article_id>

Another edit for everyone who wants to debug and edit this plugin a little better than directly inside Calibre: You can oben the recipes with your preferred editor. On MacOS they are stored in ~/Library/Preferences/calibre/custom_recipes/Pocket__1002.recipe. I changed some things inside Visual Studio Code, saved the file, restarted the Job (inside Calibre) and looked at the Job logs afterwards (also inside Calibre).

Another Edit: The following method still seems to work. Could be used as a temporary workaround for anyone who depends on this plugin: https://www.reddit.com/r/kindle/comments/1wcznt/way_to_import_articles_from_pocket_to_kindle/chmnall/ Note: Makes your rss feed publicly available

@Danilka
Copy link

Danilka commented Jul 5, 2018

Have anyone looked at this? It's still not working :-(

@dlo9
Copy link

dlo9 commented Jul 27, 2019

For anyone still interested, I've recently created a replacement here to work with Pocket's v3 API. I also have an RSS version here that allows use of RSS feeds as discussed above without making your feeds public.

@dagomar
Copy link

dagomar commented Jul 27, 2019

Story time. After 5+ years of not using my Kindle I decided to do a digital cleanse this holiday and just bring my Kindle for reading. It didn't turn on. Then this morning after reading some comments on an ifixit topic, it got it to load again and eventually it turned on. Awesome! So still this morning I installed Calibre, going through the ol' reading list and figured I'd like to put my extensive pocket list into my Kindle, by now you can guess where this is going, it didn't work. So this afternoon, hours later, I decided to look one more time and found the Pocket Plus plugin. Yet again a no-no. So, I'm thinking, let's see if anyone experienced this issue as well and found some little hack or something. Then I find dlo9's comment and can't believe my eyes. Perfect timing sir!

Thanks!

@Monirzadeh
Copy link

i can't add that to calibre4.7

@mmagnus
Copy link
Owner

mmagnus commented Jan 2, 2020

For anyone still interested, I've recently created a replacement here to work with Pocket's v3 API. I also have an RSS version here that allows use of RSS feeds as discussed above without making your feeds public.

wow, I have missed this. YEAH! This works, I will try to merge it into my plugin if I can. Thanks @dlo9 amazing work!

To authorize you might just click on this:

https://getpocket.com/auth/authorize?request_token=88ac5a8d-f42c-16ae-ff93-3fde82&redirect_uri=https://calibre-ebook.com/

in the plugin, it gets buried into a log, at least in my case.

@mmagnus
Copy link
Owner

mmagnus commented Jan 2, 2020

i can't add that to calibre4.7

Can you elaborate, @Monirzadeh it works for me, I do have Calibre 4.7

@mmagnus mmagnus changed the title Article download stopped working Article download stopped working -- [some dirty solution] Jan 2, 2020
@mmagnus mmagnus changed the title Article download stopped working -- [some dirty solution] Article download stopped working -- [some solution to get Pocket running] Jan 2, 2020
@Monirzadeh
Copy link

Monirzadeh commented Jan 3, 2020

i can't add that to calibre4.7

Can you elaborate, @Monirzadeh it works for me, I do have Calibre 4.7

i update to calibre 4.8 and i can add that to my source
but it is download noting
it try to fetch article but all of them failed
UPDATE 1: i get lot of this error

Traceback (most recent call last):
  File "site-packages/calibre/utils/threadpool.py", line 102, in run
  File "site-packages/calibre/web/feeds/news.py", line 1151, in fetch_obfuscated_article
  File "<string>", line 236, in get_obfuscated_article
  File "<string>", line 198, in get_textview
UnboundLocalError: local variable 'fc_id' referenced before assignment

@alvaroreig
Copy link
Collaborator

Hi there @Monirzadeh @mmagnus @dlo9

I can confirm that the fix (as merged yesterday by @mmagnus ) works for me in calibre 4.8. I had to manually create the ~/.config/calibre/custom_recipes directory though.

I am missing the old sort by tags functionality. Do you think you will be able to get it back @mmagnus?

Despite that, I think that this fix provides a better ebook than using crofflr, which has been my plan B all this time. Both list every article under a single section ("All articles"), but crofflr only manages to download the first image in every article, while this version does seems to provide every image for all articles.

Thanks a lot!

@mmagnus
Copy link
Owner

mmagnus commented Jan 5, 2020

@alvaroreig ;-) of course. Fortunately, it was easy to hack into getting tagged articles. It works on my computer, I will try to push changes today, in a few hours! :-) I'm super excited about it. Thanks, @dlo9 for sharing your code!

screenshot_2020_01_04T22_19_37+0100
FIg. This is my articles for "invest" tag, on investing etc.

@mmagnus mmagnus closed this as completed in b9c2ae3 Jan 5, 2020
@mmagnus mmagnus changed the title Article download stopped working -- [some solution to get Pocket running] Article download stopped working Jan 5, 2020
@blodt
Copy link

blodt commented May 3, 2020

For anyone still interested, I've recently created a replacement here to work with Pocket's v3 API. I also have an RSS version here that allows use of RSS feeds as discussed above without making your feeds public.

Thank you SO much for making this!! Just installed it and tested it and it seems to be working.
Perfect for pulling down and then transferring long articles to my Nook.

I really appreciate it - thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests