Article download stopped working #9

irgendwienet · 2017-07-02T20:11:22Z

A few days ago the article download just stopped working.

The plugin will login and fetch the list of articles since I can see it processing all articles in the calibre jobs window.
But it will not download any article and just produces an empty book in calibre with a size of < 0.1 MB.

The logfile is full of entries like

EOS 6D Mark II: Canon stellt neue Spiegelreflexkamera vor from The latest
https://getpocket.com/a/read/1802030952
Traceback (most recent call last):
  File "site-packages\calibre\utils\threadpool.py", line 100, in run
  File "site-packages\calibre\web\feeds\news.py", line 1148, in fetch_obfuscated_article
  File "<string>", line 191, in get_obfuscated_article
  File "<string>", line 176, in get_textview
  File "re.py", line 146, in search
TypeError: expected string or buffer

Maybe pocket starts hating us (or just changes something) - see comment in code This function will break when pocket hates us

Unfortunately I've got no python dev environment available to go deeper into it.

The text was updated successfully, but these errors were encountered:

alvaroreig · 2017-07-03T08:30:22Z

I can confirm the issue:
1% Starting download [5 thread(s)]... Failed to download article: Machine learning como ventaja competitiva from https://getpocket.com/a/read/1804699537 2% Article download failed: Machine learning como ventaja competitiva Failed to download article: Qué es la autofagia y qué puedes tomar durante un ayuno from https://getpocket.com/a/read/1804789356 4% Article download failed: Qué es la autofagia y qué puedes tomar durante un ayuno Failed to download article: Just because you can doesn't mean you should from https://getpocket.com/a/read/1805644174 5% Article download failed: Just because you can doesn't mean you should Failed to download article: El fin de Terra from https://getpocket.com/a/read/1805693620 7% Article download failed: El fin de Terra Failed to download article: Propuesta de mejora en la función pública Española from https://getpocket.com/a/read/1806643733 8% Article download failed: Propuesta de mejora en la función pública Española Failed to download article: Cultura pop bajo la bota del franquismo from https://getpocket.com/a/read/1804561122 10% Article download failed: Cultura pop bajo la bota del franquismo Failed to download article: Your Bob Dylan story from https://getpocket.com/a/read/1804567774
I also receive the empty ebook.

Marcin, I do not have any experience with python and calibre, but let me know if I can help you somehow.

Thanks a lot, regards,

mmagnus · 2017-07-03T08:50:12Z

Thanks, I will look into that in the evening. I hope we can fix it asap.

mmagnus · 2017-07-03T17:32:42Z

Ok, confirmed:

Failed to download article: H2O.ai on Twitter from https://getpocket.com/a/read/1802691317
Traceback (most recent call last):
  File "site-packages/calibre/utils/threadpool.py", line 100, in run
  File "site-packages/calibre/web/feeds/news.py", line 1148, in fetch_obfuscated_article
  File "<string>", line 192, in get_obfuscated_article
  File "<string>", line 177, in get_textview
  File "lib/python2.7/re.py", line 146, in search
TypeError: expected string or buffer

Let's see what I can do...

dramalho · 2017-07-05T16:46:13Z

Any clues? Enquiring and very appreciative minds want to know :)

mmagnus · 2017-07-05T17:01:54Z

Not yet. I guess Pockets hates us right now.. I'll try again today. :(

dramalho · 2017-07-05T17:03:14Z

I sort of tried the article URL + the ajax postfix - which is how I think you get the articles? - but only on the browser, which only redirects back .. so not great help there :) .

dramalho · 2017-07-05T17:03:31Z

I think they hate everybody, the calibre stock recipe also fails :)

dramalho · 2017-07-05T17:03:53Z

anyhoo, thank you Marcin :)

boguszk · 2017-07-05T19:57:09Z

Hi, I can share what I found till now, maybe it will help somehow.
fc_id = re.search(r"formCheck = \'([\d\w]+)\';", fc_tag).group(1)
is broken beacuse
fc_tag = soup.find('script', text=re.compile("formCheck"))
returns None.
It returns None, beacuse in
soup = self.index_to_soup(url)
we don't have pretty article page with var formCheck waiting to be parsed,
but only getpocket login page.
For some reason, in get_browser() we cannot successfuly login to Pocket anymore.
reCaptcha maybe?

I also saw how current articles download method was created.

Maybe we could switch to use only methods from API and replace forbidden by Pocket's v3 API textview and download full or somehow parsed articles?

mmagnus · 2017-07-07T21:24:26Z

Not successful yet in fixing. Maybe next week ;( ...

mgreen3 · 2017-07-10T16:11:04Z

Thanks for all the work Marcin, keep us updated! Really get good use out of and depend on this plugin :D

mmagnus · 2017-07-13T16:44:12Z

@boguszk It seems that we can log in because I got names of my articles. I was able to get fc_id but still, I was not able to get it to run. I pushed the code with some attempts to solve it. Maybe someone is able to take it from here. I have to read more to fix it.

	fc_id dd6ccf03f4999b393ef511be8a9047cdddce6bfe29792314c0a5de35caf59504
	('ai', u'936379328')
	('data', 'itemId=936379328&form_check=dd6ccf03f4999b393ef511be8a9047cdddce6bfe29792314c0a5de35caf59504')
	https://getpocket.com/a/x/getArticle.php
	itemId=936379328&form_check=dd6ccf03f4999b393ef511be8a9047cdddce6bfe29792314c0a5de35caf59504
	{'p3p': 'policyref="/w3c/p3p.xml", CP="ALL CURa ADMa DEVa OUR IND UNI COM NAV INT STA PRE"', 'date': 'Thu, 13 Jul 2017 16:33:19 GMT', 'pragma': 'no-cache', 'cache-control': 'no-store, no-cache, must-revalidate, post-check=0, pre-check=0', 'content-type': 'application/json', 'x-frame-options': 'SAMEORIGIN', 'server': 'Apache', 'connection': 'keep-alive', 'set-cookie': 'PHPSESSID=jc4n8akaf2312iv1rah066k7l2; path=/', 'status': '200', 'expires': 'Thu, 19 Nov 1981 08:52:00 GMT', 'content-length': '22'}
	{"status":0,"error":1}
	<Response [200]>
	Failed to download article: *args and **kwargs in python explained from https://getpocket.com/a/read/936379328

mmagnus · 2017-07-18T14:58:02Z

It's interesting that https://www.crofflr.com works. Temporary solution until we find a solution for this plugin.

mmagnus · 2017-07-28T10:33:55Z

Hey, I'm so sorry. I have the super hard deadline for my PhD at the beginning of September and I can't work to fix this plugin. Maybe some of you, is able to look at it. If not, maybe we can reach the original team to ask them for help. Sorry, this is how it looks right now.

alvaroreig · 2017-07-29T10:46:12Z

@mmagnus best of luck with your PhD, and many thanks! I will give it a try.

alvaroreig · 2017-07-29T12:35:43Z

(disclaimer: python & calibre rookie)

I am where @mmagnus was some days ago:

It looks like the formCheck value is (now) located in the original_url_container:

<li class="original_url_container"><a class="original_url" href="https://getpocket.com/redirect?url=http%3A%2F%2Fwww.jotdown.es%2F2017%2F07%2Fnostalgia-remasterizada-recuerdos-alta-definicion%2F&**formCheck=923af088bcb279dd1c3820c502e694c2**" rel=" noopener noreferrer" target="_blank" title="Ver original">jotdown.es</a></li>

However, I still can't get to https://getpocket.com/a/x/getArticle.php?itemId=1837879542&formCheck=923af088bcb279dd1c3820c502e694c2

(where 1837879542 = item id)

I still get {"status":0}

Despite being logged into pocket (I am testing from chrome before anything)

We could contact pocket. It is true that we are using web scrapping, but they already know that: https://getpocket.com/apps/ebooks/

spinda · 2017-08-16T17:08:08Z

It appears the problem is that the script doesn't have the right cookies set to send to getArticle.php... because the login is failing... because they've added a reCAPTCHA challenge to the login form :(

victorvogelpoel · 2017-08-23T05:21:17Z

Pity, just now I wanted to convert my year's supply of Pocket articles to ePUB for reading on an e-reader during vacation...

Podesta · 2017-08-23T18:10:26Z

For you guys that know a bit what you are doing, which is not my case, you can contact pocket dev team directly and ask for their help at [email protected].

spinda · 2017-08-23T20:47:44Z

Personally I've rigged up a version of this which takes an API key with access to Pocket's protected GetText API. If I have the time I'll polish it up and publish it somewhere.

caglorithm · 2017-08-24T15:11:55Z

@spinda let us please have it as well!

mmagnus · 2017-08-24T20:52:41Z

@spinda can you make a pull request or put it anywhere. I have still two hardcore weeks of PhD writing... I can't do anything before ;(

dannyfile · 2017-09-16T23:58:53Z

Hey, @spinda. It's been a long time. Maybe you could share your fix with us?

bluesodium · 2017-09-25T12:49:27Z

A fix for this would be great..

mmagnus · 2017-10-11T18:27:01Z

@spinda can you help us. I just submitted my thesis ;-) I should have more time for reading, and I would love to have this plugin working! :-)

alvaroreig · 2017-10-13T08:13:26Z

Congrats @mmagnus!

caglorithm · 2017-11-15T13:58:27Z

I can’t give up pocket. And my Kindle is basically worthless without this plugin working… :( I have fiddled with the python script myself but I can’t get it to run. Can we test the script in a separate environment instead of running it in calibre? It’s hard to debug it if you need to run it in calibre...

rayslava · 2017-11-15T15:22:21Z

The same question about environment. Maybe it's possible to "cut" that basic browser functional from Calibre into separate "browser host" for debug purposes?

alvaroreig · 2017-11-15T16:22:59Z

@jonathancardonarojas I know how you feel, I used to run this software every night, and even coded a Docker wrapper around it: https://github.com/alvaroreig/pocket2kindle

However, as @mmagnus noted, https://www.crofflr.com still works. It is not as good (all the items are together, you can't split by tag, the images are worse, etc.) but you can still read your news.

Regards,

vucalur · 2017-11-15T17:45:37Z

To all of you seeking a competitive fall back,
take a second look at Instapaper's recipe.

You can have your content broken down into categories too:
Simply add links to your Instapaper's folders to the feeds list.

@jonathancardonarojas, apart from breaking down by tags and marking as read,
what is it about Pocket's recipe that you find missing in Instapaper's one ?

@spinda Could you please share your recipe ?
As you can see from Instapaper's recipe, it doesn't have to be long and have all the bells and whistles.
Just downloading articles. That's all.

Off-topic:
At the end it all comes down to the parsing capabilities of a service (Instapaper vs. Pocket). The recipe is just a tiny little tool.
2 years ago both services had troubles with parsing some web content.
I experimented a bit and found out that Instapaper does a better job for the content I tend to read.
To this date, it still has problems with, say, 15% of articles I tend to read.
For those 15% I fire up Pocket's recipe.
Therefore, even with a decent fall back in place, I would greatly appreciate to have this recipe fixed.
Besides, it seems that parsing in Pocket has improved since I ditched it for Instapaper 2 years ago. Using just one service to keep track of one's reading, instead of bouncing around between two would be great.

vucalur · 2017-11-15T20:41:46Z

Oh, forgot to mention pagination. Yeah, you must provide explicit links to all of the pages of a folder.
However, it's not a big deal with following tips. Besides, you have a better control over what's being fetched and what's not.

pro-tip 1

feeds is an array. An array ( […, …, …] ) of tuples ((…, …, …)), to be exact.
The code in the repo is a bit unfortunate for demonstrational purposes, since the array contains only one element - only one folder.
Instead of fetching each folder, and each page of a folder in a separate recipe, you can do all at once:

    feeds = [
            (u'Unread p1', u'https://www.instapaper.com/u/1'),
            (u'Unread p2', u'https://www.instapaper.com/u/2'),
            (u'Unread p3', u'https://www.instapaper.com/u/3'),
            (u'Unread p4', u'https://www.instapaper.com/u/4'),
#            (u'Instapaper Starred', u'https://www.instapaper.com/starred'),
            (u'Philosophy', u'https://www.instapaper.com/u/folder/9327481/philosophy'),
            (u'RNA', u'https://www.instapaper.com/u/folder/8935324/rna'),
            (u'OSX', u'https://www.instapaper.com/u/folder/8374045/osx')
    ]

The end result on a kindle will be perfectly the same as for Pocket recipe.

pro-tip 2

Give a hit an url of an arbitrarily high page in your Instapaper's folder: https://www.instapaper.com/u/12345. Instead of returning HTTP error code (e.g. 404), Instapaper returns a page with no articles. That's why the recipe won't fail if you over-declare folder pages.
In the example above: What if you have only 2 pages in unread, not 4? Nothing, the fetch will complete successfully, but the resulting ebook will have some empty sections.

This enables you to keep some extra pages in the config, for folders that easily gather up loads of articles, and don't check the config on every single run.

For non-python folks

everything after # is a comment, till the end of a line.
In the example Starred folder is left out in a comment for easy activation, should there be a need in the future.
u-prefixed strings (u'blah ąłżó') are strings that support unicode.

vucalur · 2017-11-15T20:51:24Z

Apart from the recipe, there's also a download to kindle option built-in Instapaper itself.
I use the recipe anyway for historical reasons.
This does not mean, that the recipe for Pocket can be ditched. As I've mentioned, a proper parsing, or the lack thereof, is what distinguishes the two services.

caglorithm · 2017-11-15T23:42:18Z

I’m using Pocket for a variety of reasons, also because I’m used to the Mac and Android app, and it works with 95% of all articles I want to read. I appreciate everyone’s support and suggestions to find a working alternative to Pocket, but I think we should focus the discussion back on how to get the Pocket recipe working. I think the problem lies in the fact that you can’t get the full article content using Pocket’s API anymore. Getting the titles and tags still works. I don’t know how it was in the past, if they ditched the API or not at some point. It would be extremely frustrating if we would have to parse Pocket’s web view in order to get the article contents. Can anyone comment on that?

mmagnus · 2017-11-24T16:10:59Z

I'm trying also to fix it. Trying to learn anything from other tools. This works https://github.com/rakanalh/pocket-cli

Podesta · 2017-11-24T16:31:15Z

Hey @mmagnus , I know I've said it already, but a couple months ago I contacted pocket support about it, and they said I could talk to the devs directly with [email protected] From experience they tend to be quite friendly and helpful.

mmagnus · 2017-11-24T16:45:07Z

@Podesta Thanks. I can contact them, let's see if then can help :-)

mmagnus · 2017-11-24T16:59:53Z

Dear @rayslava @caglorithm I don't know if you have been there. I just discovered the way to debug the plugin:

[mm] custom_recipes$ ebook-convert Pocket__1004.recipe .epub --password XXXXX --username XXXXXX -vvv --debug-pipeline debug
Conversion options changed from defaults:
  debug_pipeline: u'debug'
  verbose: 3
  test: None
Resolved conversion options
calibre version: 3.12.0
{'asciiize': False,
 'author_sort': None,
 'authors': None,
...
{"status":0,"error":1}
<Response [200]>
Failed to download article: Jak to możliwe, że Chomikuj.pl nie jest jeszcze zamknięte? from https://getpocket.com/a/read/169221044

I see the title, so something works. I'm trying to fix parsing.

rayslava · 2017-11-24T17:42:06Z

@mmagnus, great! Thanks. I think I'll have some time next week to dive into issue.

saschalalala · 2017-11-26T17:03:39Z

I tried to fiddle around with it and after getting some information about what went wrong, I'm quite sure that you are going to have to implement https://github.com/rakanalh/pocket-api if you want to make it work again. It seems to me like accessing the API without oAuth is not supported anymore.

Edit: For me it looks like there is no way to access the articles themselves via API. The docs say

The Pocket Article View API is currently only open to partners that are integrating Pocket specific features or full-fledged Pocket clients. For example, building a Pocket client for X platform.
If you are looking for a general text parser or to provide "read now" functionality in your app - we do not currently support that. There are other companies/products that provide that type of API, for example: Diffbot. (See https://getpocket.com/developer/docs/v3/article-view)

and the pocket-cli mentioned above also only retreives metadata and can open the articles in a webbrowser using the URL of the article, which is alwayshttps://getpocket.com/a/read/<article_id>

Another edit for everyone who wants to debug and edit this plugin a little better than directly inside Calibre: You can oben the recipes with your preferred editor. On MacOS they are stored in ~/Library/Preferences/calibre/custom_recipes/Pocket__1002.recipe. I changed some things inside Visual Studio Code, saved the file, restarted the Job (inside Calibre) and looked at the Job logs afterwards (also inside Calibre).

Another Edit: The following method still seems to work. Could be used as a temporary workaround for anyone who depends on this plugin: https://www.reddit.com/r/kindle/comments/1wcznt/way_to_import_articles_from_pocket_to_kindle/chmnall/ Note: Makes your rss feed publicly available

Danilka · 2018-07-05T07:16:46Z

Have anyone looked at this? It's still not working :-(

dlo9 · 2019-07-27T09:06:05Z

For anyone still interested, I've recently created a replacement here to work with Pocket's v3 API. I also have an RSS version here that allows use of RSS feeds as discussed above without making your feeds public.

dagomar · 2019-07-27T12:50:59Z

Story time. After 5+ years of not using my Kindle I decided to do a digital cleanse this holiday and just bring my Kindle for reading. It didn't turn on. Then this morning after reading some comments on an ifixit topic, it got it to load again and eventually it turned on. Awesome! So still this morning I installed Calibre, going through the ol' reading list and figured I'd like to put my extensive pocket list into my Kindle, by now you can guess where this is going, it didn't work. So this afternoon, hours later, I decided to look one more time and found the Pocket Plus plugin. Yet again a no-no. So, I'm thinking, let's see if anyone experienced this issue as well and found some little hack or something. Then I find dlo9's comment and can't believe my eyes. Perfect timing sir!

Thanks!

Monirzadeh · 2020-01-02T11:59:48Z

i can't add that to calibre4.7

mmagnus · 2020-01-02T19:47:24Z

For anyone still interested, I've recently created a replacement here to work with Pocket's v3 API. I also have an RSS version here that allows use of RSS feeds as discussed above without making your feeds public.

wow, I have missed this. YEAH! This works, I will try to merge it into my plugin if I can. Thanks @dlo9 amazing work!

To authorize you might just click on this:

https://getpocket.com/auth/authorize?request_token=88ac5a8d-f42c-16ae-ff93-3fde82&redirect_uri=https://calibre-ebook.com/

in the plugin, it gets buried into a log, at least in my case.

mmagnus · 2020-01-02T20:08:45Z

i can't add that to calibre4.7

Can you elaborate, @Monirzadeh it works for me, I do have Calibre 4.7

Monirzadeh · 2020-01-03T11:49:31Z

i can't add that to calibre4.7

Can you elaborate, @Monirzadeh it works for me, I do have Calibre 4.7

i update to calibre 4.8 and i can add that to my source
but it is download noting
it try to fetch article but all of them failed
UPDATE 1: i get lot of this error

Traceback (most recent call last):
  File "site-packages/calibre/utils/threadpool.py", line 102, in run
  File "site-packages/calibre/web/feeds/news.py", line 1151, in fetch_obfuscated_article
  File "<string>", line 236, in get_obfuscated_article
  File "<string>", line 198, in get_textview
UnboundLocalError: local variable 'fc_id' referenced before assignment

alvaroreig · 2020-01-05T09:10:27Z

Hi there @Monirzadeh @mmagnus @dlo9

I can confirm that the fix (as merged yesterday by @mmagnus ) works for me in calibre 4.8. I had to manually create the ~/.config/calibre/custom_recipes directory though.

I am missing the old sort by tags functionality. Do you think you will be able to get it back @mmagnus?

Despite that, I think that this fix provides a better ebook than using crofflr, which has been my plan B all this time. Both list every article under a single section ("All articles"), but crofflr only manages to download the first image in every article, while this version does seems to provide every image for all articles.

Thanks a lot!

mmagnus · 2020-01-05T15:35:23Z

@alvaroreig ;-) of course. Fortunately, it was easy to hack into getting tagged articles. It works on my computer, I will try to push changes today, in a few hours! :-) I'm super excited about it. Thanks, @dlo9 for sharing your code!

FIg. This is my articles for "invest" tag, on investing etc.

blodt · 2020-05-03T16:49:16Z

For anyone still interested, I've recently created a replacement here to work with Pocket's v3 API. I also have an RSS version here that allows use of RSS feeds as discussed above without making your feeds public.

Thank you SO much for making this!! Just installed it and tested it and it seems to be working.
Perfect for pulling down and then transferring long articles to my Nook.

I really appreciate it - thank you!

mmagnus closed this as completed in 6b44fd2 Jul 13, 2017

mmagnus reopened this Jul 13, 2017

mmagnus closed this as completed Nov 24, 2017

mmagnus reopened this Nov 24, 2017

mmagnus changed the title ~~Article download stopped working~~ Article download stopped working -- [some dirty solution] Jan 2, 2020

mmagnus changed the title ~~Article download stopped working -- [some dirty solution]~~ Article download stopped working -- [some solution to get Pocket running] Jan 2, 2020

mmagnus closed this as completed in b9c2ae3 Jan 5, 2020

mmagnus mentioned this issue Jan 5, 2020

Failed to download article #12

Closed

mmagnus changed the title ~~Article download stopped working -- [some solution to get Pocket running]~~ Article download stopped working Jan 5, 2020

mmagnus mentioned this issue Jan 5, 2020

some changes to try to fix it #10

Closed

2 tasks

Article download stopped working #9

Article download stopped working #9

Comments

irgendwienet commented Jul 2, 2017

alvaroreig commented Jul 3, 2017

mmagnus commented Jul 3, 2017 • edited Loading

mmagnus commented Jul 3, 2017

dramalho commented Jul 5, 2017

mmagnus commented Jul 5, 2017 • edited Loading

dramalho commented Jul 5, 2017

dramalho commented Jul 5, 2017

dramalho commented Jul 5, 2017

boguszk commented Jul 5, 2017

mmagnus commented Jul 7, 2017

mgreen3 commented Jul 10, 2017

mmagnus commented Jul 13, 2017

mmagnus commented Jul 18, 2017

mmagnus commented Jul 28, 2017

alvaroreig commented Jul 29, 2017

alvaroreig commented Jul 29, 2017 • edited Loading

spinda commented Aug 16, 2017

victorvogelpoel commented Aug 23, 2017

Podesta commented Aug 23, 2017

spinda commented Aug 23, 2017

caglorithm commented Aug 24, 2017

mmagnus commented Aug 24, 2017

dannyfile commented Sep 16, 2017

bluesodium commented Sep 25, 2017

mmagnus commented Oct 11, 2017

alvaroreig commented Oct 13, 2017

caglorithm commented Nov 15, 2017 via email

rayslava commented Nov 15, 2017

alvaroreig commented Nov 15, 2017

vucalur commented Nov 15, 2017

vucalur commented Nov 15, 2017

pro-tip 1

pro-tip 2

For non-python folks

vucalur commented Nov 15, 2017 • edited Loading

caglorithm commented Nov 15, 2017 via email

mmagnus commented Nov 24, 2017

Podesta commented Nov 24, 2017

mmagnus commented Nov 24, 2017

mmagnus commented Nov 24, 2017

rayslava commented Nov 24, 2017

saschalalala commented Nov 26, 2017 • edited Loading

Danilka commented Jul 5, 2018

dlo9 commented Jul 27, 2019

dagomar commented Jul 27, 2019

Monirzadeh commented Jan 2, 2020

mmagnus commented Jan 2, 2020

mmagnus commented Jan 2, 2020

Monirzadeh commented Jan 3, 2020 • edited Loading

alvaroreig commented Jan 5, 2020

mmagnus commented Jan 5, 2020 • edited Loading

blodt commented May 3, 2020

mmagnus commented Jul 3, 2017 •

edited

Loading

mmagnus commented Jul 5, 2017 •

edited

Loading

alvaroreig commented Jul 29, 2017 •

edited

Loading

vucalur commented Nov 15, 2017 •

edited

Loading

saschalalala commented Nov 26, 2017 •

edited

Loading

Monirzadeh commented Jan 3, 2020 •

edited

Loading

mmagnus commented Jan 5, 2020 •

edited

Loading