Skip to content
This repository has been archived by the owner on Jul 26, 2023. It is now read-only.

Pocket's TextView system has changed, breaking book generation #5

Closed
Wiltons opened this issue Nov 27, 2012 · 14 comments
Closed

Pocket's TextView system has changed, breaking book generation #5

Wiltons opened this issue Nov 27, 2012 · 14 comments
Assignees

Comments

@Wiltons
Copy link

Wiltons commented Nov 27, 2012

I noticed recently that the recipe is now fetching the "Looks like you're kicking it old school" page when trying to fetch items.

From a brief look, I'm not sure the URL for the article is correct: it returns the format http:https://getpocket.com/textview//?t=unread&p= which leads to a blank page if pasted into the browser.

@Wiltons Wiltons closed this as completed Nov 27, 2012
@onlyhavecans
Copy link
Owner

They have recently changed everything and I will need to fix this asap

@onlyhavecans onlyhavecans reopened this Nov 27, 2012
@Wiltons
Copy link
Author

Wiltons commented Nov 27, 2012

Sorry about that, the subject is wrong - I was messing around with trying to fix it myself and the getpocket.com/a/read/id causes the browser update page!

@ghost ghost assigned onlyhavecans Nov 28, 2012
@onlyhavecans
Copy link
Owner

Yes, they've deprecated the old style text view (mainly because they wanted people like us to stop scraping it) and then also have some checks in place to help prevent scrapers from using the new picket web page. I can work around it but assume this may quickly turn into a very unpleasant cat and mouse game.

HOWEVER their new v3 api that they want everyone to use is nothing short of awesome IF not for the privatized text view and required OAuth scheme. I have found a way around it and am working on a API version now...

@mdirik
Copy link

mdirik commented Jan 13, 2013

how is the api version going? next week i will have free time and i'm interested to work on it.

@onlyhavecans
Copy link
Owner

I've got a branch going https://github.com/tbunnyman/ReadItLater-Calibre-Plugin/tree/api-v3
Everything BUT the actual text view works.

What's going to need to get done is the way they return the text view, which is now done in ajax and some css tricks to obfuscate it, needs to be reversed so that we can pull the direct article text saved. There is a method you can override from super to change the way it fetches the article which willk probably be the cleanest way of modifying the pull method to match theirs.

@mdirik
Copy link

mdirik commented Jan 13, 2013

sorry, ajax is not my strong point.
would it be feasible getting the original url of each item directly using them, or sending each url to some other service such as readability to get text view?

@onlyhavecans
Copy link
Owner

Well, yes and no.
The problem is most of these services aren't open and free. They require privatized API keys and have restrictions on how many requests per period you can use. If you know of a reader service that restricts it's usage based on IP and not an API key we would be in business!
Otherwise in the least we would be in violation of some random TOS or at worse the plugin would work in spurts first come first serve till the reset period. This is why Pocket has been unwilling to work with me on using theirs, because I would be sharing out the API key with special abilities and that is a no-no.

@Hofferic
Copy link

well if the problem with their api is the requirement for the user to authorize an app in order to give that app a personalized api key, you could make that key a variable to be filled in by the user, as are username and sometimes password in other scripts. that would require users to edit the script but i think it may be the best way to get the text view without having to play catch up with the web view

@onlyhavecans
Copy link
Owner

I've gotten around that part actually.
I cheated and used the fact that there is undocumented parts of the api, like the ability to just pass &username= and &password= instad of establishing an oAuth token. This may or may not work in the future but it at least makes the script easier for everyone to use for now.

@onlyhavecans
Copy link
Owner

Hit comment too soon, sorry.
As I mentioned in the above post, I have access to the article but they now obfuscate it's display and rendering greatly. The api level access to the text view requires special permissions attached to the application's key
http:https://getpocket.com/developer/docs/v3/article-view

According to them, and I quote.
"""
The Pocket Article View API is currently only open to partners that are integrating Pocket specific features or full-fledged Pocket clients. For example, building a Pocket client for X platform.

If you are looking for a general text parser or to provide "read now" functionality in your app - we do not currently support that.
"""

I feel this plugin fits that but they have not be amicable to me about getting access. While they won't say it I believe it's the fact that the API key is posted openly in the script. That or they just hate me and all the friends I have had write them about it as well

@Hofferic
Copy link

Well that could certaily be (seeing as they want to restrict that full feature set to a couple of apps).

So the way to go is scraping the page for the textview. Looking at the Ajax its actually rather simple if you have already circumvented the authorisation problem:
The page is loaded and then the javascript takes over to determine what to display. It then queries the server for all the info it needs. And here's the kicker: Everything you need - tags, title, author, original url and the text view in html formattig - is transmitted in one request. The answer is in perfect unscrambled JSON ;)

In order to get this (assuming you are already logged in and transmitting the correct cookies) you load the article page as you would in the browser (http:https://getpocket.com/a/read/ID), then scrape that page for the unique ajax authorisation key (formCheck) and THEN fetch the ajax-interface-page (http:https://getpocket.com/a/x/getArticle.php) using POST to transmit itemId (the article ID) and formCheck (the ajax authorisation key) which can be readily found in the unevaluated source code of the original page:

<!--app-->
<script type="text/javascript">
    var formCheck = 'd5f7401174a8252c3ff84f129911ae13';

You are then presented with the text view and all the metadata in JSON format ;)

@onlyhavecans
Copy link
Owner

Well then! Sounds like you just gave us exactly what we have been looking for! (or should have been looking for, I have been working on a release of another one of my projects)

Later today when I am free to shift gears fully I will mod the script to do this and try it out!

@Hofferic
Copy link

I'm glad to be of help, after all I'm looking forward to using the script again and utterly incapable of writing anything in python myself :D

@onlyhavecans
Copy link
Owner

Fixed by #8

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants