Skip to content

Commit

Permalink
Merge branch 'bugfix/remcaptcha' into develop
Browse files Browse the repository at this point in the history
  • Loading branch information
MaxKuehn committed Sep 3, 2019
2 parents 1189ca1 + f4a8c32 commit 9e38ac9
Show file tree
Hide file tree
Showing 14 changed files with 92 additions and 47 deletions.
13 changes: 13 additions & 0 deletions Notes.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,3 +56,16 @@
- the "Switch to single field" button saves the author name in the field "lastName"
- are you fucking shitting me? why the fuck would you do that?
- I guess I just can't assume that a field named "lastName" contains the authors last name m(

## Debug
- Zotero is basically a (firefox-like) browser?
- in Zotero: **Tools** > **Developer** > **Run JavaScript**
- then execute whatever JavaScript you want
- the console `toString`s the return values in the right pane
- might have to `JSON.stringify()` them before that
- Friends
- `window`, `Zotero`, `ZoteroPane` Objects
- `alert(…)` works
- `Object.keys()` to show (enumerable) properties
### Gotchas
- you can easily crash that console and then you have to restart zotero :(
55 changes: 27 additions & 28 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,36 +1,21 @@
# Zotero Scholar Citations (ZSC)

This is an add-on for Zotero, a research source management tool. The add-on automatically fetches numbers of citations of your Zotero items from Google Scholar and makes it possible to sort your items by the citations. Moreover, it allows batch updating the citations, as they may change over time.

When updating multiple citations in a batch, it may happen that citation queries are blocked by Google Scholar for multiple automated requests. If a blockage happens, the add-on opens a browser window and directs it to http:https://scholar.google.com/, where you should see a Captcha displayed by Google Scholar, which you need to enter to get unblocked and then re-try updating the citations. It may happen that Google Scholar displays a message like the following "We're sorry... but your computer or network may be sending automated queries. To protect our users, we can't process your request right now." In that case, the only solution is to wait for a while until Google unblocks you.

Currently, Zotero doesn't have any special field for the number of citations, that's why it is stored in the "Extra" field. To sort by this field you have to add it in the source listing table.

*IMPORTANT:* in version 1.8 the field for storing the number of citations has been changed from "Call Number" to "Extra" -- please update your column configuration.

The add-on supports both versions of Zotero:

1. Download the lastest version of the add-on from [the release page](https://github.com/MaxKuehn/zotero-scholar-citations/releases). It's an ".xpi" file.
1. In Zotero (Standalone) go to Tools -> Add-ons -> click the settings button in the top-right corner -> Install Add-on From File -> select the downloaded file and restart Zotero.

Read about how the add-on was made: http:https://blog.beloglazov.info/2009/10/zotero-citations-from-scholar-en.html

## Why the Fork

The original maintainer [Anton Beloglazov](https://github.com/beloglazov) seems semi-active.

[Texot](https://github.com/tete1030) fixed some stuff that needed fixing BADLY, that is

- Fix detection of google robot checking
- Show `No Citation Data` in failure cases instead of `00000`

**But there's more that should be done!**

## RoadMap
## Batching & CAPTCHAs
When updating multiple citations in a batch, it may happen that citation queries are blocked by Google Scholar for multiple automated requests. If a blockage happens, the add-on opens a browser window and directs it to http:https://scholar.google.com/, where you should see a Captcha displayed by Google Scholar, which you need to enter to get unblocked and then re-try updating the citations. It may happen that Google Scholar displays a message like the following "We're sorry... but your computer or network may be sending automated queries. To protect our users, we can't process your request right now." In that case, the only solution is to wait for a while until Google unblocks you.

The [RoadMap can be found here](https://github.com/MaxKuehn/zotero-scholar-citations/blob/master/RoadMap.md).
## Installation
The add-on supports Zotero Standalone:
1. Download the lastest version of the add-on from [the release page](https://github.com/MaxKuehn/zotero-scholar-citations/releases). It's an ".xpi" file.
1. In Zotero (Standalone) go to Tools -> Add-ons -> click the settings button in the top-right corner -> Install Add-on From File -> select the downloaded file and restart Zotero.

## Extra Column Info
Currently, Zotero doesn't have any special field for the number of citations, that's why it is stored in the "Extra" field. To sort by this field you have to add it in the source listing table.

### New Format in 1.8
In version 1.8 the field for storing the number of citations has been changed from "Call Number" to "Extra" -- please update your column configuration.

### New Format in 2.0.x
Version 2.0.0 introduced a new format for storing the citation count, i.e. `ZSCC: 0000001`. Unfortunately that means existing pre 2.0.0 entries are incompatible in terms of sorting and you have to update them.
Expand All @@ -50,11 +35,11 @@ The format of the staleness counter allows you to search for items with stale ci
### Existing "Extra"-Column Content
ZSC will
- update legacy ZSC "extra"-content, i.e. 5 digit citation counts and "No Citation Data" entries
- respect content that is already in the "Extra"-field
- ZSC will simply prepend the citation count to any existing content, so you can sort by the extra field to get the most cited items
- respect content that is already in the "Extra"-field by simply prepending the citation count to any existing content
- this allows you to sort by the extra field to easily get the most/least cited items

#### When Updates fail
Consider temporary cutting out/deleting the "Extra" content. ZSC will update the citation count. After that you can simply append the previously removed.
Consider temporary cutting out/deleting the "Extra" content. ZSC will update the citation count. After that you can simply append the previously removed information.

## Why is ZSC unable retrieve the citation count for item X?
The most likely culprit is that ZSC search is too precise :^). Some Items do not have as complete of an author list on google scholar as they have in Zotero.
Expand All @@ -71,6 +56,20 @@ One combination of authors will certainly yield the correct search.

You can also temporarly recreate that combination in Zotero. ZSC will then successfully query that item. Once you re-add the author however, updates will fail again. :(

## RoadMap
The [RoadMap can be found here](https://github.com/MaxKuehn/zotero-scholar-citations/blob/master/RoadMap.md).

## Why the Fork

The original maintainer [Anton Beloglazov](https://github.com/beloglazov) seems semi-active.

[Texot](https://github.com/tete1030) fixed some stuff that needed fixing BADLY, that is

- Fix detection of google robot checking
- Show `No Citation Data` in failure cases instead of `00000`

**But there's more that should be done!**

# License

Copyright (C) 2011-2013 Anton Beloglazov
Expand Down
2 changes: 1 addition & 1 deletion RoadMap.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
- improve captcha handling & introduce request batching
- if you update 200 papers you prob get captchas starting at 100 or so
- all remaining request will run into a captcha an result in a prompt
- even when the captcha situation is resolved, those items won't be update unless another update is requested
- even when the captcha situation is resolved, those items won't be updated unless another update is requested
- **solution/workaround**
- can't get around some sort of batching/sequencing
- if you throw 100 requests into the event loop, they'll happen no matter what
Expand Down
4 changes: 2 additions & 2 deletions chrome.manifest
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
content zoteroscholarcitations chrome/content/

overlay chrome:https://zotero/content/zoteroPane.xul chrome:https://zoteroscholarcitations/content/overlay.xul

locale zoteroscholarcitations en-GB chrome/locale/en-GB/
locale zoteroscholarcitations en-US chrome/locale/en-US/
locale zoteroscholarcitations ru-RU chrome/locale/ru-RU/
locale zoteroscholarcitations it-IT chrome/locale/it-IT/

overlay chrome:https://zotero/content/zoteroPane.xul chrome:https://zoteroscholarcitations/content/overlay.xul
17 changes: 8 additions & 9 deletions chrome/content/overlay.xul
Original file line number Diff line number Diff line change
@@ -1,21 +1,20 @@
<?xml version="1.0"?>
<?xml version="1.0" encoding="UTF-8"?>

<!DOCTYPE overlay SYSTEM "chrome:https://zoteroscholarcitations/locale/overlay.dtd">

<overlay
id="zoteroscholarcitations"
id="zoteroscholarcitations-overlay"
xmlns="http:https://www.mozilla.org/keymaster/gatekeeper/there.is.only.xul">

<script
type="application/x-javascript"
src="chrome:https://zoteroscholarcitations/content/scripts/zoteroscholarcitations.js"/>

<stringbundleset id="stringbundleset">
<stringbundleset>
<stringbundle
id="zoteroscholarcitations-bundle"
src="chrome:https://zoteroscholarcitations/locale/zoteroscholarcitations.properties"/>
src="chrome:https://zoteroscholarcitations/locale/zsc.properties"/>
</stringbundleset>

<script
type="application/x-javascript"
src="chrome:https://zoteroscholarcitations/content/zsc.js"/>

<popup id="zotero-collectionmenu">
<menuitem
id="zotero-collectionmenu-scholarcitations"
Expand Down
Original file line number Diff line number Diff line change
@@ -1,10 +1,12 @@
let zsc = {
_captchaString: '',
_citedPrefixString: 'Cited by ',
_citeCountStrLength: 7,
_extraPrefix: 'ZSCC',
_extraEntrySep: ' \n',
_noData : 'NoCitationData',
_searchblackList: new RegExp('[-+~*":]', 'g')
_searchblackList: new RegExp('[-+~*":]', 'g'),
_baseUrl : 'https://scholar.google.com/'
};

zsc._extraRegex = new RegExp(
Expand Down Expand Up @@ -201,8 +203,8 @@ zsc.retrieveCitationData = function(item, cb) {
if (isDebug()) Zotero.debug('[scholar-citations] '
+ 'could not retrieve the google scholar data. Server returned: ['
+ xhr.status + ': ' + xhr.statusText + ']. '
+ 'GS want\'s you to wait for ' + this.getResponseHeader("Content-Type")
+ 'seconds before sending further requests.');
+ 'GS want\'s you to wait for ' + this.getResponseHeader("Retry-After")
+ ' seconds before sending further requests.');

} else if (this.readyState == 4) {
if (isDebug()) Zotero.debug('[scholar-citations] '
Expand All @@ -216,8 +218,7 @@ zsc.retrieveCitationData = function(item, cb) {
};

zsc.generateItemUrl = function(item) {
let baseUrl = 'https://scholar.google.com/';
let url = baseUrl
let url = this._baseUrl
+ 'scholar?hl=en&as_q='
+ zsc.cleanTitle(item.getField('title')).split(/\s/).join('+')
+ '&as_epq=&as_occt=title&num=1';
Expand Down Expand Up @@ -263,7 +264,7 @@ zsc.buildStalenessString = function(stalenessCount) {
};

zsc.getCiteCount = function(responseText) {
let citePrefix = '>Cited by ';
let citePrefix = '>' + this._citedPrefixString;
let citePrefixLen = citePrefix.length;
let citeCountStart = responseText.indexOf(citePrefix);

Expand Down
3 changes: 3 additions & 0 deletions chrome/locale/en-GB/overlay.dtd
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
<!ENTITY zotero.scholarcitations.update.label "Update citation(s)">
<!ENTITY zotero.scholarcitations.updateCol.label "Update citations">
<!ENTITY zotero.scholarcitations.updateAll.label "Update all citations">
File renamed without changes.
2 changes: 2 additions & 0 deletions chrome/locale/en-US/zsc.properties
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
captchaString=Please enter the Captcha on the page that will now open and then re-try updating the citations, or wait a while to get unblocked by Google if the Captcha is not present.
citedPrefixString=Cited by
File renamed without changes.
File renamed without changes.
14 changes: 14 additions & 0 deletions test/http/429.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
let http = require('http');

let port = 8080;

http.createServer(function(req, res) {
console.log('incomming request!');
console.log('method: ', req.method);
console.log('url: ', req.url);
console.log('header: ', req.headers);
res.writeHead(429, {'Content-Type': 'text/plain', 'Retry-After': 3600});
res.end('Yikes! Your\'re blocked!');
}).listen(port);

console.log('Starting super simple http server on localhost:' + port + '!');
14 changes: 14 additions & 0 deletions test/http/captcha.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
let http = require('http');

let port = 8080;

http.createServer(function(req, res) {
console.log('incomming request!');
console.log('method: ', req.method);
console.log('url: ', req.url);
console.log('header: ', req.headers);
res.writeHead(200, {'Content-Type': 'text/plain'});
res.end('Such Captcha! Much Protec! www.google.com/recaptcha/api.js Wow!');
}).listen(port);

console.log('Starting super simple http server on localhost:' + port + '!');
2 changes: 1 addition & 1 deletion test/test.js
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
let zsc = require('../chrome/content/scripts/zoteroscholarcitations.js');
let zsc = require('../chrome/content/zsc.js');
let assert = require('assert');
let sinon = require('sinon');
let request = require('sync-request');
Expand Down

0 comments on commit 9e38ac9

Please sign in to comment.