Skip to content

Script to automate the process of updating a wiki page with the remaining amount of Transkribus credits left for the Wikimedia account

License

Notifications You must be signed in to change notification settings

Parthiv-M/tr-stat-update

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Transkribus Credits Update Automation

This Go script automates the process of updating a wiki page with the remaining amount of Transkribus credits left for the Wikimedia account. It fetches the value of remaining credits from the Transkribus dashboard and updates the Data:Wikimedia_OCR,_Transkribus_quota.tab page by adding a new row at the bottom of the table.

Local setup

Prerequisites

  • Have Go installed and working on your system. You can follow the official docs to do the same.
  • Have an instance of Mediawiki core running on your system so that you have your own local wiki to test with and do not rely on modifying any pages on hosted wikis.
  • Since the bot deals with a .tab page, it requires that the JsonConfig extension be installed as well. Read more about .tab pages here.

Clone the repository

You can clone the repository by running the following command

git clone https://github.com/parthiv-m/tr-stat-update

If you wish to fork and then clone the repository, you are welcome to do so!

Set environment variables

The environment variables required to run the script are provided in the .example.env file.

How to obtain Transkribus credentials

This script is strictly for the Transkribus account managed by Wikimedia. However, it can be generalised for any Transkribus account.

How to obtain bot credentials

  • Navigate to Special:BotPasswords on your local wiki.
  • You will be prompted to enter details like bot name, and clarify the grants required for the bot. This bot only requires permission to edit existing pages.
  • The subsequent page gives the bot username of the form username@bot_name and a password. These are to be mentioned in the .env file appropriately.

Install packages

There are no major dependencies used in the script except for the godotenv package to handle the .env file. Nevertheless, install all possible packages listed in the go.mod file using the command go get .

Once this is done, you are all set to run the script!

Running the script

In general, the command to run a go script is go run <filename>.go. In our case, this becomes go run main.go.

For the dev environment

When run without any arguments, the script runs in development mode. This is indicated by the logging statement

Running in development...

For the actual wiki page hosted on Commons

Warning

This will modify the publicly available page. Only run if you are sure of what you are doing!

To run the script to update the actual wiki page on Commons, run it as follows

go run main.go production

This will produce a logging statment that says

Running in production...

Long term usage

If you are not a developer and are not interested in tinkering around with the script, but still would like to run the script from time to time, it is best to download a binary of the script from the releases section.

Note

Currently, binaries are available only for Linux.

Extracting the downloaded .tar.gz file using the tar -xvf <file_name> command should result in a tr-stat-update file as the final executable. You will still be required to set the appropriate environment variables in the same directory as the downloaded file.

To run the executable, simply do

./tr-stat-update production

Logging

All logs for the script are stored in a debug.log file in the same directory as the script. If you run into any trouble, you might want to check the logs!

What does the script do?

The script follows a linear workflow as outlined below:

  • First, it authenticates itself to the Transkribus API using the login credentials provided by the user
  • Next, it makes a request to fetch the total credits left in the user's Transkribus dashboard
  • It then goes on to fetch the Data:Wikimedia_OCR,_Transkribus_quota.tab page using the Mediawiki Action API
  • Once the contents of the page are available, the script authenticates itself using the credentials of the bot generated by the user
  • After the bot is logged in successfully, the script requests for a CSRF token for the bot so that it can make edits safely on the wiki page
  • Now, the script is ready to add a new row to the page, along with an apprpriate summary consisting of the date and time of updation of the wiki page

Further information

  • Transkribus is a platform for the text recognition, image analysis, and structure recognition of historical documents. By means of its web interface and a desktop client, it provides users access to a rich set of features to transcribe texts and train custom handwritten text recognition models.
  • Wikimedia OCR is a web service and interface for providing OCR text from images hosted on MediaWiki wikis. Transkribus is the newest addition to the set of OCR engines available on the tool. Try it out now!

About

Script to automate the process of updating a wiki page with the remaining amount of Transkribus credits left for the Wikimedia account

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages