Modifications to database construction scripts. Forked from https://github.com/Bookworm-project/BookwormDB
There are 4 major "moving parts" for the bookworm:
- The bookworm data
- This is handled by the code in the
force_align
repo. - Briefly, this entails downloading the audio, matching it to its transcript, transcribing it phonetically, and then organizing the two transcriptions into a bookworm-readable format
- The bookworm database
- This is handled by the code in the
bookworm_db
repo - This code takes the bookworm data produced via the force_align code and organizes it into a SQL database for use with the bookworm API
- Important pieces of code here are
Makefile
-> High level overview of database constructionbookworm/tokenizer.py
-> Contains the regexes used for tokenizing the bookworm databookworm/CreateDatabase.py
-> Includes rules and SQL calls for constructing the tables in the databaseOneClick.py
-> Calls the functions in CreateDatabase during db construction
- The bookworm API
- This is handled by the code in the
bookworm_api
repo - This code is the interface between the bookworm browser/gui and the bookworm database as constructed using the code in bookworm_db
- Important pieces of code here
dbbindings.py
-> This is the script that receives queries from the front-end, sends them along to the API, and returns the resultsbookworm/general_API.py
-> The general API for organizing and parsing database queries. Makes use of theuserquery
class inSQLAPI.py
to actually query the database.bookworm/SQLAPI.py
-> Defines theuserqueries
class for querying the bookworm database and parsing the response
- The bookworm GUI
- This is handled by the code in the
bookworm_gui
repo - This is the front-end for the bookworm browser. The majority of the processing is handled in
bookworm_gui/static/js/a.js
- Important pieces of code here are
index.html
static/js/a.js
-> This is where the calls to the API are constructed; handles look + feel of the interface, as well as query highlighting and phoneme vs. word database selection (for now - this should be moved to server-side eventually).static/options.json
-> The config file containing the default values for the front-end, as well as lookup tables for translating database ids into display names.
- Construct a bookworm data zip
- For the formatting requirements, refer to: https://bookworm-project.github.io/Docs/Requirements.html
-
Initialize a server (I usually use the AWS EC2 Ubuntu free-tier). Ensure that permissions are set to allow unrestricted access to http and https ports. If the bookworm is large, make sure to allocate an appropriate swapfile to avoid segfaults during database construction.
-
SSH in to the server and clone the
bookworm_db
repo into/var/www/
:
sudo apt-get install git #if you're using ubuntu
cd /var/www/
sudo git clone https://github.com/ddbourgin/bookworm_db.git
- Make a directory
files
inbookworm_db
and rename thebookworm_db
directory to your bookworm database name. For example, if your bookworm DB is namedMy_BW_DB_Name
, you would run
sudo mkdir /var/www/bookworm_db/files
sudo mv /var/www/bookworm_db /var/www/My_BW_DB_Name
- Run the script
deploy_bw.sh
in the renamed database directory. This will install the necessary bookworm dependencies and set up the MySQL server/config files for bookworm access.
sudo sh My_BW_DB_Name/deploy_bw.sh
- From the
/var/www/
directory, download the zip file containing the bookworm data you created in step 1. I typically upload the file to dropbox and usewget
to download:
cd /var/www/
sudo wget Link_to_Bookworm_Data_Zip
sudo unzip *.zip
sudo rm *.zip
- Copy the
texts
andmetadata
folders in your unzippedBookworm_Data_Folder
to the files directory.
- We assume here that your data folder is organized as
Bookworm_Data_Folder/
| -- texts/
| | input.txt
| -- metadata/
| | jsoncatalog.txt
| | field_descriptions.json
- If this is so, then you can simply run the following from the
/var/www/
directory
sudo mv Bookworm_Data_Folder/files My_BW_DB_Name/tests/
sudo mv Bookworm_Data_Folder/metadata My_BW_DB_Name/metadata/
sudo rm -rf Bookworm_Data_Folder
- To actually construct the database
cd /var/www/My_BW_DB_Name/
sudo make all
- Follow the on-screen instructions. If all has gone well, this will result in a completed Bookworm database
##TODO:
- Add code for creating pause and word:pronunciation tables to
CreateDatabase.py