Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
kbxkb committed May 21, 2017
1 parent c28aec3 commit 2d28783
Showing 1 changed file with 15 additions and 9 deletions.
24 changes: 15 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,12 +12,12 @@ Let us see how you can, step by step, set up your environment and run this app a

1. Create a GCE VM in your desired region
* Use Ubuntu 14.04
* 1 vCPU and 3.75 GB of memory will suffice. I used 16 GB of persistent Stanbdard Disk as I was dealing with large datasets, you can go less if you just restrict yourself to the bestbuy dataset that this demo uses
* 1 vCPU and 3.75 GB of memory will suffice. I used 16 GB of persistent Standard Disk as I was dealing with large datasets, you can go less if you just restrict yourself to the bestbuy dataset that this demo uses.
* You will need access to Cloud Datastore API, so turn on "Allow full access to all Cloud APIs" right now, will save some time later (See number 2 in the pre-reqsuisite list here: https://cloud.google.com/datastore/docs/datastore-api-tutorial)
* Allow HTTP and HTTPS traffic
* use SSH key if you have one
2. Create a project on GCP cloud console
* Download service account cradentials JSON file for your project from https://console.developers.google.com, and save it somewhere on the VM you created
* Download service account credentials JSON file for your project from https://console.developers.google.com, and save it somewhere on the VM you created. You might need to SCP it into your VM, or just copy-paste the contents into a new file on your VM
* Create an App Engine instance and deploy it - does not matter what's in it, you need this app enabled to access Datastore API-s. (See number 3 in the pre-reqsuisite list here: https://cloud.google.com/datastore/docs/datastore-api-tutorial)
3. If you have completed the above steps successfully, you should be all set to start tinkering with the VM you just now created

Expand All @@ -27,10 +27,10 @@ Let us see how you can, step by step, set up your environment and run this app a
2. Run "sudo apt-get update", and then "sudo apt-get install apache2 php5"
3. Test if apache is working - browse to the external IP address of the VM from your favorite browser
4. Run "sudo apt-get install git", you will need to clone this repo
5. Run "https://github.com/kbxkb/google-cloud-datastore-php.git" from anywhere meaningful, cd into google-cloud-datastore-php
5. Run "git clone https://github.com/kbxkb/google-cloud-datastore-php.git" from anywhere meaningful, cd into google-cloud-datastore-php
6. Run "sudo chmod -R 777 /var/www" - see the security warning in the initial paragraph of this README
7. Copy calldatastore.php, form.html, loaddatastore.php, loaddatastore.sh, products.json into /var/www/html, use sudo as needed
8. Test access - browse to {your IP address}/form.html - you are seeing the rudimentary front end of my PHP application. Go ahead and type something in the textbox, the output area will show error, as it will try to access Datastore, but we have not set it up yet
8. Test access - browse to {your IP address}/form.html. You will see the rudimentary front end of my PHP application. Go ahead and type something in the textbox, the output area will show error, as it will try to access Datastore, but we have not set it up yet
9. Now, you need to add an environment variable to this VM. A good, permamnent way to do this is to edit /etc/environment and add a line to it. The line should be (without the quotes): "GOOGLE_APPLICATION_CREDENTIALS=/path/to/credentials.json" (remember you downloaded this file in the step 2 of the previous section?). If the path has spaces, put the value of the variable (i.e., the path) in double quotes. Once you save this file, you will have to log back out and in for it to take effect
10. cd into /var/www/html
11. Now, we will install composer so that we can install other PHP libraries and dependencies. Inside /var/www/html, Run: "sudo curl -sS https://getcomposer.org/installer | php"
Expand All @@ -43,10 +43,11 @@ Let us see how you can, step by step, set up your environment and run this app a

### Load test data into Cloud Datastore

1. Obviously, you have to do this once. I have used the publicly available BestBuy dataset here: https://github.com/BestBuyAPIs/open-data-set. In fact, the only file I have used from this dataset is products.json, and I have included that file in my repository, so you have it copied inside your /var/www/html folder right now
2. You will have to run this command to load the data into Cloud datastore, the script and the file are already in /var/www/html by now, just run this command from that directory: "./loaddatastore.sh products.json". Before you attempt this, here is something you should know:
* There are almost 52K records in products.json. This command will take around 5 hours to complete
* The script that loads it (both the sh file and the php file that it calls repeatedly for each line) is **poorly optimized**. It is horrible to be precise. It just connects to datastore for every line on the JSON file, and runs a tight loop writing the entities into Datastore. I have used sed to extract the SKU and the Product Name fields only, that is all I write. The intent is to demo auto-complete on the Product name, hence
1. Obviously, you have to do this once. I have used the publicly available BestBuy dataset here: https://github.com/BestBuyAPIs/open-data-set. In fact, the only file I have used from this dataset is products.json, and I have included that file in my repository, so you have it copied inside your /var/www/html folder right now (if you have followed the above steps)
2. You will have to run this command to load the data into Cloud datastore, the script and the file are already in /var/www/html by now, just run this command from that directory: "./loaddatastore.sh products.json". Before you attempt this, here is something you should note:
* Make sure that the sh file is an executable, set +x on it if needed
* There are almost 52K records in products.json. This command will take around 4-5 hours to complete
* The script that loads it (both the sh file and the php file that it calls repeatedly for each line) is **poorly optimized**. It is horrible to be precise. It just connects to datastore for every line on the JSON file, and runs a tight loop writing the entities into Datastore. I have used sed to extract the SKU and the Product Name fields only, that is all I write. The intent is to only demo auto-complete on the Product name, hence...
* Run this command and take a break, it will take a while

### Demo auto-complete!
Expand All @@ -55,6 +56,11 @@ After loading is complete, go back to the form.html on the browser, and start ty

Feeling sluggish? No wonder! Performance optimization is **very poor** in this demo as of now
1. As you type, every key-press results in a call to datastore, but instead of connection-pooling, the code creates a new connection every time. That is not good, especially if you care about the end user's experience for auto-complete
2. The solution does not use any cache. As this is an overwhelmingly a read-heavy operation, we should use a service like memcache from GCP, so that it does not have to make a round-trip to datastore every time you type the same letter or sequence of letters
2. The solution does not use any cache. As this is an overwhelmingly read-heavy operation, we should use a service like memcache from GCP, so that it does not have to make a round-trip to datastore every time you type the same letter or sequence of letters
3. GQL queries used against datastore are case-sensitive. That is why I use strtolower(...) in the file loaddatastore.php. That means all those 52K records are stored in datastore in lowercase. So if there is a product called "Battery", it will match if you start typing "battery". However, it will *not* match if you start typing "Battery". This is a known deficiency at this point. It is easy to fix this, I will leave this to others.
* Hint: Please do not try to run multiple queries for each key press, one each with every upper-case/ lower-case combination. That will totally kill it. Try something creative. Remember that for datastore, storage is cheap as dirt - its pricing model is proportional to number of reads made against the service. Why not load the data twice, once with strtolower(), and once without? if that seems abominable to your programmer's instincts, then please feel free to solve it the *super-right* way!

### Clean-up

Do not forget to stop the VM, remove the GAE App and clean up Datastore. This is a metered platform, treat it like your own electricity bill, even if you are using an account with credits!

0 comments on commit 2d28783

Please sign in to comment.