-
Notifications
You must be signed in to change notification settings - Fork 6.8k
[Clojure] Add Fine Tuning Sentence Pair Classification BERT Example #14769
[Clojure] Add Fine Tuning Sentence Pair Classification BERT Example #14769
Conversation
|
||
if [ ! -d "$data_path" ]; then | ||
mkdir -p "$data_path" | ||
curl https://s3.us-east-2.amazonaws.com/mxnet-scala/scala-example-ci/BertQA/vocab.json -o $data_path/vocab.json | ||
curl https://s3.us-east-2.amazonaws.com/mxnet-scala/scala-example-ci/BertQA/static_bert_qa-0002.params -o $data_path/static_bert_qa-0002.params | ||
curl https://s3.us-east-2.amazonaws.com/mxnet-scala/scala-example-ci/BertQA/static_bert_qa-symbol.json -o $data_path/static_bert_qa-symbol.json | ||
curl https://media.githubusercontent.com/media/gigasquid/mxnet-data/master/static_bert_base_net-symbol.json -o $data_path/static_bert_base_net-symbol.json | ||
curl https://media.githubusercontent.com/media/gigasquid/mxnet-data/master/static_bert_base_net-0000.params -o $data_path/static_bert_base_net-0000.params |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lanking520 Would it be possible to host the BERT Base params on on the same bucket as the other static-bert-qa
params too rather than my github?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gigasquid I just added them to the bucket for you. You can now find them with the equivalent paths:
https://s3.us-east-2.amazonaws.com/mxnet-scala/scala-example-ci/BertQA/static_bert_base_net-symbol.json
https://s3.us-east-2.amazonaws.com/mxnet-scala/scala-example-ci/BertQA/static_bert_base_net-0000.params
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you!!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking forward to using it!
contrib/clojure-package/examples/bert/src/bert/bert_sentence_classification.clj
Outdated
Show resolved
Hide resolved
contrib/clojure-package/examples/bert/src/bert/bert_sentence_classification.clj
Outdated
Show resolved
Hide resolved
contrib/clojure-package/examples/bert/src/bert/bert_sentence_classification.clj
Outdated
Show resolved
Hide resolved
contrib/clojure-package/examples/bert/src/bert/bert_sentence_classification.clj
Outdated
Show resolved
Hide resolved
Thanks for the feedback @Chouffe - I'm working on implementing it today :) |
43f3bad
to
8e8b936
Compare
I think this PR is ready for review. Please take a look @Chouffe @kedarbellare @hellonico when you get a chance 👀 |
I haven’t reviewed the PR completely but hopefully I can get to it this weekend |
(defproject bert "0.1.0-SNAPSHOT" | ||
:description "BERT Examples" | ||
:plugins [[lein-cljfmt "0.5.7"] | ||
[lein-jupyter "0.1.16" :exclusions [org.clojure/tools.nrepl org.clojure/clojure org.codehaus.plexus/plexus-utils org.clojure/tools.reader]]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just FYI I had to comment out the lein-jupyter
plugin in order to have a working REPL with CIDER version 0.19.0 Raleigh. We may want to consider a note, either as a comment here or in the README or both, so that people don't get tripped up by their mutual incompatibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran into the some issues as well with cider and the older version of lein-jupyter playing well together. I added a note in the README and in the project.clj
contrib/clojure-package/examples/bert/src/bert/bert_sentence_classification.clj
Outdated
Show resolved
Hide resolved
contrib/clojure-package/examples/bert/src/bert/bert_sentence_classification.clj
Outdated
Show resolved
Hide resolved
contrib/clojure-package/examples/bert/src/bert/bert_sentence_classification.clj
Outdated
Show resolved
Hide resolved
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"He said the foodservice pie business doesn 't fit the company 's long-term growth strategy .\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two sentences are clearly not semantically equivalent, and are in fact parts of two separate examples that have been accidentally merged. The double quotes in the source file are messing with the CSV parsing. Investigating now; hope to have a fix shortly.
The data/dev.tsv file has it correct:
Quality #1 ID #2 ID #1 String #2 String
1 1355540 1355592 He said the foodservice pie business doesn 't fit the company 's long-term growth strategy . " The foodservice pie business does not fit our long-term growth strategy .
Note that strict CSV parsing throws an error:
(csv/parse-csv (slurp "data/dev.tsv") :delimiter \tab :strict true)
=>
Unhandled java.lang.Exception
Double quote present in unquoted field.
core.clj: 107 clojure-csv.core/read-unquoted-field
core.clj: 93 clojure-csv.core/read-unquoted-field
core.clj: 181 clojure-csv.core/parse-csv-line
core.clj: 153 clojure-csv.core/parse-csv-line
core.clj: 192 clojure-csv.core/parse-csv-with-options/fn
LazySeq.java: 40 clojure.lang.LazySeq/sval
LazySeq.java: 49 clojure.lang.LazySeq/seq
Cons.java: 39 clojure.lang.Cons/next
RT.java: 706 clojure.lang.RT/next
core.clj: 64 clojure.core/next
dispatch.clj: 68 clojure.pprint/pprint-simple-list/fn
dispatch.clj: 67 clojure.pprint/pprint-simple-list
dispatch.clj: 77 clojure.pprint/pprint-list
dispatch.clj: 76 clojure.pprint/pprint-list
MultiFn.java: 229 clojure.lang.MultiFn/invoke
pprint_base.clj: 194 clojure.pprint/write-out
pprint_base.clj: 249 clojure.pprint/pprint/fn
pprint_base.clj: 248 clojure.pprint/pprint
pprint_base.clj: 241 clojure.pprint/pprint
pprint_base.clj: 245 clojure.pprint/pprint
pprint_base.clj: 241 clojure.pprint/pprint
Var.java: 381 clojure.lang.Var/invoke
pprint.clj: 67 cider.nrepl.middleware.pprint/handle-pprint-fn/fn
pprint.clj: 85 cider.nrepl.middleware.pprint/pprint-reply/fn
pprint.clj: 77 cider.nrepl.middleware.pprint/pprint-reply
pprint.clj: 75 cider.nrepl.middleware.pprint/pprint-reply
pprint.clj: 95 cider.nrepl.middleware.pprint/pprint-transport/reify
interruptible_eval.clj: 109 nrepl.middleware.interruptible-eval/evaluate/fn/fn
main.clj: 244 clojure.main/repl/read-eval-print
main.clj: 261 clojure.main/repl/fn
main.clj: 261 clojure.main/repl
main.clj: 177 clojure.main/repl
RestFn.java: 1523 clojure.lang.RestFn/invoke
interruptible_eval.clj: 83 nrepl.middleware.interruptible-eval/evaluate/fn
AFn.java: 152 clojure.lang.AFn/applyToHelper
AFn.java: 144 clojure.lang.AFn/applyTo
core.clj: 657 clojure.core/apply
core.clj: 1965 clojure.core/with-bindings*
core.clj: 1965 clojure.core/with-bindings*
RestFn.java: 425 clojure.lang.RestFn/invoke
interruptible_eval.clj: 81 nrepl.middleware.interruptible-eval/evaluate
interruptible_eval.clj: 50 nrepl.middleware.interruptible-eval/evaluate
interruptible_eval.clj: 221 nrepl.middleware.interruptible-eval/interruptible-eval/fn/fn
interruptible_eval.clj: 189 nrepl.middleware.interruptible-eval/run-next/fn
AFn.java: 22 clojure.lang.AFn/run
ThreadPoolExecutor.java: 1128 java.util.concurrent.ThreadPoolExecutor/runWorker
ThreadPoolExecutor.java: 628 java.util.concurrent.ThreadPoolExecutor$Worker/run
Thread.java: 834 java.lang.Thread/run
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just removing quotes seems to work on this example. I'm spot-checking the rest.
(csv/parse-csv (string/replace (slurp "data/dev.tsv") "\"" "")
:delimiter \tab
:strict true)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I submit a fix in this PR: gigasquid#5
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks!
…ing-csv-fix Clojure BERT finetuning example: fix CSV parsing
Thanks for the feedback @daveliepmann - plan to address it shortly :) |
…id/incubator-mxnet into new-bert-example-with-finetuning
…pache#14769) * Rework Bert examples to include QA infer and finetuning * update notebook example and exported markdown * add integration test for the classification * fix tests * add RAT * add another RAT * fix all the typos * Clojure BERT finetuning example: fix CSV parsing * update readme and gitignore * add fix from @davliepmann’s notebook parsing * feedback from @daveliepmann * fix running of example and don’t show very first batch on callback speedometer * rerun the notebook and save results * remove bert stuff from main .gitignore * re-putting the license back on after regen * fix integration test
…pache#14769) * Rework Bert examples to include QA infer and finetuning * update notebook example and exported markdown * add integration test for the classification * fix tests * add RAT * add another RAT * fix all the typos * Clojure BERT finetuning example: fix CSV parsing * update readme and gitignore * add fix from @davliepmann’s notebook parsing * feedback from @daveliepmann * fix running of example and don’t show very first batch on callback speedometer * rerun the notebook and save results * remove bert stuff from main .gitignore * re-putting the license back on after regen * fix integration test
Description
This adds a fine tuning example for BERT based off of the following Gluon NLP example
https://gluon-nlp.mxnet.io/examples/sentence_embedding/bert.html
There is a PR in the GluonNLP repo for the export script dmlc/gluon-nlp#672
The original BERT infer example directory has been changed from
bert-qa
tobert
to accomodate both the QA example and this one.A walkthrough in the form of a Jupyter notebook has been provided for the sentence pair classification based off of the Gluon NLP one. A markdown version of the notebook has also been provided for those who don't have the
lein-jupyter
plugin.Example results for sentence pair classification after 3 epochs:
which is comparable to the sentence embedding results in the gluon nlp notebook
Checklist
Essentials
Please feel free to remove inapplicable items for your PR.
Changes
Comments