-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Haystack with Albert is awesome! XLNet question #23
Comments
I am familar with Transformers' With FarmReader what is the best indicator of a "No Answer" output when there is not "reasonable" probability of an answer? Is the argument "no_ans_threshold" for FarmReader the means to accomplish this? I do not see any change or pattern to the answers when varying this arg from -100 (default) to +100. Must I implement "No Answer" functionality by analyzing the 'probability' & 'score' of the returned answers in |
Hi @ahotrod Nice example of searching Porsche Wikis via the XLNet
It seems like there's a difference in naming the FF module (= "QA head") between the two available XLNet implementations:
Regarding usage of this model in haystack: b) TransformersReader:
However, I guess transformers' No Answer
Yes, that's the right argument to change. However, you are totally right that the current implementation is missing a step to include the no answer option in the final result - so you can't see any difference. Working on this now in #24. |
Hello @tholor, thanks for looking into these issues. Yes, the latest fine-tuned XLNet model I have was fine-tuned with Transformers v2.1.1 and is shared at https://hugginface.co/ahotrod/xlnet_large_squad2_512. Fine-tuning this XLNet LM with Transformers v2.3.0 thru v2.4.1 is currently not possible, perhaps related to the statement that Transformers' Loading my shared XLNet fine-tuned with TF v2.1.1, as you suggested, with:
is successful, however subsequent inferencing with
Don't sweat the XLNet issues at this juncture on my account. I'm good with Albert xxlarge v1 fine-tuned on SQuAD2 which is a sufficient model for the foreseeable future. Perhaps the XLNet issues will sort-out with subsequent Transformers releases, as their Thanks for your PR #24 - you're all over it! I've enjoyed and learned a good bit going thru FARM & Haystack code the last few weeks. If you need any particular feedback or whatnot be sure to ask. Maybe limited, but I will contribute where I can. Best regards! |
Hey @ahotrod, Ok sounds good! We are currently discussing a few options on how to aggregate the
Yes, this is a bug on Transformers side. Should work once their
Ok, great! I am still a bit curious: did you observe any performance differences in training an
|
We just merged #24. This allows now returning "no_answers" from the |
Haven't tried training either in FARM, only with Transformers. Will do so when time permits. Thanks, starting to go thru the merged #24 PR changes now and run some examples. Will post comments after more experimentation. |
I have been testing with the single Porsche 911 wiki (which is similar size-wise to my domain app) to ID limitations, plus trying to "fool" FARMReader. Looking good so far! Here's an example with basic structure & elements for my app:
Output:
Reducing the BTW, a next step is architecting a cloud solution. I'm leaning towards GCP or Azure as the best platforms for Speech-To-Text and NLP/ML hosting. |
Hey Ahotrod, thanks for using the cutting edge haystack version. It is really rewarding to see that it works and people are using it already! Concerning the execution time. I have seen you already increased the batch_size parameter in the FARMReader to 48. Is this really the maximum value that fits onto your 1080Ti during inference? Could you try to setting it to 100 or even 200, so that all data coming from the retriever fits into one batch? GPU memory consumption is very different in training vs inference, so even these high batch sizes should work. We will merge the new changes into master today and will likely create more breaking changes around FARM inference + haystack interaction in the coming days. So please be prepared that some of your old code wont work with upcoming haystack versions. Concerning cloud solution: yes, something to scale inference automatically like AWS sagemaker would be interesting. Do you know any good solutions like that on GCP or Azure? |
Yes, maximum batch size on the 1080Ti GPU with Albert_xxlarge is 50, which uses 11GB of the available 11.2GB of its memory. Inferencing in the cloud for production will require more capable resources.
Using today's new master changes:
This CPU-only config inferences with one pass occupying a peak of 18GB CPU memory, with 100% execution on six of 12 CPU threads. The inferencing times increase about 16-17x compared to running on the GPU. The interesting thing is the Unfortunately the "no_answer" functionality doesn't work for me now. I tried
Yes, GCP & Azure both have provisions to scale with demand. |
Good morning @ahotrod Thanks for posting the inference times. Having one batch take that long is strange. We will look into Pytorch inference in more detail soon because it seems to be very different from normal training. We will update you once we find a good solution.
Guessed correctly, the code doesnt work on GPU. This is mainly due to working with strings to get the actual answers. Pytorch doesnt support strings, so a pure GPU solutions seems difficult. There are nevertheless many operations in this function that we could improve upon. Concerning the no_ans_boost on newest master. Very good point, we need to update the requirements for FARM, since we need the latest FARM master installed, too. |
Just in passing, as I noted sometime ago on Transformers, Rapidsai offers a CUDA stand-alone string library cuStrings & python-wrapper nvStrings: https://github.com/rapidsai/custrings High-speed data loading & processing of textual dataframes on GPU with CUDA. Moving panda dfs to GPU is several lines of code or perhaps data loading straight to GPU. Might be applicable for "pure GPU solutions" and for GPU-accelerated word tokenization as touched-on in this basic example: |
Nice, thanks for the links. Didn't know about rapids, though they seem super active in bringing all kinds of useful code to GPUs. I will move this conversation to mail, since I dont think it will be too useful for the community. |
Following up on latest developments: |
Yes, batch_size=160 now accommodates 73kB of context/paragraphs, easily fitting on a single GPU: Total prediction times have dropped from about 42-46 seconds per question, to 27 seconds. |
Nice! Thanks for the kudos - I forwarded it to the team : ) We ran some experiments ourselves and got roughly the same speedup.
Will keep you updated. |
"AI developers can now easily productionize large transformer models with high performance across both CPU and GPU hardware". To get started:
https://onnx.ai/get-started.html Install the ONNX runtime locally: |
Interesting! We will definitely have a look and see if this could be applicable for Inference in haystack and FARM. |
It looks very promising indeed. In the table they report 3-layered Bert performance with very low batch sizes. These settings seems rather atypical. |
Plan to go thru that PyTorch-based notebook today. Will try some iterations with my more common settings. There are several Bert-based model ONNX examples out there: I noticed the earlier model has the dependency run_onnx_squad.py which will be interesting to go thru for comparison to FARM's & Transformers' |
Went with onnx-ecosystem which is a recent release (couple of weeks). Found nvidia-cuda-docker was not initializing, so I ditched Docker for now and ran this notebook from an environment with PyTorch v1.4.0, Transformers v2.5.1, ONNX runtimes v1.2.1 (CPU & GPU). With the variables (max_seq_length=128, etc.) as originally specified, here is the result on GPU:
With max_seq_length=384, everything else the same, here is the result:
Should have more time tomorrow to examine these preliminary results and to further iterate & characterize the differences, including the notebook's variables At this point I am more familiar with Here's another max_seq_length=384 run: |
Very interesting results. Looking forward to more results, especially batch size and per gpu batch size - I had the impression that multi GPU utilization at inference is not really optimal in Pytorch. |
I am in the midst of evaluating Haystack with Albert and so far it looks awesome. Loving it, thanks for sharing.
I missed the whole Game of Thrones fantasy/drama phenomenon, so for a tutorial I could understand and relate-to, I went looking for other content to use with your
Tutorial1_Basic_QA_Pipeline.ipynb
notebook. Being a Porschephile I settled on:I can relate-to the above content and ask relevant questions of it "all day long". All other code in your notebook remains the same, except I use my Albert model for QA and it works well:
For my application/project, I would like to also evaluate XLNet performance with Haystack but I am having trouble loading my XLNet model:
Attached is the complete terminal output text, but bottom-line the error I get is:
AttributeError: 'XLNetForQuestionAnswering' object has no attribute 'qa_outputs'
output_term.txt
This XLNet model was fine-tuned on Transformers v2.1.1 and is the best I have because I and others are having problems fine-tuning XLNet_large under Transformers v2.4.1, huggingface/transformers#2651
Perhaps this fine-tuned XLNet model & Transformers v2.1.1 is not compatible/missing the attribute mentioned in the error message?
Looking forward to additional FARM/Haystack QA capabilities you have in the works, thanks for your efforts!
The text was updated successfully, but these errors were encountered: