-
Notifications
You must be signed in to change notification settings - Fork 4.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Unexpected Number of Questions Generated When Requesting FAQ Generation #10694
Comments
🤖 Hey @stephanedebove, great to cross paths again on here! Hope your code's been more friend than foe since our last chat. 🚀 Thank you for providing the detailed steps to reproduce the issue. Based on the context provided, it seems that the Here is the relevant code snippet from the self.question_gen_query = (
question_gen_query
or f"You are a Teacher/Professor. Your task is to setup \
{num_questions_per_chunk} questions for an upcoming \
quiz/examination. The questions should be diverse in nature \
across the document. Restrict the questions to the \
context information provided."
) To strictly limit the number of questions generated, you would need to modify the You can find the relevant code in the llama_index/evaluation/dataset_generation.py file. Please note that this is a workaround and might not be the best solution. It's possible that there's a better way to handle this that I wasn't able to find. If you have any further questions or need more clarification, feel free to ask. Sources
|
Any advanced with this issue? Yesterday I discovered this bug creating a 8932 Q&A pairs from 97 chunks. |
Thank you for the update, @elsatch! We appreciate your response. Based on the information provided, we'll go ahead and close this issue now. If you have any more questions or encounter further issues, feel free to open a new one.
|
Bug Description
When attempting to generate a set of FAQ questions and answers from a document, specifying num_questions_per_chunk=1 unexpectedly results in multiple questions being generated, exceeding the specified limit.
Expected Behavior
With num_questions_per_chunk=1, I expect to generate exactly one question per document or document chunk processed.
Actual Behavior
Despite setting num_questions_per_chunk=1, multiple questions (7 in my case) are generated for a single document or document chunk, indicating that the limit is not being respected or that the document is being split in an unexpected manner.
Edit : from my LLM log (Mixtral 8x7b running on Replicate) I can see that the LLM is called 8 times even with num_questions_per_chunk=1, so the problem is probably not with the num_questions_per_chunk parameter but with the fact that the request to generate questions/answers is sent multiple times. Could this be due to how async functions work?
Version
0.9.45
Steps to Reproduce
Prepare a document or a list of documents to be processed.
Use the provided code snippet to generate FAQ questions and answers, ensuring num_questions_per_chunk is set to 1.
Observe that the output contains more questions than expected.
Relevant Logs/Tracbacks
No response
The text was updated successfully, but these errors were encountered: