Language set to English in COPRO and MIPRO #729

excubo-jg · 2024-03-28T14:23:43Z

COPRO (and MIPRO) explicitly require English:
class BasicGenerateInstruction(Signature): ""You are an instruction optimizer for large language models. I will give you a ``signature`` of fields (inputs and outputs) **in English**. Your task is to propose an instruction that will lead a good language model to perform the task well. Don't be afraid to be creative.""
Is there a reason for this limitation? What optimizer should be used if the data is not in English?

The text was updated successfully, but these errors were encountered:

arnavsinghvi11 · 2024-04-01T21:49:48Z

Hi @excubo-jg , this specification is not a data limitation but rather to ensure compatible outputted dspy.Signatures to work with the DSPy library, which is written in English.

You can use any of the optimizers with non-English data.

excubo-jg · 2024-04-04T13:54:50Z

Many thanks for your response. I did check with a sample in French and I in the save json I had signature_instructions in English. I would have expected these to not be in English as I though mixing languages is detrimental. Is there a way to receive the prompt in the language of the samples? This would also make sharing the results with users easier.

Another question is why the "signature_prefix": just says "Answer:" . I recently did another test with text in English and Phi as model which put a lot of instructions into the signature_prefix and only some text into the signature_instructions. Is there a difference in how these fields are used?

arnavsinghvi11 · 2024-04-18T18:17:23Z

Hi @excubo-jg ,

To work with a language of your choice, the InstructionOptimizer may need a bit of an internal refactor for the instruction optimizer specifically to ensure that all instructions, prefixes, descriptions, etc. are in a predetermined language. This is a great point however on how we are currently skewed to English, and we’d love to support the library for all languages and data. Feel free to push a PR based on your investigations to abstract out the language present in the optimizers!

tagging @XenonMolecule and @klopsahlong for any other thoughts!

MarkusOdenthal · 2024-05-03T08:51:15Z

I have a similar experience with German. It would be very interesting to see if this affects performance, especially with the ChainOfTough module. What's intriguing is what GPT-3.5 is generating for the rationale:

"rationale": [
          "classify the relevant categories. We need to identify the key information provided in the dialogue:\n\n1. Lauftechnik: Der Nutzer landet auf der Ferse.\n2. Untergrund: Der Nutzer läuft hauptsächlich auf der Straße.\n3. Schuhstabilität: Der Nutzer sucht nach Neutralschuhen.\n4. Verwendungszweck: Die Schuhe sind für das tägliche Training gedacht."
        ],

First part is in English and then German. But without compiling. This is the next thing I will test to see how it will change when I compile my program. For me, this would be a very important feature if I could set the language, as I mostly have German clients.

The question is, how could an implementation look like? Are there any ideas?

XenonMolecule · 2024-05-03T17:52:08Z

This is a super interesting proposal and I think it would be awesome to adapt the MIPRO and COPRO optimizers to more languages besides English. I think there are two interesting directions to explore here: one scientific and one engineering.

(1) From the scientific perspective I'm interested if prompting an LLM in German leads to higher performance on German data, or if the English instructions are equal/better. I've seen a bit of discussion about this online here and here among other discussions but I'm not sure there are definitive conclusions.

(2) Regardless I agree the option should be available to the user. Which brings me to the engineering direction. I think the immediate solution would be to hand translate the metaprompt in the MIPRO and COPRO optimizers into several languages and expose a flag to the user that lets them choose which language they want to have their optimizer work for. Ultimately we could do something more sophisticated where it runs langid on your data to select the metaprompt to use, but I think this is a better initial solution since it puts the selection in the hands of the user. This would entail modifying the metaprompt for MIPRO and COPRO to a few different supported languages.

MarkusOdenthal · 2024-05-06T19:00:54Z

Thanks for your comment, @XenonMolecule. Really interesting thoughts.

For the short term, do you have any experience with what works better in a multilingual setup? For instance, when I have a German task like attribute extraction, is setting up the task in English better and only mentioning that the language from which we want to extract the attributes is German? From my perspective, I would say this approach would be better for now because having a mix of languages in the instructions doesn't seem like a good approach.

mikeedjones · 2024-06-11T06:58:14Z

Hi @excubo-jg ,

To work with a language of your choice, the InstructionOptimizer may need a bit of an internal refactor for the instruction optimizer specifically to ensure that all instructions, prefixes, descriptions, etc. are in a predetermined language. This is a great point however on how we are currently skewed to English, and we’d love to support the library for all languages and data. Feel free to push a PR based on your investigations to abstract out the language present in the optimizers!

tagging @XenonMolecule and @klopsahlong for any other thoughts!

I think this would be enabled with #1090

mikeedjones mentioned this issue Jun 1, 2024

Feature/make signatures replaceable with context manager #1090

Merged

arnavsinghvi11 closed this as completed Jun 17, 2024

arnavsinghvi11 mentioned this issue Jun 19, 2024

When will i18n be supported? #1055

Closed

arnavsinghvi11 mentioned this issue Jun 28, 2024

how to set dspy to make the result to be chinese or english? #1210

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Language set to English in COPRO and MIPRO #729

Language set to English in COPRO and MIPRO #729

excubo-jg commented Mar 28, 2024

arnavsinghvi11 commented Apr 1, 2024

excubo-jg commented Apr 4, 2024

arnavsinghvi11 commented Apr 18, 2024

MarkusOdenthal commented May 3, 2024 •

edited

Loading

XenonMolecule commented May 3, 2024

MarkusOdenthal commented May 6, 2024

mikeedjones commented Jun 11, 2024

Language set to English in COPRO and MIPRO #729

Language set to English in COPRO and MIPRO #729

Comments

excubo-jg commented Mar 28, 2024

arnavsinghvi11 commented Apr 1, 2024

excubo-jg commented Apr 4, 2024

arnavsinghvi11 commented Apr 18, 2024

MarkusOdenthal commented May 3, 2024 • edited Loading

XenonMolecule commented May 3, 2024

MarkusOdenthal commented May 6, 2024

mikeedjones commented Jun 11, 2024

MarkusOdenthal commented May 3, 2024 •

edited

Loading