EleutherAI · haileyschoelkopf · Jun 3, 2024 · May 8, 2024 · May 8, 2024 · May 8, 2024
@@ -107,6 +107,46 @@ Using this decorator results in the class being added to an accounting of the us
 
 We also recommend that new model contributions be accompanied by short tests of their 3 core functionalities, at minimum. To see an example of such tests, look at https://github.com/EleutherAI/lm-evaluation-harness/blob/35bdecd379c0cefad6897e67db892f4a6026a128/tests/test_ggml.py .
 
+## Chat Templating
+
+Many models are fine-tuned with a [Chat Template](https://huggingface.co/docs/transformers/main/en/chat_templating) in order to enable back-and-forth interaction between a "User"'s queries and the model (often called "Assistant")'s responses. It can be desirable to evaluate fine-tuned models on evaluation tasks while wrapped in the conversational format they expect.
+
+In order to make your model optionally compatible with a chat format, two additional methods must be implemented:
+
+```python
+class MyCustomLM(LM):
+ #...
+ @property
+ def tokenizer_name(self) -> str:
+ # should return a string denoting the name of the model's tokenizer and/or the accompanying chat template.
+
+ def apply_chat_template(self, chat_history: List[Dict[str, str]]) -> str:
+ # responsible for taking as input a chat history that would be fed into the model, and
+ # rendering it as a string that can be then tokenized and input into the model.
+ #...
+```
+
+- `apply_chat_template`
+ - This method performs the bulk of the work required for chat-formatting.
+ - As input, a `chat_history: List[Dict[str, str]]` is passed in. This is a transcript of a conversation of a form similar to
+ ```
+ [
+ {"system": <user-provided system message such as "You are a helpful math-focused chatbot">},
+ {"user": <task example - a few-shot example 'input'>}
+ {"assistant": <correct response to the above example>},
+ # ... more few-shot examples, potentially
+ {"user": <test set query--response on which we will evaluate>},
+ ]
+ ```
+ which can then be converted into a string input.
+ - The output is a string representing this conversation that can be fed into the model.
+ - For example, this consists of simply calling `tokenizer.apply_chat_template` for HFLM--see the implementation there for reference.
+- `tokenizer_name`
+ - LM Eval Harness supports [caching requests](https://github.com/EleutherAI/lm-evaluation-harness/blob/4902aaaf1f374682f95ac25fe2e13b23faddc91a/lm_eval/__main__.py#L140) that are sent to a model, for faster setup when repeating an already-performed evaluation.
+ - However, we don't want to use the cache of chat transcripts rendered using one chat template or system prompt to send to a model with a different template! So, we use this `lm.tokenizer_name()` string to distinguish caches for a given model (and chat template) from one another.
+
+If not implemented, the flags `--system_instruction`
+
 ## Other
 
 **Pro tip**: In order to make the Evaluation Harness overestimate total runtimes rather than underestimate it, HuggingFace models come in-built with the ability to provide responses on data points in *descending order by total input length* via `lm_eval.utils.Reorderer`. Take a look at `lm_eval.models.hf_causal.HFLM` to see how this is done, and see if you can implement it in your own model!