Dataset destroys Example.input_keys values #898

jsleight · 2024-04-24T15:53:13Z

Minimal example (on dspy v2.4.0):

import dspy
examples = [dspy.Example(foo=f, bar=b).with_inputs("foo") for f, b in zip("abcd", "1234")]
print(examples)  # [Example({'foo': 'a', 'bar': '1'}) (input_keys={'foo'}), Example({'foo': 'b', 'bar': '2'}) (input_keys={'foo'}), Example({'foo': 'c', 'bar': '3'}) (input_keys={'foo'}), Example({'foo': 'd', 'bar': '4'}) (input_keys={'foo'})]

from dspy.datasets.dataset import Dataset

class MyDataset(Dataset):
    def __init__(self, examples):
        super().__init__(train_size=1, dev_size=1, test_size=1)
        self._train = [examples[0]]
        self._dev = [examples[1]]
        self._test = [examples[2]]

dataset = MyDataset(examples)
print(dataset.train)  # [Example({'foo': 'a', 'bar': '1'}) (input_keys=None)]
print(dataset.dev)    # [Example({'foo': 'b, 'bar': '2'}) (input_keys=None)]
print(dataset.test)   # [Example({'foo': 'c', 'bar': '3'}) (input_keys=None)]

Expected to have the input_keys persist through the Dataset object. This line seems to be the problem.

The text was updated successfully, but these errors were encountered:

arnavsinghvi11 · 2024-04-27T22:50:41Z

Hi @jsleight , thanks for raising this. Currently, the behavior lies in declaring your Dataset type first and then setting the inputs - example from intro.ipynb:

from dspy.datasets import HotPotQA

# Load the dataset.
dataset = HotPotQA(train_seed=1, train_size=20, eval_seed=2023, dev_size=50, test_size=0)

# Tell DSPy that the 'question' field is the input. Any other fields are labels and/or metadata.
trainset = [x.with_inputs('question') for x in dataset.train]
devset = [x.with_inputs('question') for x in dataset.dev]

len(trainset), len(devset)

but it does make sense to me to have input_keys() persist if they exist. Feel free to push a PR for this change!

jsleight · 2024-04-29T15:20:17Z

I might have some time to make a PR. I can envision a couple of approaches so interested to see which you'd prefer.

Just change the line in Dataset that creates copies of the examples to also do with_inputs.
A bit more fundamental change to Examples to have Examples(**example) persist the input_keys. Would make the Dataset class persist the input_keys while adding a bit more functionality to the Examples class. But idk if you'd like Examples to work this way or not.

Fix the issue of handling input_keys using Dataset class (Issue #898)

arnavsinghvi11 added a commit that referenced this issue Jun 17, 2024

Merge pull request #1086 from pedramsalimi/main

1e29689

Fix the issue of handling input_keys using Dataset class (Issue #898)

arnavsinghvi11 closed this as completed Jun 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dataset destroys Example.input_keys values #898

Dataset destroys Example.input_keys values #898

jsleight commented Apr 24, 2024

arnavsinghvi11 commented Apr 27, 2024 •

edited

Loading

jsleight commented Apr 29, 2024

Dataset destroys Example.input_keys values #898

Dataset destroys Example.input_keys values #898

Comments

jsleight commented Apr 24, 2024

arnavsinghvi11 commented Apr 27, 2024 • edited Loading

jsleight commented Apr 29, 2024

arnavsinghvi11 commented Apr 27, 2024 •

edited

Loading