Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

check is streamable #166

Closed
wants to merge 7 commits into from
Closed

check is streamable #166

wants to merge 7 commits into from

Conversation

AlexTMallen
Copy link
Collaborator

This also removes hard-coded checks for datasets

@AlexTMallen AlexTMallen marked this pull request as draft April 4, 2023 19:14
@AlexTMallen AlexTMallen marked this pull request as ready for review April 4, 2023 19:16
elk/utils/data_utils.py Outdated Show resolved Hide resolved
@norabelrose
Copy link
Member

This seems to just hang when you try elicit on imdb with --stream on

@AlexTMallen
Copy link
Collaborator Author

AlexTMallen commented Apr 11, 2023

This seems to just hang when you try elicit on imdb with --stream on

The hanging is caused by it streaming 750 examples to check if they all have the same label. I found that streaming the data necessarily takes this much time (in my case, about 8 seconds). This should only impact performance when it throws an Exception for the dataset not being streamable.

@norabelrose
Copy link
Member

he hanging is caused by it streaming 750 examples to check if they all have the same label. I found that streaming the data necessarily takes this much time (in my case, about 8 seconds). This should only impact performance when it throws an Exception for the dataset not being streamable.

It's possible that I'm misremembering but I do distinctly remember actually going through with a print statement to see what was going on and it like, stayed frozen even after it had hit the appropriate number of examples

@AlexTMallen
Copy link
Collaborator Author

It's possible that I'm misremembering but I do distinctly remember actually going through with a print statement to see what was going on and it like, stayed frozen even after it had hit the appropriate number of examples

I can't replicate this when I use print statements. If you're using max_examples / world_size less than 100, it checks the first 100 streaming examples as a minimum. As soon as it reaches the appropriate iteration it raises an exception for me.

@CLAassistant
Copy link

CLAassistant commented Apr 23, 2023

CLA assistant check
All committers have signed the CLA.

@norabelrose
Copy link
Member

We don't even support streaming datasets anymore

@norabelrose norabelrose closed this May 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants