Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIR/Tune/Docs] Mass documentation testing failure #26410

Closed
richardliaw opened this issue Jul 9, 2022 · 21 comments · Fixed by #26821
Closed

[AIR/Tune/Docs] Mass documentation testing failure #26410

richardliaw opened this issue Jul 9, 2022 · 21 comments · Fixed by #26821
Assignees
Labels
bug Something that is supposed to be working; but isn't P0 Issues that should be fixed in short order release-blocker P0 Issue that blocks the release
Milestone

Comments

@richardliaw
Copy link
Contributor

richardliaw commented Jul 9, 2022

What happened + What you expected to happen

Basically, our Jupyter notebook testing has been broken for months, leading us to check in dozens of broken documentation and tests.

This is largely impacting RL / AIR / Tune.

Relevant PRs:
#26409

A full list of test failures can be found here:

  Status Failure
//doc/source/data/examples:nyc_taxi_basic_processing FAILED in 3 out of 3 in 9.6s BROKEN Could not read schema from 'ursa-labs-taxi-data/2009/01/data.parquet
//doc/source/ray-air/examples:torch_incremental_learning FAILED in 3 out of 3 in 93.3s BROKEN  dataset type
//doc/source/ray-air/examples:rl_offline_example FAILED in 3 out of 3 in 76.0s BROKEN ValueError: Received data of type: <class 'list'>, but expected it to be one of typing.Union[ForwardRef('numpy.ndarray'), ForwardRef('pandas.DataFrame'), ForwardRef('pyarrow.Table'), typing.Dict[str, ForwardRef('numpy.ndarray')]]
//doc/source/ray-air/examples:rl_online_example FAILED in 3 out of 3 in 51.4s BROKEN ValueError: Received data of type: <class 'list'>, but expected it to be one of typing.Union[ForwardRef('numpy.ndarray'), ForwardRef('pandas.DataFrame'), ForwardRef('pyarrow.Table'), typing.Dict[str, ForwardRef('numpy.ndarray')]]
//doc/source/ray-air/examples:tfx_tabular_train_to_serve FAILED in 3 out of 3 in 49.9s BROKEN TypeError: Object of type ndarray is not JSON serializable
//doc/source/tune/examples:sigopt_example FAILED in 3 out of 3 in 0.8s WONTFIX API key, tracked here - #26567
//doc/source/ray-air/examples:feast_example FAILED in 3 out of 3 in 0.7s WONTFIX bad import

Versions / Dependencies

Master

Reproduction script

N/A

Issue Severity

High: It blocks me from completing my task.

@richardliaw richardliaw added bug Something that is supposed to be working; but isn't release-blocker P0 Issue that blocks the release P0 Issues that should be fixed in short order 2.0.0beta-blocker labels Jul 9, 2022
@richardliaw richardliaw changed the title [AIR/Tune/Docs] Mass documentation failure [AIR/Tune/Docs] Mass documentation testing failure Jul 9, 2022
@richardliaw
Copy link
Contributor Author

My recommendation moving forward:

  1. Fix the AIR issues, as there's mainly 2 issues (S3 and RL datatype handling)
  2. Fix low hanging fruit
  3. Disable other tests that require larger testing harness changes (horovod, pip installation), and file P0s for component owners to fix.

@richardliaw
Copy link
Contributor Author

Merged #26409 which disables tests and reverts some of the Pandas changes (#26009). This is a regression and should be addressed before beta.

@xwjiang2010
Copy link
Contributor

xwjiang2010 commented Jul 11, 2022

Question:
I notice that all bash commands are skipped when converting to python.
For example I am looking at this feast_example. I notice that a bunch of commands are incorrectly commented out during conversion.

# ---
# jupyter:
#   jupytext:
#     text_representation:
#       extension: .py
#       format_name: percent
#       format_version: '1.3'
#       jupytext_version: 1.14.0
#   kernelspec:
#     display_name: Python 3 (ipykernel)
#     language: python
#     name: python3
# ---

# %% [markdown]
# # Integrate Ray AIR with Feast feature store

# %%
# # !pip install feast==0.20.1 ray[air]>=1.13 xgboost_ray

# %% [markdown] id="INyNIaeB1Kza"
# In this example, we showcase how to use Ray AIR with Feast feature store, leveraging both historical features for training a model and online features for inference.
#
# The task is adapted from [Feast credit scoring tutorial](https://github.com/feast-dev/feast-aws-credit-scoring-tutorial). In this example, we train a xgboost model and run some prediction on an incoming loan request to see if it is approved or rejected.

# %% [markdown] id="sBC9CCrpzQLF"
# Let's first set up our workspace and prepare the data to work with.

# %% id="DcPIskZlzSal"
import os
WORKING_DIR = os.path.expanduser("~/ray-air-feast-example/")
# %env WORKING_DIR=$WORKING_DIR

# %% colab={"base_uri": "https://localhost:8080/"} id="BcyCKjV3zTCK" outputId="afdfa24d-e5ce-49db-c904-e961e1eb910c"
# ! mkdir -p $WORKING_DIR
# ! wget --no-check-certificate https://github.com/ray-project/air-sample-data/raw/main/air-feast-example.zip
# ! unzip air-feast-example.zip
# ! mv air-feast-example/* $WORKING_DIR
# %cd $WORKING_DIR

# %% colab={"base_uri": "https://localhost:8080/"} id="iNbC-Qqi3Lq_" outputId="99576086-12dd-4f96-fb51-de40b77b15ce"
# ! ls

# %% [markdown] id="c_3wlEus4dYO"
# There is already a feature repository set up in `feature_repo/`. It isn't necessary to create a new feature repository, but it can be done using the following command: `feast init -t local feature_repo`.
#
# Now let's take a look at the schema in Feast feature store, which is defined by `feature_repo/features.py`. There are mainly two features: zipcode_feature and credit_history, both are generated from parquet files - `feature_repo/data/zipcode_table.parquet` and `feature_repo/data/credit_history.parquet`.

# %% colab={"base_uri": "https://localhost:8080/"} id="5VGLhPLLzlGW" outputId="a3f3499e-c140-4ceb-a66d-2f1a6b8a2142"
# !pygmentize feature_repo/features.py

# %% [markdown] id="HQmrfEV33_SM"
# Deploy the above defined feature store by running `apply` from within the feature_repo/ folder.

# %% colab={"base_uri": "https://localhost:8080/"} id="SbL_EbMC2MFS" outputId="13b07f1f-d52a-4c4e-a73f-f5478c0304de"
# ! cd feature_repo && feast apply

@Yard1 Yard1 self-assigned this Jul 11, 2022
@xwjiang2010 xwjiang2010 assigned jiaodong and xwjiang2010 and unassigned Yard1 Jul 11, 2022
@Yard1
Copy link
Member

Yard1 commented Jul 11, 2022

I'll pick up the tune/examples ones & huggingface

@jiaodong
Copy link
Member

Synced with @xwjiang2010 offline, i will start from the bottom with air examples with s3 related issues first

@jiaodong
Copy link
Member

jiaodong commented Jul 11, 2022

For //doc/source/ray-air/examples:upload_to_comet_ml and //doc/source/ray-air/examples:upload_to_wandb actually both worked for me after setting up comet and wandb credentials

======== Working on repro-ci ============

//doc/source/ray-air/examples:xgboost_example PASSED

@Yard1
Copy link
Member

Yard1 commented Jul 11, 2022

Comet would be @krfricke I think?

@xwjiang2010
Copy link
Contributor

@jiaodong could you take a look at setup_credentials.py? I believe this is run to set up the right credentials to talk to either wandb or comet-ml backend. Maybe something is not working there.

@xwjiang2010
Copy link
Contributor

@Yard1 Can you mark fixed tests in the master table? Thanks!

krfricke pushed a commit that referenced this issue Jul 13, 2022
Fixes the Tune-Pytorch-CIFAR notebook example as found in #26410

Signed-off-by: Antoni Baum <[email protected]>
krfricke pushed a commit that referenced this issue Jul 13, 2022
Fixes the BOHB notebook example as found in #26410

Signed-off-by: Antoni Baum <[email protected]>
@jiaodong
Copy link
Member

feast_example is not blocker confirmed with Richard, and the sigopt one failed only because we don't know how to setup permission.

Everything else is running on CI per commit and green now, with only one PR requires cherry pick tagged as v2.0.0-pick already.

@xwjiang2010
Copy link
Contributor

Should we have separate trackers for the remaining low-prio two?

@jiaodong
Copy link
Member

#27203 for sigopt
#27204 for feast

Stefan-1313 pushed a commit to Stefan-1313/ray_mod that referenced this issue Aug 18, 2022
Fixes the Tune-Pytorch-CIFAR notebook example as found in ray-project#26410

Signed-off-by: Antoni Baum <[email protected]>
Signed-off-by: Stefan van der Kleij <[email protected]>
Stefan-1313 pushed a commit to Stefan-1313/ray_mod that referenced this issue Aug 18, 2022
Fixes the BOHB notebook example as found in ray-project#26410

Signed-off-by: Antoni Baum <[email protected]>
Signed-off-by: Stefan van der Kleij <[email protected]>
Stefan-1313 pushed a commit to Stefan-1313/ray_mod that referenced this issue Aug 18, 2022
Fixes the Horovod notebook example as found in ray-project#26410 by installing Horovod in doc tests jobs.

Signed-off-by: Antoni Baum <[email protected]>
Signed-off-by: Stefan van der Kleij <[email protected]>
Stefan-1313 pushed a commit to Stefan-1313/ray_mod that referenced this issue Aug 18, 2022
Fixes failing hyperopt notebook in CI (as found in ray-project#26410). The cause was a mismatch between keys in points to evaluate and the search space - now, an informative exception will be raised.

Signed-off-by: Antoni Baum <[email protected]>
Signed-off-by: Stefan van der Kleij <[email protected]>
Stefan-1313 pushed a commit to Stefan-1313/ray_mod that referenced this issue Aug 18, 2022
Fixes the tune-sklearn notebook example as found in ray-project#26410

Signed-off-by: Antoni Baum <[email protected]>
Signed-off-by: Stefan van der Kleij <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something that is supposed to be working; but isn't P0 Issues that should be fixed in short order release-blocker P0 Issue that blocks the release
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants