Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

11 import datasptensor check dimensions on import #207

Merged
merged 15 commits into from
Sep 6, 2023

Conversation

DeepBlockDeepak
Copy link
Collaborator

@DeepBlockDeepak DeepBlockDeepak commented Jul 28, 2023

Tensor data with invalid indices relative to shape, in file sptensor2.tns:

sptensor
3 
3 3 1 
3
1 1 1 1
1 3 2 22
2 2 2 3

New Behavior when loading faulty tensor data:

$ python
>>> import pyttb as ttb
>>> S = ttb.import_data('sptensor2.tns')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/jmeds/code/pyttb/pyttb/import_data.py", line 52, in import_data
    return ttb.sptensor.from_data(subs, vals, shape)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jmeds/code/pyttb/pyttb/sptensor.py", line 167, in from_data
    raise ValueError(f"Index out of bounds: {idx} for dimension: {i}")
ValueError: Index out of bounds: 1 for dimension: 2

📚 Documentation preview 📚: https://pyttb--207.org.readthedocs.build/en/207/

DeepBlockDeepak added 2 commits July 28, 2023 12:17
… if an index is found that is less than zero or gte to the size of the dimension it indexes, a ValueError is raised.
@DeepBlockDeepak DeepBlockDeepak linked an issue Jul 28, 2023 that may be closed by this pull request
@DeepBlockDeepak DeepBlockDeepak self-assigned this Jul 28, 2023
@ntjohnson1
Copy link
Collaborator

Can we use one of our existing validation utilities?

def tt_subscheck(subs: np.ndarray, nargout: bool = True) -> bool:

Can we add a test to make sure we maintain this behavior? Also does this catch the situation where the shape provided is the wrong number of dimensions?

sptensor
3
3 3 1
2
1 1 1 1 1
2 2 2 2 2

Additional note, iterating in python is slow. The reason we use the invalidated from_data is for speed. from_aggregation will already do validation and adjustment to account for accident duplicates etc. Instead of looping over every sub (row) then index (col). You should be able to compare all the subs against the shape via a broadcast as a single numpy call then check if anything fails. Here is where we do something similar in from_aggregation to find the minimum acceptable shape to fit provided entries

shape = tuple(np.max(subs, axis=0) + 1)

@dmdunla
Copy link
Collaborator

dmdunla commented Jul 31, 2023

This only checks that the shape and type of the subs index array:

def tt_subscheck(subs: np.ndarray, nargout: bool = True) -> bool:

Nick's idea of using vectorized methods would be an improvement. Something like:

all(a <= b for a, b in zip(tuple(np.max(subs, axis=0) + 1), shape)

So, a combination of tt_subscheck and a comparison of the input shape and the shape derived from the subs array of indices would be a more complete check.

@dmdunla
Copy link
Collaborator

dmdunla commented Aug 10, 2023

This seems to be broken. Any updates?

@DeepBlockDeepak
Copy link
Collaborator Author

Nothing so far but I am planning on taking a look soon.

DeepBlockDeepak and others added 2 commits August 14, 2023 18:16
…bscheck() returns True, meaning empty subscript arrays are considered valid in the sptensor constructor.
@dmdunla
Copy link
Collaborator

dmdunla commented Aug 22, 2023

Waiting to land #213 before this, as this is a small change to address error checking and will be impacted by the major refactor in #213.

@dmdunla
Copy link
Collaborator

dmdunla commented Aug 22, 2023

@DeepBlockDeepak Can you update this now that the default sptensor constructor has moved to __init__?

ntjohnson1 and others added 2 commits August 22, 2023 15:58
* tensor Update __init__ and remove from_data:
* Update all usages

* tensor add explicit copy and deepcopy dunder

* sptensor update __init__ and remove from_data

* sptensor add copy and deepcopy dunder

* ttensor: Rename u to factor_matrices to match ktensor.

* ruff: Add minimal coverage to tests dir:
* We can now run ruff check . from root
* Fix imports unused, unsorted in tests
* Fix fstring errors, unused variables, etc
* All test changes via ruff check . --fix

* ttensor: update __init__ and remove from_data

* ttensor: add copy and dunder deepcopy

* ttensor: Remove from_tensor_type
* Only worked as copy constructor before so explicit copy replaces

* tensor: Remove from_tensor_type:
* Push to_tensor to all callers

* ruff: Add bugbear to catch bugs and better align with pylint

* pylint: Disable false positive if we don't remove pylint first

* sptensor: Remove from_tensor and push conversion to callers

* ttensor: Add to_tensor

* tutorials: Update to use new interfaces

* Updated doc and test to capture or ruff support of tests directory:
* Basically just keep imports reasonable and avoid logic errors (unused variables etc)

* Missed a pylint disable statement in the rebase. A search over the project no longer matches pylint

* Documentation consistency:
* Remove undocumented methods/members
 * No longer need to explicitly exclude members by name
* Move all tensor types to use slots
* Don't skip __init__ since we were dropping that entire doc string

* Documentation style:
* Merges __init__ with class definition instead of a separate __init__ entry

* Add forgotten arguments to docstring
@dmdunla
Copy link
Collaborator

dmdunla commented Aug 23, 2023

This now looks like there is no change from main. I'm not sure what happened. Is this still an issue? I do not see the changes from @DeepBlockDeepak any longer.

@ntjohnson1
Copy link
Collaborator

It doesn't look like this PR ever added a test for the expected behavior. I don't think I fixed anything in #213 so I assume this is still an issue (could check with the repro in the PR description), and the non-change is just a bad conflict resolution.

@ntjohnson1
Copy link
Collaborator

@dmdunla @DeepBlockDeepak it didn't look like this was moving so I added the original test described above, confirmed it failed, adapted the code that reported the error, and pushed that to this branch. I snuck in a few minor cleanups I hit after merging main onto this branch.

@dmdunla
Copy link
Collaborator

dmdunla commented Sep 5, 2023

@ntjohnson1 Is this ready for review? There was a lingering review request from @DeepBlockDeepak, so it wasn't clear to me if your changes are ready for the merge now.

@ntjohnson1
Copy link
Collaborator

@dmdunla yes it is ready. There were no changes on the branch after the bad merge. I mostly just mentioned him since he opened the PR initially

@dmdunla dmdunla merged commit 054b721 into main Sep 6, 2023
6 checks passed
@dmdunla dmdunla deleted the 11-import_datasptensor-check-dimensions-on-import branch September 6, 2023 12:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

import_data/sptensor: Check dimensions on import
3 participants