Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possble to add a registerable api at the beginning of torch.save #117840

Closed
CLiqing opened this issue Jan 19, 2024 · 19 comments
Closed

Is it possble to add a registerable api at the beginning of torch.save #117840

CLiqing opened this issue Jan 19, 2024 · 19 comments
Labels
enhancement Not as big of a feature, but technically not a bug. Should be easy to fix module: PrivateUse1 private use module: serialization Issues related to serialization (e.g., via pickle, or otherwise) of PyTorch objects triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@CLiqing
Copy link
Contributor

CLiqing commented Jan 19, 2024

🚀 The feature, motivation and pitch

Similar to #99808, we may have some special data in our tensor. We hope our tensor can be loaded by other devices like cpu so we need to change our tensors before torch.save.

Now we can only use monkey patches to do that for the only registerable step in torch.save is after data's(storage's) save.

image

I wonder if there can be a registerable api for privateuse1 backend at the beginning of torch.save so that we can change our data from special struct to general form before saving the actual data.

Alternatives

Add a registerable api at the beginning of torch.save, for examale prepare_for_save

Additional context

No response

cc @mruberry @mikaylagawarecki

@CLiqing CLiqing changed the title Is it possble to add one registerable api at the beginning of torch.save Is it possble to add a registerable api at the beginning of torch.save Jan 19, 2024
@malfet malfet added module: serialization Issues related to serialization (e.g., via pickle, or otherwise) of PyTorch objects module: PrivateUse1 private use triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module enhancement Not as big of a feature, but technically not a bug. Should be easy to fix labels Jan 19, 2024
@malfet
Copy link
Contributor

malfet commented Jan 19, 2024

@jbschlosser is PrivateUse1 the right label for such issues?

@CLiqing
Copy link
Contributor Author

CLiqing commented Jan 23, 2024

@ezyang could u please take a look at this? Thanks!

@ezyang
Copy link
Contributor

ezyang commented Jan 23, 2024

Can you explain to me how torch.save works today? IIUC, wouldn't we convert the tensor to cpu first before writing it out?

@CLiqing
Copy link
Contributor Author

CLiqing commented Jan 25, 2024

Can you explain to me how torch.save works today? IIUC, wouldn't we convert the tensor to cpu first before writing it out?

During torch.save, tensor and storage are saved separately.
As a part of tensor's information, storage_numel will be saved with others.

return ('storage',
storage_type,
storage_key,
location,
storage_numel)

After then, storage are converted to cpu and saved.

if storage.device.type != 'cpu':
storage = storage.cpu()
# Now that it is on the CPU we can directly copy it into the zip file
num_bytes = storage.nbytes()
zip_file.write_record(name, storage.data_ptr(), num_bytes)

However, it means torch will save storage_numel at first, which has some special data for our privateuse1 backend and is larger than real data's size. Here real data means storage's data converted from privateuse1 backend in above code.

storage_numel and storage's data are not match and it will cause a bug while loading. So we want to add registerable apis and allow privateuse1 backend to preprocessed the data at the beginning of torch.save, @ezyang

@ezyang
Copy link
Contributor

ezyang commented Jan 25, 2024

Naively, why doesn't the storage conversion to cpu preserve the metadata?

@CLiqing
Copy link
Contributor Author

CLiqing commented Jan 27, 2024

Naively, why doesn't the storage conversion to cpu preserve the metadata?

Sorry, I may not have been clear at first. Our storage data may be larger than in the CPU. Here storage data refers to the part pointed to by data_ptr. For performance, we may add some padding, for example, aligning the data to 16. Although we will restore it when converting back to CPU, storage_numel has been saved before. They're not matched.

So we want to add registerable apis and allow our privateuse1 backend to preprocessed the data (convert to cpu) at the beginning of torch.save, @ezyang

@ezyang
Copy link
Contributor

ezyang commented Jan 29, 2024

Why not preserve the padding when you convert to cpu?

@CLiqing
Copy link
Contributor Author

CLiqing commented Jan 30, 2024

Why not preserve the padding when you convert to cpu?

Because we want our saved storage data can be loaded and processed correctly by other devices like CPU or CUDA.
For example, if the size of storage data is 10 (on CPU), our PrivateUse1 device may add some padding to change the size to 16. This padding is unique to our device and cannot be processed by others. We hope the data is just saved of size 10 and other devices don't need to consider about our special data padding. @ezyang

@ezyang
Copy link
Contributor

ezyang commented Jan 31, 2024

This doesn't seem like a big deal to me? If you have padding, but the tensor metadata (e.g., storage offset, size) is set appropriately, CPU tensor would handle it correctly. And if you actually convert to cpu e.g., with .cpu() we wouldn't preserve the padding in this case.

@CLiqing
Copy link
Contributor Author

CLiqing commented Feb 2, 2024

This doesn't seem like a big deal to me? If you have padding, but the tensor metadata (e.g., storage offset, size) is set appropriately, CPU tensor would handle it correctly. And if you actually convert to cpu e.g., with .cpu() we wouldn't preserve the padding in this case.

  1. In fact, we can't set storage.nbytes() "appropriately". It is necessary to keep storage.nbytes() including the part of padding. For example, we use THPStorage_get to get storage[-1] and take padding into consideration. If we set nbytes without padding, we will get error data and can not use index -1 to get the padding part.
    static PyObject* THPStorage_get(THPStorage* self, PyObject* index) {
    HANDLE_TH_ERRORS
    THPStorage_assertNotNull(self);
    const auto& storage = THPStorage_Unpack(self);
    int64_t len = static_cast<int64_t>(storage.nbytes());
    /* Integer index */
    if (THPUtils_checkLong(index)) {
    int64_t nindex = THPUtils_unpackLong(index);
    if (nindex < 0)
    nindex += len;

    2.When we convert to cpu, we wouldn't preserve the padding and want to save the value without padding so that other device are able to process our data.

At the present, we can only use patches to solve this mismatch problem like,

torch.storage.UntypedStorage.nbytes = get_nbytes_without_paddings
result = torch.serialization.save(obj, f, pickle_module, pickle_protocol, True, _disable_byteorder_record)
torch.storage.UntypedStorage.nbytes = get_nbytes
return result

We hope there can be one such api so that we can do some preprocessing to the data. @ezyang

@CLiqing
Copy link
Contributor Author

CLiqing commented Feb 6, 2024

@ezyang , what do you think? Can you give me some suggestions?

@ezyang
Copy link
Contributor

ezyang commented Feb 7, 2024

In CPU land, I can have a CPU tensor that points to a storage buffer with padding before/after the tensor data. In this case, I have a non-zero storage offset. When I save this tensor, by default I include the padding. You can observe this in the following example:

import torch

x = torch.randn(1024)
torch.save(x[512:], 'foo.pt')
print(torch.load('foo.pt').storage_offset())

Nothing you have told me thus far has therefore convinced me that what you want to do isn't expressible without a hook.

@CLiqing
Copy link
Contributor Author

CLiqing commented Feb 7, 2024

In CPU land, I can have a CPU tensor that points to a storage buffer with padding before/after the tensor data. In this case, I have a non-zero storage offset. When I save this tensor, by default I include the padding. You can observe this in the following example:

import torch

x = torch.randn(1024)
torch.save(x[512:], 'foo.pt')
print(torch.load('foo.pt').storage_offset())

Nothing you have told me thus far has therefore convinced me that what you want to do isn't expressible without a hook.

In fact, our padding is more complex. We may modify the structure of data in a variety of ways depending on the scenario, not just adding padding at the end.

As shown in the above figure, they are complex and hard to get actual data. Therefore, we wan to add an entrance so that we can restore data back when using torch.save. Now we can only use patches, @ezyang.

@ezyang
Copy link
Contributor

ezyang commented Feb 8, 2024

The first padding scenario can be accurately expressed using strides which make you skip two blue widths.

The second padding scenario can also be expressed by using strides to express transposition, and also expanding the stride to skip past the blue padding.

I can give concrete code examples if you still don't get it.

@CLiqing
Copy link
Contributor Author

CLiqing commented Feb 9, 2024

The first padding scenario can be accurately expressed using strides which make you skip two blue widths.

The second padding scenario can also be expressed by using strides to express transposition, and also expanding the stride to skip past the blue padding.

I can give concrete code examples if you still don't get it.

I know these can be expressed by some ways, but we do not intend to increase learning costs for our clients. We hope to maintain these specific operations implicitly. Regardless of which type of paddings we use, others don't need to add some skippings or strides explicitly.
In torch.save, we convert data to CPU, remove paddings and save the special format in metadata. When it is loaded into our PrivateUse1 device, we will convert it back with special paddings and when loading it into CPU, there is no need to do anything , @ezyang

@CLiqing
Copy link
Contributor Author

CLiqing commented Feb 16, 2024

@ezyang, so what's ur opinions now?

@ezyang
Copy link
Contributor

ezyang commented Feb 16, 2024

I don't think having accurate strides makes client code more complicated, IMO

@CLiqing
Copy link
Contributor Author

CLiqing commented Feb 17, 2024

I don't think having accurate strides makes client code more complicated, IMO

@ezyang, you're familiar with torch so it's not complicated to you, but maybe it's not for our clients. We just want to hand over code adaptation work by ourselves rather than clients. They only need to specify device as PrivateUse1 to run the code which is available in CPU or CUDA, instead of adding strides everywhere in the code.

# Client code in ``CUDA``
torch.cuda.set_device(i)
...
b = a.clone()
...


# Client code in ``PrivateUse1``
# Only a few times
torch.privateuse1.set_device(i)
...
# Many times
b = a[xxx:yyy].clone()
...

It will not affect existing torch logic, but merely provide an interface to PrivateUse1 device.

prepare_for_save = None

def register_prepare_for_save(prepare_for_save_fp):
    prepare_for_save = prepare_for_save_fp

def save(
    obj: object,
    f: FILE_LIKE,
    pickle_module: Any = pickle,
    pickle_protocol: int = DEFAULT_PROTOCOL,
    _use_new_zipfile_serialization: bool = True,
    _disable_byteorder_record: bool = False
) -> None:
    if prepare_for_save is not None:
        prepare_for_save(obj)
    ...

@ezyang
Copy link
Contributor

ezyang commented Feb 19, 2024

In general, users are expected not to deal explicitly with strides. If I add two values with unusual striding, I will typically preserve this striding (but make the result contiguous). If you are doing work to automatically insert padding when you predict it is necessary, this would be the same as just making functions output non-standard strides differently than pytorch on cpu/cuda.

Look, if you tell me, "Edward, I understand this is hypothetically better, but we already wrote lots of code managing different storage padding at the storage level, and we'd have to refactor all of it which we don't have time to do" I'd be like, OK, fine, this seems like a decent tradeoff to make if you've painted yourself into this situation, but then, what's so bad about an extra monkey patch on top. If we're putting something into PyTorch core, we're going to do it right, because we're going to have to maintain it forever afterwards.

@CLiqing CLiqing closed this as completed Jun 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Not as big of a feature, but technically not a bug. Should be easy to fix module: PrivateUse1 private use module: serialization Issues related to serialization (e.g., via pickle, or otherwise) of PyTorch objects triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

3 participants