Is it possble to add a registerable api at the beginning of torch.save #117840

CLiqing · 2024-01-19T08:30:48Z

🚀 The feature, motivation and pitch

Similar to #99808, we may have some special data in our tensor. We hope our tensor can be loaded by other devices like cpu so we need to change our tensors before torch.save.

Now we can only use monkey patches to do that for the only registerable step in torch.save is after data's(storage's) save.

I wonder if there can be a registerable api for privateuse1 backend at the beginning of torch.save so that we can change our data from special struct to general form before saving the actual data.

Alternatives

Add a registerable api at the beginning of torch.save, for examale prepare_for_save

Additional context

No response

cc @mruberry @mikaylagawarecki

The text was updated successfully, but these errors were encountered:

malfet · 2024-01-19T15:02:35Z

@jbschlosser is PrivateUse1 the right label for such issues?

CLiqing · 2024-01-23T01:25:52Z

@ezyang could u please take a look at this? Thanks!

ezyang · 2024-01-23T14:59:19Z

Can you explain to me how torch.save works today? IIUC, wouldn't we convert the tensor to cpu first before writing it out?

CLiqing · 2024-01-25T15:10:16Z

Can you explain to me how torch.save works today? IIUC, wouldn't we convert the tensor to cpu first before writing it out?

During torch.save, tensor and storage are saved separately.
As a part of tensor's information, storage_numel will be saved with others.

pytorch/torch/serialization.py

Lines 828 to 832 in ee1dbb2

 return ('storage', 

 storage_type, 

 storage_key, 

 location, 

 storage_numel)

After then, storage are converted to cpu and saved.

pytorch/torch/serialization.py

Lines 858 to 862 in ee1dbb2

 if storage.device.type != 'cpu': 

 storage = storage.cpu() 

 # Now that it is on the CPU we can directly copy it into the zip file 

 num_bytes = storage.nbytes() 

 zip_file.write_record(name, storage.data_ptr(), num_bytes)

However, it means torch will save storage_numel at first, which has some special data for our privateuse1 backend and is larger than real data's size. Here real data means storage's data converted from privateuse1 backend in above code.

storage_numel and storage's data are not match and it will cause a bug while loading. So we want to add registerable apis and allow privateuse1 backend to preprocessed the data at the beginning of torch.save, @ezyang

ezyang · 2024-01-25T18:05:34Z

Naively, why doesn't the storage conversion to cpu preserve the metadata?

CLiqing · 2024-01-27T09:35:38Z

Naively, why doesn't the storage conversion to cpu preserve the metadata?

Sorry, I may not have been clear at first. Our storage data may be larger than in the CPU. Here storage data refers to the part pointed to by data_ptr. For performance, we may add some padding, for example, aligning the data to 16. Although we will restore it when converting back to CPU, storage_numel has been saved before. They're not matched.

So we want to add registerable apis and allow our privateuse1 backend to preprocessed the data (convert to cpu) at the beginning of torch.save, @ezyang

ezyang · 2024-01-29T04:22:27Z

Why not preserve the padding when you convert to cpu?

CLiqing · 2024-01-30T02:57:29Z

Why not preserve the padding when you convert to cpu?

Because we want our saved storage data can be loaded and processed correctly by other devices like CPU or CUDA.
For example, if the size of storage data is 10 (on CPU), our PrivateUse1 device may add some padding to change the size to 16. This padding is unique to our device and cannot be processed by others. We hope the data is just saved of size 10 and other devices don't need to consider about our special data padding. @ezyang

ezyang · 2024-01-31T16:26:45Z

This doesn't seem like a big deal to me? If you have padding, but the tensor metadata (e.g., storage offset, size) is set appropriately, CPU tensor would handle it correctly. And if you actually convert to cpu e.g., with .cpu() we wouldn't preserve the padding in this case.

CLiqing · 2024-02-02T03:39:42Z

This doesn't seem like a big deal to me? If you have padding, but the tensor metadata (e.g., storage offset, size) is set appropriately, CPU tensor would handle it correctly. And if you actually convert to cpu e.g., with .cpu() we wouldn't preserve the padding in this case.

In fact, we can't set storage.nbytes() "appropriately". It is necessary to keep storage.nbytes() including the part of padding. For example, we use THPStorage_get to get storage[-1] and take padding into consideration. If we set nbytes without padding, we will get error data and can not use index -1 to get the padding part.

pytorch/torch/csrc/Storage.cpp

Lines 484 to 493 in adff335

 static PyObject* THPStorage_get(THPStorage* self, PyObject* index) { 

 HANDLE_TH_ERRORS 

 THPStorage_assertNotNull(self); 

 const auto& storage = THPStorage_Unpack(self); 

 int64_t len = static_cast<int64_t>(storage.nbytes()); 

 /* Integer index */ 

 if (THPUtils_checkLong(index)) { 

 int64_t nindex = THPUtils_unpackLong(index); 

 if (nindex < 0) 

 nindex += len;

2.When we convert to cpu, we wouldn't preserve the padding and want to save the value without padding so that other device are able to process our data.

At the present, we can only use patches to solve this mismatch problem like,

torch.storage.UntypedStorage.nbytes = get_nbytes_without_paddings
result = torch.serialization.save(obj, f, pickle_module, pickle_protocol, True, _disable_byteorder_record)
torch.storage.UntypedStorage.nbytes = get_nbytes
return result

We hope there can be one such api so that we can do some preprocessing to the data. @ezyang

CLiqing · 2024-02-06T12:16:04Z

@ezyang , what do you think? Can you give me some suggestions?

ezyang · 2024-02-07T04:47:35Z

In CPU land, I can have a CPU tensor that points to a storage buffer with padding before/after the tensor data. In this case, I have a non-zero storage offset. When I save this tensor, by default I include the padding. You can observe this in the following example:

import torch

x = torch.randn(1024)
torch.save(x[512:], 'foo.pt')
print(torch.load('foo.pt').storage_offset())

Nothing you have told me thus far has therefore convinced me that what you want to do isn't expressible without a hook.

CLiqing · 2024-02-07T08:11:34Z

In CPU land, I can have a CPU tensor that points to a storage buffer with padding before/after the tensor data. In this case, I have a non-zero storage offset. When I save this tensor, by default I include the padding. You can observe this in the following example:
import torch

x = torch.randn(1024)
torch.save(x[512:], 'foo.pt')
print(torch.load('foo.pt').storage_offset())
Nothing you have told me thus far has therefore convinced me that what you want to do isn't expressible without a hook.

In fact, our padding is more complex. We may modify the structure of data in a variety of ways depending on the scenario, not just adding padding at the end.

As shown in the above figure, they are complex and hard to get actual data. Therefore, we wan to add an entrance so that we can restore data back when using torch.save. Now we can only use patches, @ezyang.

ezyang · 2024-02-08T20:11:54Z

The first padding scenario can be accurately expressed using strides which make you skip two blue widths.

The second padding scenario can also be expressed by using strides to express transposition, and also expanding the stride to skip past the blue padding.

I can give concrete code examples if you still don't get it.

CLiqing · 2024-02-09T05:20:28Z

The first padding scenario can be accurately expressed using strides which make you skip two blue widths.

The second padding scenario can also be expressed by using strides to express transposition, and also expanding the stride to skip past the blue padding.

I can give concrete code examples if you still don't get it.

I know these can be expressed by some ways, but we do not intend to increase learning costs for our clients. We hope to maintain these specific operations implicitly. Regardless of which type of paddings we use, others don't need to add some skippings or strides explicitly.
In torch.save, we convert data to CPU, remove paddings and save the special format in metadata. When it is loaded into our PrivateUse1 device, we will convert it back with special paddings and when loading it into CPU, there is no need to do anything , @ezyang

CLiqing · 2024-02-16T14:09:28Z

@ezyang, so what's ur opinions now?

ezyang · 2024-02-16T15:46:52Z

I don't think having accurate strides makes client code more complicated, IMO

CLiqing · 2024-02-17T02:00:36Z

I don't think having accurate strides makes client code more complicated, IMO

@ezyang, you're familiar with torch so it's not complicated to you, but maybe it's not for our clients. We just want to hand over code adaptation work by ourselves rather than clients. They only need to specify device as PrivateUse1 to run the code which is available in CPU or CUDA, instead of adding strides everywhere in the code.

# Client code in ``CUDA``
torch.cuda.set_device(i)
...
b = a.clone()
...


# Client code in ``PrivateUse1``
# Only a few times
torch.privateuse1.set_device(i)
...
# Many times
b = a[xxx:yyy].clone()
...

It will not affect existing torch logic, but merely provide an interface to PrivateUse1 device.

prepare_for_save = None

def register_prepare_for_save(prepare_for_save_fp):
    prepare_for_save = prepare_for_save_fp

def save(
    obj: object,
    f: FILE_LIKE,
    pickle_module: Any = pickle,
    pickle_protocol: int = DEFAULT_PROTOCOL,
    _use_new_zipfile_serialization: bool = True,
    _disable_byteorder_record: bool = False
) -> None:
    if prepare_for_save is not None:
        prepare_for_save(obj)
    ...

ezyang · 2024-02-19T01:29:51Z

In general, users are expected not to deal explicitly with strides. If I add two values with unusual striding, I will typically preserve this striding (but make the result contiguous). If you are doing work to automatically insert padding when you predict it is necessary, this would be the same as just making functions output non-standard strides differently than pytorch on cpu/cuda.

Look, if you tell me, "Edward, I understand this is hypothetically better, but we already wrote lots of code managing different storage padding at the storage level, and we'd have to refactor all of it which we don't have time to do" I'd be like, OK, fine, this seems like a decent tradeoff to make if you've painted yourself into this situation, but then, what's so bad about an extra monkey patch on top. If we're putting something into PyTorch core, we're going to do it right, because we're going to have to maintain it forever afterwards.

CLiqing changed the title ~~Is it possble to add one registerable api at the beginning of torch.save~~ Is it possble to add a registerable api at the beginning of torch.save Jan 19, 2024

CLiqing closed this as completed Jun 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is it possble to add a registerable api at the beginning of torch.save #117840

Is it possble to add a registerable api at the beginning of torch.save #117840

CLiqing commented Jan 19, 2024 •

edited

Loading

malfet commented Jan 19, 2024

CLiqing commented Jan 23, 2024

ezyang commented Jan 23, 2024

CLiqing commented Jan 25, 2024 •

edited

Loading

ezyang commented Jan 25, 2024

CLiqing commented Jan 27, 2024

ezyang commented Jan 29, 2024

CLiqing commented Jan 30, 2024 •

edited

Loading

ezyang commented Jan 31, 2024

CLiqing commented Feb 2, 2024 •

edited

Loading

CLiqing commented Feb 6, 2024

ezyang commented Feb 7, 2024

CLiqing commented Feb 7, 2024 •

edited

Loading

ezyang commented Feb 8, 2024

CLiqing commented Feb 9, 2024

CLiqing commented Feb 16, 2024

ezyang commented Feb 16, 2024

CLiqing commented Feb 17, 2024 •

edited

Loading

ezyang commented Feb 19, 2024

Is it possble to add a registerable api at the beginning of torch.save #117840

Is it possble to add a registerable api at the beginning of torch.save #117840

Comments

CLiqing commented Jan 19, 2024 • edited Loading

🚀 The feature, motivation and pitch

Alternatives

Additional context

malfet commented Jan 19, 2024

CLiqing commented Jan 23, 2024

ezyang commented Jan 23, 2024

CLiqing commented Jan 25, 2024 • edited Loading

ezyang commented Jan 25, 2024

CLiqing commented Jan 27, 2024

ezyang commented Jan 29, 2024

CLiqing commented Jan 30, 2024 • edited Loading

ezyang commented Jan 31, 2024

CLiqing commented Feb 2, 2024 • edited Loading

CLiqing commented Feb 6, 2024

ezyang commented Feb 7, 2024

CLiqing commented Feb 7, 2024 • edited Loading

ezyang commented Feb 8, 2024

CLiqing commented Feb 9, 2024

CLiqing commented Feb 16, 2024

ezyang commented Feb 16, 2024

CLiqing commented Feb 17, 2024 • edited Loading

ezyang commented Feb 19, 2024

CLiqing commented Jan 19, 2024 •

edited

Loading

CLiqing commented Jan 25, 2024 •

edited

Loading

CLiqing commented Jan 30, 2024 •

edited

Loading

CLiqing commented Feb 2, 2024 •

edited

Loading

CLiqing commented Feb 7, 2024 •

edited

Loading

CLiqing commented Feb 17, 2024 •

edited

Loading