Feature request: ability to re-upload a file from given url #33

dzek69 · 2021-09-25T10:20:08Z

I want to use for example sendMediaGroup with an URL but I don't want grammy to pass the URL to Telegram but instead in the background download the file and upload it as a new file

why?

Telegram fetches the media information from URL and caches it. Small enough MP4 files with no audio are considered Animations. MP4 with audio will be a Video.

I want to send an album with two mp4 files, one has audio and one has not, so they are Video + Animation. And Telegram does not allow me to send them as media group as Media Groups cannot contain Animations.

If i upload files from disk with InputFile then it's ok:

ctx.replyWithMediaGroup([
    {
        type: "video",
        media: new InputFile("/tmp/evv/no-audio.mp4"),
    },
    {
        type: "video",
        media: new InputFile("/tmp/evv/yes-audio.mp4"),
    },
])

Telegram won't use it's cache on uploads and will consider my mp4 files a Videos.

If I use URL:

ctx.replyWithMediaGroup([
    {
        type: "video",
        media: "https://dev.nitra.pl/no-audio.mp4",
    },
    {
        type: "video",
        media: "https://dev.nitra.pl/yes-audio.mp4",
    },
])

Telegram will return:

{
  "ok": false,
  "error_code": 400,
  "description": "Bad Request: wrong file identifier/HTTP URL specified"
}

Proposed change could be useful in other cases as well - like my bot has access to some url, but telegram servers don't - useful with local http-based microservices.

I can of course download the file manually but it would be a much smoother experience to use something like that:

ctx.replyWithMediaGroup([
    {
        type: "video",
        media: new ReuploadFile("https://dev.nitra.pl/no-audio.mp4"),
    },
    {
        type: "video",
        media: new ReuploadFile("https://dev.nitra.pl/yes-audio.mp4"),
    },
])

The text was updated successfully, but these errors were encountered:

KnorpelSenf · 2021-09-29T12:30:48Z

That sounds reasonable. It will not be a second type (ReuploadFile) but rather a new way to instantiate InputFile. Ideas:

Permit URL objects in constructor (branch over file:https:// vs. https?:https:// protocol!)
Permit { url: string } objects in constructor
Add static methods such as InputFile.download(url: string) and the like
more?

We could also have several supported ways. Uploading files should be easy and I don't mind giving people a choice here.

dzek69 · 2021-09-29T12:42:40Z

I don't mind the exact implementation, it was just a suggestion. Passing instance of URL or { url: string } sound the best to me.

Also I think it's important to not start the download process until the message is actually being send. Like if I use throttler to slow down sending messages and I "sent" like 20 of them (and they get queued) I don't want files to be downloaded immediately when I set them to be sent but when the queue gets to the message with files.

KnorpelSenf · 2021-09-29T13:19:06Z

it's important to not start the download process until the message is actually being send

This will be achieved automatically because grammY features a custom multipart/form-data implementation that collects all resources lazily when they are needed. For example, if you want to send ten files in a media group with 50 MB each (so 500 MB in one request), then grammY will not pull all files into memory at the same time, and then start streaming them up. Instead, only one file at a time is downloaded, and only a small portion of the file is kept in a buffer at all times.

KnorpelSenf · 2021-09-29T13:21:43Z

I think we have to accept that files can be downloaded automatically several times if a request needs to be retried. This does cause redundant traffic, but I don't see any good alternative.

dzek69 · 2021-09-29T13:50:20Z

This doesn't sound like an unsolvable problem from general programming perspective, but may be hard to implement due to grammY internals, I know nothing about grammY internal code.

I think most users will accept that downside and maybe something could be done to fix that in the future.

KnorpelSenf · 2021-09-29T13:56:33Z

If the grammY internals would not matter, what would your solution be? Write the file to disk? Keep in RAM? We somehow have to remember the already downloaded data without OOMing the server.

dzek69 · 2021-10-17T09:18:56Z

I somehow missed your reply earlier. I guess writing to disk is the best solution. I personally would like to reupload ~100MB files (few of them, when send as album). Wasting RAM for that doesn't sound good.

In RAM we can keep just the pointers to file path.

Meanwhile, while this is an early idea stage: we may think about how flexible this would be. Some servers will deny access for non-browser User Agents, some will deny access without Referer from the same domain being set. Some files may be accessible via POST request.

Are we gonna make this flexible and allow setting method and headers and maybe body, which will make it more ugly to use or we will allow just simple GET links with default headers and for any other use case user would have to manually download the file anyway?

KnorpelSenf · 2021-10-17T10:09:02Z

I guess writing to disk is the best solution.

That's totally possible to do. Either way, caching the file is optional and explicit, i.e. the file path must be specified..

How are you going to handle it if a user creates a single instance of an InputFile, and passes it to several API calls? Does this reuse the cached file? If so, that means we cannot clear the cache after a successful request. We can't even do it after a successful update handling. Should the file be deleted again automatically, and if so, when?

Are we gonna make this flexible and allow setting method and headers and maybe body

We could expose the same type that's used for the base fetch configuration. That would allow users to simply pass their fetch config object, and we'll forward it to fetch. That way, we automatically get support for proxies etc. I agree that it starts to look more verbose for the user at this point, but I think that's acceptable because the complexity of the problem increases. Nonetheless, it should still look pretty if you just want a simple GET request without config.

dzek69 · 2021-11-08T09:29:29Z

Sorry for being late again @KnorpelSenf, I looked at the merged code and I see no fetch config feature (so no POST, no headers), so I guess we can keep this open meanwhile?

dzek69 · 2021-11-08T09:29:55Z

I will take a look at the questions you asked in last comment in the evening

dzek69 · 2021-11-09T11:40:18Z

How are you going to handle it if a user creates a single instance of an InputFile, and passes it to several API calls? Does this reuse the cached file? If so, that means we cannot clear the cache after a successful request. We can't even do it after a successful update handling. Should the file be deleted again automatically, and if so, when?

We can skip the cache for now, but if we want to use it I guess we should count how many times it was used (bump the counter each time some function like sendPhoto receives it) and after successfull (or failed) request we should decrease the counter. Files should be cached until counter is zero.

KnorpelSenf · 2021-11-09T11:55:43Z

so no POST, no headers

This is described in #80. Please read the PR description. It includes a simple workaround.

we should count how many times it was used

I do not see how this logic adds up. If a user tries to send a file, and the request fails, and then does not want to try again, we are polluting the cache with more and more files. We cannot guess how grammY will be used.

Another point is that the transfer may fail because the server returned a wrong file, e.g. a photo that is too large. When the bot tries the request again, grammY must fetch the new file. It must not used the wrong cached version. How are we going to detect this?

It is much easier to leave caching to user land code. Remember that we already have a files plugin (https://grammy.dev/plugins/files) that can store files locally. This makes it trivially easy to download a file, and to try several requests based on the file path.

dzek69 · 2021-11-09T12:03:20Z

Oh, i missed the description

I guess you are right with the caching - let's leave that to user

KnorpelSenf mentioned this issue Nov 6, 2021

Support creating InputFile instances from URLs #80

Merged

KnorpelSenf closed this as completed in #80 Nov 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: ability to re-upload a file from given url #33

Feature request: ability to re-upload a file from given url #33

dzek69 commented Sep 25, 2021

KnorpelSenf commented Sep 29, 2021 •

edited

Loading

dzek69 commented Sep 29, 2021

KnorpelSenf commented Sep 29, 2021 •

edited

Loading

KnorpelSenf commented Sep 29, 2021

dzek69 commented Sep 29, 2021

KnorpelSenf commented Sep 29, 2021

dzek69 commented Oct 17, 2021

KnorpelSenf commented Oct 17, 2021

dzek69 commented Nov 8, 2021

dzek69 commented Nov 8, 2021

dzek69 commented Nov 9, 2021

KnorpelSenf commented Nov 9, 2021

dzek69 commented Nov 9, 2021

Feature request: ability to re-upload a file from given url #33

Feature request: ability to re-upload a file from given url #33

Comments

dzek69 commented Sep 25, 2021

KnorpelSenf commented Sep 29, 2021 • edited Loading

dzek69 commented Sep 29, 2021

KnorpelSenf commented Sep 29, 2021 • edited Loading

KnorpelSenf commented Sep 29, 2021

dzek69 commented Sep 29, 2021

KnorpelSenf commented Sep 29, 2021

dzek69 commented Oct 17, 2021

KnorpelSenf commented Oct 17, 2021

dzek69 commented Nov 8, 2021

dzek69 commented Nov 8, 2021

dzek69 commented Nov 9, 2021

KnorpelSenf commented Nov 9, 2021

dzek69 commented Nov 9, 2021

KnorpelSenf commented Sep 29, 2021 •

edited

Loading

KnorpelSenf commented Sep 29, 2021 •

edited

Loading