Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: ability to re-upload a file from given url #33

Closed
dzek69 opened this issue Sep 25, 2021 · 13 comments · Fixed by #80
Closed

Feature request: ability to re-upload a file from given url #33

dzek69 opened this issue Sep 25, 2021 · 13 comments · Fixed by #80

Comments

@dzek69
Copy link
Contributor

dzek69 commented Sep 25, 2021

I want to use for example sendMediaGroup with an URL but I don't want grammy to pass the URL to Telegram but instead in the background download the file and upload it as a new file

why?

Telegram fetches the media information from URL and caches it. Small enough MP4 files with no audio are considered Animations. MP4 with audio will be a Video.

I want to send an album with two mp4 files, one has audio and one has not, so they are Video + Animation. And Telegram does not allow me to send them as media group as Media Groups cannot contain Animations.

If i upload files from disk with InputFile then it's ok:

ctx.replyWithMediaGroup([
    {
        type: "video",
        media: new InputFile("/tmp/evv/no-audio.mp4"),
    },
    {
        type: "video",
        media: new InputFile("/tmp/evv/yes-audio.mp4"),
    },
])

Telegram won't use it's cache on uploads and will consider my mp4 files a Videos.

If I use URL:

ctx.replyWithMediaGroup([
    {
        type: "video",
        media: "https://dev.nitra.pl/no-audio.mp4",
    },
    {
        type: "video",
        media: "https://dev.nitra.pl/yes-audio.mp4",
    },
])

Telegram will return:

{
  "ok": false,
  "error_code": 400,
  "description": "Bad Request: wrong file identifier/HTTP URL specified"
}

Proposed change could be useful in other cases as well - like my bot has access to some url, but telegram servers don't - useful with local http-based microservices.

I can of course download the file manually but it would be a much smoother experience to use something like that:

ctx.replyWithMediaGroup([
    {
        type: "video",
        media: new ReuploadFile("https://dev.nitra.pl/no-audio.mp4"),
    },
    {
        type: "video",
        media: new ReuploadFile("https://dev.nitra.pl/yes-audio.mp4"),
    },
])
@KnorpelSenf
Copy link
Member

KnorpelSenf commented Sep 29, 2021

That sounds reasonable. It will not be a second type (ReuploadFile) but rather a new way to instantiate InputFile. Ideas:

  • Permit URL objects in constructor (branch over file:https:// vs. https?:https:// protocol!)
  • Permit { url: string } objects in constructor
  • Add static methods such as InputFile.download(url: string) and the like
  • more?

We could also have several supported ways. Uploading files should be easy and I don't mind giving people a choice here.

@dzek69
Copy link
Contributor Author

dzek69 commented Sep 29, 2021

I don't mind the exact implementation, it was just a suggestion. Passing instance of URL or { url: string } sound the best to me.

Also I think it's important to not start the download process until the message is actually being send. Like if I use throttler to slow down sending messages and I "sent" like 20 of them (and they get queued) I don't want files to be downloaded immediately when I set them to be sent but when the queue gets to the message with files.

@KnorpelSenf
Copy link
Member

KnorpelSenf commented Sep 29, 2021

it's important to not start the download process until the message is actually being send

This will be achieved automatically because grammY features a custom multipart/form-data implementation that collects all resources lazily when they are needed. For example, if you want to send ten files in a media group with 50 MB each (so 500 MB in one request), then grammY will not pull all files into memory at the same time, and then start streaming them up. Instead, only one file at a time is downloaded, and only a small portion of the file is kept in a buffer at all times.

@KnorpelSenf
Copy link
Member

I think we have to accept that files can be downloaded automatically several times if a request needs to be retried. This does cause redundant traffic, but I don't see any good alternative.

@dzek69
Copy link
Contributor Author

dzek69 commented Sep 29, 2021

This doesn't sound like an unsolvable problem from general programming perspective, but may be hard to implement due to grammY internals, I know nothing about grammY internal code.

I think most users will accept that downside and maybe something could be done to fix that in the future.

@KnorpelSenf
Copy link
Member

If the grammY internals would not matter, what would your solution be? Write the file to disk? Keep in RAM? We somehow have to remember the already downloaded data without OOMing the server.

@dzek69
Copy link
Contributor Author

dzek69 commented Oct 17, 2021

I somehow missed your reply earlier. I guess writing to disk is the best solution. I personally would like to reupload ~100MB files (few of them, when send as album). Wasting RAM for that doesn't sound good.

In RAM we can keep just the pointers to file path.

Meanwhile, while this is an early idea stage: we may think about how flexible this would be. Some servers will deny access for non-browser User Agents, some will deny access without Referer from the same domain being set. Some files may be accessible via POST request.

Are we gonna make this flexible and allow setting method and headers and maybe body, which will make it more ugly to use or we will allow just simple GET links with default headers and for any other use case user would have to manually download the file anyway?

@KnorpelSenf
Copy link
Member

I guess writing to disk is the best solution.

That's totally possible to do. Either way, caching the file is optional and explicit, i.e. the file path must be specified..

How are you going to handle it if a user creates a single instance of an InputFile, and passes it to several API calls? Does this reuse the cached file? If so, that means we cannot clear the cache after a successful request. We can't even do it after a successful update handling. Should the file be deleted again automatically, and if so, when?

Are we gonna make this flexible and allow setting method and headers and maybe body

We could expose the same type that's used for the base fetch configuration. That would allow users to simply pass their fetch config object, and we'll forward it to fetch. That way, we automatically get support for proxies etc. I agree that it starts to look more verbose for the user at this point, but I think that's acceptable because the complexity of the problem increases. Nonetheless, it should still look pretty if you just want a simple GET request without config.

@dzek69
Copy link
Contributor Author

dzek69 commented Nov 8, 2021

Sorry for being late again @KnorpelSenf, I looked at the merged code and I see no fetch config feature (so no POST, no headers), so I guess we can keep this open meanwhile?

@dzek69
Copy link
Contributor Author

dzek69 commented Nov 8, 2021

I will take a look at the questions you asked in last comment in the evening

@dzek69
Copy link
Contributor Author

dzek69 commented Nov 9, 2021

How are you going to handle it if a user creates a single instance of an InputFile, and passes it to several API calls? Does this reuse the cached file? If so, that means we cannot clear the cache after a successful request. We can't even do it after a successful update handling. Should the file be deleted again automatically, and if so, when?

We can skip the cache for now, but if we want to use it I guess we should count how many times it was used (bump the counter each time some function like sendPhoto receives it) and after successfull (or failed) request we should decrease the counter. Files should be cached until counter is zero.

@KnorpelSenf
Copy link
Member

so no POST, no headers

This is described in #80. Please read the PR description. It includes a simple workaround.

we should count how many times it was used

I do not see how this logic adds up. If a user tries to send a file, and the request fails, and then does not want to try again, we are polluting the cache with more and more files. We cannot guess how grammY will be used.

Another point is that the transfer may fail because the server returned a wrong file, e.g. a photo that is too large. When the bot tries the request again, grammY must fetch the new file. It must not used the wrong cached version. How are we going to detect this?

It is much easier to leave caching to user land code. Remember that we already have a files plugin (https://grammy.dev/plugins/files) that can store files locally. This makes it trivially easy to download a file, and to try several requests based on the file path.

@dzek69
Copy link
Contributor Author

dzek69 commented Nov 9, 2021

Oh, i missed the description

I guess you are right with the caching - let's leave that to user

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants