-
-
Notifications
You must be signed in to change notification settings - Fork 159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Very slow pixel-conversion inside glTexture2D (texture.c) #265
Comments
Ah, yes, sorry for not responding on that. Texture upload on GLES hardware is very limited compared to what OpenGL can do, so gl4es has a complicated scheme to handle that.
The 2 and 3 cases also will use a malloc to create a new texure in a compatible format. I can add a "fast path" with source == If the speed issue is really the malloc, that something else I need to work on (creation of a pair of scratch texture in glstate structure for all conversion, because I sometimes needs 2 when conversion also include resizing) |
Hi, thanks for your answer!
We may try firstly that way for sake of tests, if speed will be the same as GL_UNSIGNED_BYTE, then all fine. But as I say it does not matter if the texture is 10 kilobytes of 3 megabytes, speed drops are the same. Meaning that it's not actual conversion takes time, but some "handler" of conversion (malloc/free i.e. heap ?)
IMHO if the "fast path" theme will not work, and speed still will be slow, then we can be sure it "malloc/free" probably and can follow that way. In the worse case, Daniel says he can add a conversion in ogles2.library, so no conversion needs for GL4ES, and he says it will cost nothing in terms of speed, so probably speed drops because of conversion should be not that big as we have now, just milliseconds. But of course, it's better if things will be handled inside GL4ES, as it will benefit everyone. If you can create some test-branch with "fast path" I can test it right away :) So we will know if it path-conversion issue, or just general handling (malloc/free). But that constant speed drop not related to the size of texture somehow means it not the "path route" issue. IMHO. |
… GL_UNSIGNED_BYTE conversion (to help #265)
I tried to add that fast path. I'm unsure it will trigger, so please add a printf in line 866 of |
Saying pixel.c:853 |
I meant |
Compiles fine, but prinfs didn't showups there, so it not triggered and same slow then :) |
Do you know what is the exact conversion going on ? (that would be the commented printf in line 780 to get that info) |
Uncommeted that prinf and for each loaded texture i have:
|
2 conversions for each textures? Mmmm, I need to check. Texture conversion code is a bit of a mess, because there are many special case to handle for many games... |
Can you activate debug log in |
you mean just |
If so, then for each texture i had:
|
Ok thanks, I'll check how to improve this |
… used 2 pass where 1 would be enough (to help #265)
Ok, I pushed something, it should be better now. |
Omg, you did it! Everything the same fast with EDT_OPENGL as if I just use EDT_SOFTWARE in IrrLicht in terms of texture loading! Yeah! So it wasn't then free/malloc ? Btw, I noticed another issue with the latest gl4es, it happens at least with IrrLicht, but maybe a general one: on loading it says about pixel/vertex shader compilation failed at positions 585 and 605 for vertex and 580 and 586 at the pixel, saying "invalid token". But I need to do more tests to see if it now only with IrrLicht, or general and will create a separate topic for (and btw, no more ARGB errors with Irrlich, you were right:) ) |
Ok good 👍 |
Oh, one more thing before we close it: Daniel find out that Nova supports it all natively (those types), and so he added things to ogles2 so no need for conversion. That what he wrote in the readme:
And that means that it will be faster than any conversion inside of gl4es. And that speed increase really need it: even if it will speed up things just by 20%, that for our machines will be pretty cool :) But this all also untested. So, can you please add another flag to the build process, like "AMIGAOS4_NATIVE_TYPES=1" maybe, or maybe some ifdef (without worry about build-process changes for now), so we can not use conversion for all those types, but set them directly for amigaos4 only? By that, I can test ogles2 in those terms, and if it will be faster, we can keep it (via ifdefs, of course, so other PPC-machines will have what we have now, while on amigaos4 we will have it native and without conversion at all, for all those formats). So those ones types which gl4es used will be really cool to have ifdefed to not use conversion at all, but just directly our formats. Can you do so if it not so much hassle? Thanks a bunch! |
PS. I think at least formats like Or that all can be still used and redirected all of them, so no conversion will be for any of them? |
Thinking more about it, we IMHO no need any new define in the CMAKE or something, we just need to add it via pure AMIGAOS4 ifdef. I.e. keep BIG_ENDIAN where it is (for other possible PPC machines), but in all the parts where currently you do texture conversion, just add AMIGAOS4 ifdef and set it to those types, so no conversion for AMIGAOS4 will be done only. For other PPC machines it will still have a place. I checked Irrlicht code in COpenGLTexture.cpp, and those types they surely handle:
So |
Yeah ok, I'll the support for those format, according to the extension string defined by Daniel. |
Thanks a bunch! Just probably in yesterday's part where we add ifdef BIG_ENDIAN in pixel.c should be something like |
No need. Once I add the support for those format, convert will not be called for those format anyway. |
Ah, that is better. So if the format is caught, then it handled, if not, then usually PPC-way. Good! |
Ok, I have added the formats (well the RGB332 are not used by gl4es, but I've never encountered anything using it). |
Tested : builds fine, on running it give us:
But when I run Irrlichts test case over it: everything just fully green. Probably need to add some debugs to see if things work as expected from the gl4es side? Maybe again in pixel.c and texture.c enable debugs ? |
I tried an older version of ogles2.library, where we didn't have those things implemented, and all renders correct (so go old conversion way). Now to understand : is it something in gl4es or in ogles2 need dealing with :) Maybe still some big-endian ifdefs play a role there while should't ? |
I added support for those format: when they are encounter, I dont try to convert them and just use them. So I send those format to gles2 lib "as-is". I don't know, gl4es doesn't do anything here in fact. |
Aha ok, will drop that info on Daniel (and big thanks for worry!) |
Ok, Daniel fixes it, it was just some issue about Swizzles, now everything fine! Through speed not increased at all, which kind of strange too. Imho conversion still takes some time, and when we without it should be a little bit faster anyway? I added printfs to the pixel.c , where we yesterday add big_endian ifdefs, and they surely not called. So .. it can be that speed differences between your conversion, and no conversion at all even not visibly and there is nothing else we can do out of it. But anyway, have no conversion anyway better and hope somewhere faster. Thanks a bunch! Some first part of donation on the way! |
The "fast path" conversion is pretty fast. I assume the actual texture upload to the GPU memory take much more time than the fast-path conversion (as you CPU do integer stuff quite fast), the time to convert or not is negligeable compared to the upload to gpu mem... |
Hi ptitSeb!
As you may notice in the closed Irrlicht Engine issue, I found out that something very weird and slow happens when we load textures to VRAM. And does not matter the size of the textures: 3MB or just 10KB, it always the same constantly slow.
We with Daniel spend some hours on debugging and find out, that when we call glTexImage2D() with something like 8-8-8-8-REV or even 1-5-5-5-REV, the speed of loading textures drops in a factor of 6. When we use something like GL_UNSIGNED_BYTE (so no conversion happens) then it's fast.
By massive speed drop, I mean: GL_UNSIGNED_BYTE takes 1 second for load up 10 10-20kb textures, and GL_UNSIGNED_8_8_8_8_REV for the same 10 textures texture takes 6 (!) seconds. Imagine what happens when we load 30 textures :)
What we do next, is we just grab the latest GL4ES, and replace in texture.c all big_endian ifdefs on some crap, so do not call conversion and the same test code with 8_8_8_8_REV start to be very fast as expected.
Now, we have a wild guess that this is probably due to the heap activity (malloc/free). Because speed drop is constant, and do not relate at all to the size of textures (at least between 10kb and 3MB of tested ones). That just a guess, of course. And such conversions as done in texture.c by themselves should be a matter of some milliseconds, not like it now 6 seconds for 10 textures :) That why we think about heap activity being an issue there.
Probably we can solve it with one of those solutions:
1). create and use a big (eventually growing) per-context scratch-buffer for stuff like that.
2). for smaller temporary buffers, use dynamic allocation on the stack.
Currently, we have no needs to worry about all the types for tests, as 8_8_8_8_REV is one which already dead slow :)
Of course, I will be glad to donate to a solution with no problems. And other PPC machines (and not only, but any for which need conversion, just currently PPC) will benefit too :)
Thanks!
The text was updated successfully, but these errors were encountered: