Guest Albertri Posted September 18, 2009 Report Share Posted September 18, 2009 (edited) Chainfire, Hm, you're right, I tried to rewrite Screentex with GLES2, and it is not very fast... Though, it runs much better than v1! The slowest part in the v1 version, by the way, wass rasterization: when the quad is rotated so that the viewer lies in its plane, the speed is quite good, but when it faces the viewer, performance is awful. I know that the Omnia's hardware does blitting and streching quite good (just try CorePlayer in DirectDraw mode, and compare it to RAW Framebuffer mode), so there's definetly seem to be something wrong with the driver. And the v2 version has some hiccups with any quad rotation. On the other hand, Albertri said two pages ago that Screentex (v1) runs smoothly on H3 (and I cannot say that about my G6). And, in the H6 topic, FerdiBorbon reports surprisingly good performance in Cube interface. I think I should test Screentex (both v1 and v2) on a newer firmware. took some vid of screentex on my O2 :) http://www.youtube.com/watch?v=5vOzuyYvzZI Edited September 18, 2009 by Albertri Link to comment Share on other sites More sharing options...
Guest Chainfire Posted September 18, 2009 Report Share Posted September 18, 2009 GinKage yes I had suspected this, which is why I am waiting on my new firmwares and devices. It's taking longer than expected. I might try to find RAWs or dump from the downloads of the ROMs posted here on the board. I'm still not sure I have all the correct DLLs though, which is why trying new drivers on my old firmware may still prove a waste of time. Note that in a debugger you can track a number of warning messages from the drivers, though I'm not sure if these are supposed to be there or are really pointing to a problem. The video Albertri posted is 'inconclusive' for me to see if it is really faster or not. I will do some more stuff, but I'm not going to post anything conclusive until I get my new firmwares and devices. Link to comment Share on other sites More sharing options...
Guest GinKage Posted September 18, 2009 Report Share Posted September 18, 2009 Nice, Albertri. :) Well, you may want to try v2 as well: http://www.4shared.com/file/133607470/7f3f...Screentex2.html Oh, by the way: I noticed that v2 apps run more smoothly if the power cable is attached. Which is, of course, not a big surprise. Chainfire, is there something that stops you from, well, flashing one of the newer firmwares available in this forum? Even G1 is here, so you will have a chance to revert to it if you want. Link to comment Share on other sites More sharing options...
Guest Chainfire Posted September 18, 2009 Report Share Posted September 18, 2009 I had not noticed this power thing. Have you tried comparing with and without cable, but the latter putting the device in "high performance" mode through the menu, and see if there is still a difference? Link to comment Share on other sites More sharing options...
Guest GinKage Posted September 18, 2009 Report Share Posted September 18, 2009 Good point. Yes, it seems that in High performance mode gles2 apps run just as well. It means that in Auto mode with a power cable attached, device in fact disables power saving and acts as in High mode. Link to comment Share on other sites More sharing options...
Guest Albertri Posted September 18, 2009 Report Share Posted September 18, 2009 Good point. Yes, it seems that in High performance mode gles2 apps run just as well. It means that in Auto mode with a power cable attached, device in fact disables power saving and acts as in High mode. screentex V2 still runs smoothly on my Omnia 2 (H3) it feels more rubbery not as loose as V1 but i did notice the graphics are rendered cleaner (or maybe just my imagination) Oh my devices is not plug in to USB or power source but I always had my setting to high (Performance) Sorry for the vid using my wife E63 vid res is very low :) http://www.youtube.com/watch?v=d_IlR4gWFHI Link to comment Share on other sites More sharing options...
Guest Yohng Posted September 18, 2009 Report Share Posted September 18, 2009 (edited) message deleted Edited September 18, 2009 by Yohng Link to comment Share on other sites More sharing options...
Guest NuShrike Posted September 21, 2009 Report Share Posted September 21, 2009 Btw, since the Acer m900 is basically the OmniaPRO, with less ram and polish, these same drivers also work on the Acer m900. Link to comment Share on other sites More sharing options...
Guest Chainfire Posted September 21, 2009 Report Share Posted September 21, 2009 Can I get FPS quotes please on different ROMs ? Attached is Triangle2 program. Let is run for a minute or so, then tap the screen and it will show FPS. Note ActiveSync syncing will definitely slow this down horribly :)Triangle2.zip Link to comment Share on other sites More sharing options...
Guest GinKage Posted September 21, 2009 Report Share Posted September 21, 2009 (edited) G6: 31 fps in Auto mode with power cable attached, 49 in High. Edited September 21, 2009 by GinKage Link to comment Share on other sites More sharing options...
Guest Chainfire Posted September 21, 2009 Report Share Posted September 21, 2009 (edited) E2 + G6 drivers: 50 fps (high) Note that we should be seeing higher values... Looking forward to seeing H3, H6 and H9 values. Edited September 21, 2009 by Chainfire Link to comment Share on other sites More sharing options...
Guest bluhound Posted September 21, 2009 Report Share Posted September 21, 2009 G7 ROM On High Performance: 49 fps On Auto: 32fps Link to comment Share on other sites More sharing options...
Guest Shadowy Posted September 21, 2009 Report Share Posted September 21, 2009 H5 - 49 with power cable connected, 31 when disconnected. Link to comment Share on other sites More sharing options...
Guest wakeupneo Posted September 21, 2009 Report Share Posted September 21, 2009 Can I get FPS quotes please on different ROMs ? Attached is Triangle2 program. Let is run for a minute or so, then tap the screen and it will show FPS. Note ActiveSync syncing will definitely slow this down horribly :) 45 FPS on H3 without any additional drivers. Link to comment Share on other sites More sharing options...
Guest Chainfire Posted September 21, 2009 Report Share Posted September 21, 2009 (edited) That's all definitely slow :) Let hope a H9 user can bring us some good news. Cause with the speed I'm seeing, GL is pretty much useless. Edited September 21, 2009 by Chainfire Link to comment Share on other sites More sharing options...
Guest GinKage Posted September 21, 2009 Report Share Posted September 21, 2009 By the way... I looked into libGLESv1_CM.dll a bit, and the first two shaders in that file are: #------------------------------------------------- # ORION - OpenGL ES 2.0 Shading Language Compiler # SAMSUNG INDIA SOFTWARE OPERATIONS PVT. LTD. # Compiler Version : v04.00.00.alpha.01 # Release Date : 16.04.2008 # FIMG VERSION : FIMGv1.5 # Optimizer Options : -Oxp #------------------------------------------------- ps_3_0 fimg_version 0x01020000 dcl_f2_TexDim0 c2.x dcl_s2_TexImages0 s0 dcl_f4_TexCoord0 v1.x dcl_f4_FrontColor v0.x label start mul r0.x, vPos.w, vPos.w mad r1.xyzw, vPos.yyxx, d1.wwww, d1.wwww mul r1.xyzw, r1.xyzw, c2.xyxy maxcomp r1.w, |r1.xyzw| mul r1.w, r1.w, r0.x log r1.w, r1.w mov r1.xyz, v1.xyz # TexCoord0=v1.xyz texld r1.xyzw, r1.xyzw, s0 # TexImages0=s0 mul_sat oColor, v0.xyzw, r1.xyzw # gl_FragColor=oColor, FrontColor=v0.xyzw label main_end ret # 10 instructions and #------------------------------------------------- # ORION - OpenGL ES 2.0 Shading Language Compiler # SAMSUNG INDIA SOFTWARE OPERATIONS PVT. LTD. # Compiler Version : v04.00.00.alpha.01 # Release Date : 16.04.2008 # FIMG VERSION : FIMGv1.5 # Optimizer Options : -Oxp #------------------------------------------------- ps_3_0 fimg_version 0x01020000 dcl_f2_TexDim0 c2.x dcl_s2_TexImages0 s0 dcl_f4_TexCoord0 v0.x label start mul r0.x, vPos.w, vPos.w mad r1.xyzw, vPos.yyxx, d0.wwww, d0.wwww mul r1.xyzw, r1.xyzw, c2.xyxy maxcomp r1.w, |r1.xyzw| mul r1.w, r1.w, r0.x log r1.w, r1.w mov r1.xyz, v0.xyz # TexCoord0=v0.xyz texld_sat oColor.xyzw, r1.xyzw, s0 # TexImages0=s0 label main_end ret # 9 instructions This probably doesn't look too weird if you haven't seen what ORION compiler (I have a bit newer version from Cube20) generates on, for example, such a high-level shader: #ifdef GL_ES precision highp float; #endif uniform sampler2D TexImages0; varying vec4 texC; void main() { gl_FragColor = texture2D(TexImages0, texC.xy); } Ready to behold? Watch: #------------------------------------------------- # ORION - OpenGL ES 2.0 Shading Language Compiler # SAMSUNG INDIA SOFTWARE OPERATIONS PVT. LTD. # Compiler Version : v04.00.03 # Release Date : 30.06.2008 # FIMG VERSION : FIMGv1.5 # Optimizer Options : -Oxp #------------------------------------------------- ps_3_0 fimg_version 0x01020000 dcl_f2_TexDim0 c2.x dcl_s2_TexImages0 s0 dcl_f4_texC v0.x def c0, 0.0, 0.0, 0.0, 0.0 def c1, 1.0, 1.0, 1.0, 1.0 label start label main_ mul r1.x, vPos.w, vPos.w mad r2.xyzw, vPos.yyxx, d0.wwww, d0.wwww mul r2.xyzw, r2.xyzw, c2.xyxy maxcomp r2.w, |r2.xyzw| mul r2.w, r2.w, r1.x log r2.w, r2.w mov r2.xyz, v0.xyz # texC=v0.xyz texld r2.xyzw, r2.xyzw, s0 # TexImages0=s0 mov r0.xyzw, r2.xyzw # gl_FragColor=r0.xyzw label main_end mov_sat oColor.xyzw, r0.xyzw ret # 11 instructions, 3 C regs, 3 R regs Why? For another example, if you change the (one and only) instruction in higher-level language to this: gl_FragColor = texture2D(TexImages0, texC.xy*1.0); you'll something way better: #------------------------------------------------- # ORION - OpenGL ES 2.0 Shading Language Compiler # SAMSUNG INDIA SOFTWARE OPERATIONS PVT. LTD. # Compiler Version : v04.00.03 # Release Date : 30.06.2008 # FIMG VERSION : FIMGv1.5 # Optimizer Options : -Oxp #------------------------------------------------- ps_3_0 fimg_version 0x01020000 dcl_s2_TexImages0 s0 dcl_f4_texC v0.x def c0, 0.0, 0.0, 0.0, 0.0 def c1, 1.0, 1.0, 1.0, 1.0 def c2, 1.000000, 0, 0, 0 def c3, 1.0, 0.0, 0.0, 0.0 label start label main_ mul r1.xy, v0.xy, c2.xx # texC=v0.xy mul r2.xyzw, c3.xxyy, r1.xyyy texld r2.xyzw, r2.xyzw, s0 # TexImages0=s0 mov r0.xyzw, r2.xyzw # gl_FragColor=r0.xyzw label main_end mov_sat oColor.xyzw, r0.xyzw ret # 6 instructions, 4 C regs, 3 R regs But still imperfect. Try disabling "experimental optimizations" and compile with -O instead of -Oxp, and you'll get something more nice and almost clean: #------------------------------------------------- # ORION - OpenGL ES 2.0 Shading Language Compiler # SAMSUNG INDIA SOFTWARE OPERATIONS PVT. LTD. # Compiler Version : v04.00.03 # Release Date : 30.06.2008 # FIMG VERSION : FIMGv1.5 # Optimizer Options : -O #------------------------------------------------- ps_3_0 fimg_version 0x01020000 dcl_s2_TexImages0 s0 dcl_f4_texC v0.x def c2, 1.000000, 0, 0, 0 def c1, 1.0, 1.0, 1.0, 1.0 def c0, 0.0, 0.0, 0.0, 0.0 def c3, 1.0, 0.0, 0.0, 0.0 label start label main_ mul r0.xyzw, c3.xxyy, v0.xyyy texld_sat oColor.xyzw, r0.xyzw, s0 # TexImages0=s0 label main_end ret # 3 instructions, 5 C regs, 1 R regs I guess that's what causing our poor v1 performance: really bad "optimizing" shader compiler. But, of course, it still doesn't excuse bad v2 performance. Link to comment Share on other sites More sharing options...
Guest Chainfire Posted September 21, 2009 Report Share Posted September 21, 2009 I have seen those yes... but difference in Triangle2 app in v1 or v2 is very minimal. I don't think it's a shader issue. After some testing, it seems eglSwapBuffers is very slow, and a single triangle halves FPS... something weird is going on :) For something that is supposed to do 9 mio triangles/sec, the performance is very poor. Fill rate is also abysmal. Link to comment Share on other sites More sharing options...
Guest chong01 Posted September 21, 2009 Report Share Posted September 21, 2009 That's all definitely slow :) Let hope a H9 user can bring us some good news. Cause with the speed I'm seeing, GL is pretty much useless. Just to get an idea of how slow the omnia II is... do u have the numbers from other recent devices? Link to comment Share on other sites More sharing options...
Guest Chainfire Posted September 21, 2009 Report Share Posted September 21, 2009 (edited) Any MSM7K-based HTC device from the past two years will be 50 to 100% faster. This isn't really a proper benchmark tool, but it should definitely be faster than 50 fps (actually I was just looking to see if anybody might reach 60 or 70 fps on H6/H9 ROM... as this would indicate at least something) A proper bench would be glBenchmark, and what it shows on the video (see link in first post) is at least 4x slower than for example the Touch HD. But that's the "game render" test. If you go to fill rate tests (not included in the video) the Touch HD will outperform the Omnia II about 100-fold (so yeah thats about 10000%). That is why I say something fishy is going on. By the specs, the Omnia II should be at least twice as fast as the Touch HD. Depending on what benchmark you look at, thats a factor 8 to 200 difference from expected results. With this speed, for example porting TF3D would be useless. The speed would be too slow for a nice user experience. The comment about the Cube having unexpectedly decent performance on H6 sounds hopeful enough, though. Perhaps they fixed an issue with the drivers in that ROM ? It just doesn't fit with the speeds we otherwise see the Omnia II reach either. A test with Opera using GL as renderer actually made it slower, hah. Edited September 21, 2009 by Chainfire Link to comment Share on other sites More sharing options...
Guest Chainfire Posted September 21, 2009 Report Share Posted September 21, 2009 (edited) No H9 users around? :) Note, I did try to use the H9 drivers on my ROM, and v2 seems to work a bit better, but v1 doesn't work at all anymore.... But I had that same problem with G7 drivers which seem to work on a realy G7 ROM, so... Edited September 21, 2009 by Chainfire Link to comment Share on other sites More sharing options...
Guest inmatrixout Posted September 21, 2009 Report Share Posted September 21, 2009 ROM: H6 Without additional drivers ActiveSync: ON Performance: Auto FPS: 49 ActiveSync: Off Performance: Auto FPS: 31 ActiveSync: ON Performance: High FPS: 49 ActiveSync: Off Performance: High FPS: 49 Link to comment Share on other sites More sharing options...
Guest GinKage Posted September 21, 2009 Report Share Posted September 21, 2009 but difference in Triangle2 app in v1 or v2 is very minimal. That's because you don't use textures. Those were pixel shaders, and bloated three times of what it should be. As for the Triangle app, I think Omnia uses this shader for it (also taken from libGLESv1_CM.dll): #------------------------------------------------- # ORION - OpenGL ES 2.0 Shading Language Compiler # SAMSUNG INDIA SOFTWARE OPERATIONS PVT. LTD. # Compiler Version : v04.00.00.alpha.01 # Release Date : 16.04.2008 # FIMG VERSION : FIMGv1.5 # Optimizer Options : -Oxp #------------------------------------------------- ps_3_0 fimg_version 0x01020000 dcl_f4_FrontColor v0.x label start mov_sat oColor, v0.xyzw # gl_FragColor=oColor, FrontColor=v0.xyzw label main_end ret # 2 instructions Which is kinda neat and clean and, yes, I bet it runs at the same speed in v1 and v2. :) I don't think it's a shader issue. After some testing, it seems eglSwapBuffers is very slow, and a single triangle halves FPS... Compared to what? To the app that does no drawing? Well, that wouldn't be a big surprise. Or, do you mean that 2 triangles spin at 25 fps, and four spin at 12? Again, do you mea triangles of the same size, or any size? I mean, what does, in fact, matters, the number of triangles or the fillrate? And, well, eglSwapBuffers is not much of a bottleneck here, as it takes no longer than 10-20 ms. We need to do some more thorough benchmarking... Link to comment Share on other sites More sharing options...
Guest GinKage Posted September 22, 2009 Report Share Posted September 22, 2009 One more thing I don't really understand is - why Triangle2 is so big? My GLES programs, even in Debug mode (which is, in itself, a really bad idea for a performance-measuring app), do not exceed a couple of hundreds kb, and less than 50 kb in Release mode. Link to comment Share on other sites More sharing options...
Guest Chainfire Posted September 22, 2009 Report Share Posted September 22, 2009 Compared to what? To the app that does no drawing? Well, that wouldn't be a big surprise. Or, do you mean that 2 triangles spin at 25 fps, and four spin at 12? Again, do you mea triangles of the same size, or any size? I mean, what does, in fact, matters, the number of triangles or the fillrate? And, well, eglSwapBuffers is not much of a bottleneck here, as it takes no longer than 10-20 ms. We need to do some more thorough benchmarking... Compared to well, HTC. Number of triangle or fillrate do not matter directly, if everything just runs fine - it's just a measurement, after all - but it doesn't :) You can see from glBenchmark video that it's not exactly fluent. I'm trying to get my hands on glBenchmark 2.0 but so far no luck. I'd disagree with eglSwapBuffer not being much of a bottleneck. In truth it averages about 8ms (at least on my device / drivers combo) so it's a bit less worrying, but 20ms limits your fps to 50 even if you don't even draw anything... Currently I can reach 120 fps with it which is obviously not bad - though still relatively slow. Note that removing the pixel shading and just drawing a single triangle only improves the situation a few fps. Obviously you are correct and it is not a correct test, but as I said it was just to see if different ROM versions got different fps. One more thing I don't really understand is - why Triangle2 is so big? My GLES programs, even in Debug mode (which is, in itself, a really bad idea for a performance-measuring app), do not exceed a couple of hundreds kb, and less than 50 kb in Release mode. Just my way of compiling. Doesn't really matter, aside from being inconvenient. BTW my new devices should be coming in tomorrow or the day after, so perhaps then I can see some more. Link to comment Share on other sites More sharing options...
Guest GinKage Posted September 22, 2009 Report Share Posted September 22, 2009 (edited) Well, glBenchmark runs WITH texturing, and with all those awfully bloated shaders (imagine FIVE computational instructions, including logarithms and multiplications, per every pixel (!), instead of just ONE texture sampling operation), so I just think that its performance could be improved dramatically just by replacing those shaders with some "sane" ones. And, if I'm right, the situation won't change until we'll get sane version of libGLESv1_CM.dll, which I don't think will happen in the nearest couple of weeks or months. I think that, after all, we should contact Samsung directly to have some information on this. Of course, it's only my guess. Update: Tried to look at vertex shader as well, and... Well, it's big. Really big. So we might later encounter a vertex bottleneck as well. I have also looked at the data in disasm, and, yes, found some init function that xref's to a binary-compiled version of the shaders I mentioned. So, editing shaders in-place seems possible. I'll try to do it a bit later. Update2: I realized I looked at some old dll (maybe even from G1). So, I downloaded G7 ROM, dumped it, looked into libGLESv1_CM.dll, felt an urge for some liquor. One binary PS (and a long one, too). One binary VS. Same text asm sources, probably unused. Nothing more. If it really goes this way, we'll need to do some really hard work to make a usable v1... Edited September 22, 2009 by GinKage Link to comment Share on other sites More sharing options...
Recommended Posts
Please sign in to comment
You will be able to leave a comment after signing in
Sign In Now