Jump to content

OpenGL ES 3D drivers, v1 compatibility layer


Recommended Posts

Guest Albertri
Chainfire,

Hm, you're right, I tried to rewrite Screentex with GLES2, and it is not very fast... Though, it runs much better than v1!

The slowest part in the v1 version, by the way, wass rasterization: when the quad is rotated so that the viewer lies in its plane, the speed is quite good, but when it faces the viewer, performance is awful. I know that the Omnia's hardware does blitting and streching quite good (just try CorePlayer in DirectDraw mode, and compare it to RAW Framebuffer mode), so there's definetly seem to be something wrong with the driver. And the v2 version has some hiccups with any quad rotation.

On the other hand, Albertri said two pages ago that Screentex (v1) runs smoothly on H3 (and I cannot say that about my G6). And, in the H6 topic, FerdiBorbon reports surprisingly good performance in Cube interface. I think I should test Screentex (both v1 and v2) on a newer firmware.

took some vid of screentex on my O2 :)

http://www.youtube.com/watch?v=5vOzuyYvzZI

Edited by Albertri
Link to comment
Share on other sites

Guest Chainfire

GinKage yes I had suspected this, which is why I am waiting on my new firmwares and devices. It's taking longer than expected. I might try to find RAWs or dump from the downloads of the ROMs posted here on the board. I'm still not sure I have all the correct DLLs though, which is why trying new drivers on my old firmware may still prove a waste of time.

Note that in a debugger you can track a number of warning messages from the drivers, though I'm not sure if these are supposed to be there or are really pointing to a problem.

The video Albertri posted is 'inconclusive' for me to see if it is really faster or not. I will do some more stuff, but I'm not going to post anything conclusive until I get my new firmwares and devices.

Link to comment
Share on other sites

Nice, Albertri. :)

Well, you may want to try v2 as well: http://www.4shared.com/file/133607470/7f3f...Screentex2.html

Oh, by the way: I noticed that v2 apps run more smoothly if the power cable is attached. Which is, of course, not a big surprise.

Chainfire, is there something that stops you from, well, flashing one of the newer firmwares available in this forum?

Even G1 is here, so you will have a chance to revert to it if you want.

Link to comment
Share on other sites

Guest Chainfire

I had not noticed this power thing. Have you tried comparing with and without cable, but the latter putting the device in "high performance" mode through the menu, and see if there is still a difference?

Link to comment
Share on other sites

Good point. Yes, it seems that in High performance mode gles2 apps run just as well. It means that in Auto mode with a power cable attached, device in fact disables power saving and acts as in High mode.

Link to comment
Share on other sites

Guest Albertri
Good point. Yes, it seems that in High performance mode gles2 apps run just as well. It means that in Auto mode with a power cable attached, device in fact disables power saving and acts as in High mode.

screentex V2 still runs smoothly on my Omnia 2 (H3) it feels more rubbery not as loose as V1 but i did notice the graphics are rendered cleaner (or maybe just my imagination)

Oh my devices is not plug in to USB or power source but I always had my setting to high (Performance)

Sorry for the vid using my wife E63 vid res is very low :)

http://www.youtube.com/watch?v=d_IlR4gWFHI

Link to comment
Share on other sites

Guest Chainfire

Can I get FPS quotes please on different ROMs ?

Attached is Triangle2 program. Let is run for a minute or so, then tap the screen and it will show FPS. Note ActiveSync syncing will definitely slow this down horribly :)

Triangle2.zip

Link to comment
Share on other sites

Guest Chainfire

E2 + G6 drivers: 50 fps (high)

Note that we should be seeing higher values...

Looking forward to seeing H3, H6 and H9 values.

Edited by Chainfire
Link to comment
Share on other sites

Guest wakeupneo
Can I get FPS quotes please on different ROMs ?

Attached is Triangle2 program. Let is run for a minute or so, then tap the screen and it will show FPS. Note ActiveSync syncing will definitely slow this down horribly :)

45 FPS on H3 without any additional drivers.

Link to comment
Share on other sites

Guest Chainfire

That's all definitely slow :) Let hope a H9 user can bring us some good news. Cause with the speed I'm seeing, GL is pretty much useless.

Edited by Chainfire
Link to comment
Share on other sites

By the way... I looked into libGLESv1_CM.dll a bit, and the first two shaders in that file are:

#-------------------------------------------------

# ORION - OpenGL ES 2.0 Shading Language Compiler

# SAMSUNG INDIA SOFTWARE OPERATIONS PVT. LTD.

# Compiler Version	: v04.00.00.alpha.01

# Release Date		: 16.04.2008

# FIMG VERSION	  : FIMGv1.5

# Optimizer Options :  -Oxp

#-------------------------------------------------


ps_3_0


fimg_version	0x01020000


dcl_f2_TexDim0	c2.x

dcl_s2_TexImages0	s0

dcl_f4_TexCoord0	v1.x

dcl_f4_FrontColor	v0.x


label start

mul r0.x, vPos.w, vPos.w

mad r1.xyzw, vPos.yyxx, d1.wwww, d1.wwww

mul r1.xyzw, r1.xyzw, c2.xyxy

maxcomp r1.w, |r1.xyzw|

mul r1.w, r1.w, r0.x

log r1.w, r1.w

mov r1.xyz, v1.xyz	# TexCoord0=v1.xyz

texld r1.xyzw, r1.xyzw, s0	# TexImages0=s0

mul_sat oColor, v0.xyzw, r1.xyzw	# gl_FragColor=oColor, FrontColor=v0.xyzw

label main_end

ret

# 10 instructions
and
#-------------------------------------------------

# ORION - OpenGL ES 2.0 Shading Language Compiler

# SAMSUNG INDIA SOFTWARE OPERATIONS PVT. LTD.

# Compiler Version	: v04.00.00.alpha.01

# Release Date		: 16.04.2008

# FIMG VERSION	  : FIMGv1.5

# Optimizer Options :  -Oxp

#-------------------------------------------------


ps_3_0


fimg_version	0x01020000


dcl_f2_TexDim0	c2.x

dcl_s2_TexImages0	s0

dcl_f4_TexCoord0	v0.x


label start

mul r0.x, vPos.w, vPos.w

mad r1.xyzw, vPos.yyxx, d0.wwww, d0.wwww

mul r1.xyzw, r1.xyzw, c2.xyxy

maxcomp r1.w, |r1.xyzw|

mul r1.w, r1.w, r0.x

log r1.w, r1.w

mov r1.xyz, v0.xyz	# TexCoord0=v0.xyz

texld_sat oColor.xyzw, r1.xyzw, s0	# TexImages0=s0

label main_end

ret

# 9 instructions
This probably doesn't look too weird if you haven't seen what ORION compiler (I have a bit newer version from Cube20) generates on, for example, such a high-level shader:
#ifdef GL_ES

precision highp float;

#endif


uniform sampler2D TexImages0;


varying vec4 texC;


void main()

{

	gl_FragColor = texture2D(TexImages0, texC.xy);

}
Ready to behold? Watch:
#-------------------------------------------------

# ORION - OpenGL ES 2.0 Shading Language Compiler

# SAMSUNG INDIA SOFTWARE OPERATIONS PVT. LTD.

# Compiler Version	: v04.00.03

# Release Date		: 30.06.2008

# FIMG VERSION	  : FIMGv1.5

# Optimizer Options :  -Oxp

#-------------------------------------------------


ps_3_0


fimg_version	0x01020000


dcl_f2_TexDim0	c2.x

dcl_s2_TexImages0	s0

dcl_f4_texC	v0.x


def c0, 0.0, 0.0, 0.0, 0.0

def c1, 1.0, 1.0, 1.0, 1.0


label start

label main_

mul r1.x, vPos.w, vPos.w

mad r2.xyzw, vPos.yyxx, d0.wwww, d0.wwww

mul r2.xyzw, r2.xyzw, c2.xyxy

maxcomp r2.w, |r2.xyzw|

mul r2.w, r2.w, r1.x

log r2.w, r2.w

mov r2.xyz, v0.xyz	# texC=v0.xyz

texld r2.xyzw, r2.xyzw, s0	# TexImages0=s0

mov r0.xyzw, r2.xyzw	# gl_FragColor=r0.xyzw

label main_end

mov_sat oColor.xyzw, r0.xyzw

ret

# 11 instructions, 3 C regs, 3 R regs
Why? For another example, if you change the (one and only) instruction in higher-level language to this: gl_FragColor = texture2D(TexImages0, texC.xy*1.0); you'll something way better:
#-------------------------------------------------

# ORION - OpenGL ES 2.0 Shading Language Compiler

# SAMSUNG INDIA SOFTWARE OPERATIONS PVT. LTD.

# Compiler Version	: v04.00.03

# Release Date		: 30.06.2008

# FIMG VERSION	  : FIMGv1.5

# Optimizer Options :  -Oxp

#-------------------------------------------------


ps_3_0


fimg_version	0x01020000


dcl_s2_TexImages0	s0

dcl_f4_texC	v0.x


def c0, 0.0, 0.0, 0.0, 0.0

def c1, 1.0, 1.0, 1.0, 1.0

def c2, 1.000000, 0, 0, 0

def c3, 1.0, 0.0, 0.0, 0.0


label start

label main_

mul r1.xy, v0.xy, c2.xx	# texC=v0.xy

mul r2.xyzw, c3.xxyy, r1.xyyy

texld r2.xyzw, r2.xyzw, s0	# TexImages0=s0

mov r0.xyzw, r2.xyzw	# gl_FragColor=r0.xyzw

label main_end

mov_sat oColor.xyzw, r0.xyzw

ret

# 6 instructions, 4 C regs, 3 R regs
But still imperfect. Try disabling "experimental optimizations" and compile with -O instead of -Oxp, and you'll get something more nice and almost clean:
#-------------------------------------------------

# ORION - OpenGL ES 2.0 Shading Language Compiler

# SAMSUNG INDIA SOFTWARE OPERATIONS PVT. LTD.

# Compiler Version	: v04.00.03

# Release Date		: 30.06.2008

# FIMG VERSION	  : FIMGv1.5

# Optimizer Options :  -O

#-------------------------------------------------


ps_3_0


fimg_version	0x01020000


dcl_s2_TexImages0	s0

dcl_f4_texC	v0.x


def c2, 1.000000, 0, 0, 0

def c1, 1.0, 1.0, 1.0, 1.0

def c0, 0.0, 0.0, 0.0, 0.0

def c3, 1.0, 0.0, 0.0, 0.0


label start

label main_

mul r0.xyzw, c3.xxyy, v0.xyyy

texld_sat oColor.xyzw, r0.xyzw, s0	# TexImages0=s0

label main_end

ret

# 3 instructions, 5 C regs, 1 R regs

I guess that's what causing our poor v1 performance: really bad "optimizing" shader compiler. But, of course, it still doesn't excuse bad v2 performance.

Link to comment
Share on other sites

Guest Chainfire

I have seen those yes... but difference in Triangle2 app in v1 or v2 is very minimal. I don't think it's a shader issue. After some testing, it seems eglSwapBuffers is very slow, and a single triangle halves FPS... something weird is going on :)

For something that is supposed to do 9 mio triangles/sec, the performance is very poor. Fill rate is also abysmal.

Link to comment
Share on other sites

That's all definitely slow :) Let hope a H9 user can bring us some good news. Cause with the speed I'm seeing, GL is pretty much useless.

Just to get an idea of how slow the omnia II is... do u have the numbers from other recent devices?

Link to comment
Share on other sites

Guest Chainfire

Any MSM7K-based HTC device from the past two years will be 50 to 100% faster. This isn't really a proper benchmark tool, but it should definitely be faster than 50 fps (actually I was just looking to see if anybody might reach 60 or 70 fps on H6/H9 ROM... as this would indicate at least something)

A proper bench would be glBenchmark, and what it shows on the video (see link in first post) is at least 4x slower than for example the Touch HD. But that's the "game render" test. If you go to fill rate tests (not included in the video) the Touch HD will outperform the Omnia II about 100-fold (so yeah thats about 10000%). That is why I say something fishy is going on. By the specs, the Omnia II should be at least twice as fast as the Touch HD. Depending on what benchmark you look at, thats a factor 8 to 200 difference from expected results.

With this speed, for example porting TF3D would be useless. The speed would be too slow for a nice user experience.

The comment about the Cube having unexpectedly decent performance on H6 sounds hopeful enough, though. Perhaps they fixed an issue with the drivers in that ROM ?

It just doesn't fit with the speeds we otherwise see the Omnia II reach either.

A test with Opera using GL as renderer actually made it slower, hah.

Edited by Chainfire
Link to comment
Share on other sites

Guest Chainfire

No H9 users around? :)

Note, I did try to use the H9 drivers on my ROM, and v2 seems to work a bit better, but v1 doesn't work at all anymore.... But I had that same problem with G7 drivers which seem to work on a realy G7 ROM, so...

Edited by Chainfire
Link to comment
Share on other sites

Guest inmatrixout

ROM: H6

Without additional drivers

ActiveSync: ON

Performance: Auto

FPS: 49

ActiveSync: Off

Performance: Auto

FPS: 31

ActiveSync: ON

Performance: High

FPS: 49

ActiveSync: Off

Performance: High

FPS: 49

Link to comment
Share on other sites

but difference in Triangle2 app in v1 or v2 is very minimal.

That's because you don't use textures. Those were pixel shaders, and bloated three times of what it should be.

As for the Triangle app, I think Omnia uses this shader for it (also taken from libGLESv1_CM.dll):

#-------------------------------------------------

# ORION - OpenGL ES 2.0 Shading Language Compiler

# SAMSUNG INDIA SOFTWARE OPERATIONS PVT. LTD.

# Compiler Version	: v04.00.00.alpha.01

# Release Date		: 16.04.2008

# FIMG VERSION	  : FIMGv1.5

# Optimizer Options :  -Oxp

#-------------------------------------------------


ps_3_0


fimg_version	0x01020000


dcl_f4_FrontColor	v0.x


label start

mov_sat oColor, v0.xyzw	# gl_FragColor=oColor, FrontColor=v0.xyzw

label main_end

ret

# 2 instructions

Which is kinda neat and clean and, yes, I bet it runs at the same speed in v1 and v2. :)

I don't think it's a shader issue. After some testing, it seems eglSwapBuffers is very slow, and a single triangle halves FPS...

Compared to what? To the app that does no drawing? Well, that wouldn't be a big surprise. Or, do you mean that 2 triangles spin at 25 fps, and four spin at 12? Again, do you mea triangles of the same size, or any size? I mean, what does, in fact, matters, the number of triangles or the fillrate?

And, well, eglSwapBuffers is not much of a bottleneck here, as it takes no longer than 10-20 ms.

We need to do some more thorough benchmarking...

Link to comment
Share on other sites

One more thing I don't really understand is - why Triangle2 is so big?

My GLES programs, even in Debug mode (which is, in itself, a really bad idea for a performance-measuring app), do not exceed a couple of hundreds kb, and less than 50 kb in Release mode.

Link to comment
Share on other sites

Guest Chainfire
Compared to what? To the app that does no drawing? Well, that wouldn't be a big surprise. Or, do you mean that 2 triangles spin at 25 fps, and four spin at 12? Again, do you mea triangles of the same size, or any size? I mean, what does, in fact, matters, the number of triangles or the fillrate?

And, well, eglSwapBuffers is not much of a bottleneck here, as it takes no longer than 10-20 ms.

We need to do some more thorough benchmarking...

Compared to well, HTC. Number of triangle or fillrate do not matter directly, if everything just runs fine - it's just a measurement, after all - but it doesn't :) You can see from glBenchmark video that it's not exactly fluent. I'm trying to get my hands on glBenchmark 2.0 but so far no luck.

I'd disagree with eglSwapBuffer not being much of a bottleneck. In truth it averages about 8ms (at least on my device / drivers combo) so it's a bit less worrying, but 20ms limits your fps to 50 even if you don't even draw anything... Currently I can reach 120 fps with it which is obviously not bad - though still relatively slow.

Note that removing the pixel shading and just drawing a single triangle only improves the situation a few fps. Obviously you are correct and it is not a correct test, but as I said it was just to see if different ROM versions got different fps.

One more thing I don't really understand is - why Triangle2 is so big?

My GLES programs, even in Debug mode (which is, in itself, a really bad idea for a performance-measuring app), do not exceed a couple of hundreds kb, and less than 50 kb in Release mode.

Just my way of compiling. Doesn't really matter, aside from being inconvenient.

BTW my new devices should be coming in tomorrow or the day after, so perhaps then I can see some more.

Link to comment
Share on other sites

Well, glBenchmark runs WITH texturing, and with all those awfully bloated shaders (imagine FIVE computational instructions, including logarithms and multiplications, per every pixel (!), instead of just ONE texture sampling operation), so I just think that its performance could be improved dramatically just by replacing those shaders with some "sane" ones. And, if I'm right, the situation won't change until we'll get sane version of libGLESv1_CM.dll, which I don't think will happen in the nearest couple of weeks or months. I think that, after all, we should contact Samsung directly to have some information on this.

Of course, it's only my guess.

Update: Tried to look at vertex shader as well, and... Well, it's big. Really big. So we might later encounter a vertex bottleneck as well.

I have also looked at the data in disasm, and, yes, found some init function that xref's to a binary-compiled version of the shaders I mentioned. So, editing shaders in-place seems possible. I'll try to do it a bit later.

Update2: I realized I looked at some old dll (maybe even from G1). So, I downloaded G7 ROM, dumped it, looked into libGLESv1_CM.dll, felt an urge for some liquor. One binary PS (and a long one, too). One binary VS. Same text asm sources, probably unused. Nothing more.

If it really goes this way, we'll need to do some really hard work to make a usable v1...

Edited by GinKage
Link to comment
Share on other sites

Please sign in to comment

You will be able to leave a comment after signing in



Sign In Now
×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.