Jump to content

hardware performance


Guest muff

Recommended Posts

just curious

has anyone done any tests on the SPV to get performance figures for things like :-

fixed point math Vs floating point math

div speed Vs mult speed

trig functions Vs lookup tables

video blit speed

data transfer rate from SD card

key response time

audio playback overhead (processor occupancy)

??

I know it's a long shot, but it'll save me writing the tests myself if someone has already done them

I also realise that some of these might be supposed to be equivalent to a p120 (or whatever they CLAIM is inside these things), but it would be good to have some hard figures to back it up

at the moment the information on the SPV is still somewhat scarce, and I suspect that is the reason for some of the games not running at their potential speed

the game programming community as a whole would benefit from this type of information

muff

Link to comment
Share on other sites

Guest fraser
fixed point math Vs floating point math  

div speed Vs mult speed

As far as I know, the SPV can't do floats and divs at all, I remember seeing that they recommended using lookup tables if you need to do a lot of them.

Link to comment
Share on other sites

Guest revolution.cx
having read through the article I now need to rework the entire quake engine to not use any floating point values, and no divides

ooh err

Quake is already running on the Pocket PC with an ARM processor with the same limitations you'll see on the SPV. I'd start with the Pocket Quake sources if I was you.

The SPV is just not going to have the power to run Quake with any useable framerate.

All the floating point stuff will work if you link to the math lib it just won't be that fast.

I ported Quake to Pocket PC way back when but never released it. As I recall the floating point issue really wasn't an issue as the rasterizer doesn't use floats constantly. They still wanted to support the 486 so I think they did the perspective divide every 8 pixels and linearly interpolated the rest.

The fundamental tenet of performance on a Pocket PC (the SPV might be different, see below) is DON'T TOUCH MEMORY. Once a memory access is outside of the cache you are doomed. So if your look-up table will easily fit in the cache and has lots of redundancy (chances for a cache hit) then go for it, otherwise you will just slow yourself down. A 64K lookup table for some sort of math operation will just suck.

I don't remember my exact trade-off point and it's been a while since I did the tests, but usually 10 machine instructions are faster than one non-cache memory access.

SD card read speed is horrid. You can only access it via the file system though so it shouldn't be an issue. If you had visions of paging things in and out during a level it ain't gonna work.

Playing sound doesn't steal much CPU, but having to process the interrupts and mix the sound into the buffer does. Keep the sample rate low if you can.

The SPV has a system-on-a-chip design that may have better memory access characteristics than your average Pocket PC.

And as Michael Abrash always said, you have to measure the performance of your optimizations directly otherwise you are just guessing. That's what the performance timer is for.

Link to comment
Share on other sites

revolution.cx:

thanks for the feedback

I did actually start out with the PockeQuake port, but as it is completely in C, there is definite room for speed improvements

The SPV is just not going to have the power to run Quake with any useable framerate.

despite my earlier reservations, I do actually believe it's possible now [though some architecture may need simplification]

because the ported code is based on the WinQuake port which came later [which has then had all the remaining asm removed] the code base pretty much assumes a FPU, and as such is not obeying anything that it should it terms of the phones limitations [as I've learnt today]

in terms of speed, with the existing quake maps + textures it runs at around 4/5fps (although this slows when several monsters on stage, or in areas of complex level geometry)

in some areas where the level geometry is simple, and there are no monsters, it runs at 10+fps

switching off the texturing doubles these figures [sometimes triples them]

[above figures are based on the sound being off at the moment]

at 10fps its actually very playable - even at 4/5fps it feels only a little slow

SD card speed reading isn't actually as bad a factor as I suspected, and a new level loads in around 6/7 seconds - which is quicker than the N64 version - and again, this will improve if levels or textures within them are simplified

as for performance analysis, well that's a problem, as Orange won't let us amateurs run debuggers on our code - the only way of doing it is to include your own output functions in the code - never a good idea as it screws the profiling you are trying to measure

anyway, thanks for the feedback - back to squeezing this phone for all it's worth :)

Link to comment
Share on other sites

Guest revolution.cx

I don't think you are going to get quite the speed improvements you are hoping for with assembly. Maybe since the processor is only at 80 mhz or so you will - on a faster machine you still have cycles to burn while you wait for memory. Remember that ARM assembly is much more complicated than writing pipelined Pentium 1 assembly.

I think your first order of business is to see how much your performance is memory bound. If you can turn off perspective correction and still have the textures slow things down that much you have your answer.

Combining multiple 8 bit and 16 bit reads and writes into a single 32 bit read or write is a great way to improve speed. An 8 or 16 bit access takes just as long (sometimes longer on some ARMs) as a 32 bit one but only gets 1/2 or 1/4 of the work done. This made a huge difference in my Pocket PC GameBoy emulator.

Reducing the render quality is a good way to go as the screen is so small it's hard to even notice.

For debug output, get this, don't ask questions, just use it:

http://www.smartphonedn.com/articles/stlog.html

A work of art in its simplicity - a fabulous way to debug code when you can't use the debugger. If you employ it correctly it won't mess with your timings. Remeber you are making relative comparisons so if the debug output is at the beginning and end of code in question than it remains a constant across various rewrites.

Allegedly the debugger will work if you de-cert your phone just right, do a search here.

Sounds like you've got a lot of work done already, great work!

Hayes

revolution.cx:

thanks for the feedback

I did actually start out with the PockeQuake port, but as it is completely in C, there is definite room for speed improvements

despite my earlier reservations, I do actually believe it's possible now [though some architecture may need simplification]

because the ported code is based on the WinQuake port which came later [which has then had all the remaining asm removed] the code base pretty much assumes a FPU, and as such is not obeying anything that it should it terms of the phones limitations [as I've learnt today]

in terms of speed, with the existing quake maps + textures it runs at around 4/5fps (although this slows when several monsters on stage, or in areas of complex level geometry)

in some areas where the level geometry is simple, and there are no monsters, it runs at 10+fps

switching off the texturing doubles these figures [sometimes triples them]

[above figures are based on the sound being off at the moment]

at 10fps its actually very playable - even at 4/5fps it feels only a little slow

SD card speed reading isn't actually as bad a factor as I suspected, and a new level loads in around 6/7 seconds - which is quicker than the N64 version - and again, this will improve if levels or textures within them are simplified

as for performance analysis, well that's a problem, as Orange won't let us amateurs run debuggers on our code - the only way of doing it is to include your own output functions in the code - never a good idea as it screws the profiling you are trying to measure

anyway, thanks for the feedback - back to squeezing this phone for all it's worth :)

Link to comment
Share on other sites

revolution.cx:

thanks for the heads up on the stlog stuff, I'll give that a shot - looks like it could be useful

at this point I'm in 2 minds about the whole ARM code aspect

I know there are gains to be made, but I also know there is a lot of work to get the benefits I'm looking for (basically looking to double performance) - even then I'm not 100% sure they are there to get :)

gonna try some simple breakdowns to really nail the problem routines and concentrate on one of those and see what results I can squeeze out of it (probably a small routine so it's not a large waste of time if there's no extra juice to squeeze out)

fingers crossed

thanks again

muff

Link to comment
Share on other sites

Guest spacemonkey

A note on the debugger:

To get it working, when you do the de-cert you must include this in the provxml files: in the security policies section.

This enables rapi config which allows you pc to push certificates etc onto the phone. This means that eVC can push your developer certificate into the privileged certificate store on your phone which will make the debugger work.

The debugger probably isn't much use for performance analysis anyway, compiling things for debug makes the app probably half the speed that it would be compiled with optimisations. The debugger is good for breaking at a breakpoint and watching variables. Be warned that if you start stepping through code it can often crash and leave your phone hung (which normally requires pulling the battery to recover from).

Link to comment
Share on other sites

Please sign in to comment

You will be able to leave a comment after signing in



Sign In Now
×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.