Jump to content

Recommended Posts

Several new devices have an FPU these days (VFP in this case), a coprocessor that can speed up floating point (fractioned numbers) calculations. Devices that have an FPU include devices based on the SC6410 processor like the Samsung Omnia II/Pro and Acer M900 and Snapdragon based devices like the Toshiba TG01 and HTC Leo. However, Windows Mobile right now does not come with support for it. NuShrike and myself decided to do something about this, and FPU Enabler is the result.

FPU Enabler is an application that patches coredll in-memory and replaces some of the FPU emulation routines with actual FPU routines, all applications will automatically make use of this.

Now, obviously there are a number of caveats with an application like this. First, the FPU code is not IEEE compliant. This means that in some edge cases calculation results are undefined, which may cause issues. Exceptions are not supported - so for example a divide by 0 will not raise an error, which may be problematic. If your device acts as the control board for a nuclear power plant, we would definitely advise against using this app.

Not nearly all instructions that can be speed-up by FPU use are supported (yet). They may be in the future. Actual real-world effects will depend heavily on the application used. You'd have to look hard to notice it, for most applications. This may change as (if) more instructions become supported. Some say Crayon Physics seems a bit snappier, though.

Because of the way this is patched in, you aren't actually reaching the full speed possible with hardware FPU. Also, devices not running in "ALLKMODE" already are patched to run this way. As we haven't really found a good way to patch context switching code yet, during FPU instructions interrupts are disabled and an extra jump into patch-code is required. Disabling interrupts requires KMODE, and thus we patch everything to run this way.

What we really need is for Microsoft, Samsung, Toshiba, HTC, etc to just simply enable FPU support in their kernel builds. That would solve pretty much all the issues, and be quite a bit faster even. We know CE6 supports it, and it is rumored WM7 will as well, but it would be great if they would put in support in new 6.5 builds :)

Instructions

  • Unpack the zip file somewhere
  • Copy the EXE and the DLL to \ on your device
  • Run the EXE
  • Click Patch button
  • If the Patch app says so, close it and restart it, click Patch again.
  • Wait until "Done!"
  • Keep the EXE running. Closing the EXE will "unpatch" the FPU instructions again.

    Credits
    • Chainfire - Patcher code
    • NuShrike - FPU code
      • cmonex - Help with patch theory
      • no2chem - Help with patch theory
        • Samsung S3C6410:
        • Samsung Omnia II - main test device
        • Samsung Omnia Pro - untested
        • Acer M900 - tested and works
        • Acer F900 - untested
        • Acer X960 - untested
        • Qualcomm Snapdragon:
        • Toshiba TG01 - quick test done by cmonex
        • HTC Leo - untested


          Patcher notes

          • DirtyBench is not really (or "really not") a reliable benchmark and does not benchmark all functions - hence the name.
          • Instructions are benchmarked and then selected for patching or not. S3C6410 users will likely see about 17 functions patched, and 13 functions unpatched (those are very simple functions). Most effect will be seen in MUL and DIV instructions.

          Currently patched functions

          [*] __eqs

          [*] __ges

          [*] __gts

          [*] __les

          [*] __lts

          [*] __eqd

          [*] __ged

          [*] __gtd

          [*] __led

          [*] __ltd

          [*] __adds

          [*] __subs

          [*] __muls

          [*] __divs

          [*] __addd

          [*] __subd

          [*] __muld

          [*] __divd

          [*] __itos

          [*] __itod

          [*] __utos

          [*] __utod

          [*] __stoi

          [*] __stou

          [*] __stod

          [*] __dtoi

          [*] __dtou

          [*] __dtos

          Other functions may follow in the future.

          Known issues

          [*] None at the moment

          Feel free to report issues, I'm not assuring you we will fix them, but they're interesting to know anyways.

          Changelog 0.70

          [*] Seperate KMODE and COREDLL patches

          [*] Use FPUEnabler.dll instead of op_fpu.dll

          [*] Unpatch FPU calls on exit

          [*] Fixleak on exit

          [*] __eqs, __eqd, __ges, __ged, __gts, __gtd, __les, __led, __lts, __ltd, __utos, __utod, __stou, __dtou added

          Remember this is proof-of-concept code, it may not actually be very useful, and may have adverse effects. It's "because we can" code. New tricks were learned!

          FPUEnabler_0.70.zip

Edited by Chainfire
  • Upvote 1

Share this post


Link to post
Share on other sites

I made some test with coreplayer benchmark. I didnt observe any speed increasing. There is any real demonstration what is show why is this good? :)

Share this post


Link to post
Share on other sites

Various benchmarks we have run have a few % faster results. Probably mostly unnoticable to the naked eye. But that isn't the point - the point is that our devices can be faster with a very simple change to the kernel (from MS' viewpoint) - it's pretty much a 3-liner - and there isn't any good reason it isn't. True FPU support would see a much bigger difference than a patch like this can ever make come true. However also keep in mind that right now we only support a small number of instructions, there are many still missing. Before we can see real effects (even with patch) we'll have to implement more functions. And even then, it depends on what the bottleneck is in an app. We originally started on this for the GL 1.x layer, because it makes very heavy use of floats. How much floats CorePlayer uses, who knows.

Hence the:

Remember this is proof-of-concept code, it may not actually be very useful, and may have adverse effects. It's "because we can" code. New tricks were learned!
Edited by Chainfire

Share this post


Link to post
Share on other sites

Seems that other side effects include crashing of Cube and Volume rocker popup after suspend/resume.

Share this post


Link to post
Share on other sites

Tried it with Acer M900

And it really pushes floating point to the roof, leaving others behind :P ...

fpu.jpg

Great work guys. Great to see great developments :D

(hope the crashing issue in m900 can be easily fixed :) )

Share this post


Link to post
Share on other sites

Million Thanks to Chainfire, NuShrike, cmonex, no2chem. Your contributions are greatly appreciated. Though I might not be using this anytime soon. But I sure am happy to know that there are people spending their precious time on enhancements like this.

One again, Thank you. Greatly appreciate all the hardwork you guys have put in.... :-)

Edited by mechcool

Share this post


Link to post
Share on other sites

GinKage I have been able to replicate rocker/crash only once.. can you elaborate exactly what happens? Both the cube and screen lock crash I cannot replicate either on my device.

Nice stats daskalos :) We're still looking into a fix for the M900 specific issue... it may actually solve these crash issues as well.

Share this post


Link to post
Share on other sites
I made some test with coreplayer benchmark. I didnt observe any speed increasing. There is any real demonstration what is show why is this good? :)
Because most good programs have learned to avoid SLOW floating-point math, you probably won't see much difference in coreplayer or other high-performance programs.

However, programs such as 3D can't avoid needing sub-precision numbers or requires complex math (matrices, sqrt), this is where the big boost will be. An example is the Qt4 Framework whose advanced graphics paths are entirely floating-point based so the Windows Mobile port will be much faster now.

Edited by NuShrike

Share this post


Link to post
Share on other sites

Alright! It certainly looks like we have located and fixed the suspend/resume problem apparent in some programs on the Omnia II as well as the M900 in general. Test version works, now we just need to fix it up for release (and have you guys test it). GinKage, hope to see you on IRC tomorrow so you can do some last tests as well.

For now, nap time!

Share this post


Link to post
Share on other sites
Omnia II as well as the M900 in general.
This includes any device running the S3C6410 which pretty much includes all the Samsung-cpu based Acers recently such as F900, X960, etc etc.

Share this post


Link to post
Share on other sites

Really hands down to all you guys doing all these enhancement and discovering hidden potential in our devices!.

Keep up the good work if you guys need some tester just let us know I will be one of them queuing up. :)

One quick questions will this patch benefit the 3D games? rightnow I have Ferrari GT Evolution installed in my O2

Share this post


Link to post
Share on other sites

A fix already? :) how fast, I thought releasing the fix will take about another day or two :D

Testing and running it right now, yup crashing from suspend/resume is fixed :P , just will keep this running for a while to monitor things...

I notice that upon suspend/resume, the app suspends and resumes too...'Bout closing this, does it still need soft reset or exits by just tapping "ok"?

Share this post


Link to post
Share on other sites

Well the fix could have been better, actually, but that is for the future. The app detects suspend/resume, because it has to run some code before the phone suspends and then after resume to fix the problem. If you exit the app, FPU will be disabled and "old/slow" calculations will be put back. But KMODE will still be enabled until you soft-reset.

Share this post


Link to post
Share on other sites

Ow wow :) i read the first post from chainfire and i felt like reading something that someone translated from greek to chinese and than to english using 1st version of google translator :P

I just wanted to say respect guys, i dont have Omnia2 yet but its definitly a good candidate in the future.. and I follow the development just because of sheer respect to you guys! So awsome to see you guys master the coding to extreme levels!

Really RESPECT!!

Figured u guys could use some positive feedback for your great job :D

Share this post


Link to post
Share on other sites
Ow wow :) i read the first post from chainfire and i felt like reading something that someone translated from greek to chinese and than to english using 1st version of google translator :P

I'm really feeling same ! :D

Is there a relation between this FPU Enabler and your research about OpenGl ?

Is there a way from those two subjects to fixe 3D in SPB MS3.5 ?

(Could you please, try to speak to dummies ?) :D

Share this post


Link to post
Share on other sites
I'm really feeling same ! :P

Is there a relation between this FPU Enabler and your research about OpenGl ?

Is there a way from those two subjects to fixe 3D in SPB MS3.5 ?

(Could you please, try to speak to dummies ?) :D

In my opinion, these guys will not waste their time and effort in developing something that will not benefit the devices in the end.

The recent projects do have connections with each other, though I can't explain it in an expert's view. :)

Try to read the open gl development thread thoroughly and you will realize that if it goes successful, not only there will be a fix on SPB 3.5's 3D but will also fix compatibility and performance issues with other 3D apps and Games.

As I see it, they are trying their best to explain things in a way that is comprehensible to our non-developer minds, so it's up to us to do the effort to look up on some terms (through google or other means) that we may not understand.

They are doing us all a great favor without asking anything in return... So it is we who have to adjust to things, not them :D

Share this post


Link to post
Share on other sites

Yeah they are doing a great job indeed! However what is the point of having 3D in spb 3.5? I've seen it on HTC TP2 and its no use at all. If thats related to any other useful 3D app its ok but if its only for spb 3d carosel view its no point indeveloping that as its useless.. eye candy only :D I hope they will manage to raise the number of triangles per second so the phone is faster other than that, this phone is fenomenal. However if they manage to get the development of roms done in a way that with minimal effort they could all be transformed to fit OmniaPro.. i'd be in serious doubts which phone to buy again.

Its a great phone and a lot of promising dev's are looking into it right now i guess so i think there is a bright future for this phone :) hope you guys are having fun developing :D :P

Share this post


Link to post
Share on other sites

For the last few replies, see my post (in a few minutes) in the GL thread. FPU can indeed improve GL speed, as it uses a lot of float calculus, which is exactly what the FPU helps. How much difference real world? Don't know.

Share this post


Link to post
Share on other sites

Seems to be working on an F900. Sktools reports the following:

With FPU enabled:

Integer: 342.3579

Floating Point: 39.475

With FPU disabled:

Integer: 341.8036

Floating Point: 7.444

Big difference but I don't see much in actual use right at the moment. I'm sure that will change though. :)

Share this post


Link to post
Share on other sites

Samsung Omnia Pro B7610

Before:

Integer: 514,3381

Floating Point: 10,915

After:

Integer: 512,8265

Floating Point: 58,243

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×

Important Information

By using this site, you agree to our Terms of Use.