Jules - is there any standard "iOS 3 vs iOS4" definition in Juce? I looked, but nothing stood out, and this could help with the audio latency issue discussed in this thread.
The reason I ask is that in iOS 4 you can use the vDSP framework, and I've found that on intel hardware it can be 30%-200% faster than even hand-coded assembly. I've got benchmarks for general matrix multiply comparing vecLib with
http://eigen.tuxfamily.org/, and vDSP with hand-coded-NEON assembler from FFmpeg/LibAV. Yes, believe it or not, vDSP is almost twice as fast as the hand-coded assembler... for the FFTs!
Where this may come up is in the audio S1.15 interger coding to and from floating point. Since this function is called so often, it would be really nice to use vDSP_vflt16 (converts an array of signed 16-bit integers to single-precision floating-point values) and vDSP_vfix16/vDSP_vfixr16 rather than hand-coded multiplies of 32768 as is current in the iOS source.
From my (albiet limited) experience, gcc generates horrible ARM code, so using the vDSP libraries could save precious microseconds, especially in a tight loop like the audio callbacks.
Yes - I'm volunteering to develop/test patches, but I need to know about iOS3 vs iOS4 in Juce...
