[mythtv] OSD issue?

Fri Feb 13 16:23:19 EST 2004

]> I spent a few hours testing various ways of calculating this in Python.
]> The best I can get it to work is within +/- 1 of the values in pow_lut.
]> All but about 200 of the possible pairs of input values produce exactly
]> the same result.  I can maybe get it a little better, but I don't think
]> perfect is possible, short of doing the division in SSE.  Think this is
]> worth further pursuit, or should I get back to something else? :)

I didn't see the original message here, but division in SSE isn't so bad
if you use te Newton-Rhapston method to generate an inverse and then
multiply. Here's a bit of code from my vector library you are free to
use. The "+=", "-" and "*" are implemented like you might expect. This
used to be raw assembly before gcc 3.3.2 so it shouldn't be hard to go
back if need be. These are all floating point obviously, but the quick
conversion back and forth are a single cycle things whose latency can be
hidden. (This is written out with the explicit assignment to t0 and the
like to fool gcc into generating good SIMD code.)

 inline v4_float32c invert() const {
    // Newton-Rhapston method...
    // Y0 = rccps(X)
    // Y1 = (Y0+Y0)-X*Y0*Y0
    // (alt Y1=Y0*(2-X*Y0))
    v4_float32c y0=invertInexact();     // rccps(X) -> y0
    v4_float32c t0=y0;                  // Y0       -> t0
    v4_float32c t1=y0*y0;               // Y0*Y0    -> t1 (hopefully y0 register)
    t0+=t0;                             // t0+t0 (t0=Y0) -> t0=(Y0+Y0)
    t1*=data;                           // X*Y0*Y0  -> t1=X*Y0*Y0
    return t0-t1;
  }

  /** This returns an inexact inverse(), just a lookup table on CPU, enough
   *  bits for lighting functions, but not much else. very very fast.
   * @return _mm_rcp_ps(*this)
   */
  inline v4_float32c invertInexact() const {
    return v4_float32c(_mm_rcp_ps(data));
  }