A fast way of calculating an approximate squareroot on the PIII is to multiply the reciprocal squareroot of x by x:
SQRT(x) = x * RSQRT(x)
The instruction RSQRTSS or RSQRTPS gives the reciprocal squareroot with a precision of 12 bits. You can improve the precision to 23 bits by using the Newton-Raphson formula described in Intel's application note AP-803:
x0 = RSQRTSS(a)
x1 = 0.5 * x0 * (3 - (a * x0)) * x0)
where x0 is the first approximation to the reciprocal squareroot of a, and x1 is a better approximation. The order of evaluation is important. You must use this formula before multiplying with a to get the squareroot.