Did you know? Toolchain stuff

Message

technosaurus · #1 Post by **technosaurus** » Fri 03 Feb 2017, 17:53

I'm starting this thread for random programming related things that really don't fit anywhere, but may be interesting. So here goes the first one.

Did you know that it is possible (for gcc, not clang) to compile 32bit x86 binaries with most of the benefits of x86_64 without using the problematic x32 ABI?

Note if you link with any libraries, they will need to have been compiled similarly.

-Os -m32 -mlong-double-64 -mfpmath=sse -mregparm=3 -msseregparm -fomit-frame-pointer -mrtd -freg-struct-return -mpush-args -mno-accumulate-outgoing-args -fomit-frame-pointer

Here is a simple example of floating point improvements from gcc.godbolt.org

Code: Select all

float squaref(float n){return n * n;}
double square(double n){return n * n;}
long double squarel(long double n){return n*n;}
int squarei(int a){return a*a;}
long long squareil(long x){return (long long)x*x;}

with -std=c99 -x c -Os -m32 -mlong-double-64 -mfpmath=sse -mregparm=3 -msseregparm -fomit-frame-pointer (some options omitted for brevity )

Code: Select all

squaref:
        mulss   %xmm0, %xmm0
        ret
square:
        mulsd   %xmm0, %xmm0
        ret
squarel:
        mulsd   %xmm0, %xmm0
        ret
squarei:
        imull   %eax, %eax
        ret
squareil:
        imull   %eax
        ret

with just -std=c99 -x c -Os -m32

Code: Select all

squaref:
        pushl   %ebp
        movl    %esp, %ebp
        pushl   %eax
        flds    8(%ebp)
        fmul    %st(0), %st
        fstps   -4(%ebp)
        flds    -4(%ebp)
        leave
        ret
square:
        pushl   %ebp
        movl    %esp, %ebp
        subl    $8, %esp
        fldl    8(%ebp)
        fmul    %st(0), %st
        fstpl   -8(%ebp)
        fldl    -8(%ebp)
        leave
        ret
squarel:
        pushl   %ebp
        movl    %esp, %ebp
        fldt    8(%ebp)
        popl    %ebp
        fmul    %st(0), %st
        ret
squarei:
        pushl   %ebp
        movl    %esp, %ebp
        movl    8(%ebp), %eax
        popl    %ebp
        imull   %eax, %eax
        ret
squareil:
        pushl   %ebp
        movl    %esp, %ebp
        movl    8(%ebp), %eax
        popl    %ebp
        imull   %eax
        ret

breakdown:

-Os -m32 -march=pentium-m -mtune=generic :
compile for size on x86 with instructions available to Pentium M but tune it for a generic CPU
Replace -march=pentium-m with pentium3m to avoid sse2 instructions, but any code using a double or long double instead of a float will be suboptimal (may want to use -mfpmath=both or add a nasty hack like -Ddouble=float)

-mregparm=3 -msseregparm -mno-fp-ret-in-387 -mfpmath=sse -freg-struct-return :
pass up to 3 integral values in registers (e{adc}x) as well as 3 floating points in sse registers (xmm*), use sse instructions for floating point math even for structs

-mpush-args -mno-accumulate-outgoing-args -mpreferred-stack-boundary=2 -fomit-frame-pointer -mrtd -mskip-rax-setup:
avoids some holdover prologue/epilogue code (usually unnecessary) stack manipulation thereby decreasing code size
(Note all called functions must have a prototype)

You may also want to use these in your CFLAGS:
-flto OR -ffunction-sections -fdata-sections (with -Wl,--gc-sections in LDFLAGS) :
These get rid of a lot of unused junk.

Unless doing a debug build I also use:

-g0 -fno-unwind-tables -fno-asynchronous-unwind-tables -feliminate-dwarf2-dups -fno-dwarf2-cfi-asm :
don't emit useless dwarf debugging stuff

-fno-ident :
don't emit compiler info

-fmerge-all-constants :
duplicate constants are stored in 1 place (IIRC can be problematic with loadable modules though)

-fweb :
sometimes helps with optimization of larger programs

-ffast-math -fshort-double -fsingle-precision-constant :
like -mlong-double-64 reduces code size at the cost of floating point precision and "standards" compliance - probably ok for an mp3 decoder, but not for calculating ballistic missile trajectories.

For c++ (CXXFLAGS)
-fno-exceptions -fno-rtti -fvtable-gc :
don't use exceptions, run time type info and remove unused virtual method tables

Note: with -msse, you should avoid math operations on doubles - use float
However gcc seems to be generating sse2 instructions anyhow

See also:
https://gcc.gnu.org/onlinedocs/gcc/Opti ... tions.html
https://gcc.gnu.org/onlinedocs/gcc/Code ... tions.html
https://gcc.gnu.org/onlinedocs/gcc/Debu ... tions.html
AND
https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html
OR one of these
https://gcc.gnu.org/onlinedocs/gcc/Subm ... tions.html

Random foot note: -mlong-double-64 was added for bionic libc in gcc-4.8 in case anyone would like to patch gcc 4.7 series (last C version):
https://gcc.gnu.org/ml/gcc-patches/2012 ... 01512.html

(old)Puppy Linux Discussion Forum

(old)Puppy Linux Discussion Forum

Did you know? Toolchain stuff

Did you know? Toolchain stuff