Elf file header

For discussions about programming, programming questions/advice, and projects that don't really have anything to do with Puppy.
Post Reply
Message
Author
User avatar
wibble
Posts: 76
Joined: Thu 11 Jul 2013, 03:48

Elf file header

#1 Post by wibble »

Bit obscure,

I am trying to write a scanner for compiled code. I know Elf functions in a similar way to PE header.

Looking at the compiled file from a simple helloworld.c program I notice there is a lot of padding in the file. I would postulate that much of this padding can be removed so I can get nice compact binary files.

a side project - What I want to do is scan an executable, check the format of the executable and look for weirdness in the ELF. Similar to a Pe header check for bizarre sparseness ect..

But how is it implemented in Puppy?

I am using the tweak hex editor to look at the files,

I found this article

http://www.linuxjournal.com/node/1060/print

readelf also is nice, been playing with that today.
Last edited by wibble on Tue 13 Aug 2013, 04:03, edited 1 time in total.

Ibidem
Posts: 549
Joined: Wed 26 May 2010, 03:31
Location: State of Jefferson

#2 Post by Ibidem »

Sounds like you want sstrip from
http://www.muppetlabs.com/~breadbox/sof ... ckers.html
or
https://github.com/BR903/ELFkickers

elf.h is the header that defines the ELF format.

User avatar
wibble
Posts: 76
Joined: Thu 11 Jul 2013, 03:48

#3 Post by wibble »

Thanks that looks like what I need 8)

further update...

Ok so I spent a fun evening encoding the current elf em_machine list.

Started to freak out thinking where i was going to get code samples for over 200+ device familys.. feeling dejected I thought well perhaps i can just tweak the elf header and get the machine to think its a file from a diffrent machine family...

loading up tweak I spot the em_machine number (03 for intel) and I change it to 02, lets make this a file from a SPARC...

Success, well partial. however I get this message..

Code: Select all

tiny.o: ELF 32-bit LSB relocatable, Intel 80386, version 1 (SYSV), not stripped
sh-3.00# tweak goat.o
sh-3.00# file goat.o
goat.o: ELF 32-bit LSB relocatable, SPARC - invalid byte order, version 1 (SYSV), not stripped
sh-3.00# tweak goat.o
sh-3.00# file goat.o
goat.o: ELF 32-bit LSB relocatable, AT&T WE32100 - invalid byte order, version 1 (SYSV), not stripped
What's up with the invalid byte order?

Thinking things through, it probably was a bit newbish to assume that I could fool file program to accept that this was a legit file from another machine type when the other header flags probably didnt make sence for the machine. (example, perhaps SPARC does not have 32bits? perhaps its a 64bit ect?) I don't know...

This presents a bit of a chore as I now need to get the documentation for the 200 odd processor/chip familys and figure out the elf file setup.

:cry: just assembling the documentation will take ages...

update...

As I suspected.

Code: Select all

sh-3.00# gcc -Wall tiny.c
sh-3.00# ./a.out ; echo $?
42
sh-3.00# wc -c a.out
7122 a.out
becomes...

Code: Select all

sh-3.00# ld -s tiny.o
sh-3.00# ./a.out ; echo $?
42
sh-3.00# wc -c a.out
240 a.out
:lol:

And that is without messing to much with the file. ok I lost portability with this method but its now a 29th of the size of the original file! hooray!

But once you start its hard to stop, optimization I have come to find is programming equivalent of crack.

Looking at the program, there is still a fair bit of fat on this... it can be reduced further i think... according to the tutorial its possible to get it to 45 bytes... or less than 158th of the original size.. nice.. Now i can see how embedded system programs can function on 4096 bytes of memory...


The program as you can see was very simple, just assigns value 42 (the ultimate answer!)

time to mess with the program some more.

Excellent tutorial!

http://www.muppetlabs.com/~breadbox/sof ... eensy.html

only thing i found was the comment is already removed in my version of gcc (all good.. no need to remove that like in the example.)

only problem I have so far is ...

Code: Select all

sh-3.00# gcc -s -nostdlib tiny.s
tiny.s: Assembler messages:
tiny.s:1: Error: no such instruction:
hmmm.... time to go further down the rabbit hole.

In fairness I can understand why the libraries in C are bloated like that, you need a lot of redundant functionality to provide the portability thats necessary for many things.

This is also only a trivial program, with a real application this would be a much more painful process for sure.

Code: Select all

sh-3.00# nasm -f bin -o a.out tiny.asm
sh-3.00# chmod +x a.out
sh-3.00# ./a.out ; echo $?
42
sh-3.00# wc -c a.out
91 a.out
91 bytes after rolling my own elf header...

nice size reduction from the 240 bytes. no real magic yet...

Like everything else in life the last mile is the hardest i guess...

Oh yes, from this point on, it was serious mangling of the file format. But surprisingly the current implementation of the linux kernel does not actually do a whole lot with the ELF data, you can use the portions that it does not care about to store bytes for other uses..

As I suspected from the beginning the padding could be used to store program code!

But in all honesty, 80% of the work was in getting that last 46 bytes. Its just not worth it.

But thing is getting to the 91bytes was not a massive slog that I thought it would be. with that kind of size optimization I would be well chuffed even with this.

That said it is extremely cool.

Code: Select all

sh-3.00# nasm -f bin -o a.out tiny.asm
sh-3.00# chmod +x a.out
sh-3.00# ./a.out ; echo $?
42
sh-3.00# wc -c a.out
45 a.out
45 BYTE executable....

But then I got to thinking... exit interrupt was pretty useful. I wonder what the other interrupts do...

Look up the file

"/usr/include/asm/unistd.h" 5L, 82C

what the dickens...

Code: Select all

# ifdef __i386__
#  include "unistd_32.h"
# else
#  include "unistd_64.h"
# endif
No big deal, looks like that tutorial was written before the advent of 64bit libraries for the assembler...

anyway even a simpleton like myself can see its just a straight choice between 64 and 32 bit libraries... lets go with the sane choice (as this is puppy) and go for unistd_32.h..

Code: Select all

#ifndef _ASM_X86_UNISTD_32_H
#define _ASM_X86_UNISTD_32_H

...

#define __NR_restart_syscall      0
#define __NR_exit                 1
#define __NR_fork                 2
#define __NR_read                 3
#define __NR_write                4
#define __NR_open                 5
#define __NR_close                6
#define __NR_waitpid              7
#define __NR_creat                8
#define __NR_link                 9
#define __NR_unlink              10
#define __NR_execve              11
#define __NR_chdir               12
#define __NR_time                13
#define __NR_mknod               14
#define __NR_chmod               15
#define __NR_lchown              16
#define __NR_break               17
#define __NR_oldstat             18
#define __NR_lseek               19
#define __NR_getpid              20
#define __NR_mount               21
#define __NR_umount              22
#define __NR_setuid              23
#define __NR_getuid              24
#define __NR_stime               25
#define __NR_ptrace              26
#define __NR_alarm               27
#define __NR_oldfstat            28
#define __NR_pause               29
#define __NR_utime               30
#define __NR_stty                31
#define __NR_gtty                32
#define __NR_access              33
#define __NR_nice                34
#define __NR_ftime               35
#define __NR_sync                36
#define __NR_kill                37
#define __NR_rename              38
#define __NR_mkdir               39
#define __NR_rmdir               40
#define __NR_dup                 41
#define __NR_pipe                42
...
Wait - thats not such a mind melt, these are system commands! hmmmmmmm this might not be so bad!

So here is a question, where can I find the other system command list for other systems? Perhaps I am being naive but I would assume that it follows a similar logic say for embedded devices ect?

As a simple exercise I would like to write a script to enable cross-platform compilation of the code.

All it would need is the table of interrupts, I pass what system I want to run the code on and it writes a asm file for me after checking what the interrupt code is on that particular platform.

User avatar
technosaurus
Posts: 4853
Joined: Mon 19 May 2008, 01:24
Location: Blue Springs, MO
Contact:

#4 Post by technosaurus »

You may find my libc.h useful here
http://murga-linux.com/puppy/viewtopic.php?t=80916

It has examples on how to do syscalls using the NR_

With the right compiler options+strip/sstrip it can build a statically linked elf executable in <300 bytes

I am currently splitting it into libc.c and libc.h with the object file weighing in at <10kb. Not all fxns are implemented and are only optimized for size (hopefully being statically linked and small enough to fit entirely in cache helps with speed.)
Check out my [url=https://github.com/technosaurus]github repositories[/url]. I may eventually get around to updating my [url=http://bashismal.blogspot.com]blogspot[/url].

User avatar
technosaurus
Posts: 4853
Joined: Mon 19 May 2008, 01:24
Location: Blue Springs, MO
Contact:

#5 Post by technosaurus »

You may find my libc.h useful here
http://murga-linux.com/puppy/viewtopic.php?t=80916

It has examples on how to do syscalls using the NR_

With the right compiler options+strip/sstrip it can build a statically linked elf executable in <300 bytes

I am currently splitting it into libc.c and libc.h with the object file weighing in at <10kb. Not all fxns are implemented and are only optimized for size (hopefully being statically linked and small enough to fit entirely in cache helps with speed.)
Check out my [url=https://github.com/technosaurus]github repositories[/url]. I may eventually get around to updating my [url=http://bashismal.blogspot.com]blogspot[/url].

Post Reply