Puppy Linux Discussion Forum Forum Index Puppy Linux Discussion Forum
Puppy HOME page : puppylinux.com
"THE" alternative forum : puppylinux.info
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

The time now is Mon 16 Oct 2017, 18:22
All times are UTC - 4
 Forum index » Off-Topic Area » Programming
unsorted C snippets for small/fast static apps
Post new topic   Reply to topic View previous topic :: View next topic
Page 4 of 4 [59 Posts]   Goto page: Previous 1, 2, 3, 4
Author Message
technosaurus


Joined: 18 May 2008
Posts: 4745

PostPosted: Sat 09 Jan 2016, 16:39    Post subject:  

I recently removed most of the __builtin_*(...) wrappers because there is no standardize way to check for them (yet another thing to suggest to the C standards board ... Clang's has_builtin() would be a good standard) I do plan on putting them back in, but I wanted to have a fallback for unsupported browsers as well as older versions of compilers like gcc-4.2.1

In case anyone else wants to do something similar, this is how to grok 90% of them.
1. Create a wrapper around the __builtin_*(...)
Code:
v4hi pmulhrw(v4hi a, v4hi b){return __builtin_ia32_pmulhrw(a,b);}

2. Compile it with -S to get the assembly output (or use gcc.godbolt.org)
Code:
pmulhrw:
        movdq2q %xmm1, %mm0
        movdq2q %xmm0, %mm1
        pmulhrw %mm0, %mm1
        movq2dq %mm1, %xmm0
        ret

3. Grok the assembly for the appropriate line(s) of code into inline asm(it helps to know the platform's calling convention, so you can tell which line are just to move the input parameters and returns)
For this case it is really just:
Code:
         pmulhrw %mm0, %mm1
Which becomes this inline asm:
Code:

v4hi __not_builtin_pmulhrw(v4hi a, v4hi b){__asm("pmulhrw %1, %0":"+y"(a):"y"(b));return a;}

Note the registers get replaced with %0 and %1, those are the parameter numbers in order and that instead of using "r" for a general purpose register, I used "y" for an mmx register according to https://gcc.gnu.org/onlinedocs/gcc-5.3.0/gcc/Machine-Constraints.html

_________________
Web Programming - Pet Packaging 100 & 101
Back to top
View user's profile Send private message 
technosaurus


Joined: 18 May 2008
Posts: 4745

PostPosted: Thu 28 Jan 2016, 05:59    Post subject:  

I have had a few projects where I needed to share code between C and javascript. Rather than having to update 2 separate files or run a build process to generate both, I came up with some hacks to allow the code to be valid in both:

http://stackoverflow.com/a/35012334/1162141

Code:

/* C comment ends with the '/' on next line but js comment is open  *\
/ //BEGIN C Block
#define function int
/* This ends the original js comment, but we add an opening '/*' for C  */

/*Most compilers can build K&R style C with parameters like this:*/
function volume(x,y,z)/**\
/int x,y,z;/**/
{
  return x*y*z;
}

/**\
/
#undef function
#define var const char**
#define new (const char*[])
#define Array(...)  {__VA_ARGS__}
/**/

var cars = new Array("Ford", "Chevy", "Dodge");

/* Or a more readable version *\
/// BEGIN C Block
#undef var
#undef new
/* END C Block */


You can do something similar for Java by using the "??/" triglyph for the '\'
and setting up some macros and structs with function pointers as they did here.

_________________
Web Programming - Pet Packaging 100 & 101
Back to top
View user's profile Send private message 
technosaurus


Joined: 18 May 2008
Posts: 4745

PostPosted: Sat 13 Feb 2016, 21:51    Post subject:  

musl libc uses some funky #include +macro hackery to map enums to strings for strerror, and although it is pretty clever, its not quite obvious what it is doing since the data is in a separate file, so here is the simplified version:

Code:
#define TAG_MAP { \
   _MAP(TAG_BODY,"body"), \
   _MAP(TAG_HEAD,"head"), \
   _MAP(TAG_HTML,"html"), \
   _MAP(TAG_UNKNOWN,"unknown"), \
}

#define _MAP(x,y) x
enum tags TAG_MAP;
#undef _MAP
#define _MAP(x,y) y
const char *tagstrings[] = TAG_MAP;
#undef _MAP
//usage: printf("%s\n",tagstrings[TAG_HTML]);


This could be extended for any amount of tabular data

_________________
Web Programming - Pet Packaging 100 & 101
Back to top
View user's profile Send private message 
technosaurus


Joined: 18 May 2008
Posts: 4745

PostPosted: Mon 14 Mar 2016, 04:15    Post subject:  

... and some more macro hackery

this allows you to reduce multiple 3-line #ifdefs to a single line or even inline them in your functions

Code:
#define PASTE_(x,y) x##y
#define PASTE(x,y) PASTE_(x,y)
#define ENABLED(...) __VA_ARGS__
#define DISABLED(...)
#define NOT_DISABLED ENABLED
#define NOT_ENABLED DISABLED
#define IF_ENABLED(x,...) x(__VA_ARGS__)
#define IF_NOT_ENABLED(x,...) PASTE(NOT_,x)(__VA_ARGS__)
example
Code:

#define PNG_SUPPORT ENABLED
#define JPG_SUPPORT DISABLED
void init(void){
  IF_ENABLED(PNG_SUPPORT, init_png();)
  IF_ENABLED(JPG_SUPPORT, init_jpg();)
  return;
}

int main(void){
   puts("supported types:\n"
      IF_ENABLED(PNG_SUPPORT,     "\tpng supported\n")
      IF_ENABLED(JPG_SUPPORT,     "\tjpeg supported\n")
      IF_NOT_ENABLED(JPG_SUPPORT, IF_NOT_ENABLED(PNG_SUPPORT, "\tnone supported\n"))
   );
}


vs. the traditional way

Code:
#define PNG_SUPPORT
#define JPG_SUPPORT
void init(void){
#ifdef PNG_SUPPORT
  init_png();
#endif
#ifdef JPG_SUPPORT
  init_jpg();
#endif
  return;
}

int main(void){
  puts("supported types:\n"
#ifdef PNG_SUPPORT
    "\tpng supported\n"
#endif
#ifdef JPG_SUPPORT
    "\tjpeg supported\n"
#endif
#if !defined(JPG_SUPPORT) && !defined(PNG_SUPPORT)
    "\tnone supported\n"
#endif   
  );
}
It works for multiple commands as well:
Code:
IF_ENABLED(PNG_SUPPORT,
int *getRGBfromPNG(void *buf, void *return_data){
  //etc...
})

_________________
Web Programming - Pet Packaging 100 & 101
Back to top
View user's profile Send private message 
Ibidem

Joined: 25 May 2010
Posts: 553
Location: State of Jefferson

PostPosted: Sun 20 Mar 2016, 18:59    Post subject:  

Well, I've been poking at bqc.
So far, I've implemented _socketcall() (looking at musl src/internal/syscall.h to figure out how) and almost all the socketcall wrappers.
I've also discovered a small (*cough*) problem.
With GCC 5.3.x (stock for Alpine Linux) on i386 and the standard flags (-nostdlib -nostartfiles), apparently the argc/argv initialization doesn't work; for example, if I run ./get google.com /index.html it thinks argc is 0.
(I hacked a debug line in to check that.)
Hardcoding the host/url seems to result in a 'working' binary.

Attaching a patch (git format-patch) to fix what I can figure out.
socketcall.patch.gz
Description  gzipped _socketcall() implementation, along with socketcall-based networking functions
gz

 Download 
Filename  socketcall.patch.gz 
Filesize  2.08 KB 
Downloaded  69 Time(s) 
Back to top
View user's profile Send private message 
technosaurus


Joined: 18 May 2008
Posts: 4745

PostPosted: Tue 29 Mar 2016, 15:29    Post subject:  

Ibidem wrote:
Well, I've been poking at bqc.
So far, I've implemented _socketcall() (looking at musl src/internal/syscall.h to figure out how) and almost all the socketcall wrappers.
I've also discovered a small (*cough*) problem.
With GCC 5.3.x (stock for Alpine Linux) on i386 and the standard flags (-nostdlib -nostartfiles), apparently the argc/argv initialization doesn't work; for example, if I run ./get google.com /index.html it thinks argc is 0.
(I hacked a debug line in to check that.)
Hardcoding the host/url seems to result in a 'working' binary.

Attaching a patch (git format-patch) to fix what I can figure out.

Thanks for the socketcall patch - it has been high on the todo list for a while, is the patch public-domain/any-license as the rest of the code?

Feel free to submit issues and pull requests on github if you are comfortable with it... its easier for me to keep track of.

As for the argc issue, I discovered (and documented?) that it needs optimization turned on or gcc will screw up the stack pointer in _start() before it can be used for argc/argv... I tried an alternative method using a dummy struct parameter to _start(), but it suffered similar problems. I also tried to declare an array of long on the stack and set argc using its address -1 and argv -2, but optimizations screwed with that too. That part of the code is _really_ hard to make fully "portable" because in their infinite wisdom, "they" decided to always pass the _start() parameters on the stack instead of the system's default calling convention, so you can't just do void _start(long argc, char **argv){...} and neither gcc or clang have a builtin way to get the stack pointer AFAIK (I looked)

_________________
Web Programming - Pet Packaging 100 & 101
Back to top
View user's profile Send private message 
technosaurus


Joined: 18 May 2008
Posts: 4745

PostPosted: Tue 29 Mar 2016, 17:27    Post subject:  

here is an update on my anti-ifdef macros to enable boolean logic

Code:
#define PASTE_(x,y) x##y
#define PASTE(x,y) PASTE_(x,y)
#define PASTE3_(x,y,z) x##y##z
#define PASTE3(x,y,z) PASTE3_(x,y,z)
#define Y(...) __VA_ARGS__
#define N(...)
#define IF(x) x //alternate method similar to IFNOT()

#define NOT_N Y
#define NOT_Y N
#define IF_NOT(x) PASTE(NOT_,x)
#define NOT(x) PASTE(NOT_,x)

#define N_OR_N N
#define N_OR_Y Y
#define Y_OR_N Y
#define Y_OR_Y Y
#define OR(x,y) PASTE3(x,_OR_,y)

#define N_AND_N N
#define N_AND_Y N
#define Y_AND_N N
#define Y_AND_Y Y
#define AND(x,y) PASTE3(x,_AND_,y)

#define N_XOR_N N
#define N_XOR_Y Y
#define Y_XOR_N Y
#define Y_XOR_Y N
#define XOR(x,y) PASTE3(x,_XOR_,y)

#define N_NOR_N Y
#define N_NOR_Y N
#define Y_NOR_N N
#define Y_NOR_Y N
#define NOR(x,y) PASTE3(x,_NOR_,y)

#define N_NAND_N Y
#define N_NAND_Y Y
#define Y_NAND_N Y
#define Y_NAND_Y N
#define NAND(x,y) PASTE3(x,_NAND_,y)

#define N_XNOR_N Y
#define N_XNOR_Y N
#define Y_XNOR_N N
#define Y_XNOR_Y Y
#define XNOR(x,y) PASTE3(x,_XNOR_,y)

#define IF2(x,y,z) PASTE3(x,y,z)

//HACK: #error requires its own line and _Pragma support is sketchy
#define ERROR(x) char PASTE(PASTE(ERROR_on_line__,__LINE__),PASTE(__XXX_,x))[-1];
NOTES:
This version uses Y and N instead of ENABLED and DISABLED, so it may have more naming conflicts
Usage:
//in your config.h
#define FOO Y
#define BAR N
#define BAZ Y

in your code
AND(FOO,BAR)(/*do stuff if both FOO and BAR are enabled*/)
or
IF2(FOO,_AND_,BAR)( /*do stuff if both FOO and BAR are enabled*/ )
... note the parenthesis instead of squiggly braces in both cases
They can also be combined:
Code:
OR(BAZ,AND(FOO,BAR))(
  /*do stuff if both FOO and BAR are enabled or BAZ is enabled*/
)

_________________
Web Programming - Pet Packaging 100 & 101
Back to top
View user's profile Send private message 
Ibidem

Joined: 25 May 2010
Posts: 553
Location: State of Jefferson

PostPosted: Mon 04 Apr 2016, 02:30    Post subject:  

technosaurus wrote:
Ibidem wrote:
Well, I've been poking at bqc.
So far, I've implemented _socketcall() (looking at musl src/internal/syscall.h to figure out how) and almost all the socketcall wrappers.
I've also discovered a small (*cough*) problem.
With GCC 5.3.x (stock for Alpine Linux) on i386 and the standard flags (-nostdlib -nostartfiles), apparently the argc/argv initialization doesn't work; for example, if I run ./get google.com /index.html it thinks argc is 0.
(I hacked a debug line in to check that.)
Hardcoding the host/url seems to result in a 'working' binary.

Attaching a patch (git format-patch) to fix what I can figure out.

Thanks for the socketcall patch - it has been high on the todo list for a while, is the patch public-domain/any-license as the rest of the code?

Feel free to submit issues and pull requests on github if you are comfortable with it... its easier for me to keep track of.

PD/any-license, yes.

Unfortunately, github.com doesn't play nicely with Links, which I'm currently using pretty much exclusively.
Quote:
As for the argc issue, I discovered (and documented?) that it needs optimization turned on or gcc will screw up the stack pointer in _start() before it can be used for argc/argv...

Yes, but none of -Os -O[0-3] work in this case, with or without frame-pointers.
I do realize that this is not something that *can* be done portably, but I can't figure out any way.
Quote:
I tried an alternative method using a dummy struct parameter to _start(), but it suffered similar problems. I also tried to declare an array of long on the stack and set argc using its address -1 and argv -2, but optimizations screwed with that too. That part of the code is _really_ hard to make fully "portable" because in their infinite wisdom, "they" decided to always pass the _start() parameters on the stack instead of the system's default calling convention, so you can't just do void _start(long argc, char **argv){...} and neither gcc or clang have a builtin way to get the stack pointer AFAIK (I looked)


By the way: can you add 'ushort' to the typedefs?
Back to top
View user's profile Send private message 
technosaurus


Joined: 18 May 2008
Posts: 4745

PostPosted: Fri 10 Jun 2016, 12:22    Post subject:  

Here is a really stupid trick that I figured out just to see how much I could abuse the preprocessor.

Make a file with the contents:
Code:
#if __COUNTER__ < MAX_COUNT
#include __FILE__
__FILE__
#endif

Then name the file what ever text you would like to repeat (could be one letter or a long string). You can make symlinks/hardlinks to this file for different strings.

Then in your C file you can do something like this:
Code:
static const char s[] =
#define MAX_COUNT 198 //don't exceed max inclusion depth
#include "0"
#undef MAX_COUNT
#define MAX_COUNT 256
#include "0"
#undef MAX_COUNT
;
Now you have a string with 256 '0's (if you named the file
"fizzbuzz" and included it instead, the string would be 8x longer)

A more appropriate use of this would be to unroll a loop MAX_COUNT times, but what fun is that - Bonus points to anyone who can implement a compile time fizzbuzz using this method.

_________________
Web Programming - Pet Packaging 100 & 101
Back to top
View user's profile Send private message 
technosaurus


Joined: 18 May 2008
Posts: 4745

PostPosted: Sat 04 Mar 2017, 11:59    Post subject:  

long division in binary
Code:
struct div_t{
  int quot;
  int rem;
};

struct div_t div(int dividend, int divisor){
    _Bool dividend_is_negative = (dividend < 0),
        divisor_is_negative = (divisor < 0),
        result_is_negative = divisor_is_negative ^ dividend_is_negative;
    unsigned quotient =0, shift, shifted;
    //if (dividend_is_negative) dividend = -dividend;
    divisor ^= -divisor_is_negative;
    divisor += divisor_is_negative;
    //if (dividend_is_negative) dividend = -dividend;
    dividend ^= -dividend_is_negative;
    dividend += dividend_is_negative;
    shifted = divisor;
    //shift divisor so its MSB is same as dividend's - minimize loops
    //if no builtin clz, then shift divisor until its >= dividend
    //such as: while (shifted<dividend) shifted<<=1;
    shift = __builtin_clz(divisor)-__builtin_clz(dividend);
    //clamp shift to 0 to avoid undefined behavior
    shift &= -(shift > 0);
    shifted<<=shift;
    do{
        unsigned tmp;
        quotient <<=1;
        tmp = (unsigned long) (shifted <= dividend);
        quotient |= tmp;
        //if (tmp) dividend -= shifted;
      dividend -= shifted & -tmp;
        shifted >>=1;
    }while (shifted >= divisor);
    //if (result_is_negative) quotient=-quotient, dividend=-dividend;
    quotient ^= -result_is_negative;
    dividend ^= -result_is_negative;
    quotient += result_is_negative;
    dividend += result_is_negative;     
    return (struct div_t){quotient, dividend};
}
Code:
since integer division is so slow, I wanted to see if bitops would/could be faster... its pretty close but not useful for platforms with a native instruction ... maybe for jcore j1 from [url]http://j-core.org[/url]

_________________
Web Programming - Pet Packaging 100 & 101
Back to top
View user's profile Send private message 
technosaurus


Joined: 18 May 2008
Posts: 4745

PostPosted: Sun 12 Mar 2017, 02:09    Post subject: sse2 string functions  

Code:
size_t strlen_sse2(const char *s){
  const __m128i *vp =((__m128i*)s)-4, all0 = (__m128i){0};
  __m128i v0,v1,v2,v3,v;
  do{
    vp+=4;
    v = v0 = _mm_cmpeq_epi8(_mm_loadu_si128(vp+0),all0);
    v|= v1 = _mm_cmpeq_epi8(_mm_loadu_si128(vp+1),all0);
    v|= v2 = _mm_cmpeq_epi8(_mm_loadu_si128(vp+2),all0);
    v|= v3 = _mm_cmpeq_epi8(_mm_loadu_si128(vp+3),all0);
  }while(!(_mm_movemask_epi8(v)));
  u64 m = (u64)_mm_movemask_epi8(v0) | ((u64)_mm_movemask_epi8(v1)<<16) |
    ((u64)_mm_movemask_epi8(v2)<<32) | ((u64)_mm_movemask_epi8(v3)<<48);
  return (char*)vp - s + __builtin_ctzll(m);
}

int strcmp_sse2(const char *s0, const char *s1){
  const __m128i *lp =(__m128i*)s0, *rp=(__m128i*)s1,
          all0 = (__m128i){0}, all1 = _mm_set1_epi8(~0);
  __m128i l, r, tmp;
  unsigned m=1;
  size_t i = 0;
  do{
    l = _mm_loadu_si128 (lp+i);
    r = _mm_loadu_si128 (rp+i);
    m =_mm_movemask_epi8(_mm_cmpeq_epi8(l,all0)|_mm_xor_si128(_mm_cmpeq_epi8(l,r),all1));
    ++i;
  }while(!m);
  return ((union{__m128i v;char c[16];})(l-r)).c[__builtin_ctz(m)];
}

int strcasecmp_sse2(const char *s0, const char *s1){
  __m128i *l =(__m128i*)s0, *r=(__m128i*)s1,
          all0 = (__m128i){0}, all1 = (__m128i){-1,-1},
          allA = _mm_set1_epi8('A'-1), allZ = _mm_set1_epi8('Z'+1),
          all32 = _mm_set1_epi8(1<<5), lcl, lcr, tmp;
  unsigned m;
  size_t i = 0;
  do{
    lcl = _mm_loadu_si128 (l+i);
    lcr = _mm_loadu_si128 (r+i);
    tmp = _mm_cmpeq_epi8(lcl,all0);
    lcl |= (_mm_cmpgt_epi8(lcl,allA) & _mm_cmplt_epi8(lcl,allZ) & all32);
    lcr |= (_mm_cmpgt_epi8(lcr,allA) & _mm_cmplt_epi8(lcr,allZ) & all32);
    tmp |= (_mm_cmpeq_epi8(lcl,lcr) ^ all1);
    ++i;
  }while(!(m=_mm_movemask_epi8(tmp)));
  return ((union{__m128i v;char c[16];})(lcl-lcr)).c[__builtin_ctz(m)];
}

_________________
Web Programming - Pet Packaging 100 & 101
Back to top
View user's profile Send private message 
Moose On The Loose


Joined: 24 Feb 2011
Posts: 773

PostPosted: Mon 13 Mar 2017, 21:47    Post subject:  

technosaurus wrote:
long division in binary
... code that works just fine ...


On a lot of machines, Booth's method written as nested while loops is a bit faster. You subtract without making the test at all. If the remainder goes negative, you drop out of the subtracting loop and into a loop that adds until the remainder goes positive again. It works better on machines where you have to subtract to compare.

On an 8051 like processor, you can do a very different sort of long hand divide using the processors multiply and a little cleverness to correctly guess the next "digit". The code only needs to do the outer loop a little more often than the number of bytes in the numbers. The subtracting away for one digit of the answer takes a loop or can be unrolled.
Back to top
View user's profile Send private message 
technosaurus


Joined: 18 May 2008
Posts: 4745

PostPosted: Wed 26 Apr 2017, 00:05    Post subject:  

This post is mainly so I can refer back to it, probably not of interest to anyone who isn't building a c library or implementing a high speed interpreted language.

In order to make my entry code (_start:) more portable and in preparation for adding setjmp/longjmp, I needed a way to get (and set) specific registers. It only took a few minutes to get it working in gcc. After a couple of days of beating clang with progressively larger and larger sticks, I have come up with these:

Code:
#define getreg(reg) ({ \
     register void *tmp __asm__ (reg); \
     __asm__ __volatile__("":"+r"(tmp)); \
     tmp; \
})

#define setreg(reg,val) do{ \
     register void *tmp __asm__ (reg) = val; \
     __asm__ __volatile__("":"+r"(tmp)); \
} while(0)

#define save_reg(reg, loc) do{ \
     register intptr_t tmp __asm__ (reg); \
     __asm__ __volatile__("":"+r"(tmp)); \
     *(intptr_t*)loc = tmp; \
} while(0)

#define restore_reg(reg, loc) do{ \
     register void *tmp __asm__ (reg) = *(long**)loc; \
     __asm__ __volatile__("":"+r"(tmp)); \
} while(0)


Now I can easily get the stack pointer using:
void *get_sp(void){return getreg("sp");}

And hopefully I can use save_reg() and restore_reg() along with computed gotos to implement setjmp/longjmp (yes, I know compiler documentation warns about using computed gotos this way, but that's why I needed to save/restore registers)

This will allow me to use X-macros to implement the typically assembly-language-only parts of C for the 40+ architectures supported by some version of linux (mainline, uclinux, various forks...).

_________________
Web Programming - Pet Packaging 100 & 101
Back to top
View user's profile Send private message 
technosaurus


Joined: 18 May 2008
Posts: 4745

PostPosted: Thu 04 May 2017, 02:05    Post subject:  

(almost) minimal, tested aes implementation

Code:
#include <stdint.h>
#if defined (_MSC_VER)
#define VEC(x) __declspec(intrin_type,align(16))
#else
#define VEC(x) __attribute__ ((__vector_size__ (16), __may_alias__))
#endif


#ifdef INITSBOX
static uint8_t sbox[256], inv_sbox[256];
#else
static const uint8_t sbox[256] = {
   0x63, 0x7c, 0x77, 0x7b, 0xf2, 0x6b, 0x6f, 0xc5,  0x30, 0x01, 0x67, 0x2b, 0xfe, 0xd7, 0xab, 0x76,
   0xca, 0x82, 0xc9, 0x7d, 0xfa, 0x59, 0x47, 0xf0,  0xad, 0xd4, 0xa2, 0xaf, 0x9c, 0xa4, 0x72, 0xc0,
   0xb7, 0xfd, 0x93, 0x26, 0x36, 0x3f, 0xf7, 0xcc,  0x34, 0xa5, 0xe5, 0xf1, 0x71, 0xd8, 0x31, 0x15,
   0x04, 0xc7, 0x23, 0xc3, 0x18, 0x96, 0x05, 0x9a,  0x07, 0x12, 0x80, 0xe2, 0xeb, 0x27, 0xb2, 0x75,
   0x09, 0x83, 0x2c, 0x1a, 0x1b, 0x6e, 0x5a, 0xa0,  0x52, 0x3b, 0xd6, 0xb3, 0x29, 0xe3, 0x2f, 0x84,
   0x53, 0xd1, 0x00, 0xed, 0x20, 0xfc, 0xb1, 0x5b,  0x6a, 0xcb, 0xbe, 0x39, 0x4a, 0x4c, 0x58, 0xcf,
   0xd0, 0xef, 0xaa, 0xfb, 0x43, 0x4d, 0x33, 0x85,  0x45, 0xf9, 0x02, 0x7f, 0x50, 0x3c, 0x9f, 0xa8,
   0x51, 0xa3, 0x40, 0x8f, 0x92, 0x9d, 0x38, 0xf5,  0xbc, 0xb6, 0xda, 0x21, 0x10, 0xff, 0xf3, 0xd2,
   0xcd, 0x0c, 0x13, 0xec, 0x5f, 0x97, 0x44, 0x17,  0xc4, 0xa7, 0x7e, 0x3d, 0x64, 0x5d, 0x19, 0x73,
   0x60, 0x81, 0x4f, 0xdc, 0x22, 0x2a, 0x90, 0x88,  0x46, 0xee, 0xb8, 0x14, 0xde, 0x5e, 0x0b, 0xdb,
   0xe0, 0x32, 0x3a, 0x0a, 0x49, 0x06, 0x24, 0x5c,  0xc2, 0xd3, 0xac, 0x62, 0x91, 0x95, 0xe4, 0x79,
   0xe7, 0xc8, 0x37, 0x6d, 0x8d, 0xd5, 0x4e, 0xa9,  0x6c, 0x56, 0xf4, 0xea, 0x65, 0x7a, 0xae, 0x08,
   0xba, 0x78, 0x25, 0x2e, 0x1c, 0xa6, 0xb4, 0xc6,  0xe8, 0xdd, 0x74, 0x1f, 0x4b, 0xbd, 0x8b, 0x8a,
   0x70, 0x3e, 0xb5, 0x66, 0x48, 0x03, 0xf6, 0x0e,  0x61, 0x35, 0x57, 0xb9, 0x86, 0xc1, 0x1d, 0x9e,
   0xe1, 0xf8, 0x98, 0x11, 0x69, 0xd9, 0x8e, 0x94,  0x9b, 0x1e, 0x87, 0xe9, 0xce, 0x55, 0x28, 0xdf,
   0x8c, 0xa1, 0x89, 0x0d, 0xbf, 0xe6, 0x42, 0x68,  0x41, 0x99, 0x2d, 0x0f, 0xb0, 0x54, 0xbb, 0x16
},inv_sbox[256] = {
/*       0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,  0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f*/
/*0x00*/ 0x52, 0x09, 0x6a, 0xd5, 0x30, 0x36, 0xa5, 0x38,  0xbf, 0x40, 0xa3, 0x9e, 0x81, 0xf3, 0xd7, 0xfb,
/*0x01*/ 0x7c, 0xe3, 0x39, 0x82, 0x9b, 0x2f, 0xff, 0x87,  0x34, 0x8e, 0x43, 0x44, 0xc4, 0xde, 0xe9, 0xcb,
/*0x02*/ 0x54, 0x7b, 0x94, 0x32, 0xa6, 0xc2, 0x23, 0x3d,  0xee, 0x4c, 0x95, 0x0b, 0x42, 0xfa, 0xc3, 0x4e,
/*0x03*/ 0x08, 0x2e, 0xa1, 0x66, 0x28, 0xd9, 0x24, 0xb2,  0x76, 0x5b, 0xa2, 0x49, 0x6d, 0x8b, 0xd1, 0x25,
/*0x04*/ 0x72, 0xf8, 0xf6, 0x64, 0x86, 0x68, 0x98, 0x16,  0xd4, 0xa4, 0x5c, 0xcc, 0x5d, 0x65, 0xb6, 0x92,
/*0x05*/ 0x6c, 0x70, 0x48, 0x50, 0xfd, 0xed, 0xb9, 0xda,  0x5e, 0x15, 0x46, 0x57, 0xa7, 0x8d, 0x9d, 0x84,
/*0x06*/ 0x90, 0xd8, 0xab, 0x00, 0x8c, 0xbc, 0xd3, 0x0a,  0xf7, 0xe4, 0x58, 0x05, 0xb8, 0xb3, 0x45, 0x06,
/*0x07*/ 0xd0, 0x2c, 0x1e, 0x8f, 0xca, 0x3f, 0x0f, 0x02,  0xc1, 0xaf, 0xbd, 0x03, 0x01, 0x13, 0x8a, 0x6b,
/*0x08*/ 0x3a, 0x91, 0x11, 0x41, 0x4f, 0x67, 0xdc, 0xea,  0x97, 0xf2, 0xcf, 0xce, 0xf0, 0xb4, 0xe6, 0x73,
/*0x09*/ 0x96, 0xac, 0x74, 0x22, 0xe7, 0xad, 0x35, 0x85,  0xe2, 0xf9, 0x37, 0xe8, 0x1c, 0x75, 0xdf, 0x6e,
/*0x0a*/ 0x47, 0xf1, 0x1a, 0x71, 0x1d, 0x29, 0xc5, 0x89,  0x6f, 0xb7, 0x62, 0x0e, 0xaa, 0x18, 0xbe, 0x1b,
/*0x0b*/ 0xfc, 0x56, 0x3e, 0x4b, 0xc6, 0xd2, 0x79, 0x20,  0x9a, 0xdb, 0xc0, 0xfe, 0x78, 0xcd, 0x5a, 0xf4,
/*0x0c*/ 0x1f, 0xdd, 0xa8, 0x33, 0x88, 0x07, 0xc7, 0x31,  0xb1, 0x12, 0x10, 0x59, 0x27, 0x80, 0xec, 0x5f,
/*0x0d*/ 0x60, 0x51, 0x7f, 0xa9, 0x19, 0xb5, 0x4a, 0x0d,  0x2d, 0xe5, 0x7a, 0x9f, 0x93, 0xc9, 0x9c, 0xef,
/*0x0e*/ 0xa0, 0xe0, 0x3b, 0x4d, 0xae, 0x2a, 0xf5, 0xb0,  0xc8, 0xeb, 0xbb, 0x3c, 0x83, 0x53, 0x99, 0x61,
/*0x0f*/ 0x17, 0x2b, 0x04, 0x7e, 0xba, 0x77, 0xd6, 0x26,  0xe1, 0x69, 0x14, 0x63, 0x55, 0x21, 0x0c, 0x7d
};
#endif
static const uint8_t sh [16] = {0,5,10,15,4,9,14,3,8,13,2,7,12,1,6,11}
,ish[16] = { 0,13,10,7, 4,1,14,11, 8,5,2,15, 12,9,6,3 }
,rcon [] = {0x01, 0x02, 0x04, 0x08,0x10, 0x20, 0x40, 0x80,0x1b, 0x36, 0x6c, 0xd8};


void aes128(void *data, const void *skey){
   union aes128  { uint8_t c[16]; uint8_t rc[4][4]; uint32_t i[4]; uint64_t VEC(16) m; }
      *state = (union aes128*)data, //reuse input memory since we write to it
      key = *(union aes128*)skey,   //but locally copy key (it get's reused)
      tmp; //need a tmp copy of state to avoid code complexity of ShiftRows
   for(unsigned long i=0;;i++){
      //combine SubBytes (sbox[]), AddRoundKey (^) and ShiftRows (sh[j])
      for(unsigned long j = 0; j < 16; ++j)
         tmp.c[j] = sbox[state->c[sh[j]] ^ key.c[sh[j]]];
      //ComputeRoundKey
      for (unsigned long j=1; j<=4; ++j) key.rc[0][j-1] ^= sbox[ key.rc[3][(j&3)] ];
      key.c[0] ^= rcon[i];
      //for (j=4;j<16;j++) key.c[j] ^= key.c[j-4]; //for slow uint32_t
      for (unsigned long j=1;j<4;j++) key.i[j] ^= key.i[j-1];
      if ( i == 9 ) break;
      //mix columns
      for(unsigned long j = 0; j < 4; ++j){
         uint8_t e = 0, s[4];
         for (unsigned long k=0;k<4;++k) e^=tmp.rc[j][k];
         for (unsigned long k=0;k<4;++k) s[k]=tmp.rc[j][((k+1)&3)];
         for (unsigned long k=0;k<4;++k){
            uint8_t t = tmp.rc[j][k]^s[k];
            state->rc[j][k] = tmp.rc[j][k]^e^((t<<1)^(0x1b &-(t>0x7f)));
         }
      }
   }
   //add final round key to tmp for output
   //for(i=0;i<4;i++) state->i[i]=tmp.i[i]^key.i[i];  //fast 32 bit
   //for(i=0;i<16;i++) state->c[i]=tmp.c[i]^key.c[i]; //for 8 bit only
   state->m = tmp.m ^ key.m; //smallest and fastest with simd
}


#define xtime(x) (((x) & 0x80) ? (((x) << 1) ^ 0x1b) : ((x)<<1))
void inv_aes(void *data, const void *skey){
   uint8_t state[16], key[11][16], *in = data;
   for(unsigned long i=0; i < 16; i++)
      key[0][i] = ((uint8_t *)skey)[i];
   for(unsigned long i = 1; i <= 10; i++) {    /*Generate Round Keys*/
      for (unsigned long j=1; j<=4; ++j)
         key[i][j-1] = key[i-1][j-1] ^ sbox[ key[i-1][12+(j&3)] ];
      key[i][0] ^= rcon[i-1];
      for (unsigned long j=4;j<16;j++)
         key[i][j] = key[i-1][j] ^ key[i][j-4];
   }
   for(unsigned long i=0; i < 16; i++) //addRoundKey(10)
      in[i] ^= key[10][i];
   for(unsigned long i = 0; i < 10; i++){ //do rounds
      if (i) for(unsigned long j = 0; j < 16; j+=4){ //inv_mixColumns();
         uint8_t e=0;
         for(unsigned long k=0;k<4;++k)
            e ^= state[j+k];
         for (unsigned long k=0;k<4;++k)
            in[j+k] = e ^ state[j+k] ^ xtime(state[j+k] ^ state[j+((k+1)&3)])
               ^ xtime(xtime(xtime(e) ^ state[j+k] ^ state[j+((k+2)&3)]) );
      }
      for(unsigned long j = 0; j < 16; j++) //inv_shiftRows+inv_subBytes+addRoundKey(9-i)
         state[j] = inv_sbox[ in[ish[j]] ] ^ key[9-i][j];
   }
   //for(unsigned long i=0;i<16;++i)in[i]=state[i];
   *(uint64_t VEC(16) *)in=*(uint64_t VEC(16) *)state;
}
#undef xtime


#ifdef INITSBOX
#define ROTL8(x,shift) ((uint8_t) ((x) << (shift)) | ((x) >> (8 - (shift))))
void init_aes(void) {
   uint8_t p = 1, q = 1;
      do {
      p = p ^ (p << 1) ^ (p & 0x80 ? 0x1B : 0);
      q ^= q << 1;
      q ^= q << 2;
      q ^= q << 4;
      q ^= q & 0x80 ? 0x09 : 0;
      uint8_t xformed = q ^ ROTL8(q, 1) ^ ROTL8(q, 2) ^ ROTL8(q, 3) ^ ROTL8(q, 4);
      sbox[p] = xformed ^ 0x63;
      inv_sbox[xformed ^ 0x63] = p;
   } while (p != 1);
   sbox[0] = 0x63;
   inv_sbox[0x63]=0;
}
#undef ROTL8
#else
#define init_aes(...)
#endif

#ifdef TESTAES
#include <stdio.h>
int main(){
   uint8_t key[16] = { 0x2b,0x7e,0x15,0x16,0x28,0xae,0xd2,0xa6, 0xab,0xf7,0x15,0x88,0x09,0xcf,0x4f,0x3c };
   uint8_t in[16] = { 0x6b,0xc1,0xbe,0xe2,0x2e,0x40,0x9f,0x96, 0xe9,0x3d,0x7e,0x11,0x73,0x93,0x17,0x2a };
   init_aes();
   aes128(in,key);
   size_t i;
   for (i=0;i<16;++i) printf("%02x",0xFF&(unsigned)in[i]);
   puts("");
   inv_aes(in,key);
   for (i=0;i<16;++i) printf("%02x",0xFF&(unsigned)in[i]);
   puts("");
   aes128(in,key);
   for (i=0;i<16;++i) printf("%02x",0xFF&(unsigned)in[i]);
   puts("");
/* Output should be:
3ad77bb40d7a3660a89ecaf32466ef97
6bc1bee22e409f96e93d7e117393172a
3ad77bb40d7a3660a89ecaf32466ef97
*/
}
#endif


I think I can reduce this a bit further, but its at a good stopping point.

minimal (as yet untested) crc32
Code:
uint32_t crc32( uint32_t crc, const uint8_t *ptr, size_t len){
    const uint32_t lut[] = {
        0x00000000, 0x1db71064, 0x3b6e20c8, 0x26d930ac,
        0x76dc4190, 0x6b6b51f4, 0x4db26158, 0x5005713c,
        0xedb88320, 0xf00f9344, 0xd6d6a3e8, 0xcb61b38c,
        0x9b64c2b0, 0x86d3d2d4, 0xa00ae278, 0xbdbdf21c
    };
    const uint8_t *end = ptr + len;
    for( ; ptr<end; ++ptr ) {
        crc = ( crc >> 4 ) ^ lut[ (crc & 0xf)^(*ptr & 0xf) ];
        crc = ( crc >> 4 ) ^ lut[ (crc & 0xf)^(*ptr >> 4) ];
    }
    return crc;
}

_________________
Web Programming - Pet Packaging 100 & 101
Back to top
View user's profile Send private message 
Display posts from previous:   Sort by:   
Page 4 of 4 [59 Posts]   Goto page: Previous 1, 2, 3, 4
Post new topic   Reply to topic View previous topic :: View next topic
 Forum index » Off-Topic Area » Programming
Jump to:  

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Powered by phpBB © 2001, 2005 phpBB Group
[ Time: 0.2371s ][ Queries: 15 (0.0187s) ][ GZIP on ]