Puppy Linux Discussion Forum Forum Index Puppy Linux Discussion Forum
Puppy HOME page : puppylinux.com
"THE" alternative forum : puppylinux.info
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

The time now is Mon 16 Oct 2017, 18:24
All times are UTC - 4
 Forum index » Off-Topic Area » Programming
unsorted C snippets for small/fast static apps
Post new topic   Reply to topic View previous topic :: View next topic
Page 3 of 4 [59 Posts]   Goto page: Previous 1, 2, 3, 4 Next
Author Message
stevenhoneyman1

Joined: 28 Sep 2014
Posts: 4

PostPosted: Sun 28 Sep 2014, 07:59    Post subject:  

technosaurus wrote:
Its a bit disorganized but I added a ton of stuff to test
Edit: ok, really disorganized, I will do a bunch of cleanup and testing before the next version and try to complete the string functions and will be in a new thread.

The primary purpose is to allow for smaller overheads in static binaries so that multicall binaries are not needed so much. To finish this out, I need to make a tool chain that symlinks all of the stdinc files to libc.h and wraps cc with nostdlib nostdinc and fnobuiltin and some optimizations.

If anyone uses it and runs into missing structs or defined constants, please post them, (figuring out the basic types used in structs can take a lot of time) which also reminds me, I should define these to the basic types as I find them (rather than typedeffing them) ... Of course that means you should do development against a "real" c library first for sanity checks (any noted differences will help). Currently only targeting x86, but will consider basic arm support (64 bit versions wouldnt make sense, but feel free to fork )

note, linux 3.10 is lts so it will be the basis, thus sycalls may not be available in older kernels... most static binaries will work unless you try to use a newer syscall like finit_module on an older kernel


@technosaurus - I really like these "fake libc" macros, but was wondering (before I spend any more time using/testing the "0.1" version), have you made any updates since this was posted? I had a search but this thread is all that seems to appear.

Thanks Smile
Back to top
View user's profile Send private message 
technosaurus


Joined: 18 May 2008
Posts: 4745

PostPosted: Tue 25 Nov 2014, 14:39    Post subject:  

I am working on libc.h again. To be more exact, I am adding x86_64 and networking. My current approach to this is to replace the cumbersome gethostbyname, getaddrinfo and other bloated tools with a simple function that takes a host name, does a DNS query using a public DNS server and returns an IP address. Not only does it make it way smaller, but there is no need to set up a plethora of structs and buffers just to connect to a server.

I may patch the kernel to use it when dhcp is enabled if I can figure out where to wedge it in ... I recall seeing a function that parsed the x.x.x.x formatted strings to an IP address somewhere but can't find it now.

TIP: echo | gcc -E -dM |sort
will give you the gcc predefined macros for your platform (note that it changes when you add some compiler flags)

Here is my alternative method to get an IP address from a hostname:
It uses google's public DNS @ 8.8.8.8 which is a palindromic ip address, so it will work independent of endianness (arm, mips, x86)
Code:
#include <netinet/in.h>

uint32_t host2ip(char *host, uint32_t dns){
   unsigned char buf[4096]={0}, *bufp=buf,
      *hp=(unsigned char *)"\0\0" "\x01\0" "\0\x01" "\0\0" "\0\0" "\0\0";
   struct sockaddr_in dest = { //CN 0x72727272 RU 0x3E4C4C3E US2 0x08080808
      .sin_family=AF_INET, .sin_port=htons(53), .sin_addr.s_addr=dns
   };
   uint32_t i, j, ans, ip=0, destsz=sizeof(struct sockaddr_in);
   int   s=socket(AF_INET , SOCK_DGRAM , IPPROTO_UDP);
   if (s<0) goto IPV4END;
   for(i=0;i<12;i++) *bufp++=*hp++; //copy header
   i=j=0;
   do{ /* convert www.example.com to 3www7example3com */
      if(host[i]=='.' || !host[i]){ //could use strchrnul() here instead
         *bufp++ = i-j;
         for(;j<i;j++)
            *bufp++=host[j];
         ++j;
      }
   }while(host[i++]);
   *bufp++='\0';
   if (!(bufp-buf)&1) *bufp++='\0';
   *(bufp++)=0; *(bufp++)=1; *(bufp++)=0; *(bufp++)=1; //extra Q fields
   i=sendto(s, buf, bufp-buf, 0, (struct sockaddr*)&dest, destsz);
   if (i < 0) goto IPV4END;
   i=recvfrom(s,buf,sizeof(buf),0,(struct sockaddr*)&dest,(socklen_t*)&destsz);
   if (i < 0) goto IPV4END;
   for(i=0;i<buf[7];i++){ //[7] holds num of answers([6] does too but >256?)
      while(*bufp) ++bufp; //skip names
      ans=bufp[1]; //[1] holds the answer type ([0] does too, but >256???)
      bufp += 10;
      if(ans == 1){ uint32_t j=4; // ipv4 address
         unsigned char *ipp=(unsigned char *)&ip;
         while(j--) *ipp++=*bufp++;
         goto IPV4END;
      }else while(*bufp) ++bufp; //skip (alias) names
   }
IPV4END:
   close(s);
   return ip;
}

#ifdef TEST
#include <stdio.h> //printf ... adds ~16k on static musl builds
int main( int argc ,char **argv){
   if (argc < 2) return 1;
   in_addr_t ip=host2ip(argv[1],0x04020204);
   if (!ip){
      perror("host2ip");
      return 1;
   }
   printf("%d.%d.%d.%d\n",((unsigned char*)&ip)[0],((unsigned char*)&ip)[1],((unsigned char*)&ip)[2],((unsigned char*)&ip)[3]);
   return 0;
}
#endif

_________________
Web Programming - Pet Packaging 100 & 101
Back to top
View user's profile Send private message 
technosaurus


Joined: 18 May 2008
Posts: 4745

PostPosted: Sun 07 Dec 2014, 17:08    Post subject:  

screw using repeated calls to strcat, strcpy, etc... or bloated sprintf and friends here is a macro to do it all in one go:
Code:
#define strcpyall(buf, ...) do{ \
   char *bp=buf, *a[] = { __VA_ARGS__,NULL}, **ss=a, *s; \
   while(s=*ss++) while(*s)*bp++=*s++; *bp=0;
}while(0)

#include <stdio.h>

int main(int argc, char **argv){
   char buf[4096], *world=" world!\n";
   strcpyall(buf,"hello", world, "this", " ", "is", " ", "a", " ", "great", world);
   printf("%s", buf);
}


or a slightly slower but checked version that requires char[] instead of allowing char* or char[]

Code:
#define strcpyall_checked(buf, ...) do{ int l=sizeof(buf); \
   char *bp=buf, *a[] = { __VA_ARGS__,NULL}, **ss=a, *s; \
   while(s=*ss++) while(*s && --l)*bp++=*s++; *bp=0; \
}while(0)


Edit: Cleaned up to remove warnings.
Code:
#include <errno.h>
#define _PASTE(x,y) x##y
#define PASTE(x,y) _PASTE(x,y)

#ifdef __cplusplus
#define ASSERT_ARRAY(x) do { \
   const int PASTE(x##_must_be_array_not_a_pointer_on_line_,__LINE__)=((void*)&(x)==&(x)[0]); \
   typedef struct{ \
      int a :PASTE(x##_must_be_array_not_a_pointer_on_line_,__LINE__); \
   }a; \
}while(0)
#else
#define ASSERT_ARRAY(x) do{ \
(void)sizeof(struct { \
   int PASTE(x##_must_be_array_not_a_pointer_on_line_,__LINE__) : ((void*)&(x) == &(x)[0]); \
}); \
}while(0)
#endif

#define strcpyALL_CHECKED(buf,offset, ...) do{ \
   ASSERT_ARRAY(buf); \
   char *bp=buf+(size_t)offset; /* make it unsigned to prevent underrun */ \
    const char *s, *a[] = { __VA_ARGS__,NULL}, **ss=a; \
   while((s=*ss++)) \
      while((*s)&&(++offset<sizeof(buf)))*bp++=*s++; \
   if (offset<sizeof(buf)) \
      *bp=0; \
/*   else { /* or just leave it alone and check the offset vs sizeof(buf)? */ \
/*      offset=-1; */ \
/*      errno=ERANGE;*/ \
/*   }*/ \
}while(0)

#define strcpyALL_UNCHECKED(buf, ...) do{ \
   char *bp=buf; \
    const char *s, *a[] = { __VA_ARGS__,NULL}, **ss=a; \
   while((s=*ss++)) \
      while((*bp++=*s++))); \
}while(0)


_________________
Web Programming - Pet Packaging 100 & 101
Back to top
View user's profile Send private message 
technosaurus


Joined: 18 May 2008
Posts: 4745

PostPosted: Tue 23 Dec 2014, 01:16    Post subject:  

Here is a preview of some stuff that is cooking:

Code:

//unfortunately gcc has no builtin for stack pointer, so we use assembly
#ifdef __x86_64__ || defined __i386__
   #define STACK_POINTER "esp"
#elif defined __aarch64__
   #define STACK_POINTER "x13"
#elif defined __arm__
   #define  STACK_POINTER "r13"
#endif
char **environ;
int main();
void _start(void){
   register long *sp __asm__( STACK_POINTER );
//if you don't use argc, argv or envp/environ,  you can just remove them
   long argc = *sp;
   char **argv = (char **)(sp + 1);
   environ = (char **)(sp + argc + 1);
   exit(main(argc, argv, environ) );
   __builtin_unreachable(); //or for(;;); to shut up gcc
}


I have a condensed format for adding new architectures with most of the details in a tabular format:

Code:
#define ARCH_TEMPLATE stckptr,syscall,callnum,ret,arg1,arg2,arg3,arg4,arg5,arg6,arg7,"memory",...
#define ARCH_ALPHA sp,syscall,v0,v0,a0,a1,a2,a3,a4,a5,a6,"memory"
#define ARCH_ARM   r13,swi 0x0,r7,r0,r0,r1,r2,r3,r4,r5,r6,"memory"
#define ARCH_ARM64 x13,svc 0,x8,x0,x0,x1,x2,x3,x4,x5,0,"memory", \
   "x7","x9","x10","x11","x12","x13","x14","x15","x16","x17","x18"
#define ARCH_AVR32 ???,scall,
#define ARCH_BFIN  SP,excpt 0x0,P0,R0,R0,R1,R2,R3,R4,R5,0,"memory" //more clobs?
#define ARCH_CRIS  ??,break 13,r9,r9??,r10,r11,r12,r13,mof,srp,0,"memory"
#define ARCH_HPPA  %usp,ble 0x100(%sr2,%r0),%r20,%r28,%r26,%r25,%r24,%r23,%r22,%r21,0, \
   "memory","r1","r2","r20","r29","r31"
#define ARCH_IA64  ???,break 0x100000,r15,r10/r8,out0,out1,out2,out3,out4,out5,0,"memory"
#define ARCH_M68K  %sp,trap &0,%d0,%d0,%d1,%d2,%d3,%d4,%d5,%a0,"memory","%d0","%d1","%a0"
#define ARCH_MBLAZ ???,brki r14, 0x8,r12,r12??,r5,r6,r7,r8,r9,r10,0,"memory","r4"
#define ARCH_MIPS  $sp,syscall,$v0,$v0,$a0,$a1,$a2,$a3,$a4,$a5,$a6,"memory", \
   "$at","$t0","$t1","$t2","$t3","$t4","$t5","$t6","$t7","$t8","$t9","$hi","$lo"
#define ARCH_MIPS64 $sp,syscall,$v0,$v0,$a0,$a1,$a2,$a3,$a4,$a5,$a6,"memory"
   // ,"$at","$t0","$t1","$t2","$t3","$t4","$t5","$t6","$t7","$t8","$t9","$hi","$lo"
#define ARCH_OR1K  ???,l.sys 1,r11,r11,r3,r4,r5,r6,r7,r8,0,"memory","r12","r13","r15","r17","r19","r21","r23","r25","r27","r29","r31"
#define ARCH_PPC   ???,sc,r0,r0,r3,r4,r5,r6,,r7,r8,0,"memory","cr0","ctr","r8","r9","r10","r11","r12"
#define ARCH_PPC64 ???,sc,r0,r0,r3,r4,r5,r6,,r7,r8,0,"memory","cr0","ctr","r8","r9","r10","r11","r12"
#define ARCH_S390  ???,svc 0,r1,r2,r2,r3,r4,r5,r6,r7,0,"memory"
#define ARCH_SH    ???,trapa #,r3,r3??,r4,r5,r6,r7,r0,r1,"memory" //
#define ARCH_SPARC32 ???,t 0x10,g1,o0,o0,o1,o2,o3,o4,o5,0,"memory"
#define ARCH_SPARC64 ???,t 0x6d,g1,o0,o0,o1,o2,o3,o4,o5,0,"memory"
#define ARCH_X8664 esp,syscall,rax,rax,rdi,rsi,rdx,r10,r8,r9,0,"memory","rcx","r11"
#define ARCH_X86   esp,int $128,eax,eax,ebx,ecx,edx,esi,edi,ebp,0,"memory"
#define ARCH_XTNSA ???,syscall,a2,a2??,a6,a3,a4,a5,a8,a9,0,"memory"

This table is incomplete and some architectures may be completely wrong, It was started from info at http://man7.org/linux/man-pages/man2/syscall.2.html and various ABI descriptions. If you see something wrong, let me know.

At the moment I have stripped out all the internal C functions and am defining them to use builtins (including atomic and Cilk Plus) until someone finds a need for a separate function. So far I have only needed strstr and thus strncmp (but may be able to use __builtin_memcmp?)

Here is an example:
Code:
#ifdef __clang__
   #define HAS(...) __has_builtin(__VA_ARGS__)
#elif defined __GNUC__ //assume gcc ... (where the list came from)
   #define HAS(...) 1
#else
   #define HAS(...) 0
#endif
#if HAS(__builtin_abort)
   #define abort __builtin_abort
#endif
//....


the example downloader compiles to <2kb (stripped) on x86_64 with
gcc -Os -ffreestanding -nostartfiles -nostdlib -fno-asynchronous-unwind-tables -fomit-frame-pointer -mno-accumulate-outgoing-args -finline-small-functions -finline-functions-called-once -o get get.c -s -Wl,--gc-sections,--sort-common,-s -Wall -Wextra

Note: I still need to implement socketcall for x86 (and others) and map the related calls to it if the syscall is not defined
bqc.h.gz
Description  added mmx and basic x86 intrinsics since initial upload
gz

 Download 
Filename  bqc.h.gz 
Filesize  31.41 KB 
Downloaded  241 Time(s) 
get.c.gz
Description  simple downloader. usage: get host path
get www.puppylinux.com /index.html #note the space after the host
gz

 Download 
Filename  get.c.gz 
Filesize  1.17 KB 
Downloaded  249 Time(s) 

_________________
Web Programming - Pet Packaging 100 & 101
Back to top
View user's profile Send private message 
technosaurus


Joined: 18 May 2008
Posts: 4745

PostPosted: Fri 13 Mar 2015, 22:33    Post subject:  

compilers are generally bad at creating jump tables for switch() optimization if the case: contains any function.

this would probably get optimized:
Code:
void putstring(unsigned long i){
const char *s;
switch(i){
  case 0: s="zero";break;
  case 1: s="one";break;
  case 2: s="two";break;
  case 3: s="three";break;
  case 4: s="four";break;
  case 5: s="five";break;
  case 6: s="six";break;
  case 7: s="seven";break;
  case 8: s="eight";break;
  case 9: s="nine";break;
  default: s="error";
}
puts(s);
}

But if you were to replace the s=*; with puts(*); it will compile to the equivalent of a series of if-else statements instead of a jump table... some compilers will do this anyhow, but you can do the same thing a slightly different way that is optimized on all compilers just by using an array of const strings.

Here is a basic example:
Code:
enum{ZERO,ONE,TWO,THREE,FOUR,FIVE,SIX,SEVEN,EIGHT,NINE,LASTNUM};
const char *strings[]=
{"zero","one","two","three","four","five","six","seven","eight","nine","error"};
static inline void put_string(size_t x, size_t last){
   puts(strings[ (x < last) ? x : last]);
}
//put_string(ZERO, LASTNUM);

Its not too difficult to follow and can save a lot of code in the long run

_________________
Web Programming - Pet Packaging 100 & 101
Back to top
View user's profile Send private message 
Moose On The Loose


Joined: 24 Feb 2011
Posts: 773

PostPosted: Mon 16 Mar 2015, 10:19    Post subject:  

[quote="technosaurus"]
Code:

const char *strings[]=
{"zero","one","two","three","four","five","six","seven","eight","nine","error"};
static inline void put_string(size_t x, size_t last){
   puts(strings[ (x < last) ? x : last]);
}
//put_string(ZERO, LASTNUM);


Why the "size_t" for something that will be used as an array index?

BTW: If you need a huge number of strings, there is a way you can compress the strings at the cost of overhead in the display process and work at the compile time.

It is very common to have a bunch of messages with the same words in them. Messages also never have the characters 128 and above in them. You can dribble the string out with a putc() checking each character for being above 128 as you go. If you see a value above 128 you recur with (ThisCharacter-128+DICTIONARY_START)
Back to top
View user's profile Send private message 
technosaurus


Joined: 18 May 2008
Posts: 4745

PostPosted: Mon 16 Mar 2015, 14:50    Post subject:  

Yes, it seems odd, but if an index is not the equivalent of size_t, the compiler will add an extra MOV instruction to extend it.

Re string compression. I thought about using 0-X for run length encoding and 128-255 for dictionary entries.

_________________
Web Programming - Pet Packaging 100 & 101
Back to top
View user's profile Send private message 
technosaurus


Joined: 18 May 2008
Posts: 4745

PostPosted: Tue 17 Mar 2015, 23:19    Post subject:  

I wrote a macro that implements a buffered replacement for *printf() based on my strcpy_ALL code that allows (forces) you to do away with format strings altogether.

Code:
int write_chars(int fd, const char **a){
   char buf[4096]; /*alignas(PAGESIZE)?*/
   size_t offset=0;
   int ret=0;
   const char *s;
   while(s=*a++){
      while(*s){
         buf[offset++]=*s++;
         if (offset==sizeof(buf)){
            ret += write(fd,buf,offset);
            offset=0;
         }
      }
   }
   if (offset) ret+=write(fd,buf,offset);
   return ret;
}

#define FDPRINTF(fd,...) write_chars(fd,(const char *[]){__VA_ARGS__,NULL})
#define FPRINTF(fs,...) FDPRINTF(fileno(fs),__VA_ARGS__)
#define PRINTF(...) FDPRINTF(1,__VA_ARGS__)
#define EPRINTF(...) FDPRINTF(2,__VA_ARGS__)

So the format is significantly different from their lower case non-macro counterparts, but the same things can be accomplished.
Code:
printf("start: %d,%d end\n", 0xFFCF, 999);
PRINTF("start : ", itoa(0xFFCF), ",", itoa(999), " : end\n");

So it is really formatted more like C++ cout
...maybe I should rename it accordingly.

I'm working on a sprintf/snprintf replacement next ... not sure if I will be able to combine them or not yet.

_________________
Web Programming - Pet Packaging 100 & 101
Back to top
View user's profile Send private message 
technosaurus


Joined: 18 May 2008
Posts: 4745

PostPosted: Fri 29 May 2015, 20:27    Post subject:  

I was working on PDMP3 and found that pow(x,4.0f/3.0f) was considerably faster when converted to cbrt((x*x)*(x*x)), but then I tried to optimize the cbrt part and combined them as follow:

Code:
/* Description: returns x^(4/3)
 * same as cbrt((x*x)*(x*x)), but optimized for the limited cases we handle (integers 0-8209)
 */
static inline float pow43opt2(float x) {
  if (x<2) return x;
  else x*=x,x*=x; //pow(x,4)
  float f3,x2=x+x;
  union {float f; unsigned i;} u = {x};
  u.i = u.i/3 + 0x2a517d3c; //~cbrt(x)
  int accuracy_iterations=2;  //reduce for speed, increase for precision
  while (accuracy_iterations--){ //Lancaster iterations
    f3=u.f*u.f*u.f;
    u.f *= (f3 + x2) / (f3 + f3 + x);
  }
  return u.f;
}

This is roughly 50% faster than using musl's similar cbrtf() function or even gcc's __builtin_cbrtf() ... maybe because it doesn't deal with negative values and over 200% faster if accuracy_iterations=0.

_________________
Web Programming - Pet Packaging 100 & 101
Back to top
View user's profile Send private message 
technosaurus


Joined: 18 May 2008
Posts: 4745

PostPosted: Sat 30 May 2015, 19:01    Post subject:  

I took a look at some of the math functions and came up with a way to make some of the functions compile fast or small with the same code using taylor series approximation.

Code:
float inverse_factorial_f[]={
  0.0, 1.000000e+00,  5.000000e-01, 1.666667e-01,  4.166667e-02, 8.333333e-03, 1.388889e-03, 1.984127e-04,
};

float cosf(float x){
  float xx=-(x*x), term=1, res=1;
  int i, max=8;  //taylor series => 1-x^2/2!+x^4/4!-x^6/6!+x^8/8!...
  for (i=2;i<max;i+=2)
    res+=(term*=xx)*inverse_factorial_f[i];
  return res;
}

float sinf(float x){
  float xx=-(x*x), term=x, res=x;
  int i, max=8; //taylor series => x-x^3/3!+x^5/5!-x^7/7!+x^9/9!-...
  for (i=3;i<max;i+=2)
    res+=(term*=xx)*inverse_factorial_f[i];
  return res;
}

float atanf(float x){
  float xx=-(x*x), term=x, res=x;
  int i, max=8; //taylor series => x-x^3/3+x^5/5-x^7/7+x^9/9-...
  for (i=3;i<max;i+=2)
    res+=(term*=xx)/i;
  return res;
}

float expf(float x){
  float term=x, res=1+x;
  int i, max=10; //taylor series => 1+x+x^2/2!+x^3/3!+x^4/4!+x^5/5!...
  for (i=2;i<max;++i)
    res+=(term*=x)*inverse_factorial_f[i];
  return res;
}


_________________
Web Programming - Pet Packaging 100 & 101
Back to top
View user's profile Send private message 
Moose On The Loose


Joined: 24 Feb 2011
Posts: 773

PostPosted: Tue 02 Jun 2015, 21:06    Post subject:  

technosaurus wrote:
I took a look at some of the math functions and came up with a way to make some of the functions compile fast or small with the same code using taylor series approximation.


On a Pentiuuuuum, it is often faster to do a multiply or divide than to do a table look up. This is because you can get a cache miss on the first access to a table. If the table straddles a page boundary, you can get two misses.

Way back on a Z80, when coding a game I needed sin() and cos() very inaccurately. I observed that the first half cycle of sin() looks a lot like the shape of X(1-X) from 0 to 1 to work well enough to look reasonable.
Back to top
View user's profile Send private message 
technosaurus


Joined: 18 May 2008
Posts: 4745

PostPosted: Wed 03 Jun 2015, 01:09    Post subject:  

Moose On The Loose wrote:
On a Pentiuuuuum, it is often faster to do a multiply or divide than to do a table look up. This is because you can get a cache miss on the first access to a table. If the table straddles a page boundary, you can get two misses.
With -O3 these small loops and the lookup tables are unrolled/inlined, so that isn't a problem; with -Os the code is quite a bit smaller... Unlike many implementations that have a ton of compile time options to control which hand optimized implementation to use, I prefer to let the user choose which is more important without much effort. Often this is accomplished using a simplified implementation that the compiler can optimize as desired.

more optimized for speed memset

Code:
//unlike memset, returns next address after ... useful in memset
__attribute__ ((optimize("3"))) static inline void *mempset64(void *dest, unsigned long long x,unsigned long len){
   unsigned long long *dp=dest;
  while(len--)*dp++=x;
  return dp;
}

__attribute__ ((optimize("3"))) static inline void *mymempset(void *dest,  int x,unsigned long len){
  unsigned char *dp = dest;
  while ((unsigned long)dp&7ULL)
    *dp++=x; //align to 8byte boundary
  len -= dp - (unsigned char *) dest;
  if (len>7)  dp = mempset64(dest,x*0x0101010101010101ULL,len>>3); //set 8 byte chunks
  len &= 7;
  while (len--)
    *dp++=x; //set remaining <8 bytes
  return dp;
}

__attribute__ ((optimize("3"))) static inline void *mymemset(void *dest,  int x,unsigned long len){
   (void)mymempset(dest,x,len);
   return dest;
}
this is for 64 bit arches: a memset32 call should probably use x|x<<8|x<<16|x<<24 instead of the magic multiply (except x should be unsigned)
_________________
Web Programming - Pet Packaging 100 & 101
Back to top
View user's profile Send private message 
technosaurus


Joined: 18 May 2008
Posts: 4745

PostPosted: Mon 10 Aug 2015, 08:10    Post subject:  

Next task, convert this proxy to use my get.c code.
http://www.murga-linux.com/puppy/viewtopic.php?p=671246

_________________
Web Programming - Pet Packaging 100 & 101
Back to top
View user's profile Send private message 
technosaurus


Joined: 18 May 2008
Posts: 4745

PostPosted: Wed 28 Oct 2015, 17:39    Post subject:  

gcc and clang reduce the following rotate right/left macros to a single instruction
Code:
#define ROL(x,y) (x<<y)|(x>>((sizeof(x)*CHAR_BIT) -y))
#define ROR(x,y) (x>>y)|(x<<((sizeof(x)*CHAR_BIT) -y))


here are some associated functions:
Code:
static inline unsigned char rolb(const unsigned char x,const unsigned char y){
  return ROL(x,y);
}
static inline unsigned char rorb(const unsigned char x,const unsigned char y){
  return ROR(x,y);
}

static inline unsigned short rolw(const unsigned short x,const unsigned char y){
  return ROL(x,y);
}
static inline unsigned short rorw(const unsigned short x,const unsigned char y){
  return ROR(x,y);
}

static inline unsigned roll(const unsigned x,const unsigned char y){
  return ROL(x,y);
}

static inline unsigned rorl(const unsigned x,const unsigned char y){
  return ROR(x,y);
}

static inline unsigned long long rolll(const unsigned long long x,const unsigned char y){
  return ROL(x,y);
}
static inline unsigned long long rorll(const unsigned long long x,const unsigned char y){
  return ROR(x,y);
}


I also rewrote most of the ctype functions (all except the wide char functions) to be branchless... its better for compiler optimizations especially for vectorizing code to simd when things like -mavx2 or non-standard x86_64 instruction sets are enabled

Code:
static inline int isalnum(int c){
   return ((unsigned)c-'0' < 10)|(((unsigned)c|32)-'a' < 26);
}

static inline int isalpha(int c){
   return (((unsigned)c|32)-'a' < 26);
}

static inline int isascii(int c){
   return (unsigned)c<128;
}

static inline int isblank(int c){
   return (c==' ')|(c=='\t');
}

static inline int iscntrl(int c){
   return ((unsigned)c < 0x20) | (c == 0x7f);
}

static inline int isdigit(int c){
   return (unsigned)c-'0' < 10;
}

static inline int isgraph(int c){
   return (unsigned)c-0x21 < 0x5e;
}

static inline int islower(int c){
   return (unsigned)c-'a' < 26;
}

static inline int isprint(int c){
   return (unsigned)c-0x20 < 0x5f;
}

static inline int ispunct(int c){
   return ((unsigned)c-0x21 < 0x5e) & //isgraph
   !(((unsigned)c-'0' < 10)|(((unsigned)c|32)-'a' < 26)); //!isalnum

}

static inline int isspace(int c){
   return ((unsigned)c-'\t' < 5)|(c == ' ');
}

static inline int isupper(int c){
   return (unsigned)c-'A' < 26;
}

static inline int isxdigit(int c){
   return ((unsigned)c-'0' < 10) | (((unsigned)c|32)-'a' < 6);
}

static inline int tolower(int c){
   return c | ((isupper(c))<<5);
}

static inline int toupper(int c){
   return c & 0x5f & (-((unsigned)c-'a' < 26));
}


Sometimes you can't eliminate all of the branches, but you can minimize them. Take strncpy() for example, where all elements before the null terminator are copied and the rest up to "n" are '\0'.

Code:
char *mystrncpy(char * restrict dest, const char * restrict src, size_t n){
  char * restrict dp=dest;
  if (n) do {
    *dp++=*src;
    src+=!!*src; //only increment src pointer till the '\0' is reached
  } while (--n);
  return dest;
}

_________________
Web Programming - Pet Packaging 100 & 101
Back to top
View user's profile Send private message 
technosaurus


Joined: 18 May 2008
Posts: 4745

PostPosted: Sat 12 Dec 2015, 13:39    Post subject: Moved to github  

I am moving development of libc.h to github and renaming it to
Brad's Quixotic C

_________________
Web Programming - Pet Packaging 100 & 101
Back to top
View user's profile Send private message 
Display posts from previous:   Sort by:   
Page 3 of 4 [59 Posts]   Goto page: Previous 1, 2, 3, 4 Next
Post new topic   Reply to topic View previous topic :: View next topic
 Forum index » Off-Topic Area » Programming
Jump to:  

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Powered by phpBB © 2001, 2005 phpBB Group
[ Time: 0.0886s ][ Queries: 14 (0.0059s) ][ GZIP on ]