domingo, 2 de junho de 2013

Intercepting/Redirecting Library Calls in Linux

When using third-party applications, sometimes the performance or even the way as a given library behaves does not meet the user demands.
Instead of dwelling through the source code in case of being open-source, it is possible to redirect the call of a given library to a custom version.


Linux makes this very easy through the LD_PRELOAD environment variable. Pointing LD_PRELOAD to a shared library, causes it to be loaded before any other library, including the C runtime. To intercept a call from a given library the shared library pointed by LD_PRELOAD must override the implementation of that method call.


$ export LD_PRELOAD=/home/user/workspace/newLib/lib.so


Example:


Original method called by App X, located in libX.so:

 int sqrAndDiv(int val, int div){  
      return val * val / div;   
 }  

Now, you see that the implementation has a major flaw, it does not give an error if div == 0, therefore you want to create a better version of srtAndDiv:


 int sqrtAndDiv(int val, int div){  
      if(div == 0){  
           std::cerr << "Div by Zero" << std::endl;  
           abort();  
      }  
      return val * val / div;  
 }  

Compile it as a shared library, using -shared and -fPIC flags if using gcc/g++, set LD_PRELOAD pointing to the .so and when you run App X, it will print an error when dividing by zero and abort execution.


Finally, let's imagine that the library is well implemented and does what the user wants it to do, and the user just wants to measure time or do some other operation:

 int sqrtAndDiv(int val, int div){  
   //do stuff, (e.g. measuring execution time)  
   static int (*ofp)(int val, int div) = 0;  
   ofp = (int (*)(int val, int div)) dlsym(RTLD_NEXT, "sqrtAndDiv");  
   int ret = (*ofp)(val, div);  
   //do stuff  
   return ret;  
 }  


The dlsym function basically just calls the next library after the current one, with a function that corresponds to the prototype (flag RTLD_NEXT). If RTLD_DEFAULT was used it would call the first library with the function.


Regarding the overhead of redirecting the call, in current kernels, if there are zero arguments or they're passed by reference,  it's in the order of nanoseconds. If the arguments are passed by value, then there's the overhead of copying them.