[angr] DDG with library fuction

Fish Wang fish at cs.ucsb.edu
Wed Jan 20 14:05:50 PST 2016

Hi Ori,


First of all, sorry for the late response.


The DDG analysis only tracks dependencies between data that is produced at one place and consumed at another place. In other words, the data itself must be used or accessed by a statement/SimProcedure/etc.


Right now DDG relies on actions collected during a CFG recovery (where we are actually simulating the execution of all basic blocks in a “blanket execution” manner), and constructs a dependence graph. The reasons that you don’t see data dependencies between library calls (or SimProcedures) are two folds:


-          First, during the CFG recovery, we don’t really execute any SimProcedure for the sake of performance. Instead, we create a dummy stub (which does nothing) for the SimProcedure in our CFG – please read the CFG code (CFG._get_simrun() in cfg.py) to see what the logic is right now. If the execution of a SimProcedure is not simulated at all, it will not produce or consume any data, and the dependence is missing for sure. In order to resolve this problem, we should have some sort of “safe and fast” mode for SimProcedure simulation. For example, we can simulate a malloc(), but we don’t allocate more than 100 KB of memory, which might occur during our CFG recovery procedure as a result of the inaccurate simulation. This is not implemented, but I can totally see how useful it will be.


-          Second, fscanf() and scanf() may have special problems since it involves format string parsing. While our format string parsing function has a pretty good coverage of those formats, it does not cover all of them. If you see formats that are not supported by angr, you should either submit an issue to us or implement it by yourself. Also, symbolic format string is not supported at all at the moment.


Also, if the library is not loaded and SimProcedures are used, angr will not analyze library code at all. If the library is loaded and certain SimProcedures (those ones that you care about, fscanf() for instance) are disabled, angr will analyze the library code, and treat the library code as part of the binary. But you’ll notice that statically analyzing library code that is as complicated as scanf() is not a good idea. You are almost guaranteed to lose track of data dependence inside those complex library functions.





From: angr [mailto:angr-bounces at lists.cs.ucsb.edu] On Behalf Of Yan
Sent: Thursday, January 14, 2016 2:25 PM
To: ori marcovitch <marcovitch.ori at gmail.com>
Cc: angr <angr at lists.cs.ucsb.edu>
Subject: Re: [angr] DDG with library fuction


This is something only @fish can help with, when he has a moment :-)


On Wed, Jan 13, 2016 at 6:25 AM, ori marcovitch <marcovitch.ori at gmail.com <mailto:marcovitch.ori at gmail.com> > wrote:

Hi all,
I'm trying to do some analysis which tracks data dependencies and i tried to use the DDG analysis, which is amazing but when don't tracks data dependencies from libraries.
I assume that happens because the library code isn't analyzed, but analyzing the library code isn't feasible because it's <stdio.h>...

Can anyone help me to figure out how to track such dependencies?
Specifically I'm trying to track dependencies from fscanf anf scanf.

Tahnk you very much,


angr mailing list
angr at lists.cs.ucsb.edu <mailto:angr at lists.cs.ucsb.edu> 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.cs.ucsb.edu/pipermail/angr/attachments/20160120/81782d17/attachment.html>

More information about the angr mailing list