[angr] Angr performance

Yan zardus at gmail.com
Mon Jan 18 15:40:30 PST 2016


Hello,

The likely cause of the performance hit you're seeing is that you're
(presumably) running with self-modifying code support.

> Does the performance overhead mainly stem from IR translation at each
instruction
> (also with SYMBOLIC_INITIAL_VALUES flag) ?

With self-modifying code support enabled, code has to be retrieved from the
emulated state for translation, which slows everything down. The
translation itself is quite fast, but the retrieval is slow.

SYMBOLIC_INITIAL_VALUES simply states that when uninitialized memory is
read, new symbolic values are returned. I don't think it should come into
play here.

> TRACK_ACTION_HISTORY flag is not enabled by default. So taint tracking
> (DO_XXX, TRACK_MEMORY_XXX, TRACK_REGISTER_XXX) isn’t the major
> contributor of this overhead. Please comment.

I agree. With the tracking enabled, we see large overhead from copying the
taint around, but the per-block action tracking shouldn't impact
performance to such a noticeable degree.

> Is it TRACK_CONSTRAINT_ACTIONS that causes this overhead?

The effects of TRACK_CONSTRAINT_ACTIONS should result in a handful of list
appends per basic block, which shouldn't be noticeable. I'd be really
surprised if they'd even show up on a profiling test.


My suspicion (correct me if I'm wrong) is that this minute is taken up by
running code that is *not* generated, before the code that was generated is
actually run. As self-modifying code support exists right now, such code
will still be pulled from the emulated state. As a short term workaround, I
just implemented a block translation cache for you. If you upgrade to angr
4.6.1.18, there should be a "translation_cache" kwarg that you can pass to
Project. This will reuse translated blocks instead of pulling them out of
the emulated state (but, of course, could lead to using "stale" blocks if
the binary really modifies its own code instead of just generating new
code).

A medium-term solution for this would be to check if the code actually
needs to be pulled from the state. The long-term solution, that we'll get
to as soon as we have the manpower, is to integrate Unicorn Engine (
http://www.unicorn-engine.org/) into angr so that, when symbolic data isn't
involved, we can execute code by JITing it. With that, these sorts of
operations should speed up very significantly, but we just haven't had the
cycles to implement it yet...

There's also the possibility that there's some other slowdown that we're
not aware of. Are you able to share the binary you're analyzing and the
script you're using so that we could profile it?

- Yan

On Sun, Jan 17, 2016 at 3:04 PM, spark at trendmicro.com <spark at trendmicro.com>
wrote:

> My email bounced back. Retrying with the public mailing list.
>
> From: "Sean Park (RD-AU)" <spark at trendmicro.com>
> Date: Monday, 18 January 2016 10:02 am
> To: angr <angr at lists.cs.ucsb.edu>
> Subject: Angr performance
>
> Hi All,
>
> I know angr was not designed specifically for malware analysis. I’m just
> trying to figure out what angr component actually impacts the performance
> since I saw it takes about 1 min to XOR-decode 660bytes of data with no
> symbolic initial condition set at the entry point. In real life, malware
> unpacks tens or hundreds of kilobytes at runtime in multiple layers at
> arbitrary locations. So this performance overhead is difficult to tolerate
> from malware analysis standpoint.
>
>    - Does the performance overhead mainly stem from IR translation at
>    each instruction (also with SYMBOLIC_INITIAL_VALUES flag) ?
>    - TRACK_ACTION_HISTORY flag is not enabled by default. So taint
>    tracking (DO_XXX, TRACK_MEMORY_XXX, TRACK_REGISTER_XXX) isn’t the
>    major contributor of this overhead. Please comment.
>    - Is it TRACK_CONSTRAINT_ACTIONS that causes this overhead?
>
> It will be much appreciated if you enlighten me with the cause of
> performance overhead. I’m only trying to understand the problem.
>
> Sean
>
> From: "Sean Park (RD-AU)" <spark at trendmicro.com>
> Date: Wednesday, 13 January 2016 9:35 am
> To: Yan <zardus at gmail.com>, Fish Wang <fish at cs.ucsb.edu>
> Cc: "Sean Park (RD-AU)" <spark at trendmicro.com>, angr <
> angr at lists.cs.ucsb.edu>
> Subject: Re: [angr] CFG for self-modifying code
>
> Thanks for your comment, Yan.
>
> I am trying to create CFG for an arbitrary piece of malware or shellcode,
> in which case you wouldn’t know at which point the code will be unpacked or
> how many layers there are. I would go with Fish’s suggestion since that’s a
> more strategic approach. I will figure it out and let you know guys how I
> go.
>
> Cheers,
> Sean
>
> From: Yan <zardus at gmail.com>
> Date: Tuesday, 12 January 2016 1:54 pm
> To: Fish Wang <fish at cs.ucsb.edu>
> Cc: "Sean Park (RD-AU)" <spark at trendmicro.com>, angr <
> angr at lists.cs.ucsb.edu>
> Subject: Re: [angr] CFG for self-modifying code
>
> If you know when the shellcode is fully unpacked (if such a point exists),
> you can push the memory contents back into CLE (Andrew can probably give
> you some pointers on doing this) and then simply treat it as a separate
> program with a different entry point. It could be cool to have official API
> support for such an action, actually (if you want to get your hands dirty
> and send along a PR!).
>
> - Yan
>
> On Mon, Jan 11, 2016 at 6:51 PM, Fish Wang <fish at cs.ucsb.edu> wrote:
>
>> Hi Sean,
>>
>>
>>
>> CFG does not support self-modifying code right now (since it’s pure
>> static analysis). You might want to use symbolic execution in angr to
>> execute or dump all the shellcode being executed. With that information,
>> it’s very easy to show addresses, instructions, and even states of
>> everything along the path. If you want to generate a CFG for self-modifying
>> code, you really have to loyally simulate the execution, which is difficult
>> for a static analysis to do.
>>
>>
>>
>> We’ve done it for some CTF challenges (that has some simple unpacking or
>> self-modification mechanisms). They are not included in the angr-doc repo
>> though, sorry :-(
>>
>>
>>
>> Best,
>>
>> Fish
>>
>>
>>
>> *From:* angr [mailto:angr-bounces at lists.cs.ucsb.edu] *On Behalf Of *
>> spark at trendmicro.com
>> *Sent:* Monday, January 11, 2016 7:58 PM
>> *To:* angr at lists.cs.ucsb.edu
>> *Subject:* [angr] CFG for self-modifying code
>>
>>
>>
>> Hi people,
>>
>>
>>
>> I was trying to get CFG for a self-modifying shellcode. I used the
>> following code.
>>
>>
>>
>> project = angr.Project('shellcode.exe', support_selfmodifying_code=True,
>> load_options={'auto_load_libs':False})
>>
>> cfg = project.analyses.CFG(keep_state=True,
>> enable_symbolic_back_traversal=True)
>>
>>
>>
>> It appears angr creates a CFG for the original code instead of the
>> modified code. Is there any way to get a CFG by symbolically executing the
>> code? Any example code to do this showing address and disassembly for each
>> path will be much appreciated.
>>
>>
>>
>> Regards,
>>
>> Sean
>>
>>
>>
>> TREND MICRO EMAIL NOTICE
>>
>> The information contained in this email and any attachments is confidential
>>
>> and may be subject to copyright or other intellectual property protection.
>>
>> If you are not the intended recipient, you are not authorized to use or
>>
>> disclose this information, and we request that you notify us by reply mail or
>>
>> telephone and delete the original message from your mail system.
>>
>>
>>
>> _______________________________________________
>> angr mailing list
>> angr at lists.cs.ucsb.edu
>> https://lists.cs.ucsb.edu/mailman/listinfo/angr
>>
>>
> TREND MICRO EMAIL NOTICE
> The information contained in this email and any attachments is confidential
> and may be subject to copyright or other intellectual property protection.
> If you are not the intended recipient, you are not authorized to use or
> disclose this information, and we request that you notify us by reply mail or
> telephone and delete the original message from your mail system.
>
>
> _______________________________________________
> angr mailing list
> angr at lists.cs.ucsb.edu
> https://lists.cs.ucsb.edu/mailman/listinfo/angr
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.cs.ucsb.edu/pipermail/angr/attachments/20160118/e286e3b4/attachment.html>


More information about the angr mailing list