[angr] Angr performance

spark at trendmicro.com spark at trendmicro.com
Tue Jan 19 02:42:01 PST 2016


The attachment named shellcode.zip could not be scanned for viruses because it is a password protected file.
Hi Yan,

Find the attached executable (pw: infected). It’s a crafted EXE out of the real shellcode I’ve seen in the wild.  Check the following example code that breaks at the beginning of the unpacked code (0x401017). As you suspected, it takes nearly a minute to run the decryption routine before jumping to the beginning of the unpacked code. I will try ‘translation_cache’ and let you know how the code performs with it.

Sometimes malware code (and more often in shellcode) jumps into the middle of an instruction, which would render translation_cache mode unusable for arbitrary binary.

Thanks for the pointer to unicorn project. Seems like a nice candidate to help my project.


project = angr.Project('shellcode.exe', support_selfmodifying_code=True, load_options={'auto_load_libs':False})

def check(path):
    if path.state.ip.args[0] == 0x401017:
        return True
    else:
        return False

def BreakAtMemoryWrite():
    state = project.factory.entry_state()  #add_options=set(["TRACK_ACTION_HISTORY"]))
    pg = project.factory.path_group(state, immutable=False)

    pg.explore(find=check)
    #found = pg.found[0]

    print len(pg.found)

Regards,
Sean

From: Yan <zardus at gmail.com<mailto:zardus at gmail.com>>
Date: Tuesday, 19 January 2016 10:40 am
To: "Sean Park (RD-AU)" <spark at trendmicro.com<mailto:spark at trendmicro.com>>
Cc: "angr at lists.cs.ucsb.edu<mailto:angr at lists.cs.ucsb.edu>" <angr at lists.cs.ucsb.edu<mailto:angr at lists.cs.ucsb.edu>>
Subject: Re: [angr] Angr performance

Hello,

The likely cause of the performance hit you're seeing is that you're (presumably) running with self-modifying code support.

> Does the performance overhead mainly stem from IR translation at each instruction
> (also with SYMBOLIC_INITIAL_VALUES flag) ?

With self-modifying code support enabled, code has to be retrieved from the emulated state for translation, which slows everything down. The translation itself is quite fast, but the retrieval is slow.

SYMBOLIC_INITIAL_VALUES simply states that when uninitialized memory is read, new symbolic values are returned. I don't think it should come into play here.

> TRACK_ACTION_HISTORY flag is not enabled by default. So taint tracking
> (DO_XXX, TRACK_MEMORY_XXX, TRACK_REGISTER_XXX) isn’t the major
> contributor of this overhead. Please comment.

I agree. With the tracking enabled, we see large overhead from copying the taint around, but the per-block action tracking shouldn't impact performance to such a noticeable degree.

> Is it TRACK_CONSTRAINT_ACTIONS that causes this overhead?

The effects of TRACK_CONSTRAINT_ACTIONS should result in a handful of list appends per basic block, which shouldn't be noticeable. I'd be really surprised if they'd even show up on a profiling test.


My suspicion (correct me if I'm wrong) is that this minute is taken up by running code that is *not* generated, before the code that was generated is actually run. As self-modifying code support exists right now, such code will still be pulled from the emulated state. As a short term workaround, I just implemented a block translation cache for you. If you upgrade to angr 4.6.1.18, there should be a "translation_cache" kwarg that you can pass to Project. This will reuse translated blocks instead of pulling them out of the emulated state (but, of course, could lead to using "stale" blocks if the binary really modifies its own code instead of just generating new code).

A medium-term solution for this would be to check if the code actually needs to be pulled from the state. The long-term solution, that we'll get to as soon as we have the manpower, is to integrate Unicorn Engine (http://www.unicorn-engine.org/) into angr so that, when symbolic data isn't involved, we can execute code by JITing it. With that, these sorts of operations should speed up very significantly, but we just haven't had the cycles to implement it yet...

There's also the possibility that there's some other slowdown that we're not aware of. Are you able to share the binary you're analyzing and the script you're using so that we could profile it?

- Yan

On Sun, Jan 17, 2016 at 3:04 PM, spark at trendmicro.com<mailto:spark at trendmicro.com> <spark at trendmicro.com<mailto:spark at trendmicro.com>> wrote:
My email bounced back. Retrying with the public mailing list.

From: "Sean Park (RD-AU)" <spark at trendmicro.com<mailto:spark at trendmicro.com>>
Date: Monday, 18 January 2016 10:02 am
To: angr <angr at lists.cs.ucsb.edu<mailto:angr at lists.cs.ucsb.edu>>
Subject: Angr performance

Hi All,

I know angr was not designed specifically for malware analysis. I’m just trying to figure out what angr component actually impacts the performance since I saw it takes about 1 min to XOR-decode 660bytes of data with no symbolic initial condition set at the entry point. In real life, malware unpacks tens or hundreds of kilobytes at runtime in multiple layers at arbitrary locations. So this performance overhead is difficult to tolerate from malware analysis standpoint.

  *   Does the performance overhead mainly stem from IR translation at each instruction (also with SYMBOLIC_INITIAL_VALUES flag) ?
  *   TRACK_ACTION_HISTORY flag is not enabled by default. So taint tracking (DO_XXX, TRACK_MEMORY_XXX, TRACK_REGISTER_XXX) isn’t the major contributor of this overhead. Please comment.
  *   Is it TRACK_CONSTRAINT_ACTIONS that causes this overhead?

It will be much appreciated if you enlighten me with the cause of performance overhead. I’m only trying to understand the problem.

Sean

From: "Sean Park (RD-AU)" <spark at trendmicro.com<mailto:spark at trendmicro.com>>
Date: Wednesday, 13 January 2016 9:35 am
To: Yan <zardus at gmail.com<mailto:zardus at gmail.com>>, Fish Wang <fish at cs.ucsb.edu<mailto:fish at cs.ucsb.edu>>
Cc: "Sean Park (RD-AU)" <spark at trendmicro.com<mailto:spark at trendmicro.com>>, angr <angr at lists.cs.ucsb.edu<mailto:angr at lists.cs.ucsb.edu>>
Subject: Re: [angr] CFG for self-modifying code

Thanks for your comment, Yan.

I am trying to create CFG for an arbitrary piece of malware or shellcode, in which case you wouldn’t know at which point the code will be unpacked or how many layers there are. I would go with Fish’s suggestion since that’s a more strategic approach. I will figure it out and let you know guys how I go.

Cheers,
Sean

From: Yan <zardus at gmail.com<mailto:zardus at gmail.com>>
Date: Tuesday, 12 January 2016 1:54 pm
To: Fish Wang <fish at cs.ucsb.edu<mailto:fish at cs.ucsb.edu>>
Cc: "Sean Park (RD-AU)" <spark at trendmicro.com<mailto:spark at trendmicro.com>>, angr <angr at lists.cs.ucsb.edu<mailto:angr at lists.cs.ucsb.edu>>
Subject: Re: [angr] CFG for self-modifying code

If you know when the shellcode is fully unpacked (if such a point exists), you can push the memory contents back into CLE (Andrew can probably give you some pointers on doing this) and then simply treat it as a separate program with a different entry point. It could be cool to have official API support for such an action, actually (if you want to get your hands dirty and send along a PR!).

- Yan

On Mon, Jan 11, 2016 at 6:51 PM, Fish Wang <fish at cs.ucsb.edu<mailto:fish at cs.ucsb.edu>> wrote:
Hi Sean,

CFG does not support self-modifying code right now (since it’s pure static analysis). You might want to use symbolic execution in angr to execute or dump all the shellcode being executed. With that information, it’s very easy to show addresses, instructions, and even states of everything along the path. If you want to generate a CFG for self-modifying code, you really have to loyally simulate the execution, which is difficult for a static analysis to do.

We’ve done it for some CTF challenges (that has some simple unpacking or self-modification mechanisms). They are not included in the angr-doc repo though, sorry :-(

Best,
Fish

From: angr [mailto:angr-bounces at lists.cs.ucsb.edu<mailto:angr-bounces at lists.cs.ucsb.edu>] On Behalf Of spark at trendmicro.com<mailto:spark at trendmicro.com>
Sent: Monday, January 11, 2016 7:58 PM
To: angr at lists.cs.ucsb.edu<mailto:angr at lists.cs.ucsb.edu>
Subject: [angr] CFG for self-modifying code

Hi people,

I was trying to get CFG for a self-modifying shellcode. I used the following code.

project = angr.Project('shellcode.exe', support_selfmodifying_code=True, load_options={'auto_load_libs':False})
cfg = project.analyses.CFG(keep_state=True, enable_symbolic_back_traversal=True)

It appears angr creates a CFG for the original code instead of the modified code. Is there any way to get a CFG by symbolically executing the code? Any example code to do this showing address and disassembly for each path will be much appreciated.

Regards,
Sean



TREND MICRO EMAIL NOTICE

The information contained in this email and any attachments is confidential

and may be subject to copyright or other intellectual property protection.

If you are not the intended recipient, you are not authorized to use or

disclose this information, and we request that you notify us by reply mail or

telephone and delete the original message from your mail system.




_______________________________________________
angr mailing list
angr at lists.cs.ucsb.edu<mailto:angr at lists.cs.ucsb.edu>
https://lists.cs.ucsb.edu/mailman/listinfo/angr



TREND MICRO EMAIL NOTICE
The information contained in this email and any attachments is confidential
and may be subject to copyright or other intellectual property protection. If you are not the intended recipient, you are not authorized to use or
disclose this information, and we request that you notify us by reply mail or
telephone and delete the original message from your mail system.




_______________________________________________
angr mailing list
angr at lists.cs.ucsb.edu<mailto:angr at lists.cs.ucsb.edu>
https://lists.cs.ucsb.edu/mailman/listinfo/angr



<table class="TM_EMAIL_NOTICE"><tr><td><pre>
TREND MICRO EMAIL NOTICE
The information contained in this email and any attachments is confidential 
and may be subject to copyright or other intellectual property protection. 
If you are not the intended recipient, you are not authorized to use or 
disclose this information, and we request that you notify us by reply mail or
telephone and delete the original message from your mail system.
</pre></td></tr></table>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.cs.ucsb.edu/pipermail/angr/attachments/20160119/352c317d/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: shellcode.zip
Type: application/zip
Size: 1124 bytes
Desc: shellcode.zip
URL: <http://lists.cs.ucsb.edu/pipermail/angr/attachments/20160119/352c317d/attachment-0001.zip>


More information about the angr mailing list