Date

2018 12:00 AM

Major

Computer Engineering

Department

Electrical and Computer Engineering

College

Engineering

Project Advisor

Thomas Daniels

Description

This study examines the possibility of using a neural network based system to translate compiled executable binaries back to a human readable format. The main goal of the project is to map out possible avenues of approach and pitfalls for successive research projects. As such, some limitations were placed on this project. The source is simple C programs, and attempts have been made to use existing programs and tools when possible. The system takes C source files, and creates tokens that can be understood by a neural network. The tokens are then used to teach the neural network how to translate between the tokens and the associated compiled binary. The neural network then attempts to translate new executables. The human code was drawn from simple programs written for analysis, and compiled using GCC. ANTLR4 was used to facilitate the parsing of the program, and TensorFlow was used to build the neural network. Many concerns and issues were documented by the effort. One major concern for future is the suitability of existing tools. Current machine translation techniques and tools are not applicable for decompiling machine code. New techniques and tools need to be developed to support further efforts.

File Format

application/pdf

Share

COinS
 
Jan 1st, 12:00 AM

Decompilation using Neural Networks

This study examines the possibility of using a neural network based system to translate compiled executable binaries back to a human readable format. The main goal of the project is to map out possible avenues of approach and pitfalls for successive research projects. As such, some limitations were placed on this project. The source is simple C programs, and attempts have been made to use existing programs and tools when possible. The system takes C source files, and creates tokens that can be understood by a neural network. The tokens are then used to teach the neural network how to translate between the tokens and the associated compiled binary. The neural network then attempts to translate new executables. The human code was drawn from simple programs written for analysis, and compiled using GCC. ANTLR4 was used to facilitate the parsing of the program, and TensorFlow was used to build the neural network. Many concerns and issues were documented by the effort. One major concern for future is the suitability of existing tools. Current machine translation techniques and tools are not applicable for decompiling machine code. New techniques and tools need to be developed to support further efforts.