I found the difference between radare2 and xed-interface while doing some experiment
radare2 will output mov byte [rax], 0xb2
xed-interface will output mov byte ptr [rax], 0xb2
This might be a problem.
If you do not use bin2asm.py to generate the data, the assembly code you get elsewhere may not be noramlized and may have tiny difference.
Parse the assembly code and normalize them in asm2vec library may be a better solution?
Maybe use keystone and capstone to assemble and then disassemble to obtain a unified representation.
I found the difference between
radare2andxed-interfacewhile doing some experimentradare2will outputmov byte [rax], 0xb2xed-interfacewill outputmov byte ptr [rax], 0xb2This might be a problem.
If you do not use
bin2asm.pyto generate the data, the assembly code you get elsewhere may not be noramlized and may have tiny difference.Parse the assembly code and normalize them in
asm2veclibrary may be a better solution?Maybe use keystone and capstone to assemble and then disassemble to obtain a unified representation.