Since the original paper seems to see this as a multi-label classification problem while learning embedding. For example, mov rbp, rsp will be split to 3 tokens mov, rbp, rsp. And we try to increase the corresponding classifier output value of these 3 tokens to be higher. The problem is that these 3 tokens share only one classifier. But we already know that the assembly code will only be split to maximum 3 part. push rbp can be split to push, rbp, <empty>. ret can be split to ret, <empty>, <empty>. We can use 3 classifiers to classify these 3 slots and treat it as a normal multi-category classification problem. The network may learn better. Just a thought.
Since the original paper seems to see this as a multi-label classification problem while learning embedding. For example,
mov rbp, rspwill be split to 3 tokensmov,rbp,rsp. And we try to increase the corresponding classifier output value of these 3 tokens to be higher. The problem is that these 3 tokens share only one classifier. But we already know that the assembly code will only be split to maximum 3 part.push rbpcan be split topush,rbp,<empty>.retcan be split toret,<empty>,<empty>. We can use 3 classifiers to classify these 3 slots and treat it as a normal multi-category classification problem. The network may learn better. Just a thought.