How to look differently at NFTs

It feels like everyone talks (or since it is 2023 now -- talked) about NFTs and even worse, everyone is expected to have an opinion about them. Being a computer scientist, my friends, who are artists, of course also asked about mine. And even though I can explain to them what a Merkle tree is or how smart contracts are programmed to be NFTs, I do not have any additional insight into the space, beyond restating loudly voiced opinions of others.

… So I spent way too much time analyzing blockchain data to see what is actually going on there. In this article, I want to highlight one step of this process, the reverse engineering of a single smart contract. This might come in handy if you want to understand how your favorite NFT works and if it actually holds what is promised.

Smart contracts on the Ethereum blockchain operate on blockchain state, meaning that they can send, receive and own Ether (ETH), Ethereums native currency. They can also call other smart contracts, which enables them in particular to operate on derived assets, such as NFTs. Smart contracts may stand alone, or be part of a larger application, by interacting with other smart contracts and externally owned addresses. The behavior of smart contracts is entirely defined by their code and cannot be controlled or manipulated beyond the public, immutable implementation. People typically write Ethereum smart contracts in Solidity, which is, in the grand scheme of programming languages, fairly easy to write and read. What ends up on the Ethereum blockchain however is EVM bytecode. To give you an idea how this looks, this is a side by side comparison of a smart contract that stores a single number on the blockchain, written in solidity and compiled to bytecode. No worries, you don’t need to understand the output just yet.

When talking about a smart contract, typically what you start with is the blockchain address of that contract. Something like this: 0xC18360217D8F7Ab5e7c516566761Ea12Ce7F9D72. The easiest way to get an initial idea about a smart contract is to enter its address into the block explorer of the blockchain. In the case of the Ethereum blockchain, that is etherscan.io. Sometimes you are in luck, and source code is available on Etherscan. Here you can be pretty sure that what you are reading is what the smart contract actually does because Etherscan verifies on their end that the source code compiles to the EVM bytecode stored on the blockchain. Another place to store smart contract source code is sourcify.dev, however in my experience it is seldom used, even though it is the official and decentralized solution.

If you want to understand a smart contract, in order to make an informed buying decision, and no source code is available on a trusted platform, you should stop right here and not buy. Any transparent offering should have well documented source code available. But I personally have no intention to buy anything here, but am just curious. If you are too, here is what you can do to understand a smart contract anyway.

A good first thing to know is the names of the functions offered by the smart contract. Unfortunately, these are not plainly written in the smart contract, but only the first four bytes of their hash. To be specific, they are calculated like this: keccak256("transfer(address,unit256)")[:4]. This makes it impossible to know the method names for sure, but with the help of hash databases, like 4byte.directory, we can get some likely candidates.

Once you have an idea of the method names, it is often helpful to check how the contract has been used previously. Do the previous call on the smart contract, if any, match your assumptions about the method names? Or can you even find a better documented smart contract that is calling the smart contract you are interested in? In any case, interactions between smart contracts are important to understand.

Speaking of other smart contracts, it might be the case, that you can learn from similar smart contracts. Etherscan shows you smart contracts with exactly the same bytecode. It is worth checking if any of those have source code or interesting interactions. Also, if there is a potential method name that looks uncommon, putting it in a search engine oftentimes gives you source code that compiles to similar EVM bytecode. All these techniques can give you good hints of what to expect, but the truth of what the smart contract actually does lies in the EVM bytecode. To read EVM bytecode the first thing to do is take the bytes and disassemble it, for example using https://github.com/Arachnid/evmdis. You can look up the effect of any instruction here: https://ethervm.io/. If you want to dig deeper into reading EVM bytecode, understanding typical code constructs that the Solidity compiler produces, like jump tables, is helpful. There are also a few decompilers that aim to reconstruct a higher level language representation. One of them is panoramix. While these are certainly helpful, you should always be aware that they are not always correct.

While certainly not a hands-on tutorial, you got a first good overview into manually reverse engineering a smart contract. In addition to that, I was also interested in doing this automatically and finding out if it is it possible to automatically look at all smart contracts on the Ethereum blockchain and categorize them by different aspects, like: How many of them are NFTs? Are there tokens that artificially inflate their price? Or, what are the most popular smart contracts? I wrote about this here: https://wachter-space.de/2023/01/06/ethereum_analysis/. The article also details some more sophisticated ways to analyze smart contract interactions and finding similar smart contracts.