For those new to the blog, I am the lead developer of Sia, a blockchain based cloud storage platform. About a year ago, myself and some members of the Sia team started Obelisk, a cryptocurrency ASIC manufacturing company. Our first ASICs are going to ship in about 8 weeks, and our journey with Obelisk has given us a lot of insight into the world of cryptocurrency mining.
One of the reasons we started Obelisk was because we felt that coin devs in general had a very poor view into the mining world, and that the best way to understand it would be to get our hands dirty and bring a miner to market.
Since starting Obelisk, we’ve learned a lot about the mining space, as relevant to GPUs, to ASICs, to FPGAs, to ASIC resistance, mining farms, electricity, and to a whole host of other subjects that coin developers should be more aware of. We aren’t able to share everything that we know, but we’ve pulled together information on a set of key topics that I think will be helpful to cryptocurrency designers and other members of the cryptocurrency community.
We’ve been pessimistic on ASIC resistance for a long time, and our journey into the hardware world solidly confirmed our position. Hardware is extremely flexible. General purpose computational devices like CPUs, GPUs, and even DRAM all make substantial compromises to their true potential in order to be useful for general computation. For basic hardware development, most algorithms can see substantial optimization just by taking away all of that generality and focusing on one specific thing.
The vast majority of ASIC-resistant algorithms were designed by software engineers making assumptions about the limitations of custom hardware. These assumptions tend to be incorrect.
Equihash is perhaps the easiest target, as a lot of people were quite confident in the equihash algorithm, and we’ve been saying for close to a year that we know how to make very effective equihash ASICs.
The key is to make sorting memory. A lot of algorithm designers don’t seem to realize that in an ASIC, you can merge the computational and storage pieces of a chip. When a GPU does equihash computations, it has to go all the way out to off-chip memory, bring data to the computational cores, manipulate the data, and then send the altered data all the way back out to the off-chip memory.
For equihash, the manipulations that you need to make to the data are simple enough that you can just merge the memory and computation together, meaning that you can do most of your manipulating in-place, substantially reducing the amount of energy used to move data back and forth, and also substantially decreasing the amount of time between adjustments to the data. This greatly increases efficiency and speed.
Needless to say, we weren’t the least bit surprised when Bitmain released powerful ASICs for equihash. The Bitmain ASICs are actually substantially less performant (5x to 10x) than our own internal study suggested they would be. There could be many reasons for this, but overall we think that it’s pretty reasonable to assume that more powerful equihash ASICs will be released in the coming months.
We also had loose designs for ethash (Ethereum’s algorithm). Admittedly, ethash was not as easily amenable to ASICs as equihash, but as we’ve seen from products on the market today, you can still do well enough to obsolete GPUs. Ethash is by far the most ASIC resistant algorithm we’ve looked at, most of the others have shortcuts that are even more significant than the shortcuts you can take with equihash.
At the end of the day, you will always be able to create custom hardware that can outperform general purpose hardware. I can’t stress enough that everyone I’ve talked to in favor of ASIC resistance has consistently and substantially underestimated the flexibility that hardware engineers have to design around specific problems, even under a constrained budget. For any algorithm, there will always be a path that custom hardware engineers can take to beat out general purpose hardware. It’s a fundamental limitation of general purpose hardware.
A lot of people believe that computing is broken up into 3 categories: CPU, GPU, and ASIC. While those are the categories that are generally visible to the public, in the chip world there’s really only one type of chip: an ASIC. Internally, Nvidia, Intel, and other companies refer to their products as ASICs. The categories as known to the public are really a statement about how flexible the ASIC is.
I would like to use a 1 to 10 scale to measure flexibility. At one side, a ‘1’, we’ll put an Intel CPU. And at the other side, a ‘10’, we’ll put a bitcoin ASIC. Designers have the ability to create chips that fall anywhere on this scale. As you move from a ‘1’ to a ‘10’, you lose substantial flexibility, but gain substantial performance. You also decrease the amount of design and development effort required as you sacrifice flexibility. On this scale, a GPU is a ‘2’.
Generally speaking, we don’t see products developed that fall anywhere between a GPU and a fully inflexibile ASIC because typically by the time you’ve given up enough flexibility to move away from a GPU, you’ve only got a very specific application in mind and you are willing to sacrifice every last bit of flexibility to maximize performance. It’s also a lot less costly to design fully inflexible ASICs, which is another reason you don’t see too many things in the middle.
Two examples of products between a GPU and an ASIC would be the Baikal miners and the Google TPU. These are chips which can cover a flexible range of applications at performances which are substantially better than a GPU. The Baikal case specifically is interesting, because it’s good enough to obsolete GPUs for a large number of coins, all using the same basic chip. These chips appear to be flexible enough to follow hardforks as well.
The strategy of hardforking ASICs off of a network is going to lose potency the more it happens, because chip designers do have the ability to make chips that are flexible, anywhere from slightly flexible to highly flexible, with each piece of flexibility costing only a bit of performance. The Monero devs have committed to keeping the same general structure for the PoW algorithm, and because of that commitment we believe that you could make a Monero miner capable of surviving hard forks with less than a 5x hit to performance.
Equihash is an algorithm that has three parameters. Zcash mining happens with one particular choice for these parameters, and any naive hardfork from Zcash to drop ASICs would likely involve changing one or more of these parameters. We were able to come up with a basic architecture for equiahsh ASICs that would be able to successfully follow a hardfork that chose any set of parameters. Meaning, a basic hardfork tweaking the algorithm parameters would not be enough to disrupt our chip, a more fundamental change would be needed. Despite this flexibility, we believe our ASIC would be able to see massive speedups and efficiency gains over GPUs. We never found funding for the equihash ASICs, and as a result our designs ended up on the shelf.
The ultimate conclusion here once again wraps back to the capabilities of ASICs. I think there are a lot of people out there who do not realize that flexible ASICs are possible, and expected that routinely doing small hardforks to disrupt any ASICs on the network would be sufficient. It may be sufficient sometimes, but just as algorithms can attempt to be ASIC resistant, ASICs can attempt to be hardfork resistant, especially when the changes are more minor.