Don’t Trust, Verify: An Overview of Decentralized Inference
Say you want to run a large language model like Llama2–70B. A model this massive requires more than 140GB of memory, which means you can’t run the raw model on your home machine. What are your options? You might jump to a cloud…