Member-only story
Build your own mixture-of-experts model on google colab
With the rapid evolution of large language models (LLM)s, and the adoption of retrieval augmented generation (RAG) systems, this framework is used extensively in many use cases. However, when fine tuning these models is necessary, the computing requirement is a big limitation when access to large GPU is not affordable. In this case, an different approach, called mixture of experts (MoE) is taking popularity.
With the rapid evolution of large language models (LLM)s, and the adoption of retrieval augmented generation (RAG) systems, this framework is used extensively in many use cases. However, when fine tuning these models is necessary, the computing requirement is a big limitation when access to large GPU is not affordable. In this case, an different approach, called mixture of experts (MoE) is taking popularity.
What is a mixture-of-experts model?
The main idea of the MoE approach is that instead of having a foundation model that is supposed to be very good at many tasks (generalist), we could have many smaller “experts” layers at particular tasks (specialists). The trick here is that in addition to the experts, we need also a gating function that decides which of the experts should be assigned to the task (input) at hand so that only a portion of the network is active at a time. This approach was termed sparsely-gated mixture of experts (MoE). This concept is not new, it has been around for a while. In 1991, Jacobs et al [1], proposed a…