Serving MoE Models
Introduction
This guide explores the steps to serve Mixture of Experts (MoE) models such as Mixtral 8x7B using Friendli Containers.
Step 1. Search Optimal Policy
To serve MoE models efficiently, it is required to run a policy search to explore the optimal execution policy. Learn how to run the policy search at Running Policy Search.
Step 2. Running Friendli Container
When the optimal policy is successfully searched, the policy is compiled into a policy file, which can be used for creating serving endpoints. Learn how to run Friendli Container with the policy file at Starting Serving Endpoint.