Skip to main content

Serving MoE Models

Introduction

This guide explores the steps to serve Mixture of Experts (MoE) models such as Mixtral 8x7B using Friendli Containers.

Step 1. Search Optimal Policy

To serve MoE models efficiently, it is required to run a policy search to explore the optimal execution policy. Learn how to run the policy search at Running Policy Search.

Step 2. Running Friendli Container

When the optimal policy is successfully searched, the policy is compiled into a policy file, which can be used for creating serving endpoints. Learn how to run Friendli Container with the policy file at Starting Serving Endpoint.