friendli model convert
This command is deprecated and will be removed in future releases.
Use the newly created friendli-model-optimizer tool instead.
Usage
friendli model convert [OPTIONS]
Summary
Convert huggingface's model checkpoint to Friendli format.
When a checkpoint is in the Hugging Face format, it cannot be directly served. It requires conversion to the Friendli format for serving. The conversion process involves copying the original checkpoint and transforming it into a checkpoint in the Friendli format (*.h5).
The friendli checkpoint convert
is available only when the package is installed with
pip install "friendli-client[mllib]"
.
Apply quantization
If you want to quantize the model along with the conversion, --quantize
option
should be provided. You can customize the quantization configuration by describing
it in a YAML file and providing the path to the file to --quant-config-file
option. When --quantize
option is used without providing --quant-config-file
,
the following configuration is used by default.
# Default quantization configuration
mode: awq
device: cuda:0
seed: 42
offload: true
calibration_dataset:
path_or_name: lambada
format: json
split: validation
lookup_column_name: text
num_samples: 128
max_length: 512
awq_args:
quant_bit: 4
quant_group_size: 64
mode
: Quantization scheme to apply. Defaults to "awq".device
: Device to run the quantization process. Defaults to "cuda:0".seed
: Random seed. Defaults to 42.offload
: When enabled, this option significantly reduces GPU memory usage by offloading model layers onto CPU RAM. Defaults to true.calibration_dataset
path_or_name
: Path or name of the dataset. Datasets from either the Hugging Face Datasets Hub or local file system can be used. Defaults to "lambada".format
: Format of datasets. Defaults to "json".split
: Which split of the data to load. Defaults to "validation".lookup_column_name
: The name of a column in the dataset to be used as calibration inputs. Defaults to "text".num_samples
: The number of dataset samples to use for calibration. Note that the dataset will be shuffled before sampling. Defaults to 512.max_length
: The maximum length of a calibration input sequence. Defauts to 512.
awq_args
(Fill in this field only for "awq" mode)quant_bit
: Bit width of integers to represent weights. Possible values are4
or8
. Defaults to 4.quant_group_size
: Group size of quantized matrices. 64 is the only supported value at this time. Defaults to 64.
If you encounter OOM issues when running with AWQ, try enabling the offload
option.
If you set percentile
in quant-config-file into 100,
the quantization range will be determined by the maximum absolute values of the activation tensors.
Currently, AWQ is the only supported quantization scheme.
AWQ is supported only for models with architecture listed as follows:
GPTNeoXForCausalLM
GPTJForCausalLM
LlamaForCausalLM
MPTForCausalLM
Options
Option | Type | Summary | Default | Required |
---|---|---|---|---|
--model-name-or-path , -m | TEXT | Hugging Face pretrained model name or path to the saved model checkpoint. | - | ✅ |
--output-dir , -o | TEXT | Directory path to save the converted checkpoint and related configuration files. Three files will be created in the directory: model.h5 , tokenizer.json , and attr.yaml . The model.h5 or model.safetensors is the converted checkpoint and can be renamed using the --output-model-filename option. The tokenizer.json is the Friendli-compatible tokenizer file, which should be uploaded along with the checkpoint file to tokenize the model input and output. The attr.yaml is the checkpoint attribute file, to be used when uploading the converted model to Friendli. You can designate the file name using the --output-attr-filename option. | - | ✅ |
--data-type , -dt | CHOICE: [bf16, fp16, fp32, int8, int4] | The data type of converted checkpoint. | - | ✅ |
--cache-dir | TEXT | Directory for downloading checkpoint. | None | ❌ |
--dry-run | BOOLEAN | Only check conversion avaliability. | False | ❌ |
--output-model-filename | TEXT | Name of the converted checkpoint file.The default file name is model.h5 when --output-ckpt-file-type is hdf5 or model.safetensors when --output-ckpt-file-type is safetensors . | None | ❌ |
--output-ckpt-file-type | CHOICE: [hdf5, safetensors] | File format of the converted checkpoint file. | hdf5 | ❌ |
--output-attr-filename | TEXT | Name of the checkpoint attribute file. | attr.yaml | ❌ |
--quantize | BOOLEAN | Quantize the model before conversion | False | ❌ |
--quant-config-file | FILENAME | Path to the quantization configuration file. | None | ❌ |