friendli model convert

Usage

friendli model convert [OPTIONS]

Summary

Convert huggingface's model checkpoint to Friendli format.

When a checkpoint is in the Hugging Face format, it cannot be directly served. It requires conversion to the Friendli format for serving. The conversion process involves copying the original checkpoint and transforming it into a checkpoint in the Friendli format (*.h5).

caution

The friendli checkpoint convert is available only when the package is installed with pip install "friendli-client[mllib]".

Apply quantization

If you want to quantize the model along with the conversion, --quantize option should be provided. You can customize the quantization configuration by describing it in a YAML file and providing the path to the file to --quant-config-file option. When --quantize option is used without providing --quant-config-file, the following configuration is used by default.

# Default quantization configuration
mode: awq
device: cuda:0
seed: 42
offload: true
calibration_dataset:
    path_or_name: lambada
    format: json
    split: validation
    lookup_column_name: text
    num_samples: 128
    max_length: 512
awq_args:
    quant_bit: 4
    quant_group_size: 64

mode: Quantization scheme to apply. Defaults to "awq".
device: Device to run the quantization process. Defaults to "cuda:0".
seed: Random seed. Defaults to 42.
offload: When enabled, this option significantly reduces GPU memory usage by offloading model layers onto CPU RAM. Defaults to true.
calibration_dataset
- path_or_name: Path or name of the dataset. Datasets from either the Hugging Face Datasets Hub or local file system can be used. Defaults to "lambada".
- format: Format of datasets. Defaults to "json".
- split: Which split of the data to load. Defaults to "validation".
- lookup_column_name: The name of a column in the dataset to be used as calibration inputs. Defaults to "text".
- num_samples: The number of dataset samples to use for calibration. Note that the dataset will be shuffled before sampling. Defaults to 512.
- max_length: The maximum length of a calibration input sequence. Defauts to 512.
awq_args (Fill in this field only for "awq" mode)
- quant_bit : Bit width of integers to represent weights. Possible values are 4 or 8. Defaults to 4.
- quant_group_size: Group size of quantized matrices. 64 is the only supported value at this time. Defaults to 64.

tip

If you encounter OOM issues when running with AWQ, try enabling the offload option.

tip

If you set percentile in quant-config-file into 100, the quantization range will be determined by the maximum absolute values of the activation tensors.

info

Currently, AWQ is the only supported quantization scheme.

info

AWQ is supported only for models with architecture listed as follows:

GPTNeoXForCausalLM
GPTJForCausalLM
LlamaForCausalLM
MPTForCausalLM

Options

Option	Type	Summary	Default	Required
`--model-name-or-path`, `-m`	TEXT	Hugging Face pretrained model name or path to the saved model checkpoint.	-	✅
`--output-dir`, `-o`	TEXT	Directory path to save the converted checkpoint and related configuration files. Three files will be created in the directory: `model.h5`, `tokenizer.json`, and `attr.yaml`. The `model.h5` or `model.safetensors` is the converted checkpoint and can be renamed using the `--output-model-filename` option. The `tokenizer.json` is the Friendli-compatible tokenizer file, which should be uploaded along with the checkpoint file to tokenize the model input and output. The `attr.yaml` is the checkpoint attribute file, to be used when uploading the converted model to Friendli. You can designate the file name using the `--output-attr-filename` option.	-	✅
`--data-type`, `-dt`	CHOICE: [bf16, fp16, fp32, int8, int4]	The data type of converted checkpoint.	-	✅
`--cache-dir`	TEXT	Directory for downloading checkpoint.	None	❌
`--dry-run`	BOOLEAN	Only check conversion avaliability.	False	❌
`--output-model-filename`	TEXT	Name of the converted checkpoint file.The default file name is `model.h5` when `--output-ckpt-file-type` is `hdf5` or `model.safetensors` when `--output-ckpt-file-type` is `safetensors`.	None	❌
`--output-ckpt-file-type`	CHOICE: [hdf5, safetensors]	File format of the converted checkpoint file.	hdf5	❌
`--output-attr-filename`	TEXT	Name of the checkpoint attribute file.	attr.yaml	❌
`--quantize`	BOOLEAN	Quantize the model before conversion	False	❌
`--quant-config-file`	FILENAME	Path to the quantization configuration file.	None	❌

friendli model convert

Usage​

Summary​

Apply quantization​

Options​

Usage

Summary

Apply quantization

Options