Observability for Friendli Container
Observability is an integral part of DevOps. To support this, Friendli Container exports internal metrics in a Prometheus text format.
By default, metrics are served at http://localhost:8281/metrics
. You can configure the port number using the command line option --metrics-port
.
Supported Metrics
Counters
Counters are cumulative metrics whose values monotonically increase. They are often used in combination with Prometheus function rate() for calculating the throughput.
Metric Name | Description |
---|---|
friendli_requests_total | Cumulative number of requests received |
friendli_responses_total | Cumulative number of responses sent |
friendli_items_total | Cumulative number of items requested |
friendli_failure_by_cancel | Cumulative number of failed requests due to cancellation |
friendli_failure_by_timeout | Cumulative number of failed requests due to timeout |
friendli_failure_by_nan_error | Cumulative number of failed requests due to NaN error |
friendli_failure_by_reject | Cumulative number of failed requests due to rejection |
One inference request may generate multiple results with the n
field in the request body.
Upon receiving such request, friendli_requests_total
is increased by 1 and friendli_items_total
is increased by n
.
Gauges
Gauges are numerical values that can go up and down to represent the current value.
Metric Name | Description |
---|---|
friendli_current_requests | Current number of requests in the engine (either assigned or waiting) |
friendli_current_items | Current number of items in the engine (either assigned or waiting) |
friendli_current_assigned_items | Current number of items actively processed by the engine |
friendli_current_waiting_items | Current number number of items waiting in the internal queue |
Histograms
Histograms are used to track the distribution of variables over time.
Histogram | Metric Name | Description |
---|---|---|
Friendli TCache hit ratio (0≤value≤1) | friendli_tcache_hit_ratio_bucket | Bucketized number of histogram samples for TCache hit ratio, with le label |
friendli_tcache_hit_ratio_count | Total number of histogram samples for TCache hit ratio | |
friendli_tcache_hit_ratio_sum | Sum of histogram sample values for TCache hit ratio | |
The length of input tokens (Experimental metric) | friendli_input_lengths_bucket | Bucketized number of histogram samples for length of input tokens, with le label |
friendli_input_lengths_count | Total number of histogram samples for length of input tokens | |
friendli_input_lengths_sum | Sum of histogram sample values for length of input tokens | |
The length of output tokens (Experimental metric) | friendli_output_lengths_bucket | Bucketized number of histogram samples for length of output tokens, with le label |
friendli_output_lengths_count | Total number of histogram samples for length of output tokens | |
friendli_output_lengths_sum | Sum of histogram sample values for length of output tokens |
For visualizing histograms using Grafana, How to visualize Prometheus histograms in Grafana provides useful tips.
Quantiles
Quantiles are used to show the current p50(median), p90, and p99 percentiles of variables.
Quantiles | Metric Name | Description |
---|---|---|
Request completion latency (in nanoseconds) | friendli_requests_latencies | Percentile value for request completion latency (quantile label is either 0.5 , 0.9 , or 0.99 ) |
friendli_requests_latencies_count | Total number of samples for request completion latency | |
friendli_requests_latencies_sum | Sum of sample values for request completion latency | |
Time to first token (TTFT) (in nanoseconds) | friendli_requests_ttft | Percentile value for time to first token (TTFT) (quantile label is either 0.5 , 0.9 , or 0.99 ) |
friendli_requests_ttft_count | Total number of samples for time to first token (TTFT) | |
friendli_requests_ttft_sum | Sum of sample values for time to first token (TTFT) | |
Request queueing delay (in nanoseconds) | friendli_requests_queueing_delays | Percentile value for queueing delay (quantile label is either 0.5 , 0.9 , or 0.99 ) |
friendli_requests_queueing_delays_count | Total number of samples for queueing delay | |
friendli_requests_queueing_delays_sum | Sum of sample values for queueing delay |
Info
The following information metric always has a value of 1. The metric labels contain useful information in text.
Metric Name | Label | Description |
---|---|---|
friendli_engine_version | version | Engine version |
Grafana Dashboard Template
You can import the dashboard templates to your Grafana instance. The Grafana instance must be connected to a Prometheus instance (or a Prometheus-compatible data source) which is configured to scrape metrics from Friendli Container processes.
The dashboard template works with Grafana v8.0.0 or later versions. We recommend using Grafana v10.0.0 or later for the best experience.