QuickStart: Friendli Dedicated Endpoints
1. Log In or Sign Up
- If you have an account, log in using your preferred SSO or email/password combination.
- If you're new to FriendliAI, create an account for free.
2. Access Friendli Dedicated Endpoints
- On your dashboard, find the "Friendli Dedicated Endpoints" section.
- If unauthorized, ask your team admin to provide access to the Friendli Dedicated Endpoints at the team settings.
3. Select Your Project
- Either create a new project, or choose from your existing projects for your workload.
4. Prepare Your Model
- Choose a model that you wish to serve from HuggingFace, Weights & Biases, or upload your custom model on our cloud.
5. Deploy Your Endpoint
- Deploy your endpoint, using the model of your choice prepared from step 3, and the instance equipped with your desired GPU specification.
- You can also configure your replicas and the max-batch-size for your endpoint.
6. Generate Responses
- You can generate your responses in two ways: playground and endpoint address.
- Try out and test generating responses on your custom model using a chatGPT-like interface at the playground tab.
- For general usages, send queries to your model through our API at the given endpoint address, accessible on the endpoint information tab.
info
Generating Responses Through the Endpoint URL
Refer to this guide for general instructions on personal access tokens.
- Bash (Text)
- Python SDK
# Send inference request to a running Friendli Dedicated Endpoint using a `curl` command.
$ curl -X POST https://inference.friendli.ai/dedicated/v1/completions \
-H "Authorization: Bearer $FRIENDLI_TOKEN" \
-d '{"model": "$ENDPOINT_ID", "prompt": "Python is a popular",
"min_tokens": 20, "max_tokens": 30,
"top_k": 32, "top_p": 0.8, "n": 3, "no_repeat_ngram": 3,
"ngram_repetition_penalty": 1.75}'
# pip install friendli-client
# Send inference request to a Friendli Dedicated Endpoint using Python SDK.
import os
from friendli import Friendli
client = Friendli(
base_url="https://inference.friendli.ai/dedicated",
token=os.getenv("FRIENDLI_TOKEN"),
endpoint_id="ENDPOINT_ID",
)
chat_completion = client.chat.completions.create(
messages=[
{
"role": "user",
"content": "Tell me how to make a delicious pancake"
}
],
stream=False,
)
print(chat_completion.choices[0].message.content)
info
For a more detailed tutorial for your usage, please refer to our tutorial for using HuggingFace models and W&B models