Skip to content

Latest commit

 

History

History
34 lines (25 loc) · 2.31 KB

File metadata and controls

34 lines (25 loc) · 2.31 KB

InferenceEndpointCreate

Properties

Name Type Description Notes
name str Human-readable name for the inference endpoint.
checkpoint_artifact_id str Artifact id of the checkpoint to serve. Must be a ``kind=checkpoint`` Artifact with ``status=ready`` belonging to the same org as the requester.
instance_size str Saturn instance size to run the inference pod on. Must be a GPU-equipped size (gpu > 0).
quantization str Optional vLLM quantization method. Restricted to calibration-free methods (``fp8``, ``int8``) — these quantize on the fly with no calibration dataset. Calibration-requiring methods (gptq, awq) are rejected. Omit (the default) to serve in the checkpoint's native precision (BF16). [optional]
visibility str Route visibility enforced by ForwardAuth: ``org`` (any member of the endpoint's org may call it) or ``owner`` (only the owning identity and explicit ``viewers``). Defaults to ``org``. [optional] [default to 'org']
viewers List[str] Optional list of identity names (usernames or group names in the endpoint's org) granted access to the endpoint route in addition to the owner. Honored by ForwardAuth exactly like a normal deployment's viewers. [optional]

Example

from saturn_api.models.inference_endpoint_create import InferenceEndpointCreate

# TODO update the JSON string below
json = "{}"
# create an instance of InferenceEndpointCreate from a JSON string
inference_endpoint_create_instance = InferenceEndpointCreate.from_json(json)
# print the JSON string representation of the object
print(InferenceEndpointCreate.to_json())

# convert the object into a dict
inference_endpoint_create_dict = inference_endpoint_create_instance.to_dict()
# create an instance of InferenceEndpointCreate from a dict
inference_endpoint_create_from_dict = InferenceEndpointCreate.from_dict(inference_endpoint_create_dict)

[Back to Model list] [Back to API list] [Back to README]