Skip to content

Latest commit

 

History

History
266 lines (229 loc) · 164 KB

File metadata and controls

266 lines (229 loc) · 164 KB

Chat

(chat)

Overview

OpenAI's API chat completions v1 endpoint

Available Operations

create

This function processes chat completion requests by determining whether to use streaming or non-streaming response handling based on the request payload. For streaming requests, it configures additional options to track token usage.

Returns

Returns a Response containing either:

  • A streaming SSE connection for real-time completions
  • A single JSON response for non-streaming completions

Errors

Returns an error status code if:

  • The request processing fails
  • The streaming/non-streaming handlers encounter errors
  • The underlying inference service returns an error

Example Usage

import atoma_sdk
from atoma_sdk import AtomaSDK
import os


with AtomaSDK(
    bearer_auth=os.getenv("ATOMASDK_BEARER_AUTH", ""),
) as as_client:

    res = as_client.chat.create(messages=[
        {
            "content": "You are a helpful AI assistant",
            "name": "AI expert",
            "role": atoma_sdk.RoleSystem.SYSTEM,
        },
        {
            "content": "Hello!",
            "name": "John Doe",
            "role": atoma_sdk.RoleUser.USER,
        },
        {
            "content": "I'm here to help you with any questions you have. How can I assist you today?",
            "name": "AI",
            "role": atoma_sdk.RoleAssistant.ASSISTANT,
        },
    ], model="meta-llama/Llama-3.3-70B-Instruct", frequency_penalty=0, functions=[
        {
            "name": "get_current_weather",
            "description": "Get the current weather in a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The location to get the weather for",
                    },
                },
                "required": [
                    "location",
                ],
            },
        },
    ], logit_bias={
        "1234567890": 0.5,
        "1234567891": -0.5,
    }, max_completion_tokens=4096, n=1, parallel_tool_calls=True, presence_penalty=0, seed=123, service_tier="auto", stop=[
        "json([\"stop\", \"halt\"])",
    ], temperature=0.7, tools=[
        {
            "function": {
                "description": "Get the current weather in a location",
                "name": "get_current_weather",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "location": {
                            "type": "string",
                            "description": "The location to get the weather for",
                        },
                    },
                    "required": [
                        "location",
                    ],
                },
            },
            "type": "function",
        },
    ], top_logprobs=1, top_p=1, user="user-1234")

    # Handle response
    print(res)

Parameters

Parameter Type Required Description Example
messages List[models.ChatCompletionMessage] ✔️ A list of messages comprising the conversation so far [
{
"role": "system",
"content": "You are a helpful AI assistant"
},
{
"role": "user",
"content": "Hello!"
},
{
"role": "assistant",
"content": "I'm here to help you with any questions you have. How can I assist you today?"
}
]
model str ✔️ ID of the model to use meta-llama/Llama-3.3-70B-Instruct
frequency_penalty OptionalNullable[float] Number between -2.0 and 2.0. Positive values penalize new tokens based on their
existing frequency in the text so far
0
function_call Optional[Any] Controls how the model responds to function calls
functions List[Any] A list of functions the model may generate JSON inputs for [
{
"name": "get_current_weather",
"description": "Get the current weather in a location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The location to get the weather for"
}
},
"required": [
"location"
]
}
}
]
logit_bias Dict[str, float] Modify the likelihood of specified tokens appearing in the completion.

Accepts a JSON object that maps tokens (specified by their token ID in the tokenizer)
to an associated bias value from -100 to 100. Mathematically, the bias is added to the logits
generated by the model prior to sampling. The exact effect will vary per model, but values
between -1 and 1 should decrease or increase likelihood of selection; values like -100 or
100 should result in a ban or exclusive selection of the relevant token.
{
"1234567890": 0.5,
"1234567891": -0.5
}
max_completion_tokens OptionalNullable[int] The maximum number of tokens to generate in the chat completion 4096
max_tokens OptionalNullable[int] : warning: ** DEPRECATED **: This will be removed in a future release, please migrate away from it as soon as possible.

The maximum number of tokens to generate in the chat completion
4096
n OptionalNullable[int] How many chat completion choices to generate for each input message 1
parallel_tool_calls OptionalNullable[bool] Whether to enable parallel tool calls. true
presence_penalty OptionalNullable[float] Number between -2.0 and 2.0. Positive values penalize new tokens based on
whether they appear in the text so far
0
response_format OptionalNullable[models.ResponseFormat] N/A
seed OptionalNullable[int] If specified, our system will make a best effort to sample deterministically 123
service_tier OptionalNullable[str] Specifies the latency tier to use for processing the request. This parameter is relevant for customers subscribed to the scale tier service:

If set to 'auto', and the Project is Scale tier enabled, the system will utilize scale tier credits until they are exhausted.
If set to 'auto', and the Project is not Scale tier enabled, the request will be processed using the default service tier with a lower uptime SLA and no latency guarantee.
If set to 'default', the request will be processed using the default service tier with a lower uptime SLA and no latency guarantee.
When not set, the default behavior is 'auto'.
auto
stop List[str] Up to 4 sequences where the API will stop generating further tokens json(["stop", "halt"])
stream OptionalNullable[bool] Whether to stream back partial progress. Must be false for this request type.
stream_options OptionalNullable[models.StreamOptions] N/A
temperature OptionalNullable[float] What sampling temperature to use, between 0 and 2 0.7
tool_choice OptionalNullable[models.ToolChoice] N/A
tools List[models.ChatCompletionToolsParam] A list of tools the model may call [
{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Get the current weather in a location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The location to get the weather for"
}
},
"required": [
"location"
]
}
}
}
]
top_logprobs OptionalNullable[int] An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability.
logprobs must be set to true if this parameter is used.
1
top_p OptionalNullable[float] An alternative to sampling with temperature 1
user OptionalNullable[str] A unique identifier representing your end-user user-1234
retries Optional[utils.RetryConfig] Configuration to override the default retry behavior of the client.

Response

models.ChatCompletionResponse

Errors

Error Type Status Code Content Type
models.APIError 4XX, 5XX */*

stream

Example Usage

import atoma_sdk
from atoma_sdk import AtomaSDK
import os


with AtomaSDK(
    bearer_auth=os.getenv("ATOMASDK_BEARER_AUTH", ""),
) as as_client:

    res = as_client.chat.stream(messages=[
        {
            "content": "You are a helpful AI assistant",
            "name": "AI expert",
            "role": atoma_sdk.RoleSystem.SYSTEM,
        },
        {
            "content": "Hello!",
            "name": "John Doe",
            "role": atoma_sdk.RoleUser.USER,
        },
        {
            "content": "I'm here to help you with any questions you have. How can I assist you today?",
            "name": "AI",
            "role": atoma_sdk.RoleAssistant.ASSISTANT,
        },
    ], model="meta-llama/Llama-3.3-70B-Instruct", frequency_penalty=0, functions=[
        {
            "name": "get_current_weather",
            "description": "Get the current weather in a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The location to get the weather for",
                    },
                },
                "required": [
                    "location",
                ],
            },
        },
    ], logit_bias={
        "1234567890": 0.5,
        "1234567891": -0.5,
    }, max_completion_tokens=4096, n=1, parallel_tool_calls=True, presence_penalty=0, seed=123, service_tier="auto", stop=[
        "json([\"stop\", \"halt\"])",
    ], temperature=0.7, tools=[
        {
            "function": {
                "description": "Get the current weather in a location",
                "name": "get_current_weather",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "location": {
                            "type": "string",
                            "description": "The location to get the weather for",
                        },
                    },
                    "required": [
                        "location",
                    ],
                },
            },
            "type": "function",
        },
    ], top_logprobs=1, top_p=1, user="user-1234")

    with res as event_stream:
        for event in event_stream:
            # handle event
            print(event, flush=True)

Parameters

Parameter Type Required Description Example
messages List[models.ChatCompletionMessage] ✔️ A list of messages comprising the conversation so far [
{
"role": "system",
"content": "You are a helpful AI assistant"
},
{
"role": "user",
"content": "Hello!"
},
{
"role": "assistant",
"content": "I'm here to help you with any questions you have. How can I assist you today?"
}
]
model str ✔️ ID of the model to use meta-llama/Llama-3.3-70B-Instruct
frequency_penalty OptionalNullable[float] Number between -2.0 and 2.0. Positive values penalize new tokens based on their
existing frequency in the text so far
0
function_call Optional[Any] Controls how the model responds to function calls
functions List[Any] A list of functions the model may generate JSON inputs for [
{
"name": "get_current_weather",
"description": "Get the current weather in a location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The location to get the weather for"
}
},
"required": [
"location"
]
}
}
]
logit_bias Dict[str, float] Modify the likelihood of specified tokens appearing in the completion.

Accepts a JSON object that maps tokens (specified by their token ID in the tokenizer)
to an associated bias value from -100 to 100. Mathematically, the bias is added to the logits
generated by the model prior to sampling. The exact effect will vary per model, but values
between -1 and 1 should decrease or increase likelihood of selection; values like -100 or
100 should result in a ban or exclusive selection of the relevant token.
{
"1234567890": 0.5,
"1234567891": -0.5
}
max_completion_tokens OptionalNullable[int] The maximum number of tokens to generate in the chat completion 4096
max_tokens OptionalNullable[int] : warning: ** DEPRECATED **: This will be removed in a future release, please migrate away from it as soon as possible.

The maximum number of tokens to generate in the chat completion
4096
n OptionalNullable[int] How many chat completion choices to generate for each input message 1
parallel_tool_calls OptionalNullable[bool] Whether to enable parallel tool calls. true
presence_penalty OptionalNullable[float] Number between -2.0 and 2.0. Positive values penalize new tokens based on
whether they appear in the text so far
0
response_format OptionalNullable[models.ResponseFormat] N/A
seed OptionalNullable[int] If specified, our system will make a best effort to sample deterministically 123
service_tier OptionalNullable[str] Specifies the latency tier to use for processing the request. This parameter is relevant for customers subscribed to the scale tier service:

If set to 'auto', and the Project is Scale tier enabled, the system will utilize scale tier credits until they are exhausted.
If set to 'auto', and the Project is not Scale tier enabled, the request will be processed using the default service tier with a lower uptime SLA and no latency guarantee.
If set to 'default', the request will be processed using the default service tier with a lower uptime SLA and no latency guarantee.
When not set, the default behavior is 'auto'.
auto
stop List[str] Up to 4 sequences where the API will stop generating further tokens json(["stop", "halt"])
stream Optional[bool] Whether to stream back partial progress. Must be true for this request type.
stream_options OptionalNullable[models.StreamOptions] N/A
temperature OptionalNullable[float] What sampling temperature to use, between 0 and 2 0.7
tool_choice OptionalNullable[models.ToolChoice] N/A
tools List[models.ChatCompletionToolsParam] A list of tools the model may call [
{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Get the current weather in a location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The location to get the weather for"
}
},
"required": [
"location"
]
}
}
}
]
top_logprobs OptionalNullable[int] An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability.
logprobs must be set to true if this parameter is used.
1
top_p OptionalNullable[float] An alternative to sampling with temperature 1
user OptionalNullable[str] A unique identifier representing your end-user user-1234
retries Optional[utils.RetryConfig] Configuration to override the default retry behavior of the client.

Response

Union[eventstreaming.EventStream[models.ChatCompletionsCreateStreamResponseBody], eventstreaming.EventStreamAsync[models.ChatCompletionsCreateStreamResponseBody]]

Errors

Error Type Status Code Content Type
models.APIError 4XX, 5XX */*