Advance parameter

Creativity Level (Temperature)

Controls the level of randomness in the model's responses. A lower value (e.g., 0.2) makes the responses more focused and deterministic, meaning the model will be more likely to choose the most probable next word. A higher value (e.g., 0.8) increases randomness and creativity, allowing the model to explore more diverse and less probable word choices.

Max Tokens Limit (Maximum Tokens)

Sets a limit on the number of tokens (words or subwords) that the model can generate in its response. It helps control the length of the output.

Probability Cuttoff (Top P Probability)

Adjusts the diversity of the response by sampling from the top P percentage of the probability mass. For instance, if set to 0.9, the model considers only the smallest set of words whose cumulative probability is 90%. This technique is known as nucleus sampling.

Log Probability

When enabled, returns the log probabilities of the generated tokens. This can be useful for understanding how confident the model is in its choices.

Repetition Penalty (Frequency Penalty)

Applies a penalty to the model for using tokens that appear frequently. It reduces the likelihood of the model repeating the same words, promoting more varied language.

Novelty Penalty (Presence Penalty)

Similar to the frequency penalty, this discourages the model from repeating tokens that have already appeared in the response, encouraging it to generate new content instead.

ResponseCount Number of Completions

Specifies how many separate completions or responses the model should generate for a given prompt. It allows users to receive multiple responses and choose the best one.

Stop Sequence

Defines specific sequences of characters or words at which the model should stop generating further tokens. It helps to control the endpoint of the generated response.

Tool Choice

Specifies the particular tool the model should use for generating the response. It allows for more targeted and relevant output.

Response Type (Response Format)

Defines the format in which the response will be returned, such as plain text or JSON. It ensures the output is structured in a way that meets the user's needs.

Stop Sequences

An array of sequences at which the model should stop generating tokens. This can be useful for setting multiple end conditions for the response.

STOP (Stop Key)

An arbitrary key used for stopping generation. This key doesn't have a standard use and is provided as an example.

Top K

Limits the sampling pool to the top K most probable tokens. It promotes more deterministic responses by focusing on the most likely next words.