Rust library for tokenizing text with OpenAI models using tiktoken.
This library provides a set of ready-made tokenizer libraries for working with GPT, tiktoken and related OpenAI models. Use cases covers tokenizing and counting tokens in text inputs.
This library is built on top of the tiktoken
library and includes some additional features and enhancements for ease of use with rust code.
For full working examples for all supported features, see the examples directory in the repository.
cargo
Then in your rust code, call the API
use tiktoken_rs::o200k_base; let bpe = o200k_base().unwrap(); let tokens = bpe.encode_with_special_tokens( "This is a sentence with spaces" ); println!("Token count: {}", tokens.len());Counting max_tokens parameter for a chat completion request
use tiktoken_rs::{get_chat_completion_max_tokens, ChatCompletionRequestMessage}; let messages = vec![ ChatCompletionRequestMessage { content: Some("You are a helpful assistant that only speaks French.".to_string()), role: "system".to_string(), name: None, function_call: None, }, ChatCompletionRequestMessage { content: Some("Hello, how are you?".to_string()), role: "user".to_string(), name: None, function_call: None, }, ChatCompletionRequestMessage { content: Some("Parlez-vous francais?".to_string()), role: "system".to_string(), name: None, function_call: None, }, ]; let max_tokens = get_chat_completion_max_tokens("o1-mini", &messages).unwrap(); println!("max_tokens: {}", max_tokens);Counting max_tokens parameter for a chat completion request with async-openai
Need to enable the async-openai
feature in your Cargo.toml
file.
use tiktoken_rs::async_openai::get_chat_completion_max_tokens; use async_openai::types::{ChatCompletionRequestMessage, Role}; let messages = vec![ ChatCompletionRequestMessage { content: Some("You are a helpful assistant that only speaks French.".to_string()), role: Role::System, name: None, function_call: None, }, ChatCompletionRequestMessage { content: Some("Hello, how are you?".to_string()), role: Role::User, name: None, function_call: None, }, ChatCompletionRequestMessage { content: Some("Parlez-vous francais?".to_string()), role: Role::System, name: None, function_call: None, }, ]; let max_tokens = get_chat_completion_max_tokens("o1-mini", &messages).unwrap(); println!("max_tokens: {}", max_tokens);
tiktoken
supports these encodings used by OpenAI models:
o200k_base
GPT-4o models, GPT-4.1, o1, o3, and o4 models cl100k_base
ChatGPT models, text-embedding-ada-002
p50k_base
Code models, text-davinci-002
, text-davinci-003
p50k_edit
Use for edit models like text-davinci-edit-001
, code-davinci-edit-001
r50k_base
(or gpt2
) GPT-3 models like davinci
See the examples in the repo for use cases. For more context on the different tokenizers, see the OpenAI Cookbook
If you encounter any bugs or have any suggestions for improvements, please open an issue on the repository.
Thanks @spolu for the original code, and .tiktoken
files.
This project is licensed under the MIT License.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4