In this post, we will showcase how Upstash Ratelimit can be used in LangChain in Javascript and in Python.
Motivation
Large Language Models (LLMs) are powerful tools but can be costly. To manage their cost, it's essential to limit the number of requests processed, ensuring the use of LLMs remains affordable. This is where Upstash Ratelimit comes into play.
Using Upstash Ratelimit in LangChain allows you to:
- Allow a limited number of chain invocations over a time period.
- Allow a limited number of tokens (either prompt or prompt and completion) over a time period.
Usage
Start by installing @upstash/ratelimit
and @langchain/community
(you can find more information about installing LangChain community here):
npm install @upstash/ratelimit @langchain/community
npm install @upstash/ratelimit @langchain/community
If you are using Python, run:
pip install upstash-ratelimit langchain-community
pip install upstash-ratelimit langchain-community
Then, set the environment variables UPSTASH_REDIS_REST_URL
and UPSTASH_REDIS_REST_TOKEN
. You can get these environment variables by going to the Upstash console and creating a redis instance.
Now we can demonstrate how you can add rate limiting to your chain in LangChain.
First, create a rate limit instance as shown below:
import { Ratelimit } from "@upstash/ratelimit";
import { Redis } from "@upstash/redis";
const ratelimit = new Ratelimit({
redis: Redis.fromEnv(),
// 10 requests per window, where window size is 60 seconds:
limiter: Ratelimit.fixedWindow(10, "60 s"),
});
import { Ratelimit } from "@upstash/ratelimit";
import { Redis } from "@upstash/redis";
const ratelimit = new Ratelimit({
redis: Redis.fromEnv(),
// 10 requests per window, where window size is 60 seconds:
limiter: Ratelimit.fixedWindow(10, "60 s"),
});
This ratelimit
object will use the Redis database to store how many requests were made by different users. Learn more about Ratelimit from its documentation.
Next, create a mock chain in LangChain to showcase the callback:
import { RunnableLambda } from "@langchain/core/runnables";
const chain = new RunnableLambda({ func: (str: string): string => str });
import { RunnableLambda } from "@langchain/core/runnables";
const chain = new RunnableLambda({ func: (str: string): string => str });
Finally, create a callback and invoke the chain:
import {
UpstashRatelimitHandler,
UpstashRatelimitError,
} from "@langchain/community/callbacks/handlers/upstash_ratelimit";
const user_id = getUserId(); // your own method to get user ids
try {
const response = await chain.invoke("Hello World!", {
callbacks: [
new UpstashRatelimitHandler(user_id, {
requestRatelimit: ratelimit,
});
],
});
console.log(response);
} catch (err) {
if (err instanceof UpstashRatelimitError) {
console.log("Handling ratelimit!");
}
}
import {
UpstashRatelimitHandler,
UpstashRatelimitError,
} from "@langchain/community/callbacks/handlers/upstash_ratelimit";
const user_id = getUserId(); // your own method to get user ids
try {
const response = await chain.invoke("Hello World!", {
callbacks: [
new UpstashRatelimitHandler(user_id, {
requestRatelimit: ratelimit,
});
],
});
console.log(response);
} catch (err) {
if (err instanceof UpstashRatelimitError) {
console.log("Handling ratelimit!");
}
}
Here is the same code written in Python:
from upstash_redis import Redis
from upstash_ratelimit import Ratelimit, FixedWindow
from langchain_core.runnables import RunnableLambda
from langchain_community.callbacks import UpstashRatelimitError, UpstashRatelimitHandler
# your own method to get user ids
const user_id = getUserId();
# mock chain
chain = RunnableLambda(str)
# init ratelimiter
ratelimit = Ratelimit(
redis=Redis.from_env(),
# 10 requests per window, where window size is 60 seconds:
limiter=FixedWindow(max_requests=10, window=60),
)
try:
response = chain.invoke("Hello World!", {
"callbacks": [
UpstashRatelimitHandler(
identifier=user_id,
request_ratelimit: ratelimit,
)
]
})
except UpstashRatelimitError:
print("Handling ratelimit!")
from upstash_redis import Redis
from upstash_ratelimit import Ratelimit, FixedWindow
from langchain_core.runnables import RunnableLambda
from langchain_community.callbacks import UpstashRatelimitError, UpstashRatelimitHandler
# your own method to get user ids
const user_id = getUserId();
# mock chain
chain = RunnableLambda(str)
# init ratelimiter
ratelimit = Ratelimit(
redis=Redis.from_env(),
# 10 requests per window, where window size is 60 seconds:
limiter=FixedWindow(max_requests=10, window=60),
)
try:
response = chain.invoke("Hello World!", {
"callbacks": [
UpstashRatelimitHandler(
identifier=user_id,
request_ratelimit: ratelimit,
)
]
})
except UpstashRatelimitError:
print("Handling ratelimit!")
Note that we initialize the callback when we invoke the chain. This is because the handler has a state which needs to be reset with each invocation.
In the configuration above, the handler will allow 10 requests per minute for each user. However, this is not the only way to configure the handler. To learn more about Ratelimit callbacks in LangChain, see the Upstash Ratelimit Callback documentation for TypeScript and Python.
You can also rate limit based on the number of tokens. You can configure the handler to rate limit based on requests, number of tokens, or both. To add token-based rate limiting, use the tokenRatelimit
parameter when initializing the handler:
const handler = new UpstashRatelimitHandler(user_id, {
tokenRatelimit: ratelimit,
includeOutputTokens: true // whether completion is included when counting tokens
});
const handler = new UpstashRatelimitHandler(user_id, {
tokenRatelimit: ratelimit,
includeOutputTokens: true // whether completion is included when counting tokens
});
In the token-based rate limiting, we expect the LLM in your chain to return an LLMResult
in this format:
{
"tokenUsage": {
"totalTokens": 123,
"promptTokens": 456,
"otherFields: "..."
},
"otherFields: "..."
}
Not all LLMs in LangChain use this format, however. If the keys are different, you can use llmOutputTokenUsageField
, llmOutputTotalTokenField
and llmOutputPromptTokenField
fields of UpstashRatelimitHandler
.
Conclusion
The UpstashRatelimitHandler
provides a simple way to add rate limiting to your LangChain applications. With just a few steps, you can control the number of requests and tokens used. For more detailed information and advanced configurations, explore the LangChain documentation.