Blog

8 min read

Using AWS Lambda for LLM tool scalability

By Josh Wall

Introduction

As the demand for large language models (LLMs) grows, so too does the need for robust and scalable infrastructure to support their deployment and utilization. To prepare for future growth and ensure optimal performance, we are exploring AWS Lambda as a game-changing solution. This article discusses our proactive approach to scaling, the benefits of migrating LLM-related tools to AWS Lambda, and how we can implement this transition effectively.

The problem

In our pursuit of constructing the Ultimate GenAI Chat/Copilot Experience, we realized that the demand of tools and skills that power our Specialists was putting significant burden on compute. As our user base grew and the complexity of tasks increased, having these tools incorporated into the platform made it difficult to keep up with the escalating resource requirements. Each Specialist, equipped with a unique set of LLM-powered tools, required significant computational power to process requests efficiently. Continuing down the route of integrated tooling would have resulted in noisy neighbor problems such as slower response and system-wide slowdowns during peak usage. To compound this, as we ramped up the additions of tools, it would require an unnecessary redeployment of the platform. It became clear that to maintain our competitive edge and ensure a seamless user experience, we needed to develop a more scalable and flexible solution for managing our growing arsenal of CWIC tools.

Current pain points

High Compute: Intensive workloads associated with LLMs are overloading our environment.
Frequent Redeployment: Continuous need for tool deployment complicates our workflow, increasing time and potential for errors.

Business value of migration

Migrating our LLM tools to AWS Lambda offers substantial business value, characterized by several key advantages:

Scalability and cost savings

AWS Lambda’s serverless architecture allows for dynamic scaling of compute resources. This adaptability minimizes unnecessary expenditures on infrastructure and ensures efficient resource allocation, ultimately translating into cost savings.

Improved agility and responsiveness

Lambda’s capacity for automatic scaling enables us to respond swiftly to evolving business needs. This flexibility allows us to quickly deliver new LLM functionalities and tools, keeping us ahead of competitors and driving business growth.

Overview of AWS Lambda

AWS Lambda is designed for efficient management of compute resources:

Memory: Allocate memory ranging from 128 MB to 10,240 MB, tailored for LLM needs.
Timeout: Set a maximum execution time of up to 15 minutes to accommodate longer processing tasks.
Concurrency: Handle up to 1,000 simultaneous instances per region, ensuring responsiveness during peak times.

Creating Lambda Functions for LLM Tools

Pre-requisite:

Working knowledge of
Creating AWS Lambda Functions
Creating AWS Lambda Layers
Simple understanding of Langchain
Access to LLM API with langchain integration

Full example

In this example we will be creating a tool to extract information from the newsapi.org API.

You can get your free API key here: News API — Search News and Blog Articles on the Web

Sample code — AWS Lambda function

import json
import requests
import os
from datetime import datetime
from dateutil.relativedelta import relativedelta

def get_news(query, api_key):
    main_url = "https://newsapi.org/v2/everything"

    end_date = datetime.now().date()
    start_date = end_date - relativedelta(months=1)

    params = {
        "q": query,
        "from": start_date.isoformat(),
        "to": end_date.isoformat(),
        "sortBy": "relevancy",
        "excludeDomains": "wikipedia.org",
        "apiKey": api_key,
        "pageSize": 5  # Limit to top 5 results
    }

    res = requests.get(main_url, params=params)
    return res.json()

def lambda_handler(event, context):
    try:
        # Extract the query from the 'query' key
        query = event.get('query')
        
        if not query:
            return {
                'statusCode': 400,
                'response': json.dumps({'error': 'No query provided in the request body'})
            }
        
        # Get the API key from environment variables
        api_key = os.environ.get('NEWSAPI_API_KEY')

        if not api_key:
            return {
                'statusCode': 500,
                'response': json.dumps({'error': 'API key is not set in environment variables'}),
            }
        
        # Fetch the news
        news = get_news(query, api_key)
        
        # Extract the top 5 articles
        top_articles = news.get("articles", [])[:5]

        # Prepare the response and citations
        response_data = {
            "articles": top_articles
        }
        
        return {
            'statusCode': 200,
            'response': json.dumps(response_data)
        }

    except json.JSONDecodeError:
        return {
            'statusCode': 400,
            'response': json.dumps({'error': 'Invalid JSON in request body'})
        }
    except Exception as e:
        return {
            'statusCode': 500,
            'response': json.dumps({'error': str(e)})
        }

*Note: for the function to work you will need to add a lambda layer that includes the request and datetime packages

Sample code—Creating Langchain tool

Once the function is set up and working in AWS Lambda, you will now need to invoke it using boto3. This invoke can be wrapped in a Langchain Tool to be called later.

First define your tool config

tools_config = [
    {
        'tool_name': 'finance_lambda', # Should match your aws lambda function name
        'description': "This tool grabs the latest news based on the user's query. Input should be a string like 'bitcoin', 'crypto', 'finance', 'markets', or 'global'.", # will be used as tool description for langchain
        'return_direct': False,
    }
]

Next define your function to invoke the lambda function

import json
import boto3
from langchain_core.tools import Tool
from typing import List, Dict

def create_tools_from_config(tools_config: List[Dict]) -> List[Tool]:
    tools = []

    for config in tools_config:
        def tool_function(query: str, config=config):
            try:
                lambda_client = boto3.client('lambda')
                response = lambda_client.invoke(
                    InvocationType='RequestResponse',
                    FunctionName=config['tool_name'],
                    Payload=json.dumps({"query": query})
                )
                payload = response['Payload'].read().decode('utf-8')
                output = json.loads(payload)

                # Construct the output in the desired format
                return output['response']
            except Exception as e:
                return {'statusCode': 500, 'error': str(e)}

        tool = Tool(
            name=config['tool_name'],
            description=config['description'],
            func=tool_function,
            return_direct=config.get('return_direct', False)  
        )
        tools.append(tool)

    return tools

Create tool list

# Create tools using the configuration
tools = create_tools_from_config(tools_config)

Finally, test the by invoking the tools directly

# Test tools
for tool in tools:
    print(tool.func('Clearwater Analytics'))  # Example invocation for each tool

Sample code—Using Langchain tool with react agent

Set-up LLM: My preferred method is using ChatLiteLLM

from dotenv import load_dotenv
from langchain_community.chat_models import ChatLiteLLM

# Store necessary credentials in .env file
load_dotenv()

# Initialize the Azure OpenAI model
llm = ChatLiteLLM(
        model="azure/gpt-4o-mini",
    )

Create React Agent

from langgraph.prebuilt import create_react_agent

react_agent = create_react_agent(llm, tools)

Put it all together

from langchain_core.messages import HumanMessage

prompt = "What is new with Clearwater Analytics"

# Invoke the agent
response = react_agent.invoke(
    {"messages": [HumanMessage(content=prompt)]},
)

print(response)

Extract AI Response

from langchain_core.messages import AIMessage

# Function to extract the final AIMessage content
def extract_ai_response(messages):
    for message in messages:
        if isinstance(message, AIMessage) and message.content:
            return message.content
    return "No AI response found."

ai_response = extract_ai_response(response['messages'])

print(ai_response)

Final Output

Here's a summary of the latest updates regarding Clearwater Analytics:

1. **Stock Performance**:
   - Clearwater Analytics (CWAN) recently hit a **52-week high of $26.09**. This reflects a significant uptick in their market performance, indicating positive investor sentiment. [Read more here](https://www.investing.com/news/company-news/clearwater-analytics-stock-hits-52week-high-at-2609-93CH-3662244).

2. **Market Context**:
   - The company's stock performance is noteworthy amidst a complex global market landscape, which is currently characterized by interest rate adjustments and sector-specific movements. The market is speculating on various stocks, with some possibly undervalued by as much as 39.9%. [More details here](https://finance.yahoo.com/news/3-stocks-may-undervalued-much-130827712.html).

3. **Trading Insights**:
   - An article discusses Clearwater Analytics achieving a new 12-month high and speculates on what might happen next in the context of their stock performance. [Discover more here](https://biztoc.com/x/2b0ee58070e8545b).

These updates highlight Clearwater Analytics' significant achievements in the stock market, indicating strong investor confidence and a promising outlook.

Lambda tool flow with Langchain Agent

For those new to Langchain Agents and LLM function calling, understanding how Lambda tools fit into the process might seem challenging. Let’s break down the workflow:

Tool Registration: Define your Lambda tools and store their names and descriptions.
User Query: The interaction begins when a user sends a query to the Agent.
Initial Processing: The Agent forwards the user’s query to the Language Model (LLM) for analysis. (More details of what is included by the Agent can be found Langchain and MRKL Agents Demystified)
LLM Evaluation: After processing, the LLM informs the Agent that a specific Lambda tool is needed.
Tool Activation: Langchain triggers the appropriate Lambda function based on the tool name provided by the LLM.
Function Execution: The Lambda function performs its designated task and returns the results to the Agent.
Data Synthesis: The Agent combines the Lambda function results with the existing conversation and sends this to the LLM.
LLM Response Generation: Using the new information, the LLM creates a “Final Answer” for the Agent.
Agent Responds: The Agent delivers the comprehensive answer to the user, completing the interaction loop.

This streamlined process enables seamless integration of Lambda tools within the Langchain Agent and LLM ecosystem, allowing for dynamic and powerful interactions based on user queries.

Benefits and constraints of integrating Lambdas for CWIC tools

The integration of Lambdas in the Langchain system with CWIC has yielded several significant advantages. Primarily, it has reduced management overhead, allowing us to focus more on development. This streamlined approach has transformed the process of adding new tools into a simple configuration exercise, maintaining independence from the core system. Consequently, the core system no longer requires updates or redeployment for each new tool, thereby minimizing overall system risk.

However, it’s important to consider the constraints associated with serverless Lambdas in this ecosystem. The standalone nature of Lambda functions can make it challenging to efficiently share resources or coordinate complex interactions between multiple functions in more sophisticated tools. Additionally, setting up layers and dependencies, particularly concerning environment variables, requires careful attention and debugging in the Lambda environment presents its own challenges.

Despite these considerations, the benefits of using Lambdas have proven to outweigh the constraints, offering a more efficient and flexible approach to tool integration in the CWIC system.

Monitoring tool usage

Tracking Lambda Tool Performance To ensure optimal functionality and identify areas for enhancement, it’s crucial to monitor your Lambda Tool’s performance.

Monitoring the usage of your Lambda Tool is essential for ensuring optimal performance and identifying areas for improvement.

AWS CloudWatch: Harness AWS CloudWatch to track critical metrics for your Lambda functions, including invocation frequency, runtime duration, and error occurrences. This enables swift issue detection and performance optimization. For more details, consult the AWS Lambda documentation on viewing function metrics View metrics for Lambda functions — AWS Lambda
LLM Observability Platforms: Employ specialized platforms such as Langfuse, Langsmith, or Galileo to capture comprehensive interaction data, including input prompts and corresponding outputs. These tools offer valuable insights into user patterns and help fine-tune your tool’s effectiveness.

By implementing these monitoring strategies, you can maintain the efficiency and reliability of your Lambda Tool, ensuring it remains responsive to user needs and contributes to the overall effectiveness of your Langchain Agents.

Conclusion

Transitioning CWIC tools to AWS Lambda emerges as a transformative step to overcome existing constraints while enhancing scalability, cost efficiency, and agility. Although challenges remain, adopting a serverless architecture will empower us to foster innovation and responsiveness in managing LLM functionalities, ensuring that we can meet the evolving demands of our business and the broader market. As we move forward, our focus will be on optimizing this transition to fully harness the power of AWS Lambda in our CWIC ecosystem.

About the author

Josh Wall is a GenAI Software Engineer at Clearwater Analytics, bringing a versatile background in data engineering and analytics. With experience spanning multiple domains, Josh adapts quickly to new challenges in the evolving landscape of AI technologies.

Using AWS Lambda for LLM tool scalability

Introduction

The problem

Current pain points

Business value of migration

Scalability and cost savings

Improved agility and responsiveness

Overview of AWS Lambda

Creating Lambda Functions for LLM Tools

Full example

Sample code — AWS Lambda function

Sample code—Creating Langchain tool

Sample code—Using Langchain tool with react agent

Lambda tool flow with Langchain Agent

Benefits and constraints of integrating Lambdas for CWIC tools

Monitoring tool usage

Conclusion

About the author

Related
Resources

5 Key Steps to Ensure a Smooth M&A Data Integration

Thriving through change – technology’s role in the future of asset management

Q3 Trends from the Corporate Treasury Landscape

Using AWS Lambda for LLM tool scalability

Introduction

The problem

Current pain points

Business value of migration

Scalability and cost savings

Improved agility and responsiveness

Overview of AWS Lambda

Creating Lambda Functions for LLM Tools

Full example

Sample code — AWS Lambda function

Sample code—Creating Langchain tool

Sample code—Using Langchain tool with react agent

Lambda tool flow with Langchain Agent

Benefits and constraints of integrating Lambdas for CWIC tools

Monitoring tool usage

Conclusion

About the author

Related Resources

5 Key Steps to Ensure a Smooth M&A Data Integration

Thriving through change – technology’s role in the future of asset management

Q3 Trends from the Corporate Treasury Landscape

Related
Resources