We have been fine-tuning LLM for RAG on AWS Sagemaker for some time now and it was always easy to deploy models there afterwards. This guide will show you how to deploy LLM for RAG on AWS Sagemaker with this step-by-step guide. If you are looking to learn how to Fine-tune LLM's in general and deploy them on AWS I recommend you check out Phil Schmid's blog .
We will use our fine-tuned model that is based on
Mistral-7B-v0.1
model. We used around 50k high quality examples to fine-tune this
model. You can check it out on HuggingFace
here
.
To begin you need to go to your Sagemaker and create a notebook or
you can do this on your device. For environment I recommend using
`PyTorch 2.0.1 Python 3.10 CPU` in Sagemaker.
This guide uses python code to deploy the resource that is being run
in a notebook. To make sure this code runs well your environment
should be authenticated with AWS and have the right permissions like:
`AmazonSageMakerFullAccess` and `AmazonS3FullAccess`.
!pip install -U sagemaker --quiet
import sagemaker
import boto3
sess = sagemaker.Session()
# sagemaker session bucket -> used for uploading data, models and logs
# sagemaker will automatically create this bucket if it doesn’t exists
sagemaker_session_bucket=None
if sagemaker_session_bucket is None and sess is not None:
# set to default bucket if a bucket name is not given
sagemaker_session_bucket = sess.default_bucket()
try:
role = sagemaker.get_execution_role()
except ValueError:
iam = boto3.client('iam')
role = iam.get_role(RoleName='sagemaker_execution_role')['Role']['Arn']
sess = sagemaker.Session(default_bucket=sagemaker_session_bucket)
from sagemaker.huggingface import get_huggingface_llm_image_uri
# retrieve the llm image uri
llm_image = get_huggingface_llm_image_uri(
"huggingface",
version="1.1.0")
import json
from sagemaker.huggingface import HuggingFaceModel
# sagemaker config
# chose desired or available instance type here
instance_type = "ml.g5.xlarge"
number_of_gpu = 1
health_check_timeout = 600
# Define Model and Endpoint configuration parameter
config = {
'HF_MODEL_ID': "Arc53/docsgpt-7b-mistral", # model_id from hf.co/models
'SM_NUM_GPUS': json.dumps(number_of_gpu), # Number of GPU used per replica
'MAX_INPUT_LENGTH': json.dumps(3072), # Max length of input text
'MAX_TOTAL_TOKENS': json.dumps(4096),
'MAX_BATCH_TOTAL_TOKENS': json.dumps(8192),
}
# check if token is set
# create HuggingFaceModel with the image uri
llm_model = HuggingFaceModel(
role=role,
image_uri=llm_image,
env=config)
llm = llm_model.deploy(
initial_instance_count=1,
endpoint_name="docsgpt-7b-mistral",
instance_type=instance_type,
container_startup_health_check_timeout=health_check_timeout)
SAGEMAKER_ENDPOINT: str = None # SageMaker endpoint name (docsgpt-7b-mistral)
SAGEMAKER_REGION: str = None # SageMaker region name
SAGEMAKER_ACCESS_KEY: str = None # SageMaker access key
SAGEMAKER_SECRET_KEY: str = None # SageMaker secret key
EMBEDDINGS_NAME=huggingface_sentence-transformers/all-mpnet-base-v2
In conclusion, deploying your custom LLM and DocsGPT on AWS Sagemaker is a streamlined and user-friendly process. If you encounter any challenges or require a tailored solution for your specific needs, don't hesitate to reach out for assistance. Our team is ready to help you optimize your deployment!
Get in touch