Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workflow updated with the use of LLMs (using Amazon Bedrock) #3

Merged
merged 19 commits into from
Aug 2, 2024

Conversation

dlaredo
Copy link
Contributor

@dlaredo dlaredo commented Jun 25, 2024

Issue #, if available:

Description of changes:

  • Incorporates LLMs to process the data in a more efficient manner
  • Uses Amazon Bedrock
  • Creates a new data-streamer to test the workflow

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.


from langchain.prompts import PromptTemplate
from langchain.llms.bedrock import Bedrock
from langchain_community.chat_models import BedrockChat
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The official package for accessing bedrock models is now langchain-aws

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in commit e611804

'claude': 'anthropic.claude-3-haiku-20240307-v1:0',
}

logging.getLogger().setLevel(os.environ.get('LOG_LEVEL', 'WARNING').upper())
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you considered using Powertools for Lambda? They offer a cool structured logging convenience.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in commit e611804

augmented_json_format_str = json.dumps(json_format)

logging.info(f'Extract data prompt')
logging.info(extract_data_prompt.format(json_format=augmented_json_format_str,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you considered using a few shot prompting template?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in commit e611804


extract_data_prompt = ChatPromptTemplate.from_messages(messages_data)

chain_extract_data = extract_data_prompt | llm_data | StrOutputParser()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you considered using structured output or tools to do info extraction?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in commit e611804

@dlaredo dlaredo requested a review from donatoaz July 30, 2024 20:43
Copy link

@donatoaz donatoaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi david, I think overall my original comments were neatly addressed and this looks a lot more "curent" in terms of how to use langchain, very good work.

I left some comments, they are all minor, so I will go ahead and approve, take in the comments and, if you want apply them, otherwise, feel free to merge.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this could be a pdf, seems more inclusive, not everyone has ppt

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PPT is required by the solutions team but I'm including a PDF version anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added PDF diagram in 0aa1ba7

item = event

logger.info('Item:')
logger.info(item)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are already logging the entire event on line 124. Maybe you could reconsider one of them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in e91f7a0


#Attemp to categorize item
text = item['text']
logger.info(f'Text: {text}')

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, we are logging this twice

Copy link
Contributor Author

@dlaredo dlaredo Aug 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in e91f7a0

logger.info(f'Text: {text}')

text = demoji.replace(text, "")
item['text_clean'] = text

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, we are logging this twice

Copy link
Contributor Author

@dlaredo dlaredo Aug 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in e91f7a0


try:

#meta_topics_str = ','.join(META_TOPICS)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commented out line?

Copy link
Contributor Author

@dlaredo dlaredo Aug 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in e91f7a0

text: str
) -> ExtractedInformation:

bedrock_llm = ChatBedrock(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you considered extracting this instantiation to root level so the client is created only once per lifetime of the lambda function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cant extract from the function since the parameters are different for each instance of the client.


claude_information_extraction_prompt_template = INFORMATION_EXTRACTION_PROMPT_SELECTOR.get_prompt(MODEL_ID)

print("The prompt template")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason for using prints when you have a logger instance? It might also be an opportunity to have some logs as INFO and others as DEBUG.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in e91f7a0


text_insight = text_information_extraction(META_SENTIMENTS_STR, item['text_clean'])
logger.info(f'Text insights:')
logger.info(text_insight)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You seem to be double-logging some things, for example line 119 logs the same as this line 150.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in e91f7a0

})

print("Information extraction object")
print(type(information_extraction_obj))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary? We know the type is ExtractedInformation since we are using structured outputs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in e91f7a0

region = args.region

sqs = boto3.client('sqs')
translate = boto3.client(service_name='translate', region_name=region, use_ssl=True)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you using this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is optional only for sending a test load to the solution and the translation part is only used if the user decides to test for posts in Spanish

@dlaredo
Copy link
Contributor Author

dlaredo commented Aug 2, 2024

Solution upgraded to support the use of LLMs to make information extraction more efficient:

Description of changes:

  • Incorporates LLMs to process the data in a more efficient manner
  • Makes use of structured outputs using Pydantic models
  • Uses Amazon Bedrock
  • Creates a new data-streamer to test the workflow

@donatoaz donatoaz merged commit ace836a into aws-samples:main Aug 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants