GitHub - databricks-industry-solutions/review-summarisation: Automated Analysis of Product Reviews Using Large Language Models

Automated Analysis of Product Reviews Using Large Language Models

Today, most organizations employ a modestly-sized team of workers to read and digest user feedback for insights that may help improve a product's performance or otherwise identify issues related to customer satisfaction.

The work is important but anything but sexy. A worker reads a review, takes notes, and moves on to the next. Individual reviews that require a response are flagged and a summary of the feedback from across multiple reviews are compiled for review by product or category managers.

This is a type of work that's ripe for automation. The volume of reviews that pour into a site mean the more detailed portions of this work are often performed on a limited subset of products across variable windows depending on a products importance. In more sophisticated organizations, rules detecting course or inappropriate language and models estimating user sentiment or otherwise classifying reviews for positive, negative or neutral experiences may be applied to help identify problematic content and draw a reviewer's attention to it. But either way, a lot is missed simply because we can't throw enough bodies at the problem to keep up and those bodies tend to become bored or fatigued with the monotony of the work.

But using an LLM, issues of scale and consistency can be easily addressed. All we need to do is bring the product reviews to the model and ask:

What are the top three points of negative feedback found across these reviews?
What features do our customers like best about this product?
Do customers feel they are receiving sufficient value from the product relative to what they are being asked to pay?
Are there any reviews that are especially negative or are using inappropriate language?

[email protected], [email protected]

© 2023 Databricks, Inc. All rights reserved. The source in this notebook is provided subject to the Databricks License [https://databricks.com/db-license-source]. All included or referenced third party libraries are subject to the licenses set forth below.

library	description	license	source
flash-attn	Fast and memory-efficient exact attention	BSD	https://github.com/Dao-AILab/flash-attention
xformers	Hackable and optimized Transformers building blocks	BSD	https://github.com/facebookresearch/xformers
ctranslate2	Fast inference engine for Transformer models	MIT	https://github.com/OpenNMT/CTranslate2
triton	Triton language and compiler	MIT	https://github.com/openai/triton/

Getting started

Although specific solutions can be downloaded as .dbc archives from our websites, we recommend cloning these repositories onto your databricks environment. Not only will you get access to latest code, but you will be part of a community of experts driving industry best practices and re-usable solutions, influencing our respective industries.

To start using a solution accelerator in Databricks simply follow these steps:

Clone solution accelerator repository in Databricks using Databricks Repos
Attach the RUNME notebook to any cluster and execute the notebook via Run-All. A multi-step-job describing the accelerator pipeline will be created, and the link will be provided. The job configuration is written in the RUNME notebook in json format.
Execute the multi-step-job to see how the pipeline runs.
You might want to modify the samples in the solution accelerator to your need, collaborate with other users and run the code samples against your own data. To do so start by changing the Git remote of your repository to your organization’s repository vs using our samples repository (learn more). You can now commit and push code, collaborate with other user’s via Git and follow your organization’s processes for code development.

The cost associated with running the accelerator is the user's responsibility.

Project support

Please note the code in this project is provided for your exploration only, and are not formally supported by Databricks with Service Level Agreements (SLAs). They are provided AS-IS and we do not make any guarantees of any kind. Please do not submit a support ticket relating to any issues arising from the use of these projects. The source in this project is provided subject to the Databricks License. All included or referenced third party libraries are subject to the licenses set forth below.

Any issues discovered through the use of this project should be filed as GitHub Issues on the Repo. They will be reviewed as time permits, but there are no formal SLAs for support.

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
.github/workflows		.github/workflows
experiment		experiment
.gitignore		.gitignore
01-data-download.ipynb		01-data-download.ipynb
02-explore-prep.ipynb		02-explore-prep.ipynb
03-prompt-engineering.ipynb		03-prompt-engineering.ipynb
04-summarisation.ipynb		04-summarisation.ipynb
05-condensation.ipynb		05-condensation.ipynb
06-presentation.ipynb		06-presentation.ipynb
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
RUNME.py		RUNME.py
SECURITY.md		SECURITY.md
config.py		config.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Automated Analysis of Product Reviews Using Large Language Models

Getting started

Project support

About

Releases

Packages

Contributors 2

Languages

License

databricks-industry-solutions/review-summarisation

Folders and files

Latest commit

History

Repository files navigation

Automated Analysis of Product Reviews Using Large Language Models

Getting started

Project support

About

Topics

Resources

License

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages