Skip to content

[OSDI'24] Serving LLM-based Applications Efficiently with Semantic Variable

License

Notifications You must be signed in to change notification settings

microsoft/ParrotServe

Repository files navigation

Parrot: Efficient Serving of LLM-based Application with Semantic Variable

[Paper]

This project is a research prototype for now. Being eargerly iterated.

Parrot is a distributed serving system for LLM-based Applications. The Parrot API w/ Semantic Variable is served by a centralized cluster manager called ServeCore, which manages many Engine instances. Each Parrot Engine runs a single LLM model and communicates with ServeCore by contextual Fill/Gen APIs. Note that each Engine is capable of providing language model services independently, therefore the system is horizontally scalable and many types of Engines can be integrated into Parrot (e.g., vLLM, FasterTransformer, etc.).

LLM Applications

The powerful language understanding capability of large language models (LLMs) has enabled a new application paradigm, where one or multiple application entities, known as AI agents or co-pilots, communicate with LLMs via natural language, known as “prompts”, to accomplish a task collaboratively. Parrot is designed to serve these LLM-based applications efficiently by adding Semantic Variable in current OpenAI-style API, exposing richer application-level knowledge to backend systems and engines for better optimization.

Install

See INSTALL.md for installation instructions.

Run Parrot

Run the Compose Script in a Single Machine

We provide some one-click scripts to run Parrot in a single machine with sample configs. You can check them in the sample_configs/launch folder.

bash sample_configs/launch/launch_single_vicuna_13b.sh

Start a ServeCore Server

You can separately start a ServeCore server.

python3 -m parrot.serve.http_server --config_path <config_path>

Start an Engine Server

You can separately start an engine server. If you choose to connect to the ServeCore server, you need to start the ServeCore server first and specify the ServeCore server address in the config file.

python3 -m parrot.engine.http_server --config_path <config_path>

Acknowledgement

We learned a lot from the following projects when developing Parrot.

Reference

If you find Parrot useful or relevant to your research, please cite our paper as below:

@inproceedings{lin2024parrot,
    author = {Chaofan Lin and Zhenhua Han and Chengruidong Zhang and Yuqing Yang and Fan Yang and Chen Chen and Lili Qiu},
    title = {Parrot: Efficient Serving of LLM-based Applications with Semantic Variable},
    booktitle = {18th USENIX Symposium on Operating Systems Design and Implementation (OSDI 24)},
    year = {2024},
    address = {Santa Clara, CA},
    publisher = {USENIX Association},
    url = {https://www.usenix.org/conference/osdi24/presentation/lin-chaofan},
    month = jul
}

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

About

[OSDI'24] Serving LLM-based Applications Efficiently with Semantic Variable

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors 4

  •  
  •  
  •  
  •  

Languages