We have two parts in our project. The first part is the on-edge part, which is in edge_choose-your-llm
folder. The directory contains the resources and code needed to run for selecting the best LLM model for running a task.
The second part is the on-cloud part, which is in cloud_inference
folder. It provides instructions on how to install, set up, and run our dynamic batching system as well as our batch size vs. throughput vs. latency experiments. See the README.md
in the folder for more details.