Add an API example using server.cpp similar to OAI. #2009

jwj7140 · 2023-06-26T17:24:40Z

adding an API example that provides responses similar to OpenAI's chat completion and completion.
This example is about 30% faster than existing similar examples because they are based on llama-cpp-python, which is slightly slower than llama.cpp.
This example must be used with server.cpp.

Run this code, and write like this in python:
openai.api_base = "http:https://***.***.***.***:8081"
Then almost all OpenAI api code is compatible with llama.cpp.

examples/server/api_like_OAI.py

SlyEcho · 2023-06-28T17:51:06Z

There is some formatting problem that the CI picked up, can you take care of it?

jwj7140 · 2023-06-29T05:53:37Z

um... Should I make new branch and PR again?
I don't know about github system well. sorry

SlyEcho · 2023-06-29T10:55:29Z

No, if you open the failing check it will tell you where it needs fixing.

examples/server/api_like_OAI.py:
	Wrong line endings or no final newline
	Not all lines have the correct end of line character
	73: Trailing whitespace

3 errors found

Some whitespace issues, should be easy.

jwj7140 · 2023-06-29T15:06:17Z

yep thank you

SlyEcho · 2023-07-02T14:11:28Z

Is it possible to get a binding for also /v1/ prefixed endpoints?

@app.route('/completions', methods=['POST'])
@app.route('/v1/completions', methods=['POST'])
def completion():

It seems like this would give better compatibility?

howard0su · 2023-07-02T15:02:24Z

Shall we change server example to match response of OAI?

jwj7140 · 2023-07-02T15:42:29Z

@SlyEcho Actually I think that it doesn't seem to matter much, but I'll add it. thank you.
server.cpp seems to return a "truncated" contents in many times(with "truncated:true" logs). Can you tell me what is "truncated" and why it is showed?

jwj7140 · 2023-07-02T16:15:13Z

@howard0su Well, I think the current server example has the most comfortable forms. OAI api is just an experimental attempt.

SlyEcho · 2023-07-02T21:25:03Z

Can you tell me what is "truncated" and why it is showed?

Truncated indicates the automatic context management when the context runs out of space. Some explanation here: #1838.

It is just an indicator to the API's user, this way they could create some kind of method to handle the lost information in the context. For example they could use another LLM to summarize the previous chat and then replace the earlier chat messages leaving more space for new questions and answers.

SlyEcho · 2023-07-03T07:43:57Z

Probabilities are now returned from the server in #1962

Would it be hard to add them in to this?

jwj7140 · 2023-07-03T10:38:01Z

great! Should I make PR again?

SlyEcho · 2023-07-04T12:05:36Z

We can add it later as well, it's up to you, otherwise we can merge it as-is.

jwj7140 · 2023-07-04T16:13:04Z

Okay I'll add it later. It is okay to merge. thanks

SlyEcho · 2023-07-04T16:34:41Z

It would be nice to add all this to the C++ API. Maybe in the future. But I will try to use this wrapper so I get an idea of what what possible changes to the server are needed.

commit 8432e9d Author: YellowRoseCx <[email protected]> Date: Sun Jul 9 16:55:30 2023 -0500 Update Makefile commit b58c189 Author: YellowRoseCx <[email protected]> Date: Sun Jul 9 16:20:00 2023 -0500 Add multi-gpu CuBLAS support to new GUI commit 0c1c71b Author: YellowRoseCx <[email protected]> Date: Sat Jul 8 07:56:57 2023 -0500 Update Makefile commit f864f60 Author: Johannes Gäßler <[email protected]> Date: Sat Jul 8 00:25:15 2023 +0200 CUDA: add __restrict__ to mul mat vec kernels (ggerganov#2140) commit 4539bc2 Author: YellowRoseCx <[email protected]> Date: Sat Jul 8 01:36:14 2023 -0500 update makefile for changes commit 912e31e Merge: 74e2703 ddaa4f2 Author: YellowRoseCx <[email protected]> Date: Fri Jul 7 23:15:37 2023 -0500 Merge remote-tracking branch 'upstream/concedo' commit ddaa4f2 Author: Concedo <[email protected]> Date: Fri Jul 7 22:14:14 2023 +0800 fix cuda garbage results and gpu selection issues commit 95eca51 Author: Concedo <[email protected]> Date: Fri Jul 7 18:39:47 2023 +0800 add gpu choice for GUI for cuda commit a689a66 Author: Concedo <[email protected]> Date: Fri Jul 7 17:52:34 2023 +0800 make it work with pyinstaller commit 9ee9a77 Author: Concedo <[email protected]> Date: Fri Jul 7 16:25:37 2023 +0800 warn outdated GUI (+1 squashed commits) Squashed commits: [15aec3d] spelling error commit 32102c2 Merge: 8424a35 481f793 Author: Concedo <[email protected]> Date: Fri Jul 7 14:15:39 2023 +0800 Merge branch 'master' into concedo_experimental # Conflicts: # README.md commit 481f793 Author: Howard Su <[email protected]> Date: Fri Jul 7 11:34:18 2023 +0800 Fix opencl by wrap #if-else-endif with \n (ggerganov#2086) commit dfd9fce Author: Georgi Gerganov <[email protected]> Date: Thu Jul 6 19:41:31 2023 +0300 ggml : fix restrict usage commit 36680f6 Author: Judd <[email protected]> Date: Fri Jul 7 00:23:49 2023 +0800 convert : update for baichuan (ggerganov#2081) 1. guess n_layers; 2. relax warnings on context size; 3. add a note that its derivations are also supported. Co-authored-by: Judd <[email protected]> commit a17a268 Author: tslmy <[email protected]> Date: Thu Jul 6 09:17:50 2023 -0700 alpaca.sh : update model file name (ggerganov#2074) The original file name, `ggml-alpaca-7b-q4.bin`, implied the first-generation GGML. After the breaking changes (mentioned in ggerganov#382), `llama.cpp` requires GGML V3 now. Those model files are named `*ggmlv3*.bin`. We should change the example to an actually working model file, so that this thing is more likely to run out-of-the-box for more people, and less people would waste time downloading the old Alpaca model. commit 8424a35 Author: Concedo <[email protected]> Date: Thu Jul 6 23:24:21 2023 +0800 added the ability to ban any substring tokens commit 27a0907 Author: Concedo <[email protected]> Date: Thu Jul 6 22:33:46 2023 +0800 backport MM256_SET_M128I to ggml_v2, updated lite, added support for selecting the GPU for cublas commit 220aa70 Merge: 4d1700b 31cfbb1 Author: Concedo <[email protected]> Date: Thu Jul 6 15:40:40 2023 +0800 Merge branch 'master' into concedo_experimental # Conflicts: # .github/workflows/build.yml # CMakeLists.txt # Makefile # README.md # pocs/vdot/q8dot.cpp # pocs/vdot/vdot.cpp # scripts/sync-ggml.sh # tests/test-grad0.c # tests/test-quantize-fns.cpp # tests/test-quantize-perf.cpp commit 4d1700b Author: Concedo <[email protected]> Date: Thu Jul 6 15:17:47 2023 +0800 adjust some ui sizing commit 1c80002 Author: Vali-98 <[email protected]> Date: Thu Jul 6 15:00:57 2023 +0800 New UI using customtkinter (LostRuins#284) * Initial conversion to customtkinter. * Initial conversion to customtkinter. * Additions to UI, still non-functional * UI now functional, untested * UI now functional, untested * Added saving configs * Saving and loading now functional * Fixed sliders not loading * Cleaned up duplicate arrays * Cleaned up duplicate arrays * Fixed loading bugs * wip fixing all the broken parameters. PLEASE test before you commit * further cleaning * bugfix completed for gui. now evaluating save and load * cleanup prepare to merge --------- Co-authored-by: Concedo <[email protected]> commit 31cfbb1 Author: Tobias Lütke <[email protected]> Date: Wed Jul 5 16:51:13 2023 -0400 Expose generation timings from server & update completions.js (ggerganov#2116) * use javascript generators as much cleaner API Also add ways to access completion as promise and EventSource * export llama_timings as struct and expose them in server * update readme, update baked includes * llama : uniform variable names + struct init --------- Co-authored-by: Georgi Gerganov <[email protected]> commit 74e2703 Merge: cf65429 f9108ba Author: YellowRoseCx <[email protected]> Date: Wed Jul 5 15:16:49 2023 -0500 Merge branch 'LostRuins:concedo' into main commit 983b555 Author: Jesse Jojo Johnson <[email protected]> Date: Wed Jul 5 18:03:19 2023 +0000 Update Server Instructions (ggerganov#2113) * Update server instructions for web front end * Update server README * Remove duplicate OAI instructions * Fix duplicate text --------- Co-authored-by: Jesse Johnson <[email protected]> commit ec326d3 Author: Georgi Gerganov <[email protected]> Date: Wed Jul 5 20:44:11 2023 +0300 ggml : fix bug introduced in ggerganov#1237 commit 1b6efea Author: Georgi Gerganov <[email protected]> Date: Wed Jul 5 20:20:05 2023 +0300 tests : fix test-grad0 commit 1b107b8 Author: Stephan Walter <[email protected]> Date: Wed Jul 5 16:13:06 2023 +0000 ggml : generalize `quantize_fns` for simpler FP16 handling (ggerganov#1237) * Generalize quantize_fns for simpler FP16 handling * Remove call to ggml_cuda_mul_mat_get_wsize * ci : disable FMA for mac os actions --------- Co-authored-by: Georgi Gerganov <[email protected]> commit 8567c76 Author: Jesse Jojo Johnson <[email protected]> Date: Wed Jul 5 15:13:35 2023 +0000 Update server instructions for web front end (ggerganov#2103) Co-authored-by: Jesse Johnson <[email protected]> commit 924dd22 Author: Johannes Gäßler <[email protected]> Date: Wed Jul 5 14:19:42 2023 +0200 Quantized dot products for CUDA mul mat vec (ggerganov#2067) commit 051c70d Author: Howard Su <[email protected]> Date: Wed Jul 5 18:31:23 2023 +0800 llama: Don't double count the sampling time (ggerganov#2107) commit ea79e54 Author: Concedo <[email protected]> Date: Wed Jul 5 17:29:35 2023 +0800 fixed refusing to quantize some models commit 9e4475f Author: Johannes Gäßler <[email protected]> Date: Wed Jul 5 08:58:05 2023 +0200 Fixed OpenCL offloading prints (ggerganov#2082) commit 7f0e9a7 Author: Nigel Bosch <[email protected]> Date: Tue Jul 4 18:33:33 2023 -0500 embd-input: Fix input embedding example unsigned int seed (ggerganov#2105) commit b472f3f Author: Georgi Gerganov <[email protected]> Date: Tue Jul 4 22:25:22 2023 +0300 readme : add link web chat PR commit ed9a54e Author: Georgi Gerganov <[email protected]> Date: Tue Jul 4 21:54:11 2023 +0300 ggml : sync latest (new ops, macros, refactoring) (ggerganov#2106) - add ggml_argmax() - add ggml_tanh() - add ggml_elu() - refactor ggml_conv_1d() and variants - refactor ggml_conv_2d() and variants - add helper macros to reduce code duplication in ggml.c commit f257fd2 Author: jwj7140 <[email protected]> Date: Wed Jul 5 03:06:12 2023 +0900 Add an API example using server.cpp similar to OAI. (ggerganov#2009) * add api_like_OAI.py * add evaluated token count to server * add /v1/ endpoints binding commit 7ee76e4 Author: Tobias Lütke <[email protected]> Date: Tue Jul 4 10:05:27 2023 -0400 Simple webchat for server (ggerganov#1998) * expose simple web interface on root domain * embed index and add --path for choosing static dir * allow server to multithread because web browsers send a lot of garbage requests we want the server to multithread when serving 404s for favicon's etc. To avoid blowing up llama we just take a mutex when it's invoked. * let's try this with the xxd tool instead and see if msvc is happier with that * enable server in Makefiles * add /completion.js file to make it easy to use the server from js * slightly nicer css * rework state management into session, expose historyTemplate to settings --------- Co-authored-by: Georgi Gerganov <[email protected]> commit acc111c Author: Henri Vasserman <[email protected]> Date: Tue Jul 4 15:38:04 2023 +0300 Allow old Make to build server. (ggerganov#2098) Also make server build by default. Tested with Make 3.82 commit 23c7c6f Author: ZhouYuChen <[email protected]> Date: Tue Jul 4 20:15:16 2023 +0800 Update Makefile: clean simple (ggerganov#2097) commit 69add28 Merge: 00e35d0 698efad Author: Concedo <[email protected]> Date: Tue Jul 4 18:51:42 2023 +0800 Merge branch 'master' into concedo_experimental # Conflicts: # .github/workflows/build.yml commit 00e35d0 Merge: fff705d f9108ba Author: Concedo <[email protected]> Date: Tue Jul 4 18:46:40 2023 +0800 Merge branch 'concedo' into concedo_experimental commit f9108ba Author: Michael Moon <[email protected]> Date: Tue Jul 4 18:46:08 2023 +0800 Make koboldcpp.py executable on Linux (LostRuins#293) commit fff705d Merge: 784628a c6c0afd Author: Concedo <[email protected]> Date: Tue Jul 4 18:42:02 2023 +0800 Merge remote-tracking branch 'ycros/improve-sampler-api-access' into concedo_experimental commit c6c0afd Author: Concedo <[email protected]> Date: Tue Jul 4 18:35:03 2023 +0800 refactor to avoid code duplication commit 784628a Merge: ca9a116 309534d Author: Concedo <[email protected]> Date: Tue Jul 4 16:38:32 2023 +0800 Merge remote-tracking branch 'ycros/improve-sampler-api-access' into concedo_experimental commit 698efad Author: Erik Scholz <[email protected]> Date: Tue Jul 4 01:50:12 2023 +0200 CI: make the brew update temporarily optional. (ggerganov#2092) until they decide to fix the brew installation in the macos runners. see the open issues. eg actions/runner-images#7710 commit 14a2cc7 Author: Govlzkoy <[email protected]> Date: Tue Jul 4 07:50:00 2023 +0800 [ggml] fix index for ne03 value in ggml_cl_mul_f32 (ggerganov#2088) commit cf65429 Author: YellowRoseCx <[email protected]> Date: Mon Jul 3 16:56:40 2023 -0500 print cuda or opencl based on what's used commit 72c16d2 Author: YellowRoseCx <[email protected]> Date: Mon Jul 3 16:45:39 2023 -0500 Revert "fix my mistake that broke other arches" This reverts commit 777aed5. commit 1cf14cc Author: Henri Vasserman <[email protected]> Date: Tue Jul 4 00:05:23 2023 +0300 fix server crashes (ggerganov#2076) commit 777aed5 Author: YellowRoseCx <[email protected]> Date: Mon Jul 3 15:53:32 2023 -0500 fix my mistake that broke other arches commit cc45a7f Author: Howard Su <[email protected]> Date: Tue Jul 4 02:43:55 2023 +0800 Fix crash of test-tokenizer-0 under Debug build (ggerganov#2064) * Fix crash of test-tokenizer-0 under Debug build * Change per comment commit ca9a116 Author: Concedo <[email protected]> Date: Tue Jul 4 00:35:02 2023 +0800 possibly slower, but cannot use larger batches without modifying ggml library. commit bfeb347 Author: Concedo <[email protected]> Date: Mon Jul 3 21:36:42 2023 +0800 fix typos commit 55dbb91 Author: Howard Su <[email protected]> Date: Mon Jul 3 19:58:58 2023 +0800 [llama] No need to check file version when loading vocab score (ggerganov#2079) commit d7d2e6a Author: WangHaoranRobin <[email protected]> Date: Mon Jul 3 05:38:44 2023 +0800 server: add option to output probabilities for completion (ggerganov#1962) * server: add option to output probabilities for completion * server: fix issue when handling probability output for incomplete tokens for multibyte character generation * server: fix llama_sample_top_k order * examples/common.h: put all bool variables in gpt_params together commit 27780a9 Author: YellowRoseCx <[email protected]> Date: Sun Jul 2 16:03:27 2023 -0500 rocm fixes commit f52c7d4 Author: YellowRoseCx <[email protected]> Date: Sun Jul 2 16:02:58 2023 -0500 Revert "rocm fixes" This reverts commit 2fe9927. commit 2fe9927 Author: YellowRoseCx <[email protected]> Date: Sun Jul 2 15:58:21 2023 -0500 rocm fixes commit efe7560 Author: YellowRoseCx <[email protected]> Date: Sun Jul 2 15:55:43 2023 -0500 Revert "move HIPBLAS definitions into ggml-cuda.h" This reverts commit bf49a93. commit 4fc0181 Author: YellowRoseCx <[email protected]> Date: Sun Jul 2 15:55:36 2023 -0500 Revert "move hipblas definitions to header files" This reverts commit 2741ffb. commit 89eb576 Merge: 2741ffb 3d2907d Author: YellowRoseCx <[email protected]> Date: Sun Jul 2 14:44:13 2023 -0500 Merge branch 'LostRuins:concedo' into main commit 309534d Author: Ycros <[email protected]> Date: Sun Jul 2 18:15:34 2023 +0000 implement sampler order, expose sampler order and mirostat in api commit 3d2907d Author: Concedo <[email protected]> Date: Sun Jul 2 18:28:09 2023 +0800 make gptneox and gptj work with extended context too commit d6b47e6 Merge: e17c849 46088f7 Author: Concedo <[email protected]> Date: Sun Jul 2 17:26:39 2023 +0800 Merge branch 'master' into concedo_experimental commit e17c849 Author: Concedo <[email protected]> Date: Sun Jul 2 17:25:08 2023 +0800 switched to NTK aware scaling commit e19483c Author: Concedo <[email protected]> Date: Sun Jul 2 14:55:08 2023 +0800 increase scratch for above 4096 commit 46088f7 Author: Georgi Gerganov <[email protected]> Date: Sun Jul 2 09:46:46 2023 +0300 ggml : fix build with OpenBLAS (close ggerganov#2066) commit b85ea58 Merge: ef3b8dc 0bc2cdf Author: Concedo <[email protected]> Date: Sun Jul 2 14:45:25 2023 +0800 Merge branch 'master' into concedo_experimental # Conflicts: # README.md commit 2741ffb Author: YellowRoseCx <[email protected]> Date: Sat Jul 1 17:07:42 2023 -0500 move hipblas definitions to header files commit bf49a93 Author: YellowRoseCx <[email protected]> Date: Sat Jul 1 16:38:50 2023 -0500 move HIPBLAS definitions into ggml-cuda.h commit 540f4e0 Merge: 2c3b46f eda663f Author: YellowRoseCx <[email protected]> Date: Sat Jul 1 14:58:32 2023 -0500 Merge remote-tracking branch 'upstream/concedo' commit 0bc2cdf Author: Johannes Gäßler <[email protected]> Date: Sat Jul 1 21:49:44 2023 +0200 Better CUDA synchronization logic (ggerganov#2057) commit befb3a3 Author: Johannes Gäßler <[email protected]> Date: Sat Jul 1 21:47:26 2023 +0200 Test-based VRAM scratch size + context adjustment (ggerganov#2056) commit b213227 Author: Daniel Drake <[email protected]> Date: Sat Jul 1 20:31:44 2023 +0200 cmake : don't force -mcpu=native on aarch64 (ggerganov#2063) It's currently not possible to cross-compile llama.cpp for aarch64 because CMakeLists.txt forces -mcpu=native for that target. -mcpu=native doesn't make sense if your build host is not the target architecture, and clang rejects it for that reason, aborting the build. This can be easily reproduced using the current Android NDK to build for aarch64 on an x86_64 host. If there is not a specific CPU-tuning target for aarch64 then -mcpu should be omitted completely. I think that makes sense, there is not enough variance in the aarch64 instruction set to warrant a fixed -mcpu optimization at this point. And if someone is building natively and wishes to enable any possible optimizations for the host device, then there is already the LLAMA_NATIVE option available. Fixes LostRuins#495. commit 2f8cd97 Author: Aaron Miller <[email protected]> Date: Sat Jul 1 11:14:59 2023 -0700 metal : release buffers when freeing metal context (ggerganov#2062) commit 471aab6 Author: Judd <[email protected]> Date: Sun Jul 2 01:00:25 2023 +0800 convert : add support of baichuan-7b (ggerganov#2055) Co-authored-by: Judd <[email protected]> commit 463f2f4 Author: Georgi Gerganov <[email protected]> Date: Sat Jul 1 19:05:09 2023 +0300 llama : fix return value of llama_load_session_file_internal (ggerganov#2022) commit cb44dbc Author: Rand Xie <[email protected]> Date: Sun Jul 2 00:02:58 2023 +0800 llama : catch llama_load_session_file_internal exceptions (ggerganov#2022) * convert checks in llama_load_session_file to throw and handle them * make llama_load_session_file_internal static * address feedbacks to avoid using exceptions commit 79f634a Author: Georgi Gerganov <[email protected]> Date: Sat Jul 1 18:46:00 2023 +0300 embd-input : fix returning ptr to temporary commit 04606a1 Author: Georgi Gerganov <[email protected]> Date: Sat Jul 1 18:45:44 2023 +0300 train : fix compile warning commit b1ca8f3 Author: Qingyou Meng <[email protected]> Date: Sat Jul 1 23:42:43 2023 +0800 ggml : disable GGML_TASK_INIT and GGML_TASK_FINALIZE by default (ggerganov#1995) Will not be scheduled unless explicitly enabled. commit 2c3b46f Author: YellowRoseCx <[email protected]> Date: Thu Jun 29 18:43:43 2023 -0500 changes to fix build commit c9e1103 Author: YellowRoseCx <[email protected]> Date: Thu Jun 29 18:20:07 2023 -0500 Update ggml_v2-cuda-legacy.cu for ROCM commit b858fc5 Author: YellowRoseCx <[email protected]> Date: Thu Jun 29 17:49:39 2023 -0500 changes to work with upstream commit 69a0c25 Merge: 096f0b0 1347d3a Author: YellowRoseCx <[email protected]> Date: Thu Jun 29 16:59:06 2023 -0500 Merge remote-tracking branch 'upstream/concedo' commit 096f0b0 Author: YellowRoseCx <[email protected]> Date: Wed Jun 28 15:27:02 2023 -0500 revert unnecessary hipblas conditionals commit d81e81a Author: YellowRoseCx <[email protected]> Date: Wed Jun 28 14:48:23 2023 -0500 Update Makefile hipblas nvcc correction commit 2579ecf Merge: abed427 d2034ce Author: YellowRoseCx <[email protected]> Date: Sun Jun 25 17:50:04 2023 -0500 Merge branch 'LostRuins:concedo' into main commit abed427 Author: YellowRoseCx <[email protected]> Date: Sat Jun 24 19:16:30 2023 -0500 reorganize If statements to include proper headers commit 06c3bf0 Merge: ea6d320 8342fe8 Author: YellowRoseCx <[email protected]> Date: Sat Jun 24 16:57:20 2023 -0500 Merge branch 'LostRuins:concedo' into main commit ea6d320 Author: YellowRoseCx <[email protected]> Date: Fri Jun 23 01:53:28 2023 -0500 Update README.md commit 4d56ad8 Author: YellowRoseCx <[email protected]> Date: Thu Jun 22 16:19:43 2023 -0500 Update README.md commit 21f9308 Author: YellowRoseCx <[email protected]> Date: Thu Jun 22 15:42:05 2023 -0500 kquants_iter for hipblas and add gfx803 commit b6ff890 Merge: eb094f0 e6ddb15 Author: YellowRoseCx <[email protected]> Date: Thu Jun 22 12:42:09 2023 -0500 Merge branch 'LostRuins:concedo' into main commit eb094f0 Author: YellowRoseCx <[email protected]> Date: Wed Jun 21 23:59:18 2023 -0500 lowvram parameter description commit 3a5dfeb Merge: 665cc11 b1f00fa Author: YellowRoseCx <[email protected]> Date: Wed Jun 21 16:53:03 2023 -0500 Merge branch 'LostRuins:concedo' into koboldcpp-rocm commit 665cc11 Author: YellowRoseCx <[email protected]> Date: Wed Jun 21 01:13:19 2023 -0500 add lowvram parameter commit 222cbbb Author: YellowRoseCx <[email protected]> Date: Tue Jun 20 19:03:28 2023 -0500 add additional hipblas conditions for cublas commit e1f9581 Author: YellowRoseCx <[email protected]> Date: Tue Jun 20 16:51:59 2023 -0500 Add hip def for cuda v2 commit 3bff5c0 Merge: a7e74b3 266d47a Author: YellowRoseCx <[email protected]> Date: Tue Jun 20 13:38:06 2023 -0500 Merge branch 'LostRuins:concedo' into koboldcpp-rocm commit a7e74b3 Author: YellowRoseCx <[email protected]> Date: Mon Jun 19 22:04:18 2023 -0500 Update README.md commit 5e99b3c Author: YellowRoseCx <[email protected]> Date: Mon Jun 19 22:03:42 2023 -0500 Update Makefile commit 9190b17 Author: YellowRoseCx <[email protected]> Date: Mon Jun 19 21:47:10 2023 -0500 Update README.md commit 2780ea2 Author: YellowRoseCx <[email protected]> Date: Sun Jun 18 15:48:00 2023 -0500 Update Makefile commit 04a3e64 Author: YellowRoseCx <[email protected]> Date: Sun Jun 18 14:33:39 2023 -0500 remove extra line commit cccbca9 Author: YellowRoseCx <[email protected]> Date: Sun Jun 18 14:31:17 2023 -0500 attempt adding ROCM hipblas commit a44a1d4 Author: YellowRoseCx <[email protected]> Date: Sun Jun 18 14:31:01 2023 -0500 attempt adding ROCM hipblas commit b088184 Author: YellowRoseCx <[email protected]> Date: Sun Jun 18 14:30:54 2023 -0500 attempt adding ROCM hipblas

ANMahmood · 2023-09-30T14:09:41Z

Hi,
I am just trying to get python api_like_OAI.py running but having the following issue:
In one terminal, the server loads fine:

./server --model ~/Downloads/LLM_Applications/LocalAI/models/wizardcoder-python-34b-v1.0.Q8_0.gguf --ctx-size 16000 --threads 20 --alias gpt-4 --embedding --host 127.0.0.1 --port 8000
llama server listening at http:https://127.0.0.1:8000

{"timestamp":1696079264,"level":"INFO","function":"main","line":1623,"message":"HTTP server listening","hostname":"127.0.0.1","port":8000}

second terminal starts api_likeOAI.py but when I send requests to this server, it can't find the model (gpt-4):

python3 api_like_OAI.py 
 * Serving Flask app 'api_like_OAI'
 * Debug mode: off
WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
 * Running on http:https://127.0.0.1:8001
Press CTRL+C to quit
127.0.0.1 - - [30/Sep/2023 21:39:44] "GET /v1/models HTTP/1.1" 404 -
127.0.0.1 - - [30/Sep/2023 21:39:45] "GET /v1/models HTTP/1.1" 404 -
127.0.0.1 - - [30/Sep/2023 21:39:46] "GET /v1/models HTTP/1.1" 404 -

I am sending requests to gpt-4 as that's the alias set with server. What am I doing wrong?

cebtenzzre · 2023-09-30T16:48:42Z

I am just trying to get python api_like_OAI.py running but having the following issue:

If you open a new issue with that information, you are more likely to get a response.

jwj7140 · 2023-10-06T15:38:44Z

@ANMahmood

I am sending requests to gpt-4 as that's the alias set with server. What am I doing wrong?

This example does not have "/models" endpoint that exist in actual OpenAI's API because server.cpp supports only one model to run. So...if there are any problems with your project because of this issue, removing code may be helpful.

add api_like_OAI.py

f41f09a

SlyEcho reviewed Jun 27, 2023

View reviewed changes

examples/server/api_like_OAI.py Outdated Show resolved Hide resolved

SlyEcho reviewed Jun 27, 2023

View reviewed changes

examples/server/api_like_OAI.py Outdated Show resolved Hide resolved

jwj7140 added 3 commits June 28, 2023 02:16

fix bugs, remove chat format using \n

cc5de81

fix mistakes

e1abf63

change token count method

a4149aa

fix whitespace, edit README.md

d7435fe

jwj7140 added 3 commits June 30, 2023 01:08

add newline

b95016c

fix bugs

377ecf9

set n_keep to -1

7dcffd7

add /v1/ endpoints binding

f713dd5

SlyEcho approved these changes Jul 4, 2023

View reviewed changes

jwj7140 added 2 commits July 5, 2023 01:02

fix bug & add truncation return

41f7a50

print json

93e69ab

SlyEcho merged commit f257fd2 into ggerganov:master Jul 4, 2023
22 checks passed

jwj7140 deleted the OAI_API branch July 4, 2023 22:58

ghost mentioned this pull request Jul 6, 2023

[User] ./server failed to eval #2122

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add an API example using server.cpp similar to OAI. #2009

Add an API example using server.cpp similar to OAI. #2009

jwj7140 commented Jun 26, 2023

SlyEcho commented Jun 28, 2023

jwj7140 commented Jun 29, 2023

SlyEcho commented Jun 29, 2023 •

edited

Loading

jwj7140 commented Jun 29, 2023

SlyEcho commented Jul 2, 2023

howard0su commented Jul 2, 2023

jwj7140 commented Jul 2, 2023 •

edited

Loading

jwj7140 commented Jul 2, 2023 •

edited

Loading

SlyEcho commented Jul 2, 2023

SlyEcho commented Jul 3, 2023

jwj7140 commented Jul 3, 2023 •

edited

Loading

SlyEcho commented Jul 4, 2023

jwj7140 commented Jul 4, 2023

SlyEcho commented Jul 4, 2023

ANMahmood commented Sep 30, 2023

cebtenzzre commented Sep 30, 2023

jwj7140 commented Oct 6, 2023 •

edited

Loading

Add an API example using server.cpp similar to OAI. #2009

Add an API example using server.cpp similar to OAI. #2009

Conversation

jwj7140 commented Jun 26, 2023

SlyEcho commented Jun 28, 2023

jwj7140 commented Jun 29, 2023

SlyEcho commented Jun 29, 2023 • edited Loading

jwj7140 commented Jun 29, 2023

SlyEcho commented Jul 2, 2023

howard0su commented Jul 2, 2023

jwj7140 commented Jul 2, 2023 • edited Loading

jwj7140 commented Jul 2, 2023 • edited Loading

SlyEcho commented Jul 2, 2023

SlyEcho commented Jul 3, 2023

jwj7140 commented Jul 3, 2023 • edited Loading

SlyEcho commented Jul 4, 2023

jwj7140 commented Jul 4, 2023

SlyEcho commented Jul 4, 2023

ANMahmood commented Sep 30, 2023

cebtenzzre commented Sep 30, 2023

jwj7140 commented Oct 6, 2023 • edited Loading

SlyEcho commented Jun 29, 2023 •

edited

Loading

jwj7140 commented Jul 2, 2023 •

edited

Loading

jwj7140 commented Jul 2, 2023 •

edited

Loading

jwj7140 commented Jul 3, 2023 •

edited

Loading

jwj7140 commented Oct 6, 2023 •

edited

Loading