Skip to content

Create and use text embeddings of Org-mode buffers and nodes

License

Notifications You must be signed in to change notification settings

renatgalimov/org-embeddings

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Tasks

Measure the number of tokens before processing

I want to know if I need to reduce the text size before I do API request.

Fix unresolved errors

2023-07-14 11:11:44 ERROR Failed to index `/Users/renat/emacs/roam/org/daily/2022-01-28.org': (org-embeddings-data-error TEXT)
2023-07-14 11:11:50 ERROR Failed to index `/Users/renat/emacs/roam/org/daily/2022-04-20.org': (org-embeddings-data-error TEXT)
2023-07-14 11:11:54 ERROR Failed to index `/Users/renat/emacs/roam/org/daily/2022-07-11.org': (error (#s(request-response 400 nil ((error (message . "This model's maximum context length is 4097 tokens, however you requested 4588 tokens (4556 in your prompt; 32 for the completion). Please reduce your prompt; or completion length.") (type . "invalid_request_error") (param) (code))) (error http 400) error "https://api.openai.com/v1/completions" t (:error (closure ((user . "") (logit-bias) (best-of) (frequency-penalty . 0.0) (presence-penalty . 1.0) (stop) (echo) (logprobs) (stream) (n . 1) (top-p) (temperature . 1.0) (max-tokens . 32) (suffix) (model . "text-davinci-003") (org-id) (key lambda nil (op-get-password "zc3rypii34pkpbxx3hgp34ex2y")) (content-type . "application/json") (parameters) (base-url . "https://api.openai.com/v1") (--cl-rest-- :max-tokens 32 :temperature 1.0 :frequency-penalty 0.0 :presence-penalty 1.0 :n 1) (callback closure ((result . #s(deferred deferred:default-callback deferred:default-errorback deferred:default-cancel #s(deferred (closure ((it . #s(deferred (closure #11 (_) (deferred:wait-idle 1)) deferred:default-errorback deferred:default-cancel #s(deferred #[257 "\302\300\301\"\207" [(closure ((it . #s(deferred (closure #18 (_) (org-embeddings-log "info" "===== Finished indexing daily files =====")) deferred:default-errorback deferred:default-cancel nil nil nil)) t) (target-file) (org-embeddings-index-single-daily-file target-file)) "/Users/renat/emacs/roam/org/daily/2022-07-12.org" deferred:call-lambda] 4 "
" #s(hash-table size 65 test equal rehash-size 1.5 rehash-threshold 0.8125 data ("file" "/Users/renat/emacs/roam/org/daily/2022-07-11.org")) "4bf76a3406d817250044ccb12f7e5e72a5e711c827da5d82692199a3a614f99c")) (target-file . "/Users/renat/emacs/roam/org/daily/2022-07-11.org") cl-struct-org-embeddings-record-tags cl-struct-org-embeddings-source-tags t) (err) (org-embeddings-log "error" "Failed to index `%s': %s" target-file err) (deferred:succeed nil)) deferred:default-cancel #s(deferred (closure ((it . #11) (current-embedding) (source . #s(org-embeddings-source "c1a21ffc-229f-473c-a055-f1550314967d" "1 Priorities
" #s(hash-table size 65 test equal rehash-size 1.5 rehash-threshold 0.8125 data ("file" "/Users/renat/emacs/roam/org/daily/2022-07-11.org")) "4bf76a3406d817250044ccb12f7e5e72a5e711c827da5d82692199a3a614f99c")) (target-file . "/Users/renat/emacs/roam/org/daily/2022-07-11.org") cl-struct-org-embeddings-record-tags cl-struct-org-embeddings-source-tags t) (err) (org-embeddings-log "error" "Failed to index `%s': %s" target-file err) (deferred:succeed nil)) deferred:default-cancel #s(deferred (closure ((it . #11) (current-embedding) (source . #s(org-embeddings-source "c1a21ffc-229f-473c-a055-f1550314967d" "1 Priorities
2023-07-14 11:11:59 ERROR Failed to index `/Users/renat/emacs/roam/org/daily/2022-08-16.org': (org-embeddings-data-error TEXT)
2023-07-14 11:12:03 ERROR Failed to index `/Users/renat/emacs/roam/org/daily/2023-06-16.org': (org-embeddings-data-error TEXT)
2023-07-14 11:12:17 ERROR Failed to index `/Users/renat/emacs/roam/org/daily/2023-07-12.org': (error No ID found)
2023-07-14 11:12:17 ERROR Failed to index `/Users/renat/emacs/roam/org/daily/2023-07-13.org': (error Need absolute ‘org-attach-id-dir’ to attach in buffers without filename)

Model’s maximum context length when generating a TL;DR;

(error (message . "This model's maximum context length is 4097 tokens, however you requested 4588 tokens (4556 in your prompt; 32 for the completion). Please reduce your prompt; or completion length.")

error Need absolute ‘org-attach-id-dir’ to attach in buffers without filename

org/daily/2023-07-13.org

[#A] Catch errors returned by OpenAI

[#C] Extract title from the #+title field of the file

[#B] Implement traversing over org-roam node and generating embeddings

[#A] Deal with an error in a deferred processing

  • State “DONE” from “CANCELED” [2023-07-05 Wed 05:59]
  • State “DONE” from “TODO” [2023-07-05 Wed 05:59]
Debugger entered--Lisp error: (outline-before-first-heading)
  outline-back-to-heading(t)
  org-back-to-heading-or-point-min(t)
  org-get-property-block()
  org--property-local-values("ID" nil)
  org-entry-get(nil "ID")
  org-id-get(1 nil)
  (let ((id (org-id-get (point-min) nil))) (if id nil (org-embeddings-log "debug" "No ID found in %s" path) (error "No ID found")) (let ((org-export-use-babel nil)) (org-embeddings-log "debug" "Exporting embeddings source with id %s" id) (let ((text (org-export-as 'ascii nil nil t org-embeddings-export-plist))) (make-org-embeddings-source :id id :text text :metadata (org-embeddings-metadata :file (buffer-file-name))))))
  (progn (insert-file-contents path) (goto-char (point-min)) (let ((id (org-id-get (point-min) nil))) (if id nil (org-embeddings-log "debug" "No ID found in %s" path) (error "No ID found")) (let ((org-export-use-babel nil)) (org-embeddings-log "debug" "Exporting embeddings source with id %s" id) (let ((text (org-export-as 'ascii nil nil t org-embeddings-export-plist))) (make-org-embeddings-source :id id :text text :metadata (org-embeddings-metadata :file (buffer-file-name)))))))
  (unwind-protect (progn (insert-file-contents path) (goto-char (point-min)) (let ((id (org-id-get (point-min) nil))) (if id nil (org-embeddings-log "debug" "No ID found in %s" path) (error "No ID found")) (let ((org-export-use-babel nil)) (org-embeddings-log "debug" "Exporting embeddings source with id %s" id) (let ((text (org-export-as ... nil nil t org-embeddings-export-plist))) (make-org-embeddings-source :id id :text text :metadata (org-embeddings-metadata :file (buffer-file-name))))))) (and (buffer-name temp-buffer) (kill-buffer temp-buffer)))
  (save-current-buffer (set-buffer temp-buffer) (unwind-protect (progn (insert-file-contents path) (goto-char (point-min)) (let ((id (org-id-get (point-min) nil))) (if id nil (org-embeddings-log "debug" "No ID found in %s" path) (error "No ID found")) (let ((org-export-use-babel nil)) (org-embeddings-log "debug" "Exporting embeddings source with id %s" id) (let ((text ...)) (make-org-embeddings-source :id id :text text :metadata (org-embeddings-metadata :file ...)))))) (and (buffer-name temp-buffer) (kill-buffer temp-buffer))))
  (let ((temp-buffer (generate-new-buffer " *temp*" t))) (save-current-buffer (set-buffer temp-buffer) (unwind-protect (progn (insert-file-contents path) (goto-char (point-min)) (let ((id (org-id-get ... nil))) (if id nil (org-embeddings-log "debug" "No ID found in %s" path) (error "No ID found")) (let ((org-export-use-babel nil)) (org-embeddings-log "debug" "Exporting embeddings source with id %s" id) (let (...) (make-org-embeddings-source :id id :text text :metadata ...))))) (and (buffer-name temp-buffer) (kill-buffer temp-buffer)))))
  org-embeddings-file-get("/Users/renat/emacs/roam/org/daily/2022-01-17.org")
  (org-embeddings-create (org-embeddings-file-get target-file))
  (closure (t) (target-file) (org-embeddings-log "debug" "Indexing daily at `%s'" target-file) (org-embeddings-create (org-embeddings-file-get target-file)))("/Users/renat/emacs/roam/org/daily/2022-01-17.org")
  deferred:call-lambda((closure (t) (target-file) (org-embeddings-log "debug" "Indexing daily at `%s'" target-file) (org-embeddings-create (org-embeddings-file-get target-file))) "/Users/renat/emacs/roam/org/daily/2022-01-17.org")
  #f(compiled-function (x) #<bytecode -0x9051b82e92b7d48>)(nil)
  deferred:call-lambda(#f(compiled-function (x) #<bytecode -0x9051b82e92b7d48>) nil)
  deferred:exec-task(#s(deferred :callback #f(compiled-function (x) #<bytecode -0x9051b82e92b7d48>) :errorback deferred:default-errorback :cancel deferred:default-cancel :next #s(deferred :callback #f(compiled-function (x) #<bytecode -0x9051b82f1eb7d48>) :errorback deferred:default-errorback :cancel deferred:default-cancel :next #s(deferred :callback #f(compiled-function (x) #<bytecode -0x9051b8293ab7d48>) :errorback deferred:default-errorback :cancel deferred:default-cancel :next #s(deferred :callback #f(compiled-function (x) #<bytecode -0x9051b83ebeb7d48>) :errorback deferred:default-errorback :cancel deferred:default-cancel :next #s(deferred :callback #f(compiled-function (x) #<bytecode -0x9051b9bd5ab7d48>) :errorback deferred:default-errorback :cancel deferred:default-cancel :next #s(deferred :callback #f(compiled-function (x) #<bytecode -0x9051b9bc0eb7d48>) :errorback deferred:default-errorback :cancel deferred:default-cancel :next #s(deferred :callback #f(compiled-function (x) #<bytecode -0x9051b9b80ab7d48>) :errorback deferred:default-errorback :cancel deferred:default-cancel :next ... :status nil :value nil) :status nil :value nil) :status nil :value nil) :status nil :value nil) :status nil :value nil) :status nil :value nil) :status nil :value nil) ok nil)
  deferred:worker()
  apply(deferred:worker nil)
  timer-event-handler([t 25761 37785 86718 nil deferred:worker nil nil 0 nil])
  sit-for(0.05)
  deferred:sync!(#s(deferred :callback #f(compiled-function (x) #<bytecode -0x9051b9b80ab7d48>) :errorback deferred:default-errorback :cancel #f(compiled-function (x) #<bytecode -0x16a88df575a77c3>) :next #s(deferred :callback #f(compiled-function (x) #<bytecode -0x1433a959d4b774b2>) :errorback deferred:default-errorback :cancel deferred:default-cancel :next #s(deferred :callback deferred:default-callback :errorback #f(compiled-function (err) #<bytecode -0x19e5ed468cf6f67a>) :cancel deferred:default-cancel :next nil :status nil :value nil) :status nil :value nil) :status nil :value nil))
  (progn (deferred:sync! (org-embeddings-index-daily-files)))
  elisp--eval-last-sexp(nil)
  eval-last-sexp(nil)
  funcall-interactively(eval-last-sexp nil)
  command-execute(eval-last-sexp)

[#B] Skip embeddings generation if hash didn’t change

  • State “DONE” from “TODO” [2023-07-03 Mon 05:41]

Search should display in an org buffer

  • State “DONE” from “TODO” [2023-06-17 Sat 07:38]
Current “I’m feeling lucky” approach is’t very helpful.

[#A] JSON storage have model data cached

  • State “DONE” from “CANCELED” [2023-06-11 Sun 15:35]
  • State “DONE” from “TODO” [2023-06-11 Sun 15:35]

This way we will be able to patch the model data at any time wihout being afraid of race conditions.

Use the deferred library for async processing

Logic

The library can receive embeddings from various AI engines, with OpenAI being one of the options.

It takes an org-element as input and performs the necessary operations within the library.

The output vector can be saved in a vector database or utilized in any other preferred manner.

Namespaces

  • org-embeddings-store-* - Used to save embeddings received from AI services.
  • org-embeddings-create-* - Create an embedding for a given text.
  • org-embeddings-element-* - Works with org file structure
  • org-embeddings-json-* - Store embeddings in JSON - suitable for development;
  • org-embeddings-openai-* - Create embeddings with OpenAI.
  • org-embeddings-pipe-* - Pre-process the data before sending it to an API.

About

Create and use text embeddings of Org-mode buffers and nodes

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published