Skip to content
Ben edited this page Dec 17, 2023 · 9 revisions

Roadmap

  1. parse Latex documents

    1. deal with improperly formatted content
  2. in each document,

    1. identify math equations
      1. convert math equations into Abstract Syntax Tree (AST)
    2. identify variables (and associated units, range, type: real, imaginary, complex, scalar, vector, matrix)
      1. associate variables with their definitions
    3. identify numeric constants (and associated units)
    4. associate a variable in a math equation AST with a variable definition (and associated units, range, type)
    5. find the missing steps between math equations
  3. find common variables or equations across different papers

a draft property graph schema for latex-in-arxiv

node:file
  properties:
  - file path
  - hash of the file

node:token
  properties:
  - token length
  - hash of the token
  - token offset in file
  - latex type: environment/math equation/math variable or constant/display text/macros/comment

directed edge:file_has_token

queries for the property graph:

  • return the number of tokens in file
  • return the number of documents with token

Latex math equation to SymPy (for AST and for applying transformations)

https://physicsderivationgraph.blogspot.com/2020/05/replacing-symbols-in-sympy-expression.html

complexities where SymPy fails: https://physicsderivationgraph.blogspot.com/2020/09/representing-laplace-operator-nabla-in.html https://physicsderivationgraph.blogspot.com/2020/07/function-latex-for-sympy.html https://physicsderivationgraph.blogspot.com/2020/07/quantum-bra-ket-dirac-notation-in-sympy.html