π Hi, my name is Saul Shanabrook. π
π Welcome to my website! π
π Here lies a collection of "internet links." π
π I have helpfully arranged them into categories. π
πΎ I hope you enjoy! πΎ
π Oh also, if you are looking for my resume, here it is π
Feel free to reach out to me via email, google meet, twitter, mastodon or github.
a database of edible plants
nonprofit helping create public access food forests in western MA
an inspiring model of using community control to prevent gentrification and create affordable resident controlled housing
a v. fun video on some great alternative options on community stewardship
a cool tool to help build replacement systems without having to worry about rule order π±.
a library to use e-graphs in Python for building expressive DSLs and optimizing code
a project with some friends to find a place to live, do fun things, and try something out
an iOS app I started to help people become better friends with plants near them
a library to use pattern matching and type analysis to build safe DSLs in Python, in order to allow scientific computing libraries to better collaborate and share key abstractions.
provides a friendly isomporphic representation of Python's bytecode objects
an open source data science IDE in your browser. I was a core maintainor for a while and helped on a variety of extensions as well
a python code analysis tool, which helps productionize data science code by building a DAG of python code
my new blog posts on Github Discussions
my old blog posts on my previous statically generated website
March 21st, 2024: Optimizing Scikit-Learn with Egglog and Numba
Now that I have this great e-graph library in Python, what extra mechanisms do I need to make it useful in existing Python code?
This talk will go thorugh a few techniques developed and also point to how by bringing in use cases from scientific Python can help drive further theoretic research
EGRAPHS Community - Lightning Talks
November 3rd, 2023: egglog: e-graphs in Python
August 1st, 2023: egglog: E-Graphs in Python
The PyData ecosystem is home to one of the largest and most successful open source communities. It's both where most newcomers to data science start and also where cutting edge research takes place. It has been able to support the diverse needs of its users through its decentralized nature, promoting creativity and collaboration.
As the size of data has increased and our compute has moved off of our single CPUs, the nature of libraries has evolved. Whereas in the past client code would generally call out to fast pre-compiled libraries (SciPy, NumPy, etc.), now it often works via calls to a variety of distributed, out-of-core, and specialized compilation and computation backends (PyTorch, Dask, Numba, Ibis, etc.). This means a growing number of libraries do not eagerly execute a computation in the CPython interpreter, but instead optimize and translate it to some other target.
At a high level, we can see this ecosystem as a large decentralized, embedded, domain-specific compiler, translating from high-level user expressions to different low-level primitives. This calls for an exploration of tooling to help enable this translation of programs between different representations, to facilitate the efficient use of code across this distributed ecosystem.
One approach to automating this translation among different representations is the rewriting technique called βequality saturation.β This allows us to construct a data structure of equivalent programs (an βe-graphβ), and then search that space for a functionally-equivalent program that has desirable characteristics such as improved performance or memory efficiency. Building this translation tooling once can enhance sharing and collaboration between the libraries which use it.
In this talk, Saul Shanabrook goes over how e-graphs work, how they were developed, and different ways they can be used in the PyData ecosystem. Saul also surveys the egglog library, which is one specific tool for using e-graphs in Python.
Altair is a lovely tool that lets you build up complex interactive charts in Python. Ibis is also a lovely tool that lets you use a Pandas, like API to compose SQL expressions in OmniSci and other backends. By tying them together you can use the familiar syntax of Pandas, combined with the expressive power of Vega and Vega Lite, to visualize large amounts of data stored in OmniSci. This talk will walk through a number of examples of using this pipeline and then go through how it works.
December 8, 2019: metadsl: separating API from execution
metadsl is a Python framework for writing APIs that are detached from how they are executed. With it we can be framework agnostic definitions of concepts like "arrays" and compile them to backends like Tensorflow or LLVM. In this talk, we will use metadsl to build high performance scientific computing libraries.
November 4, 2019: Same API, Different Execution
Can the Python data science and scientific computing ecoystem remain in the hands of community open source projects? Or will increasingly complex performance and hardware requirements leave room only for vertically integrated corporate sponsored projects?
November 17, 2018: uarray - Efficient and Generic Array Computation
Efficient array computing is required to continue advances in fields like IoT and AI. We demonstrate a system, uarray, that does array computation generically and targets different backends. We rely on a Mathematics of Arrays, a theory of shapes and indexing, to reduce array expressions. As a result, temporary arrays and unneeded calculations are eliminated leading to minimal memory and CPU usage.