refactor code and readme (#3)

yxlao · Jul 18, 2023 · 721c0da · 721c0da
1 parent 6e1a55d
commit 721c0da
Show file tree

Hide file tree

Showing 28 changed files with 914 additions and 736 deletions.
diff --git a/.github/workflows/formatter.yml b/.github/workflows/formatter.yml
@@ -0,0 +1,15 @@
+name: Formatter CI
+
+on:
+ push:
+ branches:
+ - main
+ pull_request:
+ types: [opened, reopened, synchronize]
+
+jobs:
+ formatter:
+ runs-on: ubuntu-latest
+ steps:
+ - uses: actions/checkout@v3
+ - uses: psf/black@stable
diff --git a/.style.yapf b/.style.yapf
diff --git a/.vscode/settings.json b/.vscode/settings.json
@@ -1,5 +1,9 @@
 {
- "editor.formatOnSave": true,
- "editor.formatOnSaveMode": "file",
- "files.trimTrailingWhitespace": true
+ "editor.formatOnSave": true,
+ "editor.formatOnSaveMode": "file",
+ "files.trimTrailingWhitespace": true,
+ "python.formatting.provider": "none",
+ "[python]": {
+ "editor.defaultFormatter": "ms-python.black-formatter"
+ }
 }
diff --git a/CMakeLists.txt b/CMakeLists.txt
diff --git a/README.md b/README.md
@@ -1,124 +1,197 @@
 # CamTools
 
-Tools for handling pinhole camera parameters and plotting cameras.
+Camtools: camera tools for computer vision. Useful for plotting, converting,
+projecting, and ray casting with camera parameters.
 
-## Install
+<a href="https://github.com/yxlao/camtools/actions/workflows/formatter.yml">
+<img src="https://github.com/yxlao/camtools/actions/workflows/formatter.yml/badge.svg" alt="Formatter">
+</a>
 
-```bash
-# For development.
-pip install -e .
-
-# For install.
-pip install .
-```
-
-<!-- ```bash
-mkdir build && cd build
-cmake ..
-make pip-install -j$(nproc)
-python -c "import camtools as ct; print(ct.__version__)"
-``` -->
-
-## Matrix conventions
-
-```
-K : (3, 3) # Camera intrinsic matrix.
- # [[fx, s, cx],
- # [ 0, fy, cy],
- # [ 0, 0, 1]]
- # x: goes from top-left to top-right.
- # y: goes from top-left to bottom-left.
-R : (3, 3) # Rotation matrix.
-Rc : (3, 3) # Rc = R.T = R.inv().
-t : (3,) # Translation.
-T : (4, 4) # Extrinsic matrix with (0, 0, 0, 1) row below.
- # T = [R | t
- # 0 | 1]
- # T projects world space coordinate to the camera space
- # (a.k.a view space or eye space). The camera center in world
- # space projected by T becomes [0, 0, 0, 1]^T, i.e. the camera
- # space has its origin at the camera center.
- # T @ [[|], = [[0],
- # [C], [0],
- # [|], [0],
- # [1]] [1]]
-P : (3, 4) # World-to-pixel projection matrix. P = K @ [R | t] = K @ T[:3, :].
-W2P : (4, 4) # World-to-pixel projection matrix. It is P with (0, 0, 0, 1)
- # row below. When using W2P @ point_homo, the last
- # element is always 1, thus it is ignored.
-pose : (4, 4) # Camera pose. pose = T.inv(). pose[:3, :3] = R.T = Rc. pose[:3, 3] = C.
-C : (3,) # Camera center.
-```
-
-## Coordinate conventions
-
-### 3D to 2D projection
-
-Project 3D point `[X, Y, Z, 1]` to 2D `[x, y, 1]` pixel, e.g. with
-`pixels = ct.project.points_to_pixel(points, K, T)`.
-
-```python
-# 0 -------> 1 (x)
-# |
-# |
-# v (y)
-
-cols = pixels[:, 0] # cols, width, x, top-left to top-right
-rows = pixels[:, 1] # rows, height, y, top-left to bottom-left
-cols = np.round(cols).astype(np.int32)
-rows = np.round(rows).astype(np.int32)
-cols[cols >= width] = width - 1
-cols[cols < 0] = 0
-rows[rows >= height] = height - 1
-rows[rows < 0] = 0
-```
-
-It can be confusing to use `x, y, u, v`. Prefer `row` and `col`.
-
-
-### UV coordinates
-
-```python
-# OpenGL convention:
-# 1 (v)
-# ^
-# |
-# |
-# 0 -------> 1 (u)
-
-# The following conversion accounts for pixel size
-us = 1 / width * (0.5 + cols)
-vs = 1 / height * (0.5 + (height - rows - 1))
-```
+## Installation
 
-## Notes on vector vs. matrix
-
-We choose to use 1D array for vector values like `t` and `C`. For example, `t`
-is of shape `(3, )` instead of `(3, 1)`.
-
-```python
-# The `@` operator can be directly used to dot a matrix and a vector
-# - If both arguments are 2-D they are multiplied like conventional matrices.
-# - If either argument is N-D, N > 2, it is treated as a stack of matrices
-# residing in the last two indexes and broadcast accordingly.
-# - If the first argument is 1-D, it is promoted to a matrix by prepending a 1
-# to its dimensions. After matrix multiplication the prepended 1 is removed.
-# - If the second argument is 1-D, it is promoted to a matrix by appending a 1
-# to its dimensions. After matrix multiplication the appended 1 is removed.
-
-# t is (3, ) and it is promoted to be (3, 1).
-C = - R.T @ t
-```
+```bash
+# Option 1: install from pip.
+pip install camtools
 
-## Unit tests
+# Option 2: install from git.
+pip install git+https://github.com/yxlao/camtools.git
 
-```bash
-pytest . -s
-pytest camtools -s
+# Option 3: install from source.
+git clone https://github.com/yxlao/camtools.git
+cd camtools
+pip install -e . # Dev mode, if you want to modify camtools.
+pip install . # Install mode, if you want to use camtools only.
 ```
 
+## What can you do with CamTools?
 
-## TODO
+1. Plot cameras. Useful for debugging 3D reconstruction and NeRFs!
 
-- Full unit tests
-- PyTorch/Numpy wrapper (e.g. with `eagerpy`)
+ ```python
+ import camtools as ct
+ import open3d as o3d
+ cameras = ct.camera.create_camera_ray_frames(Ks, Ts)
+ o3d.visualization.draw_geometries([cameras])
+ ```
+
+ <p align="center">
+ <img src="./camtools/assets/camera_frames.png" width="360" />
+ </p>
+
+2. Convert camera parameters.
+
+ ```python
+ pose = ct.convert.T_to_pose(T) # Convert T to pose
+ R, t = ct.convert.T_to_R_t(T) # Convert T to R and t
+ C = ct.convert.pose_to_C(pose) # Convert pose to camera center
+ K, T = ct.convert.P_to_K_T(P) # Decompose projection matrix to K and T
+ # And more...
+ ```
+
+3. Projection and ray casting.
+
+ ```python
+ # Project 3D points to pixels.
+ pixels = ct.project.points_to_pixel(points, K, T)
+
+ # Back-project depth image ot 3D points.
+ points = ct.project.im_depth_to_points(depth, K, T)
+
+ # Ray cast a triangle mesh to depth image.
+ im_depth = ct.raycast.mesh_to_depths(mesh, Ks, Ts, height, width)
+
+ # And more...
+ ```
+
+4. Image I/O and depth I/O with no surprises.
+
+ ```python
+ ct.io.imread()
+ ct.io.imwrite()
+
+ ct.io.imread_detph()
+ ct.io.imwrite_depth()
+ ```
+
+ Strict type checks and range checks are enforced. These APIs are specifically
+ designed to solve the following pain points:
+
+ - Is my image `float32` or `uint8`?
+ - Does it has range `[0, 1]` or `[0, 255]`?
+ - Is it RGB or BGR?
+ - Do my image have alpha channel?
+ - When saving depth image as integer-based `.png`, is it correctly scaled?
+
+5. Useful command-line tools (run in terminal).
+
+ ```bash
+ # Crop image boarders.
+ ct crop-boarders *.png --pad_pixel 10 --skip_cropped --same_crop
+
+ # Draw synchronized bounding boxes interactively.
+ ct draw-bboxes path/to/a.png path/to/b.png
+
+ # For more help.
+ ct --help
+ ```
+
+ <p align="center">
+ <img src="https://user-images.githubusercontent.com/1501945/241416210-e11ff3bf-22e6-46c0-8ba0-d177a0015323.png" width="400" />
+ </p>
+
+6. And more.
+ - Solve line intersections.
+ - COLMAP tools.
+ - Points normalization.
+ - ...
+
+## Camera conventions
+
+<p align="center">
+ <img src="./camtools/assets/camera_coordinates.svg" width="360" />
+</p>
+
+We follow the standard pinhole camera model:
+
+- **Camera coordinate:** right-handed, with $Z$ pointing away from the camera
+ towards the view direction and $Y$ axis pointing down. Note that this is
+ different from the Blender convention, where $Z$ points towards the opposite
+ view direction and the $Y$ axis points up.
+- **Image coordinate:** starts from the top-left corner of the image, with $x$
+ pointing right (corresponding to the image width) and $y$ pointing down
+ (corresponding to the image height). This is also consistent with OpenCV, but
+ pay attention that the 0-th dimension in the image array is the height (i.e.,
+ $y$) and the 1-th dimension is the width (i.e., $x$). That is:
+ - $x$ <=> width <=> column <=> the 1-th dimension
+ - $y$ <=> height <=> row <=> the 0-th dimension
+- `K`: `(3, 3)` camera intrinsic matrix.
+ ```python
+ K = [[fx, s, cx],
+ [ 0, fy, cy],
+ [ 0, 0, 1]]
+ ```
+- `T` or `W2C`: `(4, 4)` camera extrinsic matrix.
+ ```python
+ T = [[R | t = [[R_01, R_02, R_03, t_0],
+ 0 | 1]] [R_11, R_12, R_13, t_1],
+ [R_21, R_22, R_23, t_2],
+ [ 0, 0, 0, 1]]
+ ```
+ - `T` is also known as the world-to-camera `W2C` matrix, which transforms a
+ point in the world coordinate to the camera coordinate.
+ - `T`'s shape is `(4, 4)`, not `(3, 4)`.
+ - `T` must be invertible, where `np.linalg.inv(T) = pose`.
+ - The camera center `C` in world coordinate is projected to `[0, 0, 0, 1]` in
+ camera coordinate, i.e.,
+ ```python
+ T @ C = np.array([0, 0, 0, 1]).T
+ ```
+- `R`: `(3, 3)` rotation matrix.
+ ```python
+ R = T[:3, :3]
+ ```
+ - `R` is a rotation matrix. It is an orthogonal matrix with determinant 1, as
+ rotations preserve volume and orientation.
+ - `R.T == np.linalg.inv(R)`
+ - `np.linalg.norm(R @ x) == np.linalg.norm(x)`, where `x` is a `(3, )` vector.
+- `t`: `(3,)` translation vector.
+ ```python
+ t = T[:3, 3]
+ ```
+ - `t`'s shape is `(3,)`, not `(3, 1)`.
+- `pose` or `C2W`: `(4, 4)` camera pose matrix. It is the inverse of `T`.
+ ```python
+ pose = T.inv()
+ ```
+ - `pose` is also known as the camera-to-world `C2W` matrix, which transforms a
+ point in the camera coordinate to the world coordinate.
+ - `pose` is the inverse of `T`, i.e., `pose == np.linalg.inv(T)`.
+- `C`: camera center.
+ ```python
+ C = pose[:3, 3]
+ ```
+ - `C`'s shape is `(3,)`, not `(3, 1)`.
+ - `C` is the camera center in world coordinate. It is also the translation
+ vector of `pose`.
+- `P`: `(3, 4)` the camera projection matrix.
+ - `P` is the world-to-pixel projection matrix, which projects a point in the
+ homogeneous world coordinate to the homogeneous pixel coordinate.
+ - `P` is the product of the intrinsic and extrinsic parameters.
+ ```python
+ # P = K @ [R | t]
+ P = K @ np.hstack([R, t[:, None]])
+ ```
+ - `P`'s shape is `(3, 4)`, not `(4, 4)`.
+ - It is possible to decompose `P` into intrinsic and extrinsic matrices by QR
+ decomposition.
+ - Don't confuse `P` with `pose`.
+- For more details, please refer to the following blog posts:
+ [part 1](https://ksimek.github.io/2012/08/14/decompose/),
+ [part 2](https://ksimek.github.io/2012/08/22/extrinsic/),
+ and [part 3](https://ksimek.github.io/2013/08/13/intrinsic/).
+
+## Future works
+
+- Refined APIs.
+- Full PyTorch/Numpy compatibility.
+- Unit tests.
diff --git a/apply_format.sh b/apply_format.sh
diff --git a/camtools/__init__.py b/camtools/__init__.py
@@ -13,4 +13,7 @@
 from . import sanity
 from . import solver
 from . import stat
-from .version import __version__
+
+import pkg_resources
+
+__version__ = pkg_resources.get_distribution("camtools").version