Skip to content

Commit

Permalink
refactor code and readme (#3)
Browse files Browse the repository at this point in the history
  • Loading branch information
yxlao committed Jul 18, 2023
1 parent 6e1a55d commit 721c0da
Show file tree
Hide file tree
Showing 28 changed files with 914 additions and 736 deletions.
15 changes: 15 additions & 0 deletions .github/workflows/formatter.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
name: Formatter CI

on:
push:
branches:
- main
pull_request:
types: [opened, reopened, synchronize]

jobs:
formatter:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: psf/black@stable
2 changes: 0 additions & 2 deletions .style.yapf

This file was deleted.

10 changes: 7 additions & 3 deletions .vscode/settings.json
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
{
"editor.formatOnSave": true,
"editor.formatOnSaveMode": "file",
"files.trimTrailingWhitespace": true
"editor.formatOnSave": true,
"editor.formatOnSaveMode": "file",
"files.trimTrailingWhitespace": true,
"python.formatting.provider": "none",
"[python]": {
"editor.defaultFormatter": "ms-python.black-formatter"
}
}
33 changes: 0 additions & 33 deletions CMakeLists.txt

This file was deleted.

299 changes: 186 additions & 113 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,124 +1,197 @@
# CamTools

Tools for handling pinhole camera parameters and plotting cameras.
Camtools: camera tools for computer vision. Useful for plotting, converting,
projecting, and ray casting with camera parameters.

## Install
<a href="https://github.com/yxlao/camtools/actions/workflows/formatter.yml">
<img src="https://github.com/yxlao/camtools/actions/workflows/formatter.yml/badge.svg" alt="Formatter">
</a>

```bash
# For development.
pip install -e .

# For install.
pip install .
```

<!-- ```bash
mkdir build && cd build
cmake ..
make pip-install -j$(nproc)
python -c "import camtools as ct; print(ct.__version__)"
``` -->

## Matrix conventions

```
K : (3, 3) # Camera intrinsic matrix.
# [[fx, s, cx],
# [ 0, fy, cy],
# [ 0, 0, 1]]
# x: goes from top-left to top-right.
# y: goes from top-left to bottom-left.
R : (3, 3) # Rotation matrix.
Rc : (3, 3) # Rc = R.T = R.inv().
t : (3,) # Translation.
T : (4, 4) # Extrinsic matrix with (0, 0, 0, 1) row below.
# T = [R | t
# 0 | 1]
# T projects world space coordinate to the camera space
# (a.k.a view space or eye space). The camera center in world
# space projected by T becomes [0, 0, 0, 1]^T, i.e. the camera
# space has its origin at the camera center.
# T @ [[|], = [[0],
# [C], [0],
# [|], [0],
# [1]] [1]]
P : (3, 4) # World-to-pixel projection matrix. P = K @ [R | t] = K @ T[:3, :].
W2P : (4, 4) # World-to-pixel projection matrix. It is P with (0, 0, 0, 1)
# row below. When using W2P @ point_homo, the last
# element is always 1, thus it is ignored.
pose : (4, 4) # Camera pose. pose = T.inv(). pose[:3, :3] = R.T = Rc. pose[:3, 3] = C.
C : (3,) # Camera center.
```

## Coordinate conventions

### 3D to 2D projection

Project 3D point `[X, Y, Z, 1]` to 2D `[x, y, 1]` pixel, e.g. with
`pixels = ct.project.points_to_pixel(points, K, T)`.

```python
# 0 -------> 1 (x)
# |
# |
# v (y)

cols = pixels[:, 0] # cols, width, x, top-left to top-right
rows = pixels[:, 1] # rows, height, y, top-left to bottom-left
cols = np.round(cols).astype(np.int32)
rows = np.round(rows).astype(np.int32)
cols[cols >= width] = width - 1
cols[cols < 0] = 0
rows[rows >= height] = height - 1
rows[rows < 0] = 0
```

It can be confusing to use `x, y, u, v`. Prefer `row` and `col`.


### UV coordinates

```python
# OpenGL convention:
# 1 (v)
# ^
# |
# |
# 0 -------> 1 (u)

# The following conversion accounts for pixel size
us = 1 / width * (0.5 + cols)
vs = 1 / height * (0.5 + (height - rows - 1))
```
## Installation

## Notes on vector vs. matrix

We choose to use 1D array for vector values like `t` and `C`. For example, `t`
is of shape `(3, )` instead of `(3, 1)`.

```python
# The `@` operator can be directly used to dot a matrix and a vector
# - If both arguments are 2-D they are multiplied like conventional matrices.
# - If either argument is N-D, N > 2, it is treated as a stack of matrices
# residing in the last two indexes and broadcast accordingly.
# - If the first argument is 1-D, it is promoted to a matrix by prepending a 1
# to its dimensions. After matrix multiplication the prepended 1 is removed.
# - If the second argument is 1-D, it is promoted to a matrix by appending a 1
# to its dimensions. After matrix multiplication the appended 1 is removed.

# t is (3, ) and it is promoted to be (3, 1).
C = - R.T @ t
```
```bash
# Option 1: install from pip.
pip install camtools

## Unit tests
# Option 2: install from git.
pip install git+https://github.com/yxlao/camtools.git

```bash
pytest . -s
pytest camtools -s
# Option 3: install from source.
git clone https://github.com/yxlao/camtools.git
cd camtools
pip install -e . # Dev mode, if you want to modify camtools.
pip install . # Install mode, if you want to use camtools only.
```

## What can you do with CamTools?

## TODO
1. Plot cameras. Useful for debugging 3D reconstruction and NeRFs!

- Full unit tests
- PyTorch/Numpy wrapper (e.g. with `eagerpy`)
```python
import camtools as ct
import open3d as o3d
cameras = ct.camera.create_camera_ray_frames(Ks, Ts)
o3d.visualization.draw_geometries([cameras])
```

<p align="center">
<img src="./camtools/assets/camera_frames.png" width="360" />
</p>

2. Convert camera parameters.

```python
pose = ct.convert.T_to_pose(T) # Convert T to pose
R, t = ct.convert.T_to_R_t(T) # Convert T to R and t
C = ct.convert.pose_to_C(pose) # Convert pose to camera center
K, T = ct.convert.P_to_K_T(P) # Decompose projection matrix to K and T
# And more...
```

3. Projection and ray casting.

```python
# Project 3D points to pixels.
pixels = ct.project.points_to_pixel(points, K, T)

# Back-project depth image ot 3D points.
points = ct.project.im_depth_to_points(depth, K, T)

# Ray cast a triangle mesh to depth image.
im_depth = ct.raycast.mesh_to_depths(mesh, Ks, Ts, height, width)

# And more...
```

4. Image I/O and depth I/O with no surprises.

```python
ct.io.imread()
ct.io.imwrite()

ct.io.imread_detph()
ct.io.imwrite_depth()
```

Strict type checks and range checks are enforced. These APIs are specifically
designed to solve the following pain points:

- Is my image `float32` or `uint8`?
- Does it has range `[0, 1]` or `[0, 255]`?
- Is it RGB or BGR?
- Do my image have alpha channel?
- When saving depth image as integer-based `.png`, is it correctly scaled?

5. Useful command-line tools (run in terminal).

```bash
# Crop image boarders.
ct crop-boarders *.png --pad_pixel 10 --skip_cropped --same_crop

# Draw synchronized bounding boxes interactively.
ct draw-bboxes path/to/a.png path/to/b.png

# For more help.
ct --help
```

<p align="center">
<img src="https://user-images.githubusercontent.com/1501945/241416210-e11ff3bf-22e6-46c0-8ba0-d177a0015323.png" width="400" />
</p>

6. And more.
- Solve line intersections.
- COLMAP tools.
- Points normalization.
- ...

## Camera conventions

<p align="center">
<img src="./camtools/assets/camera_coordinates.svg" width="360" />
</p>

We follow the standard pinhole camera model:

- **Camera coordinate:** right-handed, with $Z$ pointing away from the camera
towards the view direction and $Y$ axis pointing down. Note that this is
different from the Blender convention, where $Z$ points towards the opposite
view direction and the $Y$ axis points up.
- **Image coordinate:** starts from the top-left corner of the image, with $x$
pointing right (corresponding to the image width) and $y$ pointing down
(corresponding to the image height). This is also consistent with OpenCV, but
pay attention that the 0-th dimension in the image array is the height (i.e.,
$y$) and the 1-th dimension is the width (i.e., $x$). That is:
- $x$ <=> width <=> column <=> the 1-th dimension
- $y$ <=> height <=> row <=> the 0-th dimension
- `K`: `(3, 3)` camera intrinsic matrix.
```python
K = [[fx, s, cx],
[ 0, fy, cy],
[ 0, 0, 1]]
```
- `T` or `W2C`: `(4, 4)` camera extrinsic matrix.
```python
T = [[R | t = [[R_01, R_02, R_03, t_0],
0 | 1]] [R_11, R_12, R_13, t_1],
[R_21, R_22, R_23, t_2],
[ 0, 0, 0, 1]]
```
- `T` is also known as the world-to-camera `W2C` matrix, which transforms a
point in the world coordinate to the camera coordinate.
- `T`'s shape is `(4, 4)`, not `(3, 4)`.
- `T` must be invertible, where `np.linalg.inv(T) = pose`.
- The camera center `C` in world coordinate is projected to `[0, 0, 0, 1]` in
camera coordinate, i.e.,
```python
T @ C = np.array([0, 0, 0, 1]).T
```
- `R`: `(3, 3)` rotation matrix.
```python
R = T[:3, :3]
```
- `R` is a rotation matrix. It is an orthogonal matrix with determinant 1, as
rotations preserve volume and orientation.
- `R.T == np.linalg.inv(R)`
- `np.linalg.norm(R @ x) == np.linalg.norm(x)`, where `x` is a `(3, )` vector.
- `t`: `(3,)` translation vector.
```python
t = T[:3, 3]
```
- `t`'s shape is `(3,)`, not `(3, 1)`.
- `pose` or `C2W`: `(4, 4)` camera pose matrix. It is the inverse of `T`.
```python
pose = T.inv()
```
- `pose` is also known as the camera-to-world `C2W` matrix, which transforms a
point in the camera coordinate to the world coordinate.
- `pose` is the inverse of `T`, i.e., `pose == np.linalg.inv(T)`.
- `C`: camera center.
```python
C = pose[:3, 3]
```
- `C`'s shape is `(3,)`, not `(3, 1)`.
- `C` is the camera center in world coordinate. It is also the translation
vector of `pose`.
- `P`: `(3, 4)` the camera projection matrix.
- `P` is the world-to-pixel projection matrix, which projects a point in the
homogeneous world coordinate to the homogeneous pixel coordinate.
- `P` is the product of the intrinsic and extrinsic parameters.
```python
# P = K @ [R | t]
P = K @ np.hstack([R, t[:, None]])
```
- `P`'s shape is `(3, 4)`, not `(4, 4)`.
- It is possible to decompose `P` into intrinsic and extrinsic matrices by QR
decomposition.
- Don't confuse `P` with `pose`.
- For more details, please refer to the following blog posts:
[part 1](https://ksimek.github.io/2012/08/14/decompose/),
[part 2](https://ksimek.github.io/2012/08/22/extrinsic/),
and [part 3](https://ksimek.github.io/2013/08/13/intrinsic/).

## Future works

- Refined APIs.
- Full PyTorch/Numpy compatibility.
- Unit tests.
13 changes: 0 additions & 13 deletions apply_format.sh

This file was deleted.

5 changes: 4 additions & 1 deletion camtools/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,4 +13,7 @@
from . import sanity
from . import solver
from . import stat
from .version import __version__

import pkg_resources

__version__ = pkg_resources.get_distribution("camtools").version
Loading

0 comments on commit 721c0da

Please sign in to comment.