Useful when two sources are available and you would like to combine them in curtain ways, which would only become possible once they are perfectly aligned. For example doing a color transfer, replacing a logo/hardsubs, creating a paired dataset, combining high resolution Bluray chroma with better DVD luma, or similar.
- pip install numpy
- pytorch
- julek-plugin (optional, only for temporal alignment precision=2)
- pip install pyiqa (optional, only for temporal alignment precision=3)
- tivtc (optional, only for temporal alignment with different frame rates)
Put the entire "vs_align" folder into your scripts folder, or where you typically load scripts from.
Aligns the content of a frame to a reference frame using a modified Rife AI model. Frames should have no black borders before using. Output clip will have the same dimensions as reference clip. Resize reference clip to get desired output scale. Examples: https://slow.pics/c/rqeq3D97
import vs_align
clip = vs_align.spatial(clip, ref, precision=3, iterations=1, blur_strength=0, device="cuda")
clip
Misaligned clip. Must be in RGBS format.
ref
Reference clip that misaligned clip will be aligned to. Must be in RGBS format.
precision
1, 2, 3, 4, or 5. Higher values will internally align at higher resolutions to increase precision. Each step up doubles the internal resolution, which will in turn increase processing time and VRAM usage. Lower values are less precise, but can correct larger misalignments. For problematic cases it can be helpful to chain multiple alignment calls with increasing precision.
3 works great in most cases.
iterations
(optional)
Runs the alignment multiple times to dial it in even further. With more than around 5 passes, artifacts can appear.
blur_strength
(optional)
Blur is only used internally and will not be visible on the output. It can help to ignore small details in the alignment process (like compression, noise or halos) and focus more on the general shapes. If lines on the output get thinner or thicker, try to increase blur a little. It will reduce accuracy, so try to keep it as low as possible. Good values are 0-10. The best alignment will be at Blur 0.
device
(optional)
Possible values are "cuda" to use with an Nvidia GPU, or "cpu". This will be very slow on CPU.
Syncs two clips timewise by searching through one clip and selecting the frame that most closely matches the reference clip frame. It is recommended trying to minimize the difference between the two clips by preprocessing. For example removing black borders, cropping to the overlapping region, rough color matching, dehaloing. The closer the clips look to each other, the better the temporal alignment will be. Adapted from decimatch by po5.
import vs_align
clip = vs_align.temporal(clip, ref, clip2, tr=20, precision=1, fallback, thresh=40, device="cuda", debug=False)
clip
Misaligned clip. Must be same format and dimensions as ref.
ref
Reference clip that misaligned clip will be aligned to. Must be same format and dimensions as clip.
clip2
(optional)
Clip and ref will be used for the calculations, but the actual output frame is then copied from clip2 if set. This is useful if you would like to do preprocessing on clip and ref (like downsizing to increase speed), but would like the ouput frame to be unaltered.
tr
Temporal radius. How many frames it will search forward and back to find a match.
precision
Value | Precision | Speed | Usecase | Method |
---|---|---|---|---|
1 | worst | very fast | when clips are basically identical besides the temporal misalignment | PlaneStats |
2 | better | slow | more robust to differences between clips | Butteraugli |
3 | best | very slow | extremely accurate with large differences and spatial misalignments between clips | TOPIQ |
fallback
(optional)
Optional fallback clip in case no frame below thresh can be found. Must have the same format and dimensions as clip (or clip2 if it is set).
thresh
(optional)
Threshold for fallback clip. If frame difference is higher than this value, fallback clip is used. Use "debug=True" to get an idea for the values.
Does nothing if no fallback clip is set.
device
(optional)
Possible values are "cuda" to use with an Nvidia GPU, or "cpu".
Only has an effect with "precision=3", which will be very slow on CPU.
debug
(optional)
Overlays computed difference values for all surrounding frames and the best match directly onto the frame.
clip_num
, clip_den
, ref_num
, ref_den
(optional)
Resamples clip to match ref's frame rate. Numerator and Denominator for clip and ref (clip2 uses the same as clip). Set this only if clip and ref have different frame rates (e.g., 29.97fps and 23.976fps), as it will double processing time. Requires all input clips to be in YUV8..16 format.
To avoid removal of the wrong frames during resampling, frames are doubled, resampled, aligned, then halved again.
Example: clip_num=30000, clip_den=1001, ref_num=24000, ref_den=1001
- Enums are available in vs_align/enums.py if needed.
- For problematic cases of spatial misalignment, it can be helpful to chain multiple alignment calls with increasing precision.
- Temporal Alignment precision=3 may need a little time on the first run, as the model needs to download first.
- Temporal Alignment precision=2 and 3 are at half or quarter resolution still better than precision 1.