Vintage post: Project Cascadia Tech Stack Investigation

Important

What follows here is notes from a tech stack investigation I did circa 2018, when we were first starting to look at building a new terminal for Windows. These are my unfiltered notes into what kind of tech stack we should use to build it.

What we didn't consider at the time (and later made this debate more simple) was that we could just re-use the entire console text buffer, renderer, and parser, all together. At the time this was written, those components were more tightly coupled with the rest of the console.

These notes are being shared because they might be interesting to someone. A curious artifact from a moment in time long since past. We did end up using a C++WinRT native application, using XAML Islands.

Raw Perf metrics:

Stack	Idle, Empty Buffer	Idle, full buffer	Scrolling	Worst-case
VT#(WPF+no renderer)	34MB	93.7MB	Peaks at ~100, mean 95MB
VT#(WPF+GlyphRun)	35.9-38MB	84.6-95MB	see above
VT# buffer+UWP	unavailable
xtermjs+UWP	46.3MB	69-73.1MB	102-117, mean 110 MB
xtermjs+WPF	28(+29)=57MB[1]	37(+48.4)=86MB	33.4-34(+101)=135MB
Prototype benchmarks
WPF+DX+native Buffer [4]				22.4MB
VT#+optimal buffer [5]	17.1MB	19.6MB		29.2MB
Other benchmarks
conhost+gdi	6.8 MB	6.8 MB	6.8 MB	22.5 MB
C++winrt CoreWindow [3]	2.0MB
C++winrt Xaml app	11.5MB->9.8
CX UWP with nothing	10.0MB
C# UWP with nothing	10.8MB
UWP with webview	21.6MB
WPF with nothing	24.3MB->14.9 [2]
WPF with webview	25.2+16.8=42MB

[1]: A WPF WebView Control uses an out-of-proc web server for it's content. So there's a bunch of memory it's consuming external to it's process tree.

[2]: The empty WPF started at 24MB, then when coming back from lunch it was down to 14. Presumably garbage collector ran? Note that VT# with the glyphrun renderer sitting empty did not seem to repro this memory decrease.

[3]: A CoreWindow by itself doesn't give us any XAML, only a DX surface. So it's not really relevant, as we'd need to implement all of the Fluent features ourselves.

[4]: These prototypes used a fake char[] to emulate a hypothetical implementation. The assumtion was made that the buffer was (9000 lines * 80 cells per row * 11B per cell). This is a very rough estimate of what a real buffer would use - a more optimal solution would not need 11B per cell, nor 80 cells in every row (except worst case), though there will probably be additional overhead introducing helper containers and other std types.

[5]: This removes the existing VT# buffer implementation in favor of an "optimal" 9000x80x11 char buffer. This would be an effective lower bound on the footprint of the VT# implementation, if the buffer were implemented as just a block of bytes, without any data structures to abstract the implementation.

[6]: This is about the same as 5. It represents the worst-case screnario for the buffer, where every single row is totally filled with 80 cells with different attributes. This is the value that should be compared to the [4] entries, which represent the worst-case buffer scenario in c++.

VTSharp + GlyphRun

This is using our hackathon implementation with some refactoring. Not optimized, but works well enough.

Pros:

Can be used in WPF and UWPs
Work on renderer, parser might contribute back to conhost.
Buffer could be greatly optimized from current state.
- optimal empty buffer is only 17MB (x2.5)
- optimal full worst-case buffer is only 29MB (on par with conhost)
Entire project is in a single language (C#)
C# will be faster for long-term dev work, more external developer excitement

Cons:

Work might not contribute back to conhost.
VT adapter is incomplete (good enough for conpty, not enough for ssh)
Buffer, Core in general is incomplete
Initial buffer implementation leaves much to be desired
- in WPF: 5x,14x increased memory over conhost (empty, full buffer)
- in WPF: 5x,14x increased memory over conhost (empty, full buffer)

Questions

Could another render head potentially save us some memory?
- presumably not, without any GlyphRuns the implementation as-is was already bigger than xtermjs
Would it be possible to use the DxEngine as a renderer for this option?
- Would enable reusability of some inbox components
- Might be a perf penalty to pinvoking for each Engine call (for each StartPaint, InvalidateRect, PaintBufferLine, etc)

xterm.js + WebView

WebView Impact

UWP

adding a webview adds 11MB
adding a webview with xterm.js adds 35MB (at an empty buffer)

WPF

adding a webview adds 17MB (+>=16.8MB in Desktop App Web Viewer)
adding a webview with xterm.js adds 32.7MB (at an empty buffer)
- is the WPF doing scary out-of-proc magic?
  - YES. Each WPF Web View adds 16.3MB of out-of-proc commit in Desktop App Web Viewer.
  - UWP does not seem to have this OOP server.

UWP and WPF web views seem comparable in size.

Pros:

Existing developer community
Existing implementation, test coverage for core
Complete VT implementation
Contributions to xterm.js benefit many 3rd parties
Can be used in WPF and UWPs

Cons:

Work on xterm.js won't contribute back to conhost
Need to work in the js ecosystem
- console devs have minimal experience in this area
- Daniel Imms would help transition, own JS bits
in UWP: 7x,10x increased over conhost (empty, full buffer)
in WPF: 10x,12x increased memory over conhost (empty, full buffer)
Debugging JS to C# issues will be painful
No amount of optimization could we do to improve the webview footprint (roughly 20MB)
unknown c# to js throughput

Questions

How do I translate the window size into a buffer size? Resizing is always tricky, but now only JS knows how many characters fit in the window. We'll have to do some magic to figure that out I think.

What kind of overhead is there sending data from C# to JS? I haven't been able to measure this.

Native Buffer + DxRenderer

Pros:

Work will contribute back to inbox console
Console team is already familiar with c++
minimal memory footprint
- (WPF,UWP) = (3x, 2x) over conhost
Can be used in WPF and UWPs with some hassle

Cons:

no existing implementation, need to start from scratch
- Renderer and Parser are done, but buffer, adapter, connection, ux, uia, tests will need to be written
complicated renderer interop/ux layer
- The "Core" component is now a DX render engine + the Terminal Core, and each UX layer has different ways of embedding that DX component. If you wanted to implement a C# renderer, you'd have to create a winrt wrapper around the Terminal
- Each user input is a pinvoke into the Core's UX layer (not terrible)
WpfDxInterop is abandoned circa 2015. Support would have to come from us.

Questions

How do we effectively abstract DX across UWP and WPF?

WpfDxInterop uses a IDXGIResource in the Image's OnRender event
UWP uses the Composition APIs, or a SwapChainPanel or a SurfaceImageHost, which each have different ways of interacting with DX
- The SwapChainPanel has a SwapChain. For perf reasons, we are limited to 4 swap chains per app. Could we theoretically have one SwapChain for multiple panes?

Conclusion

Is 20MB of overhead for the webview worth the kickstart we get on development?

Hypothetically we could improve the VT# buffer. At empty, with xterm.js we're already at (57,46)MB (wpf, uwp respectively).

How valuable is reusability of the existing components (DxRenderer, Parser) vs the speed of development and relative simplicity of a pure-managed solution?

Footnotes

11B "Optimal" buffer cell assumption

While working on these benchmarks, I used the following as the math on what an "optimal" buffer layout might be like. It consists of TerminalColor's that are 4B, and TextRuns that are 11B total. In the worst case, each cell in a row has different attributes, requiring one text run per cell.

4B for TerminalColor
00drIIII - isDefault, isRgb, ColorTableIndex
RRRRRRRR - Red
GGGGGGGG - Green
BBBBBBBB - Blue

11B for a TextRun
00000uib - isUnderlined, isItalic, isBold
TerminalColor - foreground
TerminalColor - background
LLLLLLLL - Length
LLLLLLLL - Length

worst case bound buffer [9000 * (80 * 11)] = 7734KB as it's buffer

Its possible that this could be further optimized - talking with Daniel Imms, he suggested that we'd have a separate map of index->attributes, where attributes is a (fg,bg,meta) struct, which means each run would only need an index into that map. That would mean the worst case isn't only just having every cell in a row different colors, but having every cell in the ENTIRE BUFFER different pairings of colors, which is a wildly less likely scenario.

However, I didn't really want to implement that a bunch of times, so I used one that's more similar to conhost's for benchmarking's sake.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vintage post: Project Cascadia Tech Stack Investigation

Raw Perf metrics:

VTSharp + GlyphRun

Pros:

Cons:

Questions

xterm.js + WebView

WebView Impact

UWP

WPF

Pros:

Cons:

Questions

Native Buffer + DxRenderer

Pros:

Cons:

Questions

Conclusion

Footnotes

11B "Optimal" buffer cell assumption

Clone this wiki locally