Skip to content

Vintage post: Project Cascadia Tech Stack Investigation

Mike Griese edited this page Nov 7, 2023 · 1 revision

Important

What follows here is notes from a tech stack investigation I did circa 2018, when we were first starting to look at building a new terminal for Windows. These are my unfiltered notes into what kind of tech stack we should use to build it.

What we didn't consider at the time (and later made this debate more simple) was that we could just re-use the entire console text buffer, renderer, and parser, all together. At the time this was written, those components were more tightly coupled with the rest of the console.

These notes are being shared because they might be interesting to someone. A curious artifact from a moment in time long since past. We did end up using a C++WinRT native application, using XAML Islands.

 

Raw Perf metrics:

Stack Idle, Empty Buffer Idle, full buffer Scrolling Worst-case
VT#(WPF+no renderer) 34MB 93.7MB Peaks at ~100, mean 95MB
VT#(WPF+GlyphRun) 35.9-38MB 84.6-95MB see above
VT# buffer+UWP unavailable
xtermjs+UWP 46.3MB 69-73.1MB 102-117, mean 110 MB
xtermjs+WPF 28(+29)=57MB[1] 37(+48.4)=86MB 33.4-34(+101)=135MB
Prototype benchmarks
WPF+DX+native Buffer [4] 22.4MB
VT#+optimal buffer [5] 17.1MB 19.6MB 29.2MB
Other benchmarks
conhost+gdi 6.8 MB 6.8 MB 6.8 MB 22.5 MB
C++winrt CoreWindow [3] 2.0MB
C++winrt Xaml app 11.5MB->9.8
CX UWP with nothing 10.0MB
C# UWP with nothing 10.8MB
UWP with webview 21.6MB
WPF with nothing 24.3MB->14.9 [2]
WPF with webview 25.2+16.8=42MB

[1]: A WPF WebView Control uses an out-of-proc web server for it's content. So there's a bunch of memory it's consuming external to it's process tree.

[2]: The empty WPF started at 24MB, then when coming back from lunch it was down to 14. Presumably garbage collector ran? Note that VT# with the glyphrun renderer sitting empty did not seem to repro this memory decrease.

[3]: A CoreWindow by itself doesn't give us any XAML, only a DX surface. So it's not really relevant, as we'd need to implement all of the Fluent features ourselves.

[4]: These prototypes used a fake char[] to emulate a hypothetical implementation. The assumtion was made that the buffer was (9000 lines * 80 cells per row * 11B per cell). This is a very rough estimate of what a real buffer would use - a more optimal solution would not need 11B per cell, nor 80 cells in every row (except worst case), though there will probably be additional overhead introducing helper containers and other std types.

[5]: This removes the existing VT# buffer implementation in favor of an "optimal" 9000x80x11 char buffer. This would be an effective lower bound on the footprint of the VT# implementation, if the buffer were implemented as just a block of bytes, without any data structures to abstract the implementation.

[6]: This is about the same as 5. It represents the worst-case screnario for the buffer, where every single row is totally filled with 80 cells with different attributes. This is the value that should be compared to the [4] entries, which represent the worst-case buffer scenario in c++.


VTSharp + GlyphRun

This is using our hackathon implementation with some refactoring. Not optimized, but works well enough.

Pros:

  • Can be used in WPF and UWPs
  • Work on renderer, parser might contribute back to conhost.
  • Buffer could be greatly optimized from current state.
    • optimal empty buffer is only 17MB (x2.5)
    • optimal full worst-case buffer is only 29MB (on par with conhost)
  • Entire project is in a single language (C#)
  • C# will be faster for long-term dev work, more external developer excitement

Cons:

  • Work might not contribute back to conhost.
  • VT adapter is incomplete (good enough for conpty, not enough for ssh)
  • Buffer, Core in general is incomplete
  • Initial buffer implementation leaves much to be desired
    • in WPF: 5x,14x increased memory over conhost (empty, full buffer)
    • in WPF: 5x,14x increased memory over conhost (empty, full buffer)

Questions

  • Could another render head potentially save us some memory?
    • presumably not, without any GlyphRuns the implementation as-is was already bigger than xtermjs
  • Would it be possible to use the DxEngine as a renderer for this option?
    • Would enable reusability of some inbox components
    • Might be a perf penalty to pinvoking for each Engine call (for each StartPaint, InvalidateRect, PaintBufferLine, etc)

xterm.js + WebView

WebView Impact

UWP

  • adding a webview adds 11MB
  • adding a webview with xterm.js adds 35MB (at an empty buffer)

WPF

  • adding a webview adds 17MB (+>=16.8MB in Desktop App Web Viewer)
  • adding a webview with xterm.js adds 32.7MB (at an empty buffer)
    • is the WPF doing scary out-of-proc magic?
      • YES. Each WPF Web View adds 16.3MB of out-of-proc commit in Desktop App Web Viewer.
      • UWP does not seem to have this OOP server.

UWP and WPF web views seem comparable in size.

Pros:

  • Existing developer community
  • Existing implementation, test coverage for core
  • Complete VT implementation
  • Contributions to xterm.js benefit many 3rd parties
  • Can be used in WPF and UWPs

Cons:

  • Work on xterm.js won't contribute back to conhost
  • Need to work in the js ecosystem
    • console devs have minimal experience in this area
    • Daniel Imms would help transition, own JS bits
  • in UWP: 7x,10x increased over conhost (empty, full buffer)
  • in WPF: 10x,12x increased memory over conhost (empty, full buffer)
  • Debugging JS to C# issues will be painful
  • No amount of optimization could we do to improve the webview footprint (roughly 20MB)
  • unknown c# to js throughput

Questions

  • How do I translate the window size into a buffer size? Resizing is always tricky, but now only JS knows how many characters fit in the window. We'll have to do some magic to figure that out I think.
  • What kind of overhead is there sending data from C# to JS? I haven't been able to measure this.

Native Buffer + DxRenderer

Pros:

  • Work will contribute back to inbox console
  • Console team is already familiar with c++
  • minimal memory footprint
    • (WPF,UWP) = (3x, 2x) over conhost
  • Can be used in WPF and UWPs with some hassle

Cons:

  • no existing implementation, need to start from scratch
    • Renderer and Parser are done, but buffer, adapter, connection, ux, uia, tests will need to be written
  • complicated renderer interop/ux layer
    • The "Core" component is now a DX render engine + the Terminal Core, and each UX layer has different ways of embedding that DX component. If you wanted to implement a C# renderer, you'd have to create a winrt wrapper around the Terminal
    • Each user input is a pinvoke into the Core's UX layer (not terrible)
  • WpfDxInterop is abandoned circa 2015. Support would have to come from us.

Questions

How do we effectively abstract DX across UWP and WPF?

  • WpfDxInterop uses a IDXGIResource in the Image's OnRender event
  • UWP uses the Composition APIs, or a SwapChainPanel or a SurfaceImageHost, which each have different ways of interacting with DX
    • The SwapChainPanel has a SwapChain. For perf reasons, we are limited to 4 swap chains per app. Could we theoretically have one SwapChain for multiple panes?

Conclusion

Is 20MB of overhead for the webview worth the kickstart we get on development?

Hypothetically we could improve the VT# buffer. At empty, with xterm.js we're already at (57,46)MB (wpf, uwp respectively).

How valuable is reusability of the existing components (DxRenderer, Parser) vs the speed of development and relative simplicity of a pure-managed solution?

Footnotes

11B "Optimal" buffer cell assumption

While working on these benchmarks, I used the following as the math on what an "optimal" buffer layout might be like. It consists of TerminalColor's that are 4B, and TextRuns that are 11B total. In the worst case, each cell in a row has different attributes, requiring one text run per cell.

4B for TerminalColor
00drIIII - isDefault, isRgb, ColorTableIndex
RRRRRRRR - Red
GGGGGGGG - Green
BBBBBBBB - Blue

11B for a TextRun
00000uib - isUnderlined, isItalic, isBold
TerminalColor - foreground
TerminalColor - background
LLLLLLLL - Length
LLLLLLLL - Length

worst case bound buffer [9000 * (80 * 11)] = 7734KB as it's buffer

Its possible that this could be further optimized - talking with Daniel Imms, he suggested that we'd have a separate map of index->attributes, where attributes is a (fg,bg,meta) struct, which means each run would only need an index into that map. That would mean the worst case isn't only just having every cell in a row different colors, but having every cell in the ENTIRE BUFFER different pairings of colors, which is a wildly less likely scenario.

However, I didn't really want to implement that a bunch of times, so I used one that's more similar to conhost's for benchmarking's sake.