Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance problem with rich text /xl/sharedStrings.xml deduplication #978

Open
Arnie97 opened this issue Aug 2, 2021 · 1 comment
Open
Labels
enhancement New feature or request

Comments

@Arnie97
Copy link
Contributor

Arnie97 commented Aug 2, 2021

Description

The dedupl operation in SetCellRichText() is quite CPU intensive when exporting a bunch of cells together (at least O(n²s), where n is the number of cells and s is the size of rich text in each cell):

func (f *File) SetCellRichText(sheet, cell string, runs []RichTextRun) error {
	...
	for idx, strItem := range sst.SI {  // O(n)
		if reflect.DeepEqual(strItem, si) {  // > O(s)
			cellData.T, cellData.V = "s", strconv.Itoa(idx)
			return err
		}
	}
	...
}

func main() {
	cells := ...
	f := excelize.NewFile()
	sheetName := f.GetSheetName(0)
	for index, cellValue := range cells {  // O(n)
		cellName := excelize.CoordinatesToCellName(1, index)
		if err := f.SetCellRichText(sheetName, cellName, cellValue); err != nil {
			...
		}
	}
}

Steps to reproduce the issue:

Set a number of cells with SetCellRichText().

Describe the results you received:

The reflect.DeepEqual() loop caused high CPU usage. Please see the pprof result for details.

Describe the results you expected:

The CPU usage should be comparable to SetCellStr(), SetCellDefault() etc.

Possible solutions:

  • Provide an option to switch off the reflect.DeepEqual() comparisons, or
  • Marshal the rich text structure to utilize a hash map or bloom filter instead of brute-force loops to reduce the time complexity to O(n·s). This is my suggested approach, but the process of serialization and hashing the (possibly large) serialized result would both introduce some noticeable overheads.
  • Get rid of SetCellRichText() and stick to the good old SetCellStr() instead where performance matters. In my own case I only need correct line breaks (Preserve whitespace in cells #976) and do not need fancy rich text.

Output of go version:

go version go1.17rc1 linux/amd64

Excelize version or commit ID:

v2.4.0 (d42834f)

@xuri
Copy link
Member

xuri commented Aug 19, 2021

Thanks for your feedback, I'll consider optimizing for that, but maybe it's taken a while to respond.

@xuri xuri added the enhancement New feature or request label Aug 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants