Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large amount of data export memory occupies a huge amount, which is far larger than the actual Excel size and can not recover memory. #406

Closed
robinsoon opened this issue May 20, 2019 · 2 comments

Comments

@robinsoon
Copy link

robinsoon commented May 20, 2019

Large amount of data export memory occupies a huge amount, which is far larger than the actual Excel size and can not recover memory.

The more the number of exported columns, the more memory is consumed and the memory cannot be reclaimed

The invocation method is:

                        scol, _ = excelize.ColumnNumberToName(colid)
			scolindex = fmt.Sprintf("%s%d", scol, rowcount+2)
			xlsx.SetCellValue(sheetname, scolindex, rowdata[colid])

The following is the result of my test:


1. When exporting only CSV file RAM takes 5.5MB, ( using encoding/csv )
2. Using Excel to export a row, 647 columns ,  RAM  23.7MB, and the actual Excel file size is 11KB.
3. Use Excel to export 10000 rows, 26 columns, RAM 127MB , 0.5 seconds, actual Excel file size is 916KB 
4. Using Excel to export large file row: 8000 column:647 RAM 1.9GB , 9 seconds. The actual size of Excel file is 14.3MB.
5. Using Excel to export large file row:10000 column:647 RAM 2.5GB ,12 seconds. The actual size of Excel file is 17.9MB.

After the file is saved, the memory is not released, only after the program exits.

导出列数越多占用内存剧增,不能回收内存的问题

以下是我测试的结果:


1. 当只导出csv文件 RAM占用5.5MB, (  encoding/csv )
2. 使用Excel导出1行,647列 RAM占用 23.7MB ,实际Excel文件大小 11KB
3. 使用Excel导出10000行,26列, RAM占用 127MB, 0.5秒,实际Excel文件大小 916KB
4. 使用Excel导出大文件 row: 8000 column:647 RAM占用 1.9GB 耗时 9秒 ,实际Excel文件大小 14.3MB
5. 使用Excel导出大文件 row:10000 column:647 RAM占用 2.5GB 耗时 12秒,实际Excel文件大小 17.9MB

文件保存后,内存并不释放, 只有在退出程序后才释放。
对象数组分配了太多内存给单元格,大数据量时甚至不需要样式,并不一定指定所有全部标签。

prepareSheetXML(xlsx, col, row)

type xlsxRow struct {
	Collapsed    bool    `xml:"collapsed,attr,omitempty"`
	CustomFormat bool    `xml:"customFormat,attr,omitempty"`
	CustomHeight bool    `xml:"customHeight,attr,omitempty"`
	Hidden       bool    `xml:"hidden,attr,omitempty"`
	Ht           float64 `xml:"ht,attr,omitempty"`
	OutlineLevel uint8   `xml:"outlineLevel,attr,omitempty"`
	Ph           bool    `xml:"ph,attr,omitempty"`
	R            int     `xml:"r,attr,omitempty"`
	S            int     `xml:"s,attr,omitempty"`
	Spans        string  `xml:"spans,attr,omitempty"`
	ThickBot     bool    `xml:"thickBot,attr,omitempty"`
	ThickTop     bool    `xml:"thickTop,attr,omitempty"`
	C            []xlsxC `xml:"c"`
}

Object arrays allocate too much memory to cells, and do not even need styles for large amounts of data, and do not necessarily specify all tags.

Processing Excel requires about 140 times memory usage per MB. The client uses small memory consumption and uses disk caching, which results in a dramatic performance degradation.

For example, writing a 50MB Excel file requires more than 7GB of memory, which is hard to imagine. Please give me back the memory.

@xuri
Copy link
Member

xuri commented May 20, 2019

Hi @robinsoon, thanks for your issue. Also reference #20, #29. We can continue to optimize and improve performance, I will consider adding a stream R/W or flush windows to decrease memory usages, and any suggestion is welcome

@xuri
Copy link
Member

xuri commented Feb 6, 2020

I have added stream writer for generating a new worksheet with huge amounts of data. That will be in the next release v2.1.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants