Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High memory when writing 1million number of rows #876

Open
viv2793 opened this issue Jul 12, 2021 · 8 comments
Open

High memory when writing 1million number of rows #876

viv2793 opened this issue Jul 12, 2021 · 8 comments

Comments

@viv2793
Copy link

viv2793 commented Jul 12, 2021

Hello,

I am facing an issue. I am trying to write large amount of data(approx 1million rows) into the excel and saving the excel at the end after writing all the lines. But the memory usage is very high in the case. It is taking more than 1.5GB of memory. I assume as each row stays in memory and all the data is dumped into the file at the end is causing the issue.
Do we have any method so that we append rows in excel files without taking so much of memory ?

@xuri
Copy link
Member

xuri commented Jul 13, 2021

Please using the StreamWriter for generating a worksheet with huge amounts of data. Note that the streaming API doesn't support append to the existing worksheets currently.

@viv2793
Copy link
Author

viv2793 commented Jul 13, 2021

Thanks @xuri for suggesting it but still the memory usage is on higher side.
I believe this issue can be resolved only if have some append method for writing it in worksheets. I'll probably investigate more about it.

@rakesh-wrx
Copy link

@viv2793 @xuri I am facing the same issue. When I am trying to write a large amount of data i.e more than a million records, it is taking more than 5 hrs to process. I am not sure about memory usage. Is there any workaround for it? @viv2793 how much time is taking for you to write million of records?

@xuri
Copy link
Member

xuri commented Sep 9, 2021

Hi @rakesh-wrx, thanks for your feedback, could you provide any code and attachments to reproduce this issue?

@mzimmerman
Copy link

Writing 17,000 rows takes about 7 minutes. In my case it's not a high memory thing (it can use more if it'd be faster), but the performance is really slow. I'm trying to give my users an option of having an XLSX file instead of CSV, so I'm used to writing [][]string.

I can write the same CSV file in 7 seconds what takes 7 minutes with XLSX. Are there things I can do on the data-generation side before passing into the library that will make it more performant?

func writeLineForOutput(line []string, excelLineNum int, lineInterface []interface{}, excelWriter *excelize.StreamWriter) error {
lineInterface = lineInterface[:0]
for x := range line {
lineInterface = append(lineInterface, line[x])
}
cell, _ := excelize.CoordinatesToCellName(1, excelLineNum)
return excelWriter.SetRow(cell, lineInterface)
}

@xuri
Copy link
Member

xuri commented Jan 30, 2022

Hi @mzimmerman, thanks for your feedback, could you provide more details about your environment information, how many columns each row, which version of the Go language and excelize library you are using, what hardware and OS info? Please also reference the performance benchmark report.

@mzimmerman
Copy link

mzimmerman commented Jan 31, 2022

Looks like there's a lot of garbage created/used in referencing the cell addresses... e.g., A10 vs ZZ56 -- it looks like you're converting to/from that a few times depending on where it's needed. Also a lot done in XML serialization.

git clone https://github.com/mzimmerman/excelizetest

go test -bench=. -cpuprofile cpu.out -memprofile mem.out
goos: linux
goarch: amd64
pkg: github.com/mzimmerman/excelizetest
cpu: Intel(R) Xeon(R) Platinum 8160 CPU @ 2.10GHz
BenchmarkExcelize10x10-192 610 1882966 ns/op
BenchmarkExcelize100x100-192 82 13992091 ns/op
BenchmarkExcelize1000x1000-192 1 1179630242 ns/op
BenchmarkExcelize1000x10-192 78 14885501 ns/op
BenchmarkExcelize10000x10-192 8 138420402 ns/op
BenchmarkExcelize100000x10-192 1 1386176914 ns/op
BenchmarkExcelize100000x100-192 1 13480967919 ns/op
BenchmarkExcelize10000x1000-192 1 13490364172 ns/op
PASS
ok github.com/mzimmerman/excelizetest 36.209s

pprof excelizetest.test cpu.out

Total: 37.44s
ROUTINE ======================== github.com/xuri/excelize/v2.(*StreamWriter).SetRow in /home/mzimmerman/go/pkg/mod/github.com/xuri/excelize/[email protected]/stream.go
     280ms     16.45s (flat, cum) 43.94% of Total
         .          .    308:https:// 'Flush' method to end the streaming writing process.
         .          .    309:https://
         .          .    310:https:// As a special case, if Cell is used as a value, then the Cell.StyleID will be
         .          .    311:https:// applied to that cell.
         .          .    312:func (sw *StreamWriter) SetRow(axis string, values []interface{}, opts ...RowOpts) error {
         .       30ms    313:   col, row, err := CellNameToCoordinates(axis)
         .          .    314:   if err != nil {
         .          .    315:           return err
         .          .    316:   }
         .          .    317:   if !sw.sheetWritten {
         .          .    318:           if len(sw.cols) > 0 {
         .          .    319:                   _, _ = sw.rawData.WriteString("<cols>" + sw.cols + "</cols>")
         .          .    320:           }
         .          .    321:           _, _ = sw.rawData.WriteString(`<sheetData>`)
         .          .    322:           sw.sheetWritten = true
         .          .    323:   }
         .          .    324:   attrs, err := marshalRowAttrs(opts...)
         .          .    325:   if err != nil {
         .          .    326:           return err
         .          .    327:   }
         .       60ms    328:   fmt.Fprintf(&sw.rawData, `<row r="%d"%s>`, row, attrs)
      90ms       90ms    329:   for i, val := range values {
      40ms      4.41s    330:           axis, err := CoordinatesToCellName(col+i, row)
         .          .    331:           if err != nil {
         .          .    332:                   return err
         .          .    333:           }
      20ms       80ms    334:           c := xlsxC{R: axis}
      20ms       20ms    335:           if v, ok := val.(Cell); ok {
         .          .    336:                   c.S = v.StyleID
         .          .    337:                   val = v.Value
         .          .    338:                   setCellFormula(&c, v.Formula)
      50ms       50ms    339:           } else if v, ok := val.(*Cell); ok && v != nil {
         .          .    340:                   c.S = v.StyleID
         .          .    341:                   val = v.Value
         .          .    342:                   setCellFormula(&c, v.Formula)
         .          .    343:           }
      20ms      2.03s    344:           if err = sw.setCellValFunc(&c, val); err != nil {
         .          .    345:                   _, _ = sw.rawData.WriteString(`</row>`)
         .          .    346:                   return err
         .          .    347:           }
      40ms      9.17s    348:           writeCell(&sw.rawData, c)
         .          .    349:   }
         .       10ms    350:   _, _ = sw.rawData.WriteString(`</row>`)
         .      500ms    351:   return sw.rawData.Sync()
         .          .    352:}
         .          .    353:
         .          .    354:https:// marshalRowAttrs prepare attributes of the row by given options.
         .          .    355:func marshalRowAttrs(opts ...RowOpts) (attrs string, err error) {
         .          .    356:   var opt *RowOpts
Total: 37.44s
ROUTINE ======================== github.com/xuri/excelize/v2.writeCell in /home/mzimmerman/go/pkg/mod/github.com/xuri/excelize/[email protected]/stream.go
     140ms      9.01s (flat, cum) 24.07% of Total
         .          .    481:   default:
         .          .    482:   }
         .          .    483:   return
         .          .    484:}
         .          .    485:
      20ms       20ms    486:func writeCell(buf *bufferedWriter, c xlsxC) {
         .      140ms    487:   _, _ = buf.WriteString(`<c`)
         .          .    488:   if c.XMLSpace.Value != "" {
         .          .    489:           fmt.Fprintf(buf, ` xml:%s="%s"`, c.XMLSpace.Name.Local, c.XMLSpace.Value)
         .          .    490:   }
      10ms      3.23s    491:   fmt.Fprintf(buf, ` r="%s"`, c.R)
      10ms       10ms    492:   if c.S != 0 {
         .          .    493:           fmt.Fprintf(buf, ` s="%d"`, c.S)
         .          .    494:   }
         .          .    495:   if c.T != "" {
      50ms      2.85s    496:           fmt.Fprintf(buf, ` t="%s"`, c.T)
         .          .    497:   }
      10ms      300ms    498:   _, _ = buf.WriteString(`>`)
      10ms       10ms    499:   if c.F != nil {
         .          .    500:           _, _ = buf.WriteString(`<f>`)
         .          .    501:           _ = xml.EscapeText(buf, []byte(c.F.Content))
         .          .    502:           _, _ = buf.WriteString(`</f>`)
         .          .    503:   }
      10ms       10ms    504:   if c.V != "" {
         .      110ms    505:           _, _ = buf.WriteString(`<v>`)
         .      1.94s    506:           _ = xml.EscapeText(buf, []byte(c.V))
         .      150ms    507:           _, _ = buf.WriteString(`</v>`)
         .          .    508:   }
         .      220ms    509:   _, _ = buf.WriteString(`</c>`)
      20ms       20ms    510:}
         .          .    511:
         .          .    512:https:// Flush ending the streaming writing process.
         .          .    513:func (sw *StreamWriter) Flush() error {
         .          .    514:   if !sw.sheetWritten {
         .          .    515:           _, _ = sw.rawData.WriteString(`<sheetData>`)```

```(pprof) list CoordinatesToCellName
Total: 37.44s
ROUTINE ======================== github.com/xuri/excelize/v2.CoordinatesToCellName in /home/mzimmerman/go/pkg/mod/github.com/xuri/excelize/[email protected]/lib.go
     120ms      4.40s (flat, cum) 11.75% of Total
         .          .    262:https:// Example:
         .          .    263:https://
         .          .    264:https://    excelize.CoordinatesToCellName(1, 1) // returns "A1", nil
         .          .    265:https://    excelize.CoordinatesToCellName(1, 1, true) // returns "$A$1", nil
         .          .    266:https://
      10ms       10ms    267:func CoordinatesToCellName(col, row int, abs ...bool) (string, error) {
         .          .    268:   if col < 1 || row < 1 {
      10ms       10ms    269:           return "", fmt.Errorf("invalid cell coordinates [%d, %d]", col, row)
         .          .    270:   }
         .          .    271:   sign := ""
         .          .    272:   for _, a := range abs {
         .          .    273:           if a {
         .          .    274:                   sign = "$"
         .          .    275:           }
         .          .    276:   }
      20ms      2.33s    277:   colname, err := ColumnNumberToName(col)
      80ms      2.05s    278:   return sign + colname + sign + strconv.Itoa(row), err
         .          .    279:}
         .          .    280:
         .          .    281:https:// areaRefToCoordinates provides a function to convert area reference to a
         .          .    282:https:// pair of coordinates.
         .          .    283:func areaRefToCoordinates(ref string) ([]int, error) {
(pprof) list ColumnNumberToName
Total: 37.44s
ROUTINE ======================== github.com/xuri/excelize/v2.ColumnNumberToName in /home/mzimmerman/go/pkg/mod/github.com/xuri/excelize/[email protected]/lib.go
     170ms      2.31s (flat, cum)  6.17% of Total
         .          .    219:https://
         .          .    220:https:// Example:
         .          .    221:https://
         .          .    222:https://     excelize.ColumnNumberToName(37) // returns "AK", nil
         .          .    223:https://
      40ms       40ms    224:func ColumnNumberToName(num int) (string, error) {
      10ms       10ms    225:   if num < 1 {
         .          .    226:           return "", fmt.Errorf("incorrect column number %d", num)
         .          .    227:   }
      10ms       10ms    228:   if num > TotalColumns {
         .          .    229:           return "", ErrColumnNumber
         .          .    230:   }
         .          .    231:   var col string
      30ms       30ms    232:   for num > 0 {
      30ms      2.17s    233:           col = string(rune((num-1)%26+65)) + col
      20ms       20ms    234:           num = (num - 1) / 26
         .          .    235:   }
      30ms       30ms    236:   return col, nil
         .          .    237:}
         .          .    238:
         .          .    239:https:// CellNameToCoordinates converts alphanumeric cell name to [X, Y] coordinates
         .          .    240:https:// or returns an error.
         .          .    241:https://```

pprof excelizetest.test mem.out
```(pprof) list writeCell
Total: 3.33GB
ROUTINE ======================== github.com/xuri/excelize/v2.writeCell in /home/mzimmerman/go/pkg/mod/github.com/xuri/excelize/[email protected]/stream.go
  950.51MB     1.34GB (flat, cum) 40.16% of Total
         .          .    482:   }
         .          .    483:   return
         .          .    484:}
         .          .    485:
         .          .    486:func writeCell(buf *bufferedWriter, c xlsxC) {
         .    24.15MB    487:   _, _ = buf.WriteString(`<c`)
         .          .    488:   if c.XMLSpace.Value != "" {
         .          .    489:           fmt.Fprintf(buf, ` xml:%s="%s"`, c.XMLSpace.Name.Local, c.XMLSpace.Value)
         .          .    490:   }
  356.01MB   459.51MB    491:   fmt.Fprintf(buf, ` r="%s"`, c.R)
         .          .    492:   if c.S != 0 {
         .          .    493:           fmt.Fprintf(buf, ` s="%d"`, c.S)
         .          .    494:   }
         .          .    495:   if c.T != "" {
  356.51MB   426.10MB    496:           fmt.Fprintf(buf, ` t="%s"`, c.T)
         .          .    497:   }
         .          .    498:   _, _ = buf.WriteString(`>`)
         .          .    499:   if c.F != nil {
         .          .    500:           _, _ = buf.WriteString(`<f>`)
         .          .    501:           _ = xml.EscapeText(buf, []byte(c.F.Content))
         .          .    502:           _, _ = buf.WriteString(`</f>`)
         .          .    503:   }
         .          .    504:   if c.V != "" {
         .    59.90MB    505:           _, _ = buf.WriteString(`<v>`)
     238MB   339.98MB    506:           _ = xml.EscapeText(buf, []byte(c.V))
         .    20.93MB    507:           _, _ = buf.WriteString(`</v>`)
         .          .    508:   }
         .    37.23MB    509:   _, _ = buf.WriteString(`</c>`)
         .          .    510:}
         .          .    511:
         .          .    512:https:// Flush ending the streaming writing process.
         .          .    513:func (sw *StreamWriter) Flush() error {
         .          .    514:   if !sw.sheetWritten {```

@mzimmerman
Copy link

I'm not sure how to fix/adjust/help the excelize library -- it supports a lot of additional features/functions that I don't want, but that would still be needed to be supported.

Because of that, I implemented an XLSX writer of [][]string -- doesn't support any other Excel type but is more performant in large file cases. Uses all cores on a machine but is generally still limited to the flate compression process.

https://github.com/mzimmerman/xlsxwriter

BenchmarkExcelize10x10-192           1819354          5708179         +213.75%
BenchmarkExcelize100x100-192         13970914         9574846         -31.47%
BenchmarkExcelize1000x1000-192       1177549846       267765228       -77.26%
BenchmarkExcelize10000x10000-192     125017992200     27324051886     -78.14%
BenchmarkExcelize1000x10-192         14653077         10480947        -28.47%
BenchmarkExcelize10000x10-192        133965033        55374300        -58.67%
BenchmarkExcelize100000x10-192       1363888368       582779774       -57.27%
BenchmarkExcelize100000x100-192      13342011189      2711677473      -79.68%
BenchmarkExcelize10000x1000-192      13136815755      2647669308      -79.85%```

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants