Scaling up the model prediciton #1012

azomorod · 2023-09-27T14:10:28Z

First of all, I wanted to thank you for your great work and your invaluable contributions to the open-source community.
Using SITS package I could generate a very good model for landcover classification.
At the moment I am facing the challenge of scaling up model predictions to accommodate larger regions in the scale of millions of square kilometers.
Given your experience and expertise, I was wondering if you could share any insights or recommendations on how to scale up model predictions effectively and efficiently.
Once again, thank you for your outstanding work.

gilbertocamara · 2023-09-28T17:34:19Z

Dear @azomorod, to classify a big region, SITS uses the same API. To speed up processing, you will benefit from a large VM with enough disk to store the whole region. We suggest the following steps:

(1) Select a data collection (e.g., Sentinel-2) from a cloud provider (e.g., Planetary Computer).
(2) Define the region (either with a list of tiles or with a shapefile) and the temporal interval.
(3) Regularize the collection and build a cube. For large regions, you will need sufficient space to store all the bands in all time steps.
(4) Select training samples and build your model.
(5) Run the model with sits_classify. Remember to set the "multicores" and "memsize" parameters to fit your VM.

Good luck! Keep us posted!

azomorod · 2023-09-28T19:07:58Z

Dear @gilbertocamara,
Thanks for your quick reply and your guidance. Following your suggestions, I have been trying to employ a powerful AWS EC2 instance (i3.metal) with 72 cores and 512 GB of ram.
All the steps from 1 to 4 are running as expected, however, applying sits_calssify() takes exceptionally longer than usual.
I am classifying a tile of 110km x 110km with 10m resolution (using 70 cores and 200 GB RAM in the function parameters) and after 3 hours of run time, the progress bar is still at 0%.

I have tried the same region on the university workstation (using 80 cores and 100 GB of RAM in the function parameters) and each tile is classified after around 1.5 hours.
You can find the cpu information related to the university workstation and instance below for your reference.
The only main difference I can think of is using the Rstudio Server on the EC2 instead of the Rstudio Desktop in the workstation.
Could this have an impact on the performance of the sits_classify() function?
Thanks in advance for your time and
Best,

Workstation CPU

Model name:          Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
Stepping:            1
CPU MHz:             1197.486
CPU max MHz:         3600.0000
CPU min MHz:         1200.0000
BogoMIPS:            4390.22
Virtualization:      VT-x
L1d cache:           32K
L1i cache:           32K
L2 cache:            256K
L3 cache:            56320K

EC2 CPU

Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   46 bits physical, 48 bits virtual
CPU(s):                          72
On-line CPU(s) list:             0-71
Thread(s) per core:              2
Core(s) per socket:              18
Socket(s):                       2
NUMA node(s):                    2
Vendor ID:                       GenuineIntel
CPU family:                      6
Model:                           79
Model name:                      Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
Stepping:                        1
CPU MHz:                         1200.000
CPU max MHz:                     3000.0000
CPU min MHz:                     1200.0000
BogoMIPS:                        4600.14
Virtualization:                  VT-x
L1d cache:                       1.1 MiB
L1i cache:                       1.1 MiB
L2 cache:                        9 MiB

gilbertocamara · 2023-09-28T20:24:45Z

Dear @azomorod, I will ask my fellow developers @rolfsimoes and @OldLipe to follow up on this problem.

OldLipe · 2023-09-29T19:12:29Z

Dear @azomorod,

Here are a few suggestions about models:

Overall, we have seen that the Random Forest model suggests a robust initial benchmark for the various classifications we have tested. We recommend tuning the hyperparameters for Deep Learning models to identify these values in more detail. For the TempCNN model, we tested the hyperparameters used in Rußwurm's work (DOI: 10.5194/isprs-archives-XLIII-B2-2020-1545-2020).
To scale up Deep Learning models, such as 1D convolution neural networks (TempCNN) or Temporal self-attention encoder (TAE), we recommend the usage of GPUs since SITS support the GPU processing for all torch-based models.
Multicores and memory size values must be rethought in GPU processing since all parallel operations will occur based on the GPU's memory and cores.

The only main difference I can think of is using the Rstudio Server on the EC2 instead of the Rstudio Desktop in the workstation.
Could this have an impact on the performance of the sits_classify() function?

The difference between the Rstudio Server and Desktop has no impact on processing. What can impact processing time is the use of an environment with multiple users, thus having a sharing of computing resources.
One hypothesis about the classification times reported is the region where the data is stored. Working with images in the same region where the data is stored provides faster image access. For example, Sentinel-2 AWS images are stored in the "US-WEST-2" region. Thus, a machine in this region will access the images more quickly. An alternative is to store the data locally, where the limitation will be the disk's read rate.

Best regards!

azomorod · 2023-10-01T12:45:01Z

Thanks a lot for your suggenstions @OldLipe. I am currently using a TempCNN model running on the locally stored data. But I will try it with a GPU instance and will let you know about the outcome.
Best,

OldLipe · 2023-10-02T14:57:35Z

Dear @azomorod,

A further hypothesis about the different processing times is the versions of the Torch and Luz packages. Are both environments running the same versions of these packages?

Best Regards!

azomorod · 2023-10-04T08:21:34Z

Dear @OldLipe,
I checked it and both environements are using the same version for torch and luz packages:

packageVersion("torch")
'0.11.0'
packageVersion("luz")
'0.4.0'

I will update you shortly on the status of the gpu instance
Best

azomorod · 2023-10-05T13:41:57Z

Dear @OldLipe
I have tried the sits_classify() function on an EC2 instace with GPU p3.16xlarge. Unfortunately, the classification still does not move forward.
I am using 60 cores and 150 GB of RAM. Do you have any other recommendations for "multicores" and "memsize" parameters?
Thanks in advanced for your time.
Best.

OldLipe · 2023-10-09T20:46:28Z

Dear @azomorod,

Would you have a time to have a teleconference this week? We would like to better understand your problem so we will be able to help you.

Best.

azomorod · 2023-10-10T06:57:23Z

Dear @OldLipe
Thanks a lot for your support I am available every day after 18:00 German time (i.e. 13:00 Brazil time) via skype at: alizomorodian
Best,

gilbertocamara · 2023-10-10T18:11:28Z

Dear @azomorod we propose to meet on Wednesday, October 10th, at 19:00 German time (which will be 14:00 BRT). Please use the following link to connect:
https://us02web.zoom.us/j/3347971699

azomorod · 2023-10-11T06:06:34Z

Dear @gilbertocamara Thanks for your time. See you soon

gilbertocamara · 2023-10-11T13:45:20Z

Dear @azomorod. would it be possible to start our meeting at 19:30 German time today? Sorry.

azomorod · 2023-10-11T14:19:19Z

Dear @gilbertocamara , Sure, no problem from my side.
Best

OldLipe closed this as completed Jun 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scaling up the model prediciton #1012

Scaling up the model prediciton #1012

azomorod commented Sep 27, 2023

gilbertocamara commented Sep 28, 2023

azomorod commented Sep 28, 2023

gilbertocamara commented Sep 28, 2023

OldLipe commented Sep 29, 2023

azomorod commented Oct 1, 2023

OldLipe commented Oct 2, 2023

azomorod commented Oct 4, 2023 •

edited

azomorod commented Oct 5, 2023

OldLipe commented Oct 9, 2023

azomorod commented Oct 10, 2023

gilbertocamara commented Oct 10, 2023

azomorod commented Oct 11, 2023

gilbertocamara commented Oct 11, 2023

azomorod commented Oct 11, 2023

Scaling up the model prediciton #1012

Scaling up the model prediciton #1012

Comments

azomorod commented Sep 27, 2023

gilbertocamara commented Sep 28, 2023

azomorod commented Sep 28, 2023

Workstation CPU

EC2 CPU

gilbertocamara commented Sep 28, 2023

OldLipe commented Sep 29, 2023

azomorod commented Oct 1, 2023

OldLipe commented Oct 2, 2023

azomorod commented Oct 4, 2023 • edited

azomorod commented Oct 5, 2023

OldLipe commented Oct 9, 2023

azomorod commented Oct 10, 2023

gilbertocamara commented Oct 10, 2023

azomorod commented Oct 11, 2023

gilbertocamara commented Oct 11, 2023

azomorod commented Oct 11, 2023

azomorod commented Oct 4, 2023 •

edited