Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scaling up the model prediciton #1012

Closed
azomorod opened this issue Sep 27, 2023 · 14 comments
Closed

Scaling up the model prediciton #1012

azomorod opened this issue Sep 27, 2023 · 14 comments

Comments

@azomorod
Copy link

First of all, I wanted to thank you for your great work and your invaluable contributions to the open-source community.
Using SITS package I could generate a very good model for landcover classification.
At the moment I am facing the challenge of scaling up model predictions to accommodate larger regions in the scale of millions of square kilometers.
Given your experience and expertise, I was wondering if you could share any insights or recommendations on how to scale up model predictions effectively and efficiently.
Once again, thank you for your outstanding work.

@gilbertocamara
Copy link
Contributor

Dear @azomorod, to classify a big region, SITS uses the same API. To speed up processing, you will benefit from a large VM with enough disk to store the whole region. We suggest the following steps:

(1) Select a data collection (e.g., Sentinel-2) from a cloud provider (e.g., Planetary Computer).
(2) Define the region (either with a list of tiles or with a shapefile) and the temporal interval.
(3) Regularize the collection and build a cube. For large regions, you will need sufficient space to store all the bands in all time steps.
(4) Select training samples and build your model.
(5) Run the model with sits_classify. Remember to set the "multicores" and "memsize" parameters to fit your VM.

Good luck! Keep us posted!

@azomorod
Copy link
Author

Dear @gilbertocamara,
Thanks for your quick reply and your guidance. Following your suggestions, I have been trying to employ a powerful AWS EC2 instance (i3.metal) with 72 cores and 512 GB of ram.
All the steps from 1 to 4 are running as expected, however, applying sits_calssify() takes exceptionally longer than usual.
I am classifying a tile of 110km x 110km with 10m resolution (using 70 cores and 200 GB RAM in the function parameters) and after 3 hours of run time, the progress bar is still at 0%.

I have tried the same region on the university workstation (using 80 cores and 100 GB of RAM in the function parameters) and each tile is classified after around 1.5 hours.
You can find the cpu information related to the university workstation and instance below for your reference.
The only main difference I can think of is using the Rstudio Server on the EC2 instead of the Rstudio Desktop in the workstation.
Could this have an impact on the performance of the sits_classify() function?
Thanks in advance for your time and
Best,

Workstation CPU

Model name:          Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
Stepping:            1
CPU MHz:             1197.486
CPU max MHz:         3600.0000
CPU min MHz:         1200.0000
BogoMIPS:            4390.22
Virtualization:      VT-x
L1d cache:           32K
L1i cache:           32K
L2 cache:            256K
L3 cache:            56320K

EC2 CPU

Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   46 bits physical, 48 bits virtual
CPU(s):                          72
On-line CPU(s) list:             0-71
Thread(s) per core:              2
Core(s) per socket:              18
Socket(s):                       2
NUMA node(s):                    2
Vendor ID:                       GenuineIntel
CPU family:                      6
Model:                           79
Model name:                      Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
Stepping:                        1
CPU MHz:                         1200.000
CPU max MHz:                     3000.0000
CPU min MHz:                     1200.0000
BogoMIPS:                        4600.14
Virtualization:                  VT-x
L1d cache:                       1.1 MiB
L1i cache:                       1.1 MiB
L2 cache:                        9 MiB

@gilbertocamara
Copy link
Contributor

Dear @azomorod, I will ask my fellow developers @rolfsimoes and @OldLipe to follow up on this problem.

@OldLipe
Copy link
Contributor

OldLipe commented Sep 29, 2023

Dear @azomorod,

Here are a few suggestions about models:

  1. Overall, we have seen that the Random Forest model suggests a robust initial benchmark for the various classifications we have tested. We recommend tuning the hyperparameters for Deep Learning models to identify these values in more detail. For the TempCNN model, we tested the hyperparameters used in Rußwurm's work (DOI: 10.5194/isprs-archives-XLIII-B2-2020-1545-2020).
  2. To scale up Deep Learning models, such as 1D convolution neural networks (TempCNN) or Temporal self-attention encoder (TAE), we recommend the usage of GPUs since SITS support the GPU processing for all torch-based models.
  3. Multicores and memory size values must be rethought in GPU processing since all parallel operations will occur based on the GPU's memory and cores.

The only main difference I can think of is using the Rstudio Server on the EC2 instead of the Rstudio Desktop in the workstation.
Could this have an impact on the performance of the sits_classify() function?

The difference between the Rstudio Server and Desktop has no impact on processing. What can impact processing time is the use of an environment with multiple users, thus having a sharing of computing resources.
One hypothesis about the classification times reported is the region where the data is stored. Working with images in the same region where the data is stored provides faster image access. For example, Sentinel-2 AWS images are stored in the "US-WEST-2" region. Thus, a machine in this region will access the images more quickly. An alternative is to store the data locally, where the limitation will be the disk's read rate.

Best regards!

@azomorod
Copy link
Author

azomorod commented Oct 1, 2023

Thanks a lot for your suggenstions @OldLipe. I am currently using a TempCNN model running on the locally stored data. But I will try it with a GPU instance and will let you know about the outcome.
Best,

@OldLipe
Copy link
Contributor

OldLipe commented Oct 2, 2023

Dear @azomorod,

A further hypothesis about the different processing times is the versions of the Torch and Luz packages. Are both environments running the same versions of these packages?

Best Regards!

@azomorod
Copy link
Author

azomorod commented Oct 4, 2023

Dear @OldLipe,
I checked it and both environements are using the same version for torch and luz packages:

packageVersion("torch")
'0.11.0'
packageVersion("luz")
'0.4.0'

I will update you shortly on the status of the gpu instance
Best

@azomorod
Copy link
Author

azomorod commented Oct 5, 2023

Dear @OldLipe
I have tried the sits_classify() function on an EC2 instace with GPU p3.16xlarge. Unfortunately, the classification still does not move forward.
I am using 60 cores and 150 GB of RAM. Do you have any other recommendations for "multicores" and "memsize" parameters?
Thanks in advanced for your time.
Best.

@OldLipe
Copy link
Contributor

OldLipe commented Oct 9, 2023

Dear @azomorod,

Would you have a time to have a teleconference this week? We would like to better understand your problem so we will be able to help you.

Best.

@azomorod
Copy link
Author

Dear @OldLipe
Thanks a lot for your support I am available every day after 18:00 German time (i.e. 13:00 Brazil time) via skype at: alizomorodian
Best,

@gilbertocamara
Copy link
Contributor

Dear @azomorod we propose to meet on Wednesday, October 10th, at 19:00 German time (which will be 14:00 BRT). Please use the following link to connect:
https://us02web.zoom.us/j/3347971699

@azomorod
Copy link
Author

Dear @gilbertocamara Thanks for your time. See you soon

@gilbertocamara
Copy link
Contributor

Dear @azomorod. would it be possible to start our meeting at 19:30 German time today? Sorry.

@azomorod
Copy link
Author

Dear @gilbertocamara , Sure, no problem from my side.
Best

@OldLipe OldLipe closed this as completed Jun 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants