Adding storage as a service #143

da-ekchajzer · 2022-12-25T14:09:18Z

Problem

Most of the cloud instances we implement doesn't include storage since cloud providers offers storage as a service which can be linked to an instance. We should add a way to compute the impacts of storage as a service from the impacts of storage components.

Solution

General case

Create a storage_as_a_service router.

We should compute the impacts of a classic disk from user (or default) characteristics :

And compute the impacts per unit of storage from the value given by the user to finally get the impact for the total amount of storage used by the user.

{
 "configuration":{
    "capacity": 1000,
    "type": "SSD",
    "manufacturer": "Samsung",
    "replication": 1
  }
  "usage":{
    "hours_electrical_consumption":12,
    "usage_location":"FRA",
    "storage": 2000,
  }
}

impact = ((impact(disk) / disk.capacity) * usage.storage) * replication

archetypes

We could pre-record some values to complete the fields with the characteristics from specific type of services from specific providers.

{
  "provider": "aws",
  "storage_type":  "s3"
  "usage":{
    "usage_location":"FRA",
    "storage": 2000,
  }
}

Most of the value will be hard to get for specific providers. Replication factors have been gathered for GCP, AZURE and AWS here : https://docs.google.com/spreadsheets/d/1D7mIGKkdO1djPoMVmlXRmzA7_4tTiGZLYdVbfe85xQM/edit#gid=735227650

The text was updated successfully, but these errors were encountered:

github-benjamin-davy · 2024-03-18T15:29:17Z

Adding a note, I previously asked CCF to update their replication factors here for AWS and S3 Standard (change from 3 to 6 the replication factor).

bpetit · 2024-04-09T15:03:15Z

I think of 2 potential difficulties for this.

1. Each provider provides different classes of service for a given bloc storage service.

Taking the example of Azure, bloc storage can be Standard, Premium or Ultra, meaning more IOPS as you choose a higher performance class, so probably more performance driven infrastructure behind.

This probably doesn't mean more infrastructure, but it most probably means higher end disks.

2. Infrastructure is probably very different depending on the storage service type

s3-like storage means at least (base on what you can find in an open-source s3-like implementation like Ceph object storage / Rados Gateway):

web servers to receive HTTP requests
application servers to manage data and requests in the context of an object store
storage dedicated servers

bloc-storage on the other hand might be a bit simpler, containing at least :

storage dedicated servers (implementing software-defined-storage) or SANs
probably dedicated network hardware to get very fast access to the data

Filesystem storage might be in the middle, needing less components than object-storage, but still being more complex than block-storage.

I imagine part of the modeling should account for the difference of complexity somehow.

I'm not sure how those two items should be implemented, but I think this is a modeling discussion worth having.

bpetit · 2024-05-23T16:10:23Z

Extra note for the hackathon on May 24th:

Could we and how could we, account for the impact of not only storing data on a given storage service, but also accessing and writing this data ?

I hardly imagine we could use something finer than a per GB impact factor regarding network traffic, probably different for each type of service, and maybe something very estimative regarding the compute part for delivering the data (that might be more important in object storage than other services).

demeringo · 2024-05-24T10:47:36Z

Some notes from hackathon:

demeringo · 2024-05-24T12:46:30Z

S3 storage classes: https://aws.amazon.com/s3/storage-classes/
https://docs.aws.amazon.com/AmazonS3/latest/userguide/DataDurability.html

demeringo · 2024-05-24T12:55:10Z

CCF coefficients: https://docs.google.com/spreadsheets/d/1D7mIGKkdO1djPoMVmlXRmzA7_4tTiGZLYdVbfe85xQM/edit#gid=735227650
/!\ it may differ from the coef implemented:
https://github.com/cloud-carbon-footprint/cloud-carbon-footprint/blob/ee6f0e71d80e22fef9ed76846979bf4123562da2/packages/aws/src/domain/AwsFootprintEstimationConstants.ts#L130

demeringo · 2024-05-24T13:21:29Z

Ceph : https://docs.ceph.com/en/latest/rados/operations/erasure-code/

benjaminlebigot · 2024-05-24T14:57:13Z

Content of the framaform hackathonOrange2024:

Object storage impacts

Formula

Impact Use = FE_elec * PUE * conso_storage(storage_class) * Storage_user * resiliency_cost(storage_class) * Remplissage + Compute
Where :
* FE_elec : Electricity emission factor kgCO2e / Wh
* PUE : DC PUE
* Storage_user : amount of storage as seen by the user, which can be either provisionned (for block storage) or used (for object storage)

conso(storage_class)

Wh / GB
storage_class :
- perf / latence écriture et lecture
- type de hardware
- SLA

Resiliency_cost(storage_class) = ( Bytes physiques Nécessaires) / (Bytes utilisateur resilient)
= ( resilience_region * local_redundancy )
Where
* resilience_region = nombre de fois où la donnée est répliquée, typiquement sur des AZ différentes. de 1 à 6
* local_redundancy : resilience_az (intra az)= surcout de stockage pour assurer la résilience à la perte d'une machine à l'échelle d'une AZ : 2 par défaut (à priori borne sup)

Impact Fab = Fab(temp) + compute

Fab(temp) = FE_fab_fixe + FE_fabvariable * densité * Reservation * resiliency_cost * Remplissage

r
remplissage =
*

FE_fab_fixe: exemple cout du casing

S3 :
selon la classe de stockage , le niveau de réplication change cf https://aws.amazon.com/s3/storage-classes/ paragraphe performance chart: (multi AZ vs single AZ)

Azure object storage:

    niveau de réplication semble indépendante de la classe de stockage

Exemple One Zone (cas un peu particulier du S3)

replication : 1
redondance : ? raid 5 ? X2 ou x3 au niveau de l'az
remplissage ? : 50%

Autres sources d'info:
YouTube - FAST '23 - Building and Operating a Pretty Big Storage System (My Adventures in Amazon S3) https://www.youtube.com/watch?v=sc3J4McebHE

Dénomination azure des classes de redondance:
LRS (local redundant storage): minimum de redondance (equivalent 2 serveurs au sein d'une unique zone/dc)
ZRS (Zone redundant storage) multi zone (dc's indépendants) au sein d'une unique région
GZRS: globally redundant storage: même chose que précédement mais multi region
https://learn.microsoft.com/en-us/azure/storage/common/storage-redundancy

Visualiser les couts du versioning S3

https://cloudiamo.com/2020/06/20/figuring-out-the-cost-of-versioning-on-amazon-s3/

Cout des versions d'objet dans S3

=> Considéré comme du stockage classique (c'est à dire taille totale facturée est taille de la v0 + taille de la v1)

- Voir How am I charged for using Versioning dans https://aws.amazon.com/s3/faqs/

Cas d'usage

S3
https://aws.amazon.com/s3/storage-classes/?nc1=h_ls

donnée utilisateur : utilisation storage (dans billing) = Somme des tailles buckets avec versionning (la replication n'est pas visible)
paramètres utilsateur : par sélection du type de storage class
- replication sur plusieurs zones
gestion par l'hébergeur
- local_redundancy
- gestion capacitaire

Block Storage

donnée utilisateur : réservation

demeringo · 2024-05-24T15:21:41Z

Updated diagram showing what we intend to exclude in v1:

control plan and cloud provider internal implementation (overprovisionning)
impacts related to accessing the files (network and compute/api to send or retrieve the files)

bpetit · 2024-05-27T11:05:40Z

note : add usage time dimension to the formula

da-ekchajzer added methodology good first issue Good for newcomers labels Dec 25, 2022

da-ekchajzer mentioned this issue May 22, 2023

AWS storage estimations #183

Open

demeringo pinned this issue May 24, 2024

demeringo mentioned this issue Jun 13, 2024

doc(faq): explicit the positioning of cloud scanner compared to API and datavizta Boavizta/cloud-scanner#528

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding storage as a service #143

Adding storage as a service #143

da-ekchajzer commented Dec 25, 2022

github-benjamin-davy commented Mar 18, 2024

bpetit commented Apr 9, 2024 •

edited

Loading

bpetit commented May 23, 2024

demeringo commented May 24, 2024

demeringo commented May 24, 2024 •

edited

Loading

demeringo commented May 24, 2024 •

edited

Loading

demeringo commented May 24, 2024

benjaminlebigot commented May 24, 2024

demeringo commented May 24, 2024 •

edited

Loading

bpetit commented May 27, 2024

Adding storage as a service #143

Adding storage as a service #143

Comments

da-ekchajzer commented Dec 25, 2022

Problem

Solution

General case

archetypes

github-benjamin-davy commented Mar 18, 2024

bpetit commented Apr 9, 2024 • edited Loading

1. Each provider provides different classes of service for a given bloc storage service.

2. Infrastructure is probably very different depending on the storage service type

bpetit commented May 23, 2024

demeringo commented May 24, 2024

demeringo commented May 24, 2024 • edited Loading

demeringo commented May 24, 2024 • edited Loading

demeringo commented May 24, 2024

benjaminlebigot commented May 24, 2024

demeringo commented May 24, 2024 • edited Loading

bpetit commented May 27, 2024

bpetit commented Apr 9, 2024 •

edited

Loading

demeringo commented May 24, 2024 •

edited

Loading

demeringo commented May 24, 2024 •

edited

Loading

demeringo commented May 24, 2024 •

edited

Loading