C+M+S Summit 2023 Presentation Abstracts

 Below are the abstracts for the lineup of topics and speakers at our 2023 Compute+Memory+Summit Agenda.  Stay tuned as we add more content!

2023 Compute+Memory+Storage Summit Presentation Abstracts


Compute, Memory, and Storage:  Optimized Configurations for a New Era of Workloads

David McIntyre, SNIA C+M+S Summit Planning Team Co-Chair, SNIA CMSI Marketing Co-Chair; Director, Product Planning and Business Enablement, Samsung Corporation

Abstract

Cloud to Edge applications require real-time, deterministic decisions now. The latest advances within and across compute, memory and storage technologies should be optimized and configured to meet the requirements of end customer applications and the developers that create them. From UCIe-enabled compute resources to CXL memory pooling, semantics and optimization to the latest advancements in storage, this presentation provides an holistic view of application requirements and the infrastructure resources that are required to support them.


Watch Out - Memory's Changing!

Jim Handy, General Director, Objective Analysis
Tom Coughlin, President, Coughlin Associates

Abstract

Memory today consists of two giants: DRAM and NAND flash, and a broad range of lesser alternatives all vying for the designer’s attention, including established technologies like SRAM, NOR flash, and EEPROM as well as emerging technologies like MRAM, ReRAM, PCM, and FRAM.  The newer technologies are all hoping that their superior performance and scalability will allow them to steal market share from the leading technologies.  Meanwhile CXL and UCIe are poised to completely change the rules by which the memory game is played.  This presentation will review all of these technologies to show how they interact and to draw some surprising conclusions about the likely outcomes of these changes, leading into what SNIA members and others must do to keep their edge during this important era of change.

Back to Top

 

NVMe Computational Storage Standardization

Kim Malone, NVM Express/Storage Software Architect, Intel Corporation
William Martin, NVM Express/SSD I/O Standards, Samsung Semiconductor

Abstract

Learn what is happening in NVMe to support Computational Storage devices. Computational Storage requires two new command sets: The Computational Programs Command Set and the Subsystem Local Memory Command Set.  We will introduce you to how these two command sets work together, the details of each command set, and how they fit within the NVMe I/O Command Set architecture.


Explore the Compute Express Link™ (CXL™) Device Ecosystem and Usage Models - A Panel

Moderated by Kurtis BowmanCXL Consortium

Abstract

Compute Express Link™ (CXL™) maintains memory coherency between the CPU memory space and memory on CXL attached devices. CXL enables a high-speed, efficient interconnect between the CPU, platform enhancements, and workload accelerators such as GPUs, FPGAs, and other purpose-built accelerator solutions.
 
Recently, CXL Consortium members showcased public demonstrations of CXL at industry events, proving to the industry that CXL’s vision to enable a new ecosystem of high-performance, heterogeneous computing is now a reality. There were also multi-vendor demos to illustrate interoperability between vendor solutions.
 
As the industry sees more CXL products getting released into the market and with the completion of the CXL Consortium FYI compliance event, this presentation will feature a panel of experts that will discuss the type of CXL product devices that are currently available today and the CXL devices we expect to see in the next year. The panelists will also examine the new features in the CXL 3.0 specification and the usage models it will enable.

Computational Storage APIs

Oscar Pinto, Principal Engineer, Samsung Semiconductor

Abstract

Computational Storage is a new field that is addressing performance and scaling issues for compute with traditional server architectures. This is an active area of innovation in the industry where multiple device and solution providers are collaborating in defining this architecture while actively working to create new and exciting solutions. The SNIA Computational Storage TWG is leading the way with new interface definitions with Computational Storage APIs that work across different hardware architectures. Learn how these APIs may be applied and what types of problems they can help solve.


Form Factors Update - A Panel

Moderated by Cameron Brett, Co-Chair, SNIA SSD Special Interest Group/Sr. Director of Enterprise and Cloud Storage Marketing, KIOXIA 

Abstract

Learn how having a flexible and scalable family of form factors allows for optimization for different use cases, different media types on SSDs, scalable performance, and improved data center TCO. Our experts will highlight the latest SNIA specifications that support these form factors, provide an overview of platforms that are EDSFF-enabled, and discuss the future for new product and application introductions.

Introduction to CXL Fabrics

Vince Hache, Director, Systems Architecture, Rambus

Abstract

This session will introduce CXL Fabrics, a new multi-level switch architecture introduced in CXL 3.x. CXL Fabrics are not restricted to tree-based topologies and can scale to 4,096 nodes, offering substantial scale and flexibility. The presentation will cover the transport level details, routing model, and management architecture.


Standardizing Computational Storage

William Martin, SSS IO Standards/Co-Chair, SNIA Computational Storage Technical Work Group
Jason Molgaard, Principal Storage Solutions Architect, Solidigm/Co-Chair SNIA Computational Storage Technical Work Group

Abstract

Computational Storage standards are under active development at both SNIA and NVMe.  The CS TWG in SNIA continues to work on enhancements to the Architecture and Programming Model after the successful release of the 1.0 revision of the standard in August 2022.  The CS TWG also continues to refine the CS API, which was released for public review in July 2022, to ensure alignment and compatibility with NVMe.  Many of the same companies are engaged with the SNIA CS work and the NVMe CS work and strive to ensure compatibility and cohesion between the SNIA and NVMe CS standards.  This presentation will discuss the current state of the SNIA CS Architecture and Programming Model, the current state of the CS API, and explain how these standards align with and support the NVMe CS efforts.  Part of the discussion will include a lexicon of terminology from both standards and a decoder ring to translate between the slightly different terminology utilized between the two standards organizations.
 

Providing Capacity and TCO to Applications

Sudhir Balasubramanian, Sr. Staff Solution Architect, VMware
Arvind Jagannath, 
Product Line Manager for vSphere Platform, VMware

Abstract

This session will walk the audience through the various CXL use-cases and then provide a software tiering use-case using a real-world database workload.

Back to Top


Universal Chiplet Interconnect Express (UCIe)TM : An Open Standard for Innovations at the Package Level

Dr. Debendra Das Sharma, Intel Senior Fellow and Chair of UCIe Consortium

Abstract

High-performance workloads demand on-package integration of heterogeneous processing units, on-package memory, and communication infrastructure to meet the demands of the emerging compute landscape. Applications such as artificial intelligence, machine learning, data analytics, 5G, automotive, and high-performance computing are driving these demands to meet the needs of cloud computing, intelligent edge, and client computing infrastructure. On-package interconnects are a critical component to deliver the power-efficient performance with the right feature set in this evolving landscape.

UCIe is an open industry standard with a fully specified stack that comprehends plug-and-play interoperability of chiplets on a package; similar to the seamless interoperability on board with well-established and successful off-package interconnect standards such PCI Express®, Universal Serial Bus (USB)®, and Compute Express Link (CXL)®. In this talk, we will discuss the usages and key metrics associated with different technology choices. We will also delve into the different layers as well as the software model along with the compliance and interoperability mechanisms.

Back to Top

A Big-Disk Computational Storage Array for High Performance Data Analytics

Stephen Bates, VP, Huawei

Abstract

High-performance workloads demand on-package integration of heterogeneous processing units, on-package memory, and communication infrastructure to meet the demands of the emerging compute landscape. Applications such as artificial intelligence, machine learning, data analytics, 5G, automotive, and high-performance computing are driving these demands to meet the needs of cloud computing, intelligent edge, and client computing infrastructure. On-package interconnects are a critical component to deliver the power-efficient performance with the right feature set in this evolving landscape.

UCIe is an open industry standard with a fully specified stack that comprehends plug-and-play interoperability of chiplets on a package; similar to the seamless interoperability on board with well-established and successful off-package interconnect standards such PCI Express®, Universal Serial Bus (USB)®, and Compute Express Link (CXL)®. In this talk, we will discuss the usages and key metrics associated with different technology choices. We will also delve into the different layers as well as the software model along with the compliance and interoperability mechanisms.

Back to Top

Detecting Ransomware with Computational Storage 

Andy Walls, IBM Fellow, Chief Architect and CTO, IBM Flash Storage Division

Abstract

We are in the midst of another pandemic in the world, this time caused by cyber criminals in the form of Ransomware. The incidents have risen spectacularly and point to the urgency that all parts of the stack do everything possible to identify and recover from these attacks. Effective security measures are to be implemented in the application, OS, hypervisor, FS,  host system management, network and in the storage arrays.  Computational storage provides a useful mechanism to offload key statistical indicators to the SSD closer to where the data resides.  This presentation will highlight the latest ransomware detection assistance mechanisms by IBM within IBM’s computational storage Flash Control Modules (FCMs).

Back to Top

Compute Express Link™ (CXL™) 3.0: Expanded Capabilities for Increasing Scale and Optimizing Resource Utilization

Andy Rudoff, Principal Engineer, Intel

Abstract

Compute Express Link™ (CXL™) is an open industry standard interconnect offering coherency and memory semantics using high-bandwidth, low-latency connectivity between the host process and devices such as accelerators, memory buffers, and smart I/O devices. CXL technology is designed to support the growing high-performance computational workloads by supporting heterogeneous processing and memory systems for applications in Artificial Intelligence (AI), Machine Learning (ML), Analytics, Cloud Infrastructure, and Cloudification of Network and Edge.
 
CXL 3.0 doubles the transfer rate to 64GT/s with no additional latency over previous generations and introduces fabric capabilities and management, improved memory sharing and memory pooling, enhanced coherency, and peer-to-peer communication. 
 
This presentation will explore the new features in the CXL 3.0 specification, highlight the enhancements to memory pooling to optimize server performance, and introduce the concept of memory sharing to enable clusters of machines to solve large problems through shared memory constructs.

Back to Top

Optimizing Complex Memory and Storage Systems Using Simulations

Andy Banta, Storage Janitor, Magnition

Abstract

Modern storage and content delivery systems are built out of a myriad of components providing a dizzying set of variables used to optimize their performance. By modularly simulating the components into building blocks, we show how you can now quickly try many different sets of variables.  This modularity also allows plugging in of proprietary components to measure their difference in the system.  These simulations can be built in days or weeks, instead of the months to years needed to build the actual system.
 
This session runs you through a demonstration of simulating a storage system and running a series of sample data sets on it.
 
Wrap this up with a success story of using these simulations, and you won't want to miss this session..

Back to Top

Server Fabrics and the Evolution to CXL 3.0

Shreyas Shah, Founder and CTO, Elastics.cloud

Abstract

The increasing resource requirements for applications and algorithms in a heterogeneous compute environment are driving the need for more efficient approaches to managing memory and resources. With the advent of CXL, systems architecture is undergoing a transformation to meet these challenges. We explore the functionalities of CXL 1.1, CXL 2.0, and CXL 3.0 and the improved efficiency that can be realized with a PCIe and CXL-enabled switch. A switch SoC (SSoC) leveraging CXL provides the required features to efficiently manage resources, allowing for memory expansion in the box, memory pooling, and resource pooling in a disaggregated rack. These new usage models are about more than just connectivity, they demonstrate the revolutionary system-level performance improvements that are possible with CXL switching.

Back to Top

Thinking Memory

Mats Oberg, Associate Vice President, Marvell

Abstract

Compute Express Link (CXL) based memory devices are expected alleviate the ever increasing demand for memory in datacenter applications. CXL devices can be used as memory expansion as well as memory bandwidth expansion when the socket cannot hold enough memory for the applications. Memory pooling devices can reduce the overall need for DRAM, and thereby cost. In addition to providing additional memory to hosts, it is also possible to offload compute tasks to CXL memory devices and free up the CPU for other operations.
 
In this presentation we will take a look at how we can add compute capability to CXL memory devices. We will look at offloading operations like data analytics, and also telemetry functions like hotness tracking.

Back to Top

A Host-Assisted Computational Storage Architecture

Tao Lu, Research Manager, DapuStor Corporation

Abstract

Integrating data compression capability into SSDs has demonstrated great potential to improve the utilization and lifetime of the storage device and also the performance of the entire system. It is advocated to add a hardware engine into the SSD for low-latency compression and decompression. However, this requires a new and long hardware product development cycle, which would prevent current storage systems from reaping the benefits of in-SSD compression. We explore a software-based in-SSD compression solution, which can be delivered to users quickly through a simple SSD firmware update. The most critical challenge is the severe performance bottleneck caused by compression and decompression, as the in-SSD embedded CPU has quite limited computing power. To tackle this challenge, we propose a host-assisted computational storage device, called HA-CSD. It employs an offline and data hotness- and compressibility-aware compression strategy to remove compression from the critical write I/O path. A novel decompression architecture is devised to utilize the powerful host CPU for fast data decompression. We implement HA-CSD in a commercial enterprise SSD. Experimental results show that HA-CSD achieves 2.1GB/s and 5.2GB/s read and write bandwidth without using a hardware accelerator. Compared with RocksDB built-in compression, HA-CSD can increase the YCSB benchmark throughput by up to 5.7×, and improve the host CPU efficiency significantly.

Back to Top

Accelerating Encryption for Solid State Storage Drives

Kelvin Goveas, Architect, Azimuth Technology

A hardware accelerator to improve the throughput and power efficiency of encryption and decryption of solid state drive (SSD) based storage systems is presented. The accelerator interface takes advantage of the open source RISC-V ISA, which allows it to be attached to a RISC-V based microcontroller system-on-chip (SOC) in 2 different configurations, based on design time configuration. The accelerator presents a management interface which can be used to configure, start and stop operation, a completion interface which supports firmware polling or interrupts, and a memory interface through which it initiates I/O and memory accesses.
 
The accelerator can be attached inside a CPU in the tightly attached configuration, allowing firmware to use special register reads and writes for the management interface, polling or interrupts for the completion interface, and the CPU’s coherent memory subsystem for I/O or memory accesses. No additional RISC-V instructions are needed to operate the accelerator, only custom special register addresses. Alternatively, the accelerator can be attached to the SOC interconnect in the closely attached configuration, which allows it to participate in a coherency protocol, such as AXI or CHI, as a non-caching node. Memory mapped register reads and writes are used to configure the management interface, and I/O or memory read and write requests are initiated, and data fills and responses received via the memory interface. 
 
The accelerator adds support for separate processing threads, each of which can be allocated to separate storage devices, by providing a special register bank per thread which can be accessed via the management interface. Both design and run time configurability of the number of compute pipes associated with a thread are provided to allow reallocation of resources in a dynamic runtime environment. The number of channels in the memory interface is design time configurable to allow tuning of memory versus compute bandwidth for a particular system.The design is currently in development using Verilog, and is fully synthesizable to support soft IP integration into an SOC, or mapping to an FPGA. The goal is to support the most commonly used cryptographic protocols from TLS 1.3 including AES-128/256, SHA-256/512 as well as AES-GCM 128/256. The hardware accelerator can be used to improve throughput and power efficiency relative to firmware based solutions, which may be using the RISC-V crypto instruction set extensions (ISEs), running on the microcontroller.

Back to Top

Fabric Attached Memory – Hardware and Software Architecture

Clarete Riana Crasta, Master Technologist, Hewlett Packard Enterprise

Abstract

HPC architectures increasingly handle workloads where the working data set cannot be easily partitioned or is too large to fit into node local memory. We have defined a system architecture and a software stack to enable large data sets to be held in fabric-attached memory (FAM) that is accessible to all compute nodes across a Slingshot-connected HPC cluster, thus providing a new approach to handling large data sets. 
Emerging AI and data analytics workloads are increasingly becoming important for HPC architectures because HPC clusters provide computation capabilities needed at scale; however a divide still exists between traditional HPC, AI, and data analytics applications, because the three communities use very different programming models. The architecture leverages emerging hardware capabilities such as CXL along with ideas from both HPC and high performance data analytics software to support AI and data analytics on HPC clusters. This presentation will cover the architecture, the software stack and its value using a use case: an Arkouda-based proxy application for real-time data analytics. 

Back to Top

Python with Computational Storage

Jayassankar OP, Associate Technical Director, Samsung Electronics
Arun V Pillai, Staff Engineer, Samsung Electronics

Abstract

Computational storage devices bring compute to the storage subsystem and help offload data processing within the device thus promising faster data processing. Samsung is helping standardize industry efforts towards Computational Storage in the NVMe SNIA standards workgroups and also developing SmartSSD, an NVMe based Computational Storage Drive (CSD). As majority of industry applications in the AI/ML space are implemented through fourth generation programming languages (such as python), there arises the need of a readily usable ecosystem to adapt this technology. We have developed a CS Python Library Framework that exposes SNIA CS-APIs to Python users. This library enables users to test the functionality of a CSD/CSx and allows Python applications to easily integrate with Computational Storage. This also helps enable Computational Storage in industry applications with minimal changes. In this talk, we will be detailing the design aspects of CS Python Library and demonstrate the use case of a geo-locator application that offloads computation to Samsung SmartSSD.

Back to Top

 
 

Standardizing Memory to Memory Data Movement with SDXI v1.0

Shyam Iyer, Distinguished Engineer, Dell Technologies

Abstract

For long, using software to perform memory copies has been the gold standard for applications performing memory-to-memory data movement or system memory operations. The newly released SNIA Smart Data Accelerator Interface (SDXI) Specification v1.0 attempts to change that.

SDXI v1.0 is a standard for a memory-to-memory data mover and acceleration interface that is extensible, forward-compatible and independent of I/O interconnect technology. Among other features, SDXI standardizes an interface and architecture that can be abstracted or virtualized with a well-defined capability to quiesce, suspend, and resume the architectural state of a per-address-space data mover.

This specification was developed by SNIA’s SDXI Technical Working Group, comprising of 89 individuals representing 23 SNIA member companies.

As new memory technologies get adopted and memory fabrics expand the use of tiered memory, SDXI Specification v1.0 will be enhanced with new accelerated data movement operations and use cases. This talk gives an overview of the SDXI’s use cases, important features of SDXI specification v1.0, and what features can be expected with new versions of the specification.

Back to Top

Decomposing Compute to Grow Computational Storage

Aldrin Montana, PhD Student, University of California, Santa Cruz

Abstract

Recent hardware trends have introduced a variety of heterogeneous compute units while also bringing network and storage bandwidths within an order of magnitude of memory subsystems. In response, developers have used increasingly exotic solutions to extract more performance from hardware; typically relying on static, design-time partitioning of their programs which cannot keep pace with storage systems that are layering compute units throughout deepening hierarchies of storage devices. We argue that dynamic, just-in-time partitioning of computation offers a solution for emerging data systems to overcome ever-growing data sizes in the face of stalled CPU performance and memory bandwidth.

We are prototyping a computational storage system (CSS) that adopts a database perspective to utilize computational storage devices (CSx). We discuss our approach, "decomposable queries," for passing a portion of a query plan (sub-plan) down a hierarchy of CSx and allowing each device to dynamically determine how much of the sub-plan to propagate and execute. We briefly describe microbenchmarks used to evaluate aspects of our initial progress for a bioinformatics use case, from which we observed unexpected slowdowns on our CSDs (Seagate's research-only KV drives). We then explain how decomposable queries will allow our prototype CSS to transparently benefit from future improvements to CSx and accommodate new types of CSx, and how these are two capabilities are necessary for deployment of CSx in production.

Back to Top

Efficiency of Data Centric Computing

Steven Yuan, Founder and CEO, StorageX.ai

Abstract

In today's data-driven world, the rapid growth of large datasets presents a significant challenge for many applications. At the same time, AI workloads are becoming increasingly dominant in various aspects, exhibiting growth not only in data size but also in throughput, latency requirements, and more. Moving computation closer to data storage is more efficient than transferring large amounts of data, as federated data processing allows for better system efficiency, reduced network traffic, and lower latency, ultimately resulting in improved total cost of ownership (TCO).

Focusing on data-centric computing, we will discuss the need to integrate compute, data, and I/O acceleration closely with storage systems and data nodes. This approach can provide a highly efficient engine for data-heavy applications, such as training large language models, supporting the metaverse in next-generation data centers, and enabling edge computing. By addressing the requirements for low latency and high bandwidth services, this strategy helps tackle the coexistence of "big" and "fast" data challenges.

 

Back to Top

NVMe as a Cloud Interface

Jake Oshins, Partner Software Engineer, Azure Core, Microsoft

Abstract

NVMe as an interface has been used for lots of things, from flash to computational storage.  It’s a natural choice for a hardware interface for cloud storage, too, but using it that way comes with scaling challenges.  Jake Oshins from Microsoft will discuss the issues that come from moving from providing storage to a single physical machine, to multiple virtual machines on a host to thousands of containers running on that host.  After that, he’ll cover work going on in the Scalable IOV working group of the Open Compute Project and how that might map onto the NVMe specification.

 

Deploying SSDs with Computational Storage - The Promise & Reality

JB Baker, Vice President, Marketing, ScaleFlux

Abstract

We’ve been hearing about Computational Storage for several years now.  But, how is it playing out in the field? What applications benefit from CS? How are users deploying CS? What challenges, changes, and integrations do they deal with to reap the rewards from CS?  Join us to hear about a few examples of Computational Storage Drive deployments and use cases to improve efficiency, TCO, and uptime.

 

Improving Storage Systems for Simulation Science with Computational Storage

Dominic Manno, Computer Scientist, Los Alamos National Labs

Abstract

The advantages of computational storage are well suited to improve storage system efficiency for HPC storage systems supporting simulation science. These benefits can be implemented transparently or include slight modifications to application behavior. Simulation science workloads have ever-growing requirements for bandwidth while single data set size is increasing towards petabyte scale. The organization of these datasets is currently file-based, but the structure of the data looks like common data analytics formats. It has become clear that integrating computational storage into HPC system designs will provide significant performance improvements over more traditional designs while also enabling a large and open ecosystem.

Providing use-cases, requirements, and demonstrations for real world applications of computational storage can benefit the broader community and assist with standardization efforts. We will present these use-cases and demonstrations for integrating computation near storage in HPC storage systems.

 

Back to Top

Storage Security Past, Present, and Future

Eric Hibbard, Chair, SNIA Storage Security Technical Work Group

Abstract

Storage security emerged in the early 2000s with the SNIA publication on Storage Networking Security. Since then, multiple international standards related to storage security have been published as well as having it included in key security frameworks (e.g., ISO/IEC 27000-series and NIST SP 800-series). As attacks against data increase (e.g., data breaches and ransomware), storage systems and ecosystems are becoming a last line of defense. New forms of data at-rest encryption (Key Per I/O and homomorphic encryption), trustworthiness mechanism, and communications security are being incorporated into storage. Looking to the future, more security mechanisms are anticipated at the component level to protect data while they are in-use.

2023 Cybersecurity & Privacy Landscape

Eric Hibbard, Chair, INCITS Technical Committee Cybersecurity and Privacy

Abstract

This session provides a quick overview of the cybersecurity and privacy landscape with an eye to expectations for 2023. Anticipated maleficent behaviors, industry trends, and legal/regulatory changes that are likely to impact organizations and individuals are discussed. Their relevance to storage, and IT in general, will be explored. If you’ve ever wondered why security/privacy professionals do the things they do, this session will give you a glimpse into their not-so-rose colored glasses.

Storage Sanitization-The Next Chapter

Paul Suhler, IEEE Security in Storage Work Group

Abstract

The need to eradicate recorded data on storage devices and media is well understood, but the technologies and methodologies to do it correctly can be elusive. With the publication of the new ISO/IEC 27040 (Storage security) and IEEE 2883 (Standard for Storage Sanitization) international standards, there is some clarity for organization as well as enhance expectations under the heading of reasonable security. However, many issues remain to be addressed. This session highlights these new standards as well as exploring some of the remaining issues, including initiatives that are underway to deal with some of the gaps.

 

Zero Trust Security

Chris Williams, IEEE Zero Trust Security WG

Abstract

Zero Trust is a security framework requiring user and entities to be authenticated, authorized, and continuously validated before being granted or continuing access to applications, systems, and data. The U.S. Government is spearheading adoption of zero trust security, which is having an impact on the offerings from the security vendor community. Private organizations are beginning to transition proof of concept implementations into their mainstream operations.

This session highlights the primary focus areas for Zero Trust activities and provides an overview of key lessons learned from early adopters of zero trust security.

Zero Trust Security

Chris Williams, IEEE Zero Trust Security WG

Abstract

Zero Trust is a security framework requiring user and entities to be authenticated, authorized, and continuously validated before being granted or continuing access to applications, systems, and data. The U.S. Government is spearheading adoption of zero trust security, which is having an impact on the offerings from the security vendor community. Private organizations are beginning to transition proof of concept implementations into their mainstream operations.

This session highlights the primary focus areas for Zero Trust activities and provides an overview of key lessons learned from early adopters of zero trust security.

Fine Grain Encryption Using Key I/O

Festus Hategekimana, Trusted Computing Group, Storage Work Group

Abstract

The Key Per IO (KPIO) project is a joint initiative between NVM Express® and the Trusted Computing Group (TCG) Storage Work Group to define a new Security Subsystem Class (SSC), the Key Per IO SSC, for NVMe® class of Storage Devices.

Key Per IO allows hosts to own, control, and specify the Media Encryption Keys (MEKs) that a Storage Device uses for its device-level user data encryption. Key Per IO achieves this by allowing hosts to securely download a large number of media encryption keys into the NVM subsystem and to specify on a per command basis which of those media encryption keys the Storage Device uses for encryption.

Since the MEKs can come from a variety of sources external to the Storage Devices, data can be encrypted by keys that are known to only a particular host and/or tenant. This control over the keys and their encryption granularity can be a powerful security control for multitenant scenarios (e.g., cloud, containers, VMs).

 

This session provides a brief overview of the KPIO functionality, summarizes the current state of the specifications, and explores a few of the compelling use cases.

Back to Top

 

Cyber Recovery and Resilience

Jim Shook, Global Director, Cybersecurity and Compliance Practice, Dell Data Protection

Abstract

Cyber and ransomware attacks continue to plague organizations around the world and the size of the victim organizations is no longer a predictor of attacks (i.e., individuals, small business, non-profits, Fortune 500 enterprises, and governments are being hit). Surviving a ransomware attack is often dependent on the entity’s cyber resiliency and ability to quickly recover critical data in a timely manner.

 

The National Institute of Standards and Technology (NIST) introduced the concept of cyber recovery in NIST SP 800-209 as something distinct from more traditional data protection mechanisms like backups. This session explores key aspects of cyber recovery solutions and outlines scenarios where they can be most effective.

Persistent Memory Trends - a Panel Discussion

Moderated by Dave Eggleston, Business Development, Microchip

Abstract

Where do companies see the industry going with regard to persistent memory? With the improvement of SSD and DRAM I/O over CXL, the overlap of CXL and NVMe, high density persistent memory, and memory-semantic SSDs, there is a lot to talk about! Our moderator and panel of experts from Intel, Samsung, and SMART Modular will widen the lens on persistent memory, take a system level approach, and see how the persistent memory landscape is being redefined

Where do companies see the industry going with regard to persistent memory? With the improvement of SSD and DRAM I/O over CXL, the overlap of CXL and NVMe, high density persistent memory, and memory-semantic SSDs, there is a lot to talk about! Our moderator and panel of experts from Intel, Samsung, and SMART Modular will widen the lens on persistent memory, take a system level approach, and see how the persistent memory landscape is being redefined.

 

The Evolution of Compute from a Storage Viewpoint

Scott Shadley, SNIA Board of Directors

Abstract

Since 1945 we have been following the basic Princeton architecture created by John von Neumann. Now that we have evolved the components of the systems. From the CPU, with overclocking, multi-thread, multi-core, and more. To the memory bus where we have come from EDO to DDR to DDR5 to HBM and now CXL. Lastly the storage architecture. From punch cards, tape, floppies, HDD the size of small office buildings, and then the advent of Flash. We have been locked into this dichotomy of what to do with data. Where to store, where to act, where to protect.

Now we need to step back, look at what has evolved and focus on the next evolution. Compute IN, not next to, Storage and Memory. A step toward following Amdahl’s law is the next progression and where Computational X comes into play. Learn more as we walk through the evolution of compute from the other side of the CPU.

The Key Foundations and Future of Memory: Industry Standards

Jonathan HInkle, Distinguished Systems Architect for Storage, Micron

Abstract

The JEDEC Solid State Technology Association develops open standards for the microelectronics industry This presentation will provide an overview of the latest memory standards and work of JEDEC.

Computational SSDs for Semantic Image Retrieval

Vishwas Saxena, Sr. Technologist, Firmware Engineering, Western Digital

Abstract

Abstract Multiple deep learning approaches collect analytics for semantic image retrieval in different formats. Some approaches may just collect the analytics in form of tag, some might collect the analytics in form of caption and recently a few sophisticated scene graph generation algorithms are collecting the analytics in form of triplets (Subject- Verb- Object). The lack of uniformity in the analytics format leads to usage of multiple types of schemas and databases to retrieve the images using unstructured user queries. For example, a tag-based search uses a Key Value DB, Scene graph-based retrieval uses a graph DB.

Having dependency on a database increases the computational requirements for searching images on low compute storage devices like SSDs. Further Since every database uses its own query language, the performance of the image retrieval framework is highly dependent on how efficient these queries are written

Learn how to design an embedded machine learning standard framework inside SSD using an advanced transformer model to convert analytics from various deep learning approaches to uniform embedding format suitable for fast and accurate image retrieval.  Any form of analytics (tag, sentence, paragraph, triplets) will be converted into a N-dimensional vector. For faster lookup on SSD, the framework stores the extracted N-dimensional vectors as an index tree and processes the user queries also as a N-dimensional vector by feeding the user input to SBERT transformer model

Learn framework more as in how to design an interface to capture the intent behind a user query using clustering techniques. For ex.  “woman riding a bike” and “Girl driving a scooter” have the same intent, the intent based interface increases image search accuracy with low number of false positives

 

Exploring Performance Paradigm of HMB NVMe SSDs

Pradeep SR, Associate Technical Director, Samsung Semiconductor India Research 
Ranjith T, Associate Staff Engineer, Samsung Semiconductor India Research

Abstract

The Value NVMe Client SSD segment has been introduced with advent of Host Memory Buffer (HMB) technology in NVMe. In this, as a replacement to the on chip DRAM, Host memory is used & this has helped customers to take the plunge into the world of SSDs at a lesser cost. The current generation of HMB SSD’s predominantly use this Host buffer to park the Logical-to-Physical (L2P) mapping during the device runtime. The general perception is that the performance will be on par with DRAM devices as the access to HMB over PCIe is fast enough to meet the customer needs.

In this paper, we intend to break the paradigm about the HMB-SSD’s performance. By means of an in depth analysis into factors and parameters that can make a difference. The study probes into three major areas. Device State; fresh or sustained, Device configurations; Variation in Number of Queue, Queue depth, HMB Size allocation from Host, & the Work load impacts. The type of work load includes industry standard methods for Sequential and random write/read operations. Also granular variations that probes to find what works best & worst for a HMB based NVMe SSD, compared to its DRAM counterpart. This paper additionally looks into HMB-SSD as standalone entity, to derive the type of fine tuning that host can leverage in-terms of device configuration & optimally make the best use of HMB based SSD depending on its use cases.