Skip to content

Commit

Permalink
lib/zlib: add s390 hardware support for kernel zlib_deflate
Browse files Browse the repository at this point in the history
Patch series "S390 hardware support for kernel zlib", v3.

With IBM z15 mainframe the new DFLTCC instruction is available.  It
implements deflate algorithm in hardware (Nest Acceleration Unit - NXU)
with estimated compression and decompression performance orders of
magnitude faster than the current zlib.

This patchset adds s390 hardware compression support to kernel zlib.
The code is based on the userspace zlib implementation:

	madler/zlib#410

The coding style is also preserved for future maintainability.  There is
only limited set of userspace zlib functions represented in kernel.
Apart from that, all the memory allocation should be performed in
advance.  Thus, the workarea structures are extended with the parameter
lists required for the DEFLATE CONVENTION CALL instruction.

Since kernel zlib itself does not support gzip headers, only Adler-32
checksum is processed (also can be produced by DFLTCC facility).  Like
it was implemented for userspace, kernel zlib will compress in hardware
on level 1, and in software on all other levels.  Decompression will
always happen in hardware (when enabled).

Two DFLTCC compression calls produce the same results only when they
both are made on machines of the same generation, and when the
respective buffers have the same offset relative to the start of the
page.  Therefore care should be taken when using hardware compression
when reproducible results are desired.  However it does always produce
the standard conform output which can be inflated anyway.

The new kernel command line parameter 'dfltcc' is introduced to
configure s390 zlib hardware support:

    Format: { on | off | def_only | inf_only | always }
     on:       s390 zlib hardware support for compression on
               level 1 and decompression (default)
     off:      No s390 zlib hardware support
     def_only: s390 zlib hardware support for deflate
               only (compression on level 1)
     inf_only: s390 zlib hardware support for inflate
               only (decompression)
     always:   Same as 'on' but ignores the selected compression
               level always using hardware support (used for debugging)

The main purpose of the integration of the NXU support into the kernel
zlib is the use of hardware deflate in btrfs filesystem with on-the-fly
compression enabled.  Apart from that, hardware support can also be used
during boot for decompressing the kernel or the ramdisk image

With the patch for btrfs expanding zlib buffer from 1 to 4 pages (patch
6) the following performance results have been achieved using the
ramdisk with btrfs.  These are relative numbers based on throughput rate
and compression ratio for zlib level 1:

  Input data              Deflate rate   Inflate rate   Compression ratio
                          NXU/Software   NXU/Software   NXU/Software
  stream of zeroes        1.46           1.02           1.00
  random ASCII data       10.44          3.00           0.96
  ASCII text (dickens)    6,21           3.33           0.94
  binary data (vmlinux)   8,37           3.90           1.02

This means that s390 hardware deflate can provide up to 10 times faster
compression (on level 1) and up to 4 times faster decompression (refers
to all compression levels) for btrfs zlib.

Disclaimer: Performance results are based on IBM internal tests using DD
command-line utility on btrfs on a Fedora 30 based internal driver in
native LPAR on a z15 system.  Results may vary based on individual
workload, configuration and software levels.

This patch (of 9):

Create zlib_dfltcc library with the s390 DEFLATE CONVERSION CALL
implementation and related compression functions.  Update zlib_deflate
functions with the hooks for s390 hardware support and adjust workspace
structures with extra parameter lists required for hardware deflate.

Link: http:https://lkml.kernel.org/r/[email protected]
Signed-off-by: Ilya Leoshkevich <[email protected]>
Signed-off-by: Mikhail Zaslonko <[email protected]>
Co-developed-by: Ilya Leoshkevich <[email protected]>
Cc: Chris Mason <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: David Sterba <[email protected]>
Cc: Eduard Shishkin <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Josef Bacik <[email protected]>
Cc: Richard Purdie <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
  • Loading branch information
mzaslonk authored and torvalds committed Jan 31, 2020
1 parent f88b426 commit aa5b395
Show file tree
Hide file tree
Showing 11 changed files with 751 additions and 102 deletions.
7 changes: 7 additions & 0 deletions lib/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -278,6 +278,13 @@ config ZLIB_DEFLATE
tristate
select BITREVERSE

config ZLIB_DFLTCC
def_bool y
depends on S390
prompt "Enable s390x DEFLATE CONVERSION CALL support for kernel zlib"
help
Enable s390x hardware support for zlib in the kernel.

config LZO_COMPRESS
tristate

Expand Down
1 change: 1 addition & 0 deletions lib/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -140,6 +140,7 @@ obj-$(CONFIG_842_COMPRESS) += 842/
obj-$(CONFIG_842_DECOMPRESS) += 842/
obj-$(CONFIG_ZLIB_INFLATE) += zlib_inflate/
obj-$(CONFIG_ZLIB_DEFLATE) += zlib_deflate/
obj-$(CONFIG_ZLIB_DFLTCC) += zlib_dfltcc/
obj-$(CONFIG_REED_SOLOMON) += reed_solomon/
obj-$(CONFIG_BCH) += bch.o
obj-$(CONFIG_LZO_COMPRESS) += lzo/
Expand Down
79 changes: 41 additions & 38 deletions lib/zlib_deflate/deflate.c
Original file line number Diff line number Diff line change
Expand Up @@ -52,16 +52,18 @@
#include <linux/zutil.h>
#include "defutil.h"

/* architecture-specific bits */
#ifdef CONFIG_ZLIB_DFLTCC
# include "../zlib_dfltcc/dfltcc.h"
#else
#define DEFLATE_RESET_HOOK(strm) do {} while (0)
#define DEFLATE_HOOK(strm, flush, bstate) 0
#define DEFLATE_NEED_CHECKSUM(strm) 1
#endif

/* ===========================================================================
* Function prototypes.
*/
typedef enum {
need_more, /* block not completed, need more input or more output */
block_done, /* block flush performed */
finish_started, /* finish started, need only more output at next deflate */
finish_done /* finish done, accept no more input or output */
} block_state;

typedef block_state (*compress_func) (deflate_state *s, int flush);
/* Compression function. Returns the block state after the call. */
Expand All @@ -72,7 +74,6 @@ static block_state deflate_fast (deflate_state *s, int flush);
static block_state deflate_slow (deflate_state *s, int flush);
static void lm_init (deflate_state *s);
static void putShortMSB (deflate_state *s, uInt b);
static void flush_pending (z_streamp strm);
static int read_buf (z_streamp strm, Byte *buf, unsigned size);
static uInt longest_match (deflate_state *s, IPos cur_match);

Expand All @@ -98,6 +99,25 @@ static void check_match (deflate_state *s, IPos start, IPos match,
* See deflate.c for comments about the MIN_MATCH+1.
*/

/* Workspace to be allocated for deflate processing */
typedef struct deflate_workspace {
/* State memory for the deflator */
deflate_state deflate_memory;
#ifdef CONFIG_ZLIB_DFLTCC
/* State memory for s390 hardware deflate */
struct dfltcc_state dfltcc_memory;
#endif
Byte *window_memory;
Pos *prev_memory;
Pos *head_memory;
char *overlay_memory;
} deflate_workspace;

#ifdef CONFIG_ZLIB_DFLTCC
/* dfltcc_state must be doubleword aligned for DFLTCC call */
static_assert(offsetof(struct deflate_workspace, dfltcc_memory) % 8 == 0);
#endif

/* Values for max_lazy_match, good_match and max_chain_length, depending on
* the desired pack level (0..9). The values given below have been tuned to
* exclude worst case performance for pathological files. Better values may be
Expand Down Expand Up @@ -207,7 +227,15 @@ int zlib_deflateInit2(
*/
next = (char *) mem;
next += sizeof(*mem);
#ifdef CONFIG_ZLIB_DFLTCC
/*
* DFLTCC requires the window to be page aligned.
* Thus, we overallocate and take the aligned portion of the buffer.
*/
mem->window_memory = (Byte *) PTR_ALIGN(next, PAGE_SIZE);
#else
mem->window_memory = (Byte *) next;
#endif
next += zlib_deflate_window_memsize(windowBits);
mem->prev_memory = (Pos *) next;
next += zlib_deflate_prev_memsize(windowBits);
Expand Down Expand Up @@ -277,6 +305,8 @@ int zlib_deflateReset(
zlib_tr_init(s);
lm_init(s);

DEFLATE_RESET_HOOK(strm);

return Z_OK;
}

Expand All @@ -294,35 +324,6 @@ static void putShortMSB(
put_byte(s, (Byte)(b & 0xff));
}

/* =========================================================================
* Flush as much pending output as possible. All deflate() output goes
* through this function so some applications may wish to modify it
* to avoid allocating a large strm->next_out buffer and copying into it.
* (See also read_buf()).
*/
static void flush_pending(
z_streamp strm
)
{
deflate_state *s = (deflate_state *) strm->state;
unsigned len = s->pending;

if (len > strm->avail_out) len = strm->avail_out;
if (len == 0) return;

if (strm->next_out != NULL) {
memcpy(strm->next_out, s->pending_out, len);
strm->next_out += len;
}
s->pending_out += len;
strm->total_out += len;
strm->avail_out -= len;
s->pending -= len;
if (s->pending == 0) {
s->pending_out = s->pending_buf;
}
}

/* ========================================================================= */
int zlib_deflate(
z_streamp strm,
Expand Down Expand Up @@ -404,7 +405,8 @@ int zlib_deflate(
(flush != Z_NO_FLUSH && s->status != FINISH_STATE)) {
block_state bstate;

bstate = (*(configuration_table[s->level].func))(s, flush);
bstate = DEFLATE_HOOK(strm, flush, &bstate) ? bstate :
(*(configuration_table[s->level].func))(s, flush);

if (bstate == finish_started || bstate == finish_done) {
s->status = FINISH_STATE;
Expand Down Expand Up @@ -503,7 +505,8 @@ static int read_buf(

strm->avail_in -= len;

if (!((deflate_state *)(strm->state))->noheader) {
if (!DEFLATE_NEED_CHECKSUM(strm)) {}
else if (!((deflate_state *)(strm->state))->noheader) {
strm->adler = zlib_adler32(strm->adler, strm->next_in, len);
}
memcpy(buf, strm->next_in, len);
Expand Down
54 changes: 0 additions & 54 deletions lib/zlib_deflate/deftree.c
Original file line number Diff line number Diff line change
Expand Up @@ -76,11 +76,6 @@ static const uch bl_order[BL_CODES]
* probability, to avoid transmitting the lengths for unused bit length codes.
*/

#define Buf_size (8 * 2*sizeof(char))
/* Number of bits used within bi_buf. (bi_buf might be implemented on
* more than 16 bits on some systems.)
*/

/* ===========================================================================
* Local data. These are initialized only once.
*/
Expand Down Expand Up @@ -147,7 +142,6 @@ static void send_all_trees (deflate_state *s, int lcodes, int dcodes,
static void compress_block (deflate_state *s, ct_data *ltree,
ct_data *dtree);
static void set_data_type (deflate_state *s);
static void bi_windup (deflate_state *s);
static void bi_flush (deflate_state *s);
static void copy_block (deflate_state *s, char *buf, unsigned len,
int header);
Expand All @@ -169,54 +163,6 @@ static void copy_block (deflate_state *s, char *buf, unsigned len,
* used.
*/

/* ===========================================================================
* Send a value on a given number of bits.
* IN assertion: length <= 16 and value fits in length bits.
*/
#ifdef DEBUG_ZLIB
static void send_bits (deflate_state *s, int value, int length);

static void send_bits(
deflate_state *s,
int value, /* value to send */
int length /* number of bits */
)
{
Tracevv((stderr," l %2d v %4x ", length, value));
Assert(length > 0 && length <= 15, "invalid length");
s->bits_sent += (ulg)length;

/* If not enough room in bi_buf, use (valid) bits from bi_buf and
* (16 - bi_valid) bits from value, leaving (width - (16-bi_valid))
* unused bits in value.
*/
if (s->bi_valid > (int)Buf_size - length) {
s->bi_buf |= (value << s->bi_valid);
put_short(s, s->bi_buf);
s->bi_buf = (ush)value >> (Buf_size - s->bi_valid);
s->bi_valid += length - Buf_size;
} else {
s->bi_buf |= value << s->bi_valid;
s->bi_valid += length;
}
}
#else /* !DEBUG_ZLIB */

#define send_bits(s, value, length) \
{ int len = length;\
if (s->bi_valid > (int)Buf_size - len) {\
int val = value;\
s->bi_buf |= (val << s->bi_valid);\
put_short(s, s->bi_buf);\
s->bi_buf = (ush)val >> (Buf_size - s->bi_valid);\
s->bi_valid += len - Buf_size;\
} else {\
s->bi_buf |= (value) << s->bi_valid;\
s->bi_valid += len;\
}\
}
#endif /* DEBUG_ZLIB */

/* ===========================================================================
* Initialize the various 'constant' tables. In a multi-threaded environment,
* this function may be called by two threads concurrently, but this is
Expand Down
134 changes: 124 additions & 10 deletions lib/zlib_deflate/defutil.h
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
#ifndef DEFUTIL_H
#define DEFUTIL_H


#include <linux/zutil.h>

#define Assert(err, str)
#define Trace(dummy)
Expand Down Expand Up @@ -238,17 +240,13 @@ typedef struct deflate_state {

} deflate_state;

typedef struct deflate_workspace {
/* State memory for the deflator */
deflate_state deflate_memory;
Byte *window_memory;
Pos *prev_memory;
Pos *head_memory;
char *overlay_memory;
} deflate_workspace;

#ifdef CONFIG_ZLIB_DFLTCC
#define zlib_deflate_window_memsize(windowBits) \
(2 * (1 << (windowBits)) * sizeof(Byte) + PAGE_SIZE)
#else
#define zlib_deflate_window_memsize(windowBits) \
(2 * (1 << (windowBits)) * sizeof(Byte))
#endif
#define zlib_deflate_prev_memsize(windowBits) \
((1 << (windowBits)) * sizeof(Pos))
#define zlib_deflate_head_memsize(memLevel) \
Expand Down Expand Up @@ -292,6 +290,24 @@ void zlib_tr_stored_type_only (deflate_state *);
put_byte(s, (uch)((ush)(w) >> 8)); \
}

/* ===========================================================================
* Reverse the first len bits of a code, using straightforward code (a faster
* method would use a table)
* IN assertion: 1 <= len <= 15
*/
static inline unsigned bi_reverse(
unsigned code, /* the value to invert */
int len /* its bit length */
)
{
register unsigned res = 0;
do {
res |= code & 1;
code >>= 1, res <<= 1;
} while (--len > 0);
return res >> 1;
}

/* ===========================================================================
* Flush the bit buffer, keeping at most 7 bits in it.
*/
Expand Down Expand Up @@ -325,3 +341,101 @@ static inline void bi_windup(deflate_state *s)
#endif
}

typedef enum {
need_more, /* block not completed, need more input or more output */
block_done, /* block flush performed */
finish_started, /* finish started, need only more output at next deflate */
finish_done /* finish done, accept no more input or output */
} block_state;

#define Buf_size (8 * 2*sizeof(char))
/* Number of bits used within bi_buf. (bi_buf might be implemented on
* more than 16 bits on some systems.)
*/

/* ===========================================================================
* Send a value on a given number of bits.
* IN assertion: length <= 16 and value fits in length bits.
*/
#ifdef DEBUG_ZLIB
static void send_bits (deflate_state *s, int value, int length);

static void send_bits(
deflate_state *s,
int value, /* value to send */
int length /* number of bits */
)
{
Tracevv((stderr," l %2d v %4x ", length, value));
Assert(length > 0 && length <= 15, "invalid length");
s->bits_sent += (ulg)length;

/* If not enough room in bi_buf, use (valid) bits from bi_buf and
* (16 - bi_valid) bits from value, leaving (width - (16-bi_valid))
* unused bits in value.
*/
if (s->bi_valid > (int)Buf_size - length) {
s->bi_buf |= (value << s->bi_valid);
put_short(s, s->bi_buf);
s->bi_buf = (ush)value >> (Buf_size - s->bi_valid);
s->bi_valid += length - Buf_size;
} else {
s->bi_buf |= value << s->bi_valid;
s->bi_valid += length;
}
}
#else /* !DEBUG_ZLIB */

#define send_bits(s, value, length) \
{ int len = length;\
if (s->bi_valid > (int)Buf_size - len) {\
int val = value;\
s->bi_buf |= (val << s->bi_valid);\
put_short(s, s->bi_buf);\
s->bi_buf = (ush)val >> (Buf_size - s->bi_valid);\
s->bi_valid += len - Buf_size;\
} else {\
s->bi_buf |= (value) << s->bi_valid;\
s->bi_valid += len;\
}\
}
#endif /* DEBUG_ZLIB */

static inline void zlib_tr_send_bits(
deflate_state *s,
int value,
int length
)
{
send_bits(s, value, length);
}

/* =========================================================================
* Flush as much pending output as possible. All deflate() output goes
* through this function so some applications may wish to modify it
* to avoid allocating a large strm->next_out buffer and copying into it.
* (See also read_buf()).
*/
static inline void flush_pending(
z_streamp strm
)
{
deflate_state *s = (deflate_state *) strm->state;
unsigned len = s->pending;

if (len > strm->avail_out) len = strm->avail_out;
if (len == 0) return;

if (strm->next_out != NULL) {
memcpy(strm->next_out, s->pending_out, len);
strm->next_out += len;
}
s->pending_out += len;
strm->total_out += len;
strm->avail_out -= len;
s->pending -= len;
if (s->pending == 0) {
s->pending_out = s->pending_buf;
}
}
#endif /* DEFUTIL_H */
Loading

0 comments on commit aa5b395

Please sign in to comment.