Message ID | cover.1710969449.git.andre.glover@linux.intel.com |
---|---|
Headers | show |
Series | crypto: Add new compression modes for zlib and IAA | expand |
Hi Eric, Thank you for reviewing the patch. Please see responses to your questions inline below. On Thu, 2024-03-28 at 19:46 -0700, Eric Biggers wrote: > On Thu, Mar 28, 2024 at 10:44:41AM -0700, Andre Glover wrote: > > The 'canned' compression mode implements a compression scheme that > > uses a statically defined set of Huffman tables, but where the > > Deflate > > Block Header is implied rather than stored with the compressed > > data. > > This already exists in standard DEFLATE; it's called fixed mode. See > section > 3.2.6 of RFC1951 > (https://datatracker.ietf.org/doc/html/rfc1951#page-12). > > I think that what's going on is that you've implemented a custom > variant of > DEFLATE where you set the fixed Huffman codes to something different > from the > ones defined in the standard. > > Is that correct, or are there other differences? > We view it as a variant of dynamic block Deflate as opposed to a variant of fixed. In particular, it is compressing the input with static Huffman tables (i.e. ones that do not vary with the input), and where the Deflate block header (which is a constant) is not stored with the compressed data. If the missing block header were to be prepended to the compressed data, the result would be a valid dynamic compressed block. One might think of this as vaguely similar to dictionary compression. In that case, the dictionary is not stored with the compressed data but is agreed to by the compressor and decompression. In the case of canned compression, the header is not stored with the compressed data but is agreed to by both entities. > Actually, looking at your zlib_tr_flush_block(), it looks instead of > using the > reserved block type value (3) or redefining the meaning of the fixed > block type > value (1), you actually deleted the BTYPE and BFINAL fields from the > data stream > entirely. So the stream no longer stores the type of block or the > flag that > indicates whether the block is the final one or not. > > That has the property that there cannot be any standard blocks, even > uncompressed blocks, included in the data stream anymore. Is that > intentional? > Conceptually, there is a valid dynamic block header associated with the compressed data, it is just not stored with the data in order to save space (since it is effectively an unchanging constant). In this usage, it is envisioned that the output would consist of a single Deflate block (i.e. the implied header is marked as BFINAL). In theory, one could have the implied block header not marked as BFINAL, and so the compressed data would need to contain at least two blocks (i.e. the body of the initial block, an EOB, and a normal block with header), but this would be counterproductive to the intended usage. > Maybe this is why you're using the name "canned", instead of going > with > something more consistent with the existing "fixed" name, like > "custom-fixed"? > Again, we view this as a variant of dynamic compression rather than as a variant of fixed compression. We use the term "static" to describe compression with a dynamic block where the tables are unchanging (i.e. not dependent on the input data) . This can allow the compression to be done in one pass rather than two. The "canned" compression uses a static table, but is different in that the header is implied rather than stored. > I wonder what the plan is for when the next hardware vendor tries to > do this and > chooses their own Huffman codes, different from yours. Or what if > Intel decides > the Huffman codes they chose aren't the best ones anymore and > releases new > hardware that uses different codes. Will we perhaps be getting a > tinned mode > too? > The Huffman tables are not built into the hardware. The Huffman tables/block header is in this case built into the software. The same tables are embedded in the zlib portion of the patch for software compression/decompression and in the hardware driver portion of the patch for IAA compression/decompression. The hardware itself can work with any desired set of tables. Thanks, Andre
On Thu, Mar 28, 2024 at 10:44:41AM -0700, Andre Glover wrote: > > Below is a table showing the latency improvements with zlib, between > zlib dynamic and zlib canned modes, and the compression ratio for > each mode while using a set of 4300 4KB pages sampled from SPEC > CPU17 workloads: > _________________________________________________________ > | Zlib Level | Canned Latency Gain | Comp Ratio | > |------------|-----------------------|------------------| > | | compress | decompress | dynamic | canned | > |____________|__________|____________|_________|________| > | 1 | 49% | 29% | 3.16 | 2.92 | > |------------|----------|------------|---------|--------| > | 6 | 27% | 28% | 3.35 | 3.09 | > |------------|----------|------------|---------|--------| > | 9 | 12% | 29% | 3.36 | 3.11 | > |____________|__________|____________|_________|________| So which kernel user (zswap I presume) is clamouring for this feature? We don't add new algorithms that have no in-kernel users. So we need to be sure that the kernel user actually want this. Thanks,
Hi Herbert, On Fri, 2024-04-05 at 15:07 +0800, Herbert Xu wrote: > On Thu, Mar 28, 2024 at 10:44:41AM -0700, Andre Glover wrote: > > > > Below is a table showing the latency improvements with zlib, > > between > > zlib dynamic and zlib canned modes, and the compression ratio for > > each mode while using a set of 4300 4KB pages sampled from SPEC > > CPU17 workloads: > > _________________________________________________________ > > > Zlib Level | Canned Latency Gain | Comp Ratio | > > > ------------|-----------------------|------------------| > > > | compress | decompress | dynamic | canned | > > > ____________|__________|____________|_________|________| > > > 1 | 49% | 29% | 3.16 | 2.92 | > > > ------------|----------|------------|---------|--------| > > > 6 | 27% | 28% | 3.35 | 3.09 | > > > ------------|----------|------------|---------|--------| > > > 9 | 12% | 29% | 3.36 | 3.11 | > > > ____________|__________|____________|_________|________| > > So which kernel user (zswap I presume) is clamouring for this > feature? We don't add new algorithms that have no in-kernel > users. So we need to be sure that the kernel user actually > want this. > > Thanks, Hi Herbert, We have recently submitted an RFC to zswap and zram maintainers and users for by_n compression with Intel IAA [1] feedback. This work is in support of efforts to swap in/out large and multi-sized folios. With by_n compression, we have created a scheme that allows parallel IAA compression and decompression operations on a single folio resulting in performance gains. Currently the by_n scheme uses the canned mode compression algorithm to perform the compression and decompression operations. Using canned mode compression results in reduced compression latency because the deflate header doesnt need to be created dynamically, while also producing better ratio than Deflate Fixed mode. We would appreciate your feedback on this scheme. Here is data from the RFC showing a performance comparison for 64KB folio swap in/out with zram on Sapphire Rapids, whose core frequency is fixed at 2500MHz: +------------+-------------+---------+-------------+----------+-------+ | | Compression | Decomp | Compression | zram | zram | | Algorithm | latency | latency | ratio | write | read | +------------+-------------+---------+-------------+----------+-------+ | | Median (ns) | | Median (ns) | +------------+-------------+---------+-------------+----------+-------+ | | | | | | | | IAA by_1 | 34,493 | 20,038 | 2.93 | 40,130 | 24,478| | IAA by_2 | 18,830 | 11,888 | 2.93 | 24,149 | 15,536| | IAA by_4 | 11,364 | 8,146 | 2.90 | 16,735 | 11,469| | IAA by_8 | 8,344 | 6,342 | 2.77 | 13,527 | 9,177| | IAA by_16 | 8,837 | 6,549 | 2.33 | 15,309 | 9,547| | IAA by_32 | 11,153 | 9,641 | 2.19 | 16,457 | 14,086| | IAA by_64 | 18,272 | 16,696 | 1.96 | 24,294 | 20,048| | | | | | | | | lz4 | 139,190 | 33,687 | 2.40 | 144,940 | 37,312| | | | | | | | | lzo-rle | 138,235 | 61,055 | 2.52 | 143,666 | 64,321| | | | | | | | | zstd | 251,820 | 90,878 | 3.40 | 256,384 | 94,328| +------------+-------------+---------+-------------+----------+-------+ [1]https://lore.kernel.org/all/cover.1714581792.git.andre.glover@linux. intel.com/