1 Restriping the output of a compressor for storage
4 This document specifies the transformation of the output of a data
5 compressor into a smaller set of larger output slices. The primary use
6 case is as a backend of the memcached keyv::Map, which has a maximum
7 value size of one megabyte.
11 The new compression plugin API: @subpage data
14 * data::CompressorInfo
16 * max output slice size
19 * n output slices of size <= max output slice size
20 * Output of uncompressed data if data is uncompressible with zero-copy
21 during compression and decompression
23 For an input of n output slices (see above), the slicer produces the
34 struct Result { uint8_t* data; uint32_t size; };
35 typedef std::vector< Result > Results; //!< Set of result slices
36 typedef std::vector< uint32_t > ResultSizes; //!< Remaining slice sizes
38 Slicer( const CompressorInfo& compressor );
40 // returned pointers are valid until next compress(), delete of
41 // input data, or dtor of Slicer called
42 Results&& compress( const uint8_t* data, size_t size,
45 // input: first slice, output: remaining slice sizes
46 ResultSizes&& getRemainingSizes( const uint8_t* data, uint32_t size );
48 // input: first slice, output: total decompressed data size
49 size_t getDecompressedSize( const uint8_t* data, uint32_t size );
51 /** @overload convenience wrapper */
52 void decompress( const Results& input, uint8_t* data );
59 compress() allocates a compressor and compresses the input data. Output
60 is uncompressible if pression::getDataSize() exceeds input size minus
63 Uncompressibly output is returned as:
64 * one zero-copy slice if size <= sliceSize
66 * one header slice: 16 byte magic 'uncompressed', 8 byte input size,
68 * n zero-copy output slices of sliceSize, pointing to input data memory
70 Compressibly output is returned as:
71 * one header slice: 16 byte magic 'compressed', 16 byte compressor name hash,
72 8 byte input size, 4 byte nChunks, nChunks * 4 byte chunkSizes
73 * nSlices: complete, compressed chunks up to sliceSize
75 First implementation throws if header size exceeds sliceSize for
76 compressed output and if a chunk is bigger than a slice.
80 void Keyv::memcached::Plugin::insert( const std::string& key,
81 const void* ptr, const size_t size )
83 const auto data = _slicer.compress( ptr, size, LB_1MB );
84 const std::string& hash = servus::make_uint128( key ).getString();
86 for( const auto& slice : data )
89 memcached_set( _instance, hash.c_str(), hash.length(),
90 slice.data, slice.size, (time_t)0, (uint32_t)0 );
94 std::string Keyv::memcached::Plugin::operator [] ( const std::string& key )
96 const std::string& hash = servus::make_uint128( key ).getString();
97 pression::data::Slicer::Results slices( 1 );
98 slices[0].data = memcached_get( _instance, hash.c_str(), hash.length(),
101 const auto remaining = _slicer.getRemainingSizes( slice[0].data,
103 slices.append( takeValues( hash, remaining ));
105 std::string value( _slicer.getDecompressedSize( slice[0].data,
107 _slicer.decompress( slices, value.data(), value.length( ));
113 ### Issue 1: What is the maximum allowed slice size?
117 It is unlikely that a storage system uses larger slices. Memcached has
118 a recommended limit of one megabyte.