* Thu Sep 12 2024 Andrea Manzini <andrea.manzini@suse.com>
- update to 9.6.1:
* Fix correctness of MultiGet across column families with user timestamp.
- update to 9.6.0:
- New Features
* Best efforts recovery supports recovering to incomplete Version with a
clean seqno cut that presents a valid point in time view from the user's
perspective, if versioning history doesn't include atomic flush.
* New option BlockBasedTableOptions::decouple_partitioned_filters should
improve efficiency in serving read queries because filter and index
partitions can consistently target the configured metadata_block_size.
This option is currently opt-in.
* Introduce a new mutable CF option paranoid_memory_checks. It enables
additional validation on data integrity during reads/scanning. Currently,
skip list based memtable will validate key ordering during look up and scans.
- Public API Changes
* Add ticker stats to count file read retries due to checksum mismatch
* Adds optional installation callback function for remote compaction
- Behavior Changes
* There may be less intra-L0 compaction triggered by total L0 size being too
small. We now use compensated file size (tombstones are assigned some value
size) when calculating L0 size and reduce the threshold for L0 size limit.
This is to avoid accumulating too much data/tombstones in L0.
- Bug Fixes
* Make DestroyDB supports slow deletion when it's configured in SstFileManager.
The slow deletion is subject to the configured rate_bytes_per_sec, but not
subject to the max_trash_db_ratio.
* Fixed a bug where we set unprep_seqs_ even when WriteImpl() fails. This was
caught by stress test write fault injection in WriteImpl(). This may have
incorrectly caused iteration creation failure for unvalidated writes or
returned wrong result for WriteUnpreparedTxn::GetUnpreparedSequenceNumbers().
* Fixed a bug where successful write right after error recovery for last failed
write finishes causes duplicate WAL entries
* Fixed a data race involving the background error status in unordered_write
mode.
* Fix a bug where file snapshot functions like backup, checkpoint may attempt
to copy a non-existing manifest file. #12882
* Fix a bug where per kv checksum corruption may be ignored in MultiGet().
* Fix a race condition in pessimistic transactions that could allow multiple
transactions with the same name to be registered simultaneously, resulting
in a crash or other unpredictable behavior.
* Wed Aug 28 2024 Andrea Manzini <andrea.manzini@suse.com>
- update to 9.5.2:
* Fix a race condition in pessimistic transactions that could allow
multiple transactions with the same name to be registered simultaneously,
resulting in a crash or other unpredictable behavior.
* Add ticker stats to count file read retries due to checksum mismatch
- update to 9.5.1:
* Make DestroyDB supports slow deletion when it's configured in
SstFileManager. The slow deletion is subject to the configured
rate_bytes_per_sec, but not subject to the max_trash_db_ratio.
- update to 9.5.0:
* Introduced new C API function rocksdb_writebatch_iterate_cf for column
family-aware iteration over the contents of a WriteBatch
* Add support to ingest SST files generated by a DB instead of SstFileWriter.
This can be enabled with experimental option
IngestExternalFileOptions::allow_db_generated_files.
* When calculating total log size for the log_size_for_flush argument
in CreateCheckpoint API, the size of the archived log will not be
included to avoid unnecessary flush
* Fix a major bug in which an iterator using prefix filtering and SeekForPrev
might miss data when the DB is using whole_key_filtering=false and
partition_filters=true.
* Fixed a bug where OnErrorRecoveryBegin() is not called before auto
recovery starts.
* Fixed a bug where event listener reads ErrorHandler's bg_error_ member
without holding db mutex(#12803).
* Fixed a bug in handling MANIFEST write error that caused the latest valid
MANIFEST file to get deleted, resulting in the DB being unopenable.
* Fixed a race between error recovery due to manifest sync or write failure
and external SST file ingestion. Both attempt to write a new manifest file,
which causes an assertion failure.
* Fix an issue where compactions were opening table files and reading table
properties while holding db mutex_.
* Reduce unnecessary filesystem queries and DB mutex acquires in creating
backups and checkpoints.
* Sat Jul 13 2024 Andreas Stieger <andreas.stieger@gmx.de>
- update to 9.4.0:
* Added a CompactForTieringCollectorFactory to auto trigger
compaction for tiering use case.
* Optimistic transactions and pessimistic transactions with the
WriteCommitted policy now support the GetEntityForUpdate API.
* Added a new "count" command to the ldb repl shell. By default,
it prints a count of keys in the database from start to end.
The options --from= and/or --to= can be specified to limit the
range.
* Deprecated names LogFile and VectorLogPtr in favor of new names
WalFile and VectorWalPtr.
* Introduce a new universal compaction option
CompactionOptionsUniversal::max_read_amp which allows user to
define the limit on the number of sorted runs separately from
the trigger for compaction (level0_file_num_compaction_trigger)
* Inactive WALs are immediately closed upon being fully sync-ed
rather than in a background thread.
* Bug Fixes
* Sat Jun 29 2024 Andreas Stieger <andreas.stieger@gmx.de>
- update to 9.3.1:
* Optimistic transactions and pessimistic transactions with the
WriteCommitted policy now support the GetEntity API.
* Added new Iterator property, "rocksdb.iterator.is-value-pinned",
for checking whether the Slice returned by Iterator::value()
can be used until the Iterator is destroyed.
* Optimistic transactions and WriteCommitted pessimistic
transactions now support the MultiGetEntity API.
* Optimistic transactions and pessimistic transactions with the
WriteCommitted policy now support the PutEntity API. Support
for read APIs and other write policies (WritePrepared,
WriteUnprepared) will be added later.
* Exposed block based metadata cache options via C API
* Exposed compaction pri via c api.
* Add a kAdmPolicyAllowAll option to TieredAdmissionPolicy that
admits all blocks evicted from the primary block cache into
the compressed secondary cache.
* CompactRange() with change_level=true on a CF with FIFO
compaction will return Status::NotSupported().
* External file ingestion with FIFO compaction will always ingest
to L0.
* bug fixes
* Thu May 23 2024 Andreas Stieger <andreas.stieger@gmx.de>
- update to 9.2.1:
* Added two options deadline and max_size_bytes for CacheDumper
to exit early
* API for wide-column point lookups with read-your-own-writes
consistency, and a batched versions of the same
* API to support programmatically read a SST file as a raw table
file
* API to wait for background purge to complete
* DeleteRange() will return NotSupported() if row_cache is
configured since they don't work together in some cases
* Deprecated CompactionOptions::compression
* Using OptionChangeMigration() to migrate from non-FIFO to FIFO
compaction with Options::compaction_options_fifo.
max_table_files_size > 0 can cause the whole DB to be dropped
right after migration if the migrated data is larger than
max_table_files_size
* Various behavior changes, and changes of defaults
* On distributed file systems that support file system level
checksum verification and reconstruction reads, RocksDB will
now retry a file read if the initial read fails RocksDB block
level or record level checksum verification. This applies to
MANIFEST file reads when the DB is opened, and to SST file
reads at all times.
* Bug fixes
* Mon Apr 22 2024 Andreas Stieger <andreas.stieger@gmx.de>
- update to 9.1.1:
* Adde an option GetMergeOperandsOptions::continue_cb to give
users the ability to end GetMergeOperands()'s lookup process
before all merge operands were found.
* Add sanity checks for ingesting external files that currently
checks if the user key comparator used to create the file is
compatible with the column family's user key comparator.
* Support ingesting external files for column family that has
user-defined timestamps in memtable only enabled
* On file systems that support storage level data checksum and
reconstruction, retry SST block reads for point lookups, scans,
and flush and compaction if there's a checksum mismatch on the
initial read.
* Some enhancements and fixes to experimental Temperature handling
features, including new default_write_temperature CF option and
opening an SstFileWriter with a temperature.
* WriteBatchWithIndex now supports wide-column point lookups via
the GetEntityFromBatch API.
* Implement experimental features:
API Iterator::GetProperty("rocksdb.iterator.write-time") to
allow users to get data's approximate write unix time and write
data with a specific write time via WriteBatch::TimedPut API.
- drop rocksdb-9.0.0-Fix-zstd-typo-in-cmake.patch, upstream
* Thu Apr 18 2024 Andreas Stieger <andreas.stieger@gmx.de>
- update to 9.0.1:
* Fix CMake Javadoc and source jar builds
* Fix Java SstFileMetaData to prevent throwing
java.lang.NoSuchMethodError
* Tue Mar 19 2024 Andreas Stieger <andreas.stieger@gmx.de>
- update to 9.0.0:
* Provide support for FSBuffer for point lookups. Also added
support for scans and compactions that don't go through prefetching.
* *Make SstFileWriter create SST files without persisting user
defined timestamps when the
Option.persist_user_defined_timestamps flag is set to false.
* Add support for user-defined timestamps in APIs
DeleteFilesInRanges and GetPropertiesOfTablesInRange.
* Mark wal_compression feature as production-ready. Currently
only compatible with ZSTD compression.
* Public API Changes, including incompatible changes
* format_version=6 is the new default setting in
BlockBasedTableOptions, for more robust data integrity
checking. DBs and SST files written with this setting cannot be
read by RocksDB versions before 8.6.0.
* Compactions can be scheduled in parallel in an additional
scenario: multiple files are marked for compaction within a
single column family
* For leveled compaction, RocksDB will try to do intra-L0
compaction if the total L0 size is small compared to Lbase.
* Users with atomic_flush=true are more likely to see the impact
of this change.
* Bug Fixes
- add rocksdb-9.0.0-Fix-zstd-typo-in-cmake.patch
* Wed Feb 28 2024 Andrea Manzini <andrea.manzini@suse.com>
- update to 8.11.3:
* Bug Fixes
+ Fix a bug where older data of an ingested key can be returned for read when universal compaction is used
+ Apply appropriate rate limiting and priorities in more places.
- update to 8.11.0:
* New Features
+ Add new statistics: rocksdb.sst.write.micros measures time of each write to SST file
* Public API Changes
+ Added another enumerator kVerify to enum class FileOperationType in listener.h.
Update your switch statements as needed.
+ Add CompressionOptions to the CompressedSecondaryCacheOptions structure to allow users to specify
library specific options when creating the compressed secondary cache.
+ Deprecated several options: level_compaction_dynamic_file_size, ignore_max_compaction_bytes_for_input,
+ check_flush_compaction_key_order, flush_verify_memtable_count, compaction_verify_record_count,
fail_if_options_file_error, and enforce_single_del_contracts
+ Exposed options ttl via c api.
* Behavior Changes
+ rocksdb.blobdb.blob.file.write.micros expands to also measure time writing the header and footer.
Therefore the COUNT may be higher and values may be smaller than before. For stacked BlobDB,
it no longer measures the time of explictly flushing blob file.
+ Files will be compacted to the next level if the data age exceeds periodic_compaction_seconds
except for the last level.
+ Reduced the compaction debt ratio trigger for scheduling parallel compactions
+ For leveled compaction with default compaction pri (kMinOverlappingRatio),
files marked for compaction will be prioritized over files not marked when picking a file
from a level for compaction.
* Bug Fixes
+ Fix bug in auto_readahead_size that combined with IndexType::kBinarySearchWithFirstKey + fails
or iterator lands at a wrong key
+ Fixed some cases in which DB file corruption was detected but ignored on creating a backup with BackupEngine.
+ Fix bugs where rocksdb.blobdb.blob.file.synced includes blob files failed to get synced
and rocksdb.blobdb.blob.file.bytes.written includes blob bytes failed to get written.
+ Fixed a possible memory leak or crash on a failure (such as I/O error)
in automatic atomic flush of multiple column families.
+ Fixed some cases of in-memory data corruption using mmap reads with BackupEngine, sst_dump, or ldb.
+ Fixed issues with experimental preclude_last_level_data_seconds option that could interfere
with expected data tiering.
+ Fixed the handling of the edge case when all existing blob files become unreferenced.
Such files are now correctly deleted.
* Wed Feb 21 2024 Andreas Stieger <andreas.stieger@gmx.de>
- update to 8.10.2:
* Fix bug in auto_readahead_size that combined with
IndexType::kBinarySearchWithFirstKey + fails or iterator lands
at a wrong key