v10.2.11 Jewel
This point releases brings a number of important bugfixes and has a fewimportant security fixes. This is expected to be the last Jewel release. Werecommend all Jewel 10.2.x users to upgrade.
Notable Changes
CVE 2018-1128: auth: cephx authorizer subject to replay attack (issue#24836, Sage Weil)
CVE 2018-1129: auth: cephx signature check is weak (issue#24837, Sage Weil)
CVE 2018-10861: mon: auth checks not correct for pool ops (issue#24838, Jason Dillaman)
The RBD C API’s rbd_discard method and the C++ API’s Image::discard methodnow enforce a maximum length of 2GB. This restriction prevents overflow ofthe result code.
New OSDs will now use rocksdb for omap data by default, rather thanleveldb. omap is used by RGW bucket indexes and CephFS directories,and when a single leveldb grows to 10s of GB with a high write ordelete workload, it can lead to high latency when leveldb’ssingle-threaded compaction cannot keep up. rocksdb supports multiplethreads for compaction, which avoids this problem.
The CephFS client now catches failures to clear dentries during startupand refuses to start as consistency and untrimmable cache issues maydevelop. The new option client_die_on_failed_dentry_invalidate (default:true) may be turned off to allow the client to proceed (dangerous!).
In 10.2.10 and earlier releases, keyring caps were not checked for validity,so the caps string could be anything. As of 10.2.11, caps strings arevalidated and providing a keyring with an invalid caps string to, e.g.,“ceph auth add” will result in an error.
Changelog
admin: bump sphinx to 1.6 (issue#21717, pr#18166, Kefu Chai, Alfredo Deza)
auth: ceph auth add does not sanity-check caps (issue#22525, pr#21367, Jing Li, Nathan Cutler, Kefu Chai, Sage Weil)
build/ops: rpm: bump epoch ahead of ceph-common in RHEL base (issue#20508, pr#21190, Ken Dreyer)
build/ops: upstart: radosgw-all does not start on boot if ceph-base is not installed (issue#18313, pr#16294, Ken Dreyer)
ceph_authtool: add mode option (issue#23513, pr#21197, Sébastien Han)
ceph-disk: factor out the retry logic into a decorator (issue#21728, pr#18169, Kefu Chai)
ceph-disk: fix –runtime omission when enabling ceph-osd@$ID.service units for device-backed OSDs (issue#21498, pr#17942, Carl Xiong)
ceph-disk flake8 test fails on very old, and very new, versions of flake8 (issue#22207, pr#19153, Nathan Cutler)
cephfs: ceph.in: pass RADOS inst to LibCephFS (issue#21406, issue#21967, pr#19907, Patrick Donnelly)
cephfs: client::mkdirs not handle well when two clients send mkdir request for a same dir (issue#20592, pr#20271, dongdong tao)
cephfs: client: prevent fallback to remount when dentry_invalidate_cb is true but root->dir is NULL (issue#23211, pr#21189, Zhi Zhang)
cephfs: fix tmap_upgrade crash (issue#23529, pr#21208, “Yan, Zheng”)
cephfs: fuse client: ::rmdir() uses a deleted memory structure of dentry leads … (issue#22536, pr#19993, YunfeiGuan)
cephfs-journal-tool: add “set pool_id” option (issue#22631, pr#20111, dongdong tao)
cephfs-journal-tool: move shutdown to the deconstructor of MDSUtility (issue#22734, pr#20333, dongdong tao)
cephfs: osdc: “FAILED assert(bh->last_write_tid > tid)” in powercycle-wip-yuri-master-1.19.18-distro-basic-smithi (issue#22741, pr#20312, “Yan, Zheng”)
cephfs: osdc/Journaler: make sure flush() writes enough data (issue#22824, pr#20435, “Yan, Zheng”)
cephfs: Processes stuck waiting for write with ceph-fuse (issue#22008, issue#22207, pr#19141, “Yan, Zheng”)
ceph-fuse: failure to remount in startup test does not handle client_die_on_failed_remount properly (issue#22269, pr#21162, Patrick Donnelly)
ceph.in: bypass codec when writing raw binary data (issue#23185, pr#20763, Oleh Prypin)
ceph-objectstore-tool command to trim the pg log (issue#23242, pr#20882, Josh Durgin, David Zafman)
ceph-objectstore-tool: “$OBJ get-omaphdr” and “$OBJ list-omap” scan all pgs instead of using specific pg (issue#21327, pr#20284, David Zafman)
ceph.restart + ceph_manager.wait_for_clean is racy (issue#15778, pr#20508, Warren Usui, Sage Weil)
ceph_volume_client: fix setting caps for IDs (issue#21501, pr#18084, Ramana Raja)
class rbd.Image discard—-OSError: [errno 2147483648] error discarding region (issue#16465, issue#21966, pr#20287, Nathan Cutler, Huan Zhang, Jason Dillaman)
cli/crushtools/build.t sometimes fails in jenkins’ make check run (issue#21758, pr#21158, Kefu Chai)
client reconnect gather race (issue#22263, pr#21163, “Yan, Zheng”)
client: release revoking Fc after invalidate cache (issue#22652, pr#19975, “Yan, Zheng”)
client: set client_try_dentry_invalidate to false by default (issue#21423, pr#17925, “Yan, Zheng”)
[cli] rename of non-existent image results in seg fault (issue#21248, pr#20280, Jason Dillaman)
CLI unit formatting tests are broken (issue#24733, pr#22913, Jason Dillaman)
common: compute SimpleLRU’s size with contents.size() instead of lru.… (issue#22613, pr#19978, Xuehan Xu)
common/config: set rocksdb_cache_size to OPT_U64 (issue#22104, pr#18850, Vikhyat Umrao, liuhongtong)
common: fix typo in rados bench write JSON output (issue#24199, pr#22407, Sandor Zeestraten)
config: lower default omap entries recovered at once (issue#21897, pr#19927, Josh Durgin)
core: Addition of online osd ‘omap’compaction command (issue#19592, pr#17101, liuchang0812, Sage Weil)
core: global/signal_handler.cc: fix typo (issue#21432, pr#17883, Kefu Chai)
core: librados: Double free in rados_getxattrs_next (issue#22042, pr#20381, Gu Zhongyan)
core: Objecter::C_ObjectOperation_sparse_read throws/catches exceptions on -ENOENT (issue#21844, pr#18743, Jason Dillaman)
Deleting a pool with active notify linger ops can result in seg fault (issue#23966, pr#22188, Kefu Chai, Jason Dillaman)
doc: clarify Path Restriction instructions (issue#16906, pr#19795, huanwen ren)
doc: clarify Path Restriction instructions (issue#16906, pr#19840, Drunkard Zhang)
doc: remove region from INSTALL CEPH OBJECT GATEWAY (issue#21610, pr#18303, Orit Wasserman)
Filestore rocksdb compaction readahead option not set by default (issue#21505, pr#20446, Mark Nelson)
follow-on: osd: be_select_auth_object() sanity check oi soid (issue#20471, pr#20622, David Zafman)
HashIndex: randomize split threshold by a configurable amount (issue#15835, pr#19906, Josh Durgin)
include/fs_types: fix unsigned integer overflow (issue#22494, pr#19611, runsisi)
install-deps.sh: point gcc to the one shipped by distro (issue#22220, pr#19461, Kefu Chai)
install-deps.sh: readlink /usr/bin/gcc not /usr/bin/x86_64-linux-gnu-gcc (issue#22220, pr#19521, Kefu Chai)
install-deps.sh: update g++ symlink also (issue#22220, pr#19656, Kefu Chai)
journal: Message too long error when appending journal (issue#23526, pr#21215, Mykola Golub)
[journal] tags are not being expired if no other clients are registered (issue#21960, pr#20282, Jason Dillaman)
legal: remove doc license ambiguity (issue#23336, pr#20999, Nathan Cutler)
librados: copy out data to users’ buffer for xio (issue#20616, pr#17594, Vu Pham)
librbd: cannot clone all image-metas if we have more than 64 key/value pairs (issue#21814, pr#21228, PCzhangPC)
librbd: cannot copy all image-metas if we have more than 64 key/value pairs (issue#21815, pr#21203, PCzhangPC)
librbd: create+truncate for whole-object layered discards (issue#23285, pr#21219, Jason Dillaman)
librbd: list_children should not attempt to refresh image (issue#21670, pr#21224, Jason Dillaman)
librbd: object map batch update might cause OSD suicide timeout (issue#22716, issue#21797, pr#21220, Song Shun, Jason Dillaman)
librbd: set deleted parent pointer to null (issue#22158, pr#19098, Jason Dillaman)
log: Fix AddressSanitizer: new-delete-type-mismatch (issue#23324, pr#21084, Brad Hubbard)
mds: FAILED assert(get_version() < pv) in CDir::mark_dirty (issue#21584, pr#21156, Yan, Zheng, “Yan, Zheng”)
mds: fix dump last_sent (issue#22562, pr#19961, dongdong tao)
mds: fix integer overflow (issue#21067, pr#17188, Henry Chang)
mds: fix scrub crash (issue#22730, pr#20335, dongdong tao)
mds: session reference leak (issue#22821, pr#21175, Nathan Cutler, “Yan, Zheng”)
mds: unbalanced auth_pin/auth_unpin in RecoveryQueue code (issue#22647, pr#20067, “Yan, Zheng”)
mds: underwater dentry check in CDir::_omap_fetched is racy (issue#23032, pr#21185, Yan, Zheng)
mon/LogMonitor: call no_reply() on ignored log message (issue#24180, pr#22431, Sage Weil)
mon/MDSMonitor: no_reply on MMDSLoadTargets (issue#23769, pr#22189, Sage Weil)
mon/OSDMonitor.cc: fix expected_num_objects interpret error (issue#22530, pr#22050, Yang Honggang)
mon/OSDMonitor: fix dividing by zero in OSDUtilizationDumper (issue#22662, pr#20344, Mingxin Liu)
ObjectStore/StoreTest.FiemapHoles/3 fails with kstore (issue#21716, pr#20143, Kefu Chai, Ning Yao)
osd: also check the exsistence of clone obc for “CEPH_SNAPDIR” requests (issue#17445, pr#17707, Xuehan Xu)
osdc/Objecter: prevent double-invocation of linger op callback (issue#23872, pr#21754, Jason Dillaman)
osd: objecter sends out of sync with pg epochs for proxied ops (issue#22123, pr#20518, Sage Weil)
osd ops (sent and?) arrive at osd out of order (issue#19133, issue#19139, pr#17893, Jianpeng Ma, Sage Weil)
osd: OSDMap cache assert on shutdown (issue#21737, pr#21184, Greg Farnum)
osd: osd_scrub_during_recovery only considers primary, not replicas (issue#18206, pr#17815, David Zafman)
osd/PrimaryLogPG: dump snap_trimq size (issue#22448, pr#21200, Piotr Dałek)
osd: recover_replicas: object added to missing set for backfill, but is not in recovering, error! (issue#18162, issue#14513, pr#18690, huangjun, Adam C. Emerson, David Zafman)
osd: replica read can trigger cache promotion (issue#20919, pr#21199, Sage Weil)
osd: update heartbeat peers when a new OSD is added (issue#18004, pr#20108, Pan Liu)
performance: Only scan for omap corruption once (issue#21328, pr#18951, David Zafman)
qa: failures from pjd fstest (issue#21383, pr#21152, “Yan, Zheng”)
qa: src/test/libcephfs/test.cc:376: Expected: (len) > (0), actual: -34 vs 0 (issue#22221, pr#21172, Patrick Donnelly)
qa: use xfs instead of btrfs w/ filestore (issue#20169, issue#20911, pr#18165, Sage Weil)
qa: use xfs instead of btrfs w/ filestore (issue#21481, pr#17847, Patrick Donnelly)
radosgw: fix awsv4 header line sort order (issue#21607, pr#18080, Marcus Watts)
rbd: clean up warnings when mirror commands used on non-setup pool (issue#21319, pr#21227, Jason Dillaman)
rbd: disk usage on empty pool no longer returns an error message (issue#22200, pr#19186, Jason Dillaman)
[rbd] image-meta list does not return all entries (issue#21179, pr#20281, Jason Dillaman)
rbd: is_qemu_running in qemu_rebuild_object_map.sh and qemu_dynamic_features.sh may return false positive (issue#23502, pr#21207, Mykola Golub)
rbd: [journal] allocating a new tag after acquiring the lock should use on-disk committed position (issue#22945, pr#21206, Jason Dillaman)
rbd: librbd: filter out potential race with image rename (issue#18435, pr#19855, Jason Dillaman)
rbd ls -l crashes with SIGABRT (issue#21558, pr#19801, Jason Dillaman)
rbd-mirror: cluster watcher should ensure it has latest OSD map (issue#22461, pr#19644, Jason Dillaman)
rbd-mirror: fix potential infinite loop when formatting status message (issue#22932, pr#20418, Mykola Golub)
rbd-mirror: ignore permission errors on rbd_mirroring object (issue#20571, pr#21225, Jason Dillaman)
rbd-mirror: strip environment/CLI overrides for remote cluster (issue#21894, pr#21223, Jason Dillaman)
[rbd-nbd] Fedora does not register resize events (issue#22131, pr#19115, Jason Dillaman)
rbd-nbd: fix ebusy when do map (issue#23528, pr#21232, Li Wang)
rbd: possible deadlock in various maintenance operations (issue#22120, pr#20285, Jason Dillaman)
rbd: rbd crashes during map (issue#21808, pr#18843, Peter Keresztes Schmidt)
rbd: rbd-mirror split brain test case can have a false-positive failure until teuthology (issue#22485, pr#21205, Jason Dillaman)
rbd: TestLibRBD.RenameViaLockOwner may still fail with -ENOENT (issue#23068, pr#20627, Mykola Golub)
repair_test fails due to race with osd start (issue#20705, pr#20146, Sage Weil)
rgw: 15912 15673 (Fix duplicate tag removal during GC, cls/refcount: store and use list of retired tags) (issue#20107, pr#16708, Jens Rosenboom)
rgw: abort in listing mapped nbd devices when running in a container (issue#22012, issue#22011, pr#20286, Li Wang, Pan Liu)
rgw: add ability to sync user stats from admin api (issue#21301, pr#20179, Nathan Johnson)
rgw: add cors header rule check in cors option request (issue#22002, pr#19057, yuliyang)
rgw: add radosgw-admin sync error trim to trim sync error log (issue#23287, pr#21210, fang yuxiang)
rgw: add xml output header in RGWCopyObj_ObjStore_S3 response msg (issue#22416, pr#19887, Enming Zhang)
rgw: automated trimming of datalog and mdlog (issue#18227, pr#20061, Casey Bodley)
rgw: bi list entry count incremented on error, distorting error code (issue#21205, pr#18207, Nathan Cutler)
rgw: boto3 v4 SignatureDoesNotMatch failure due to sorting of sse-kms headers (issue#21832, pr#18772, Nathan Cutler)
rgw: bucket resharding should not update bucket ACL or user stats (issue#22124, pr#20421, Orit Wasserman)
rgw: copying part without http header x-amz-copy-source-range will be mistaken for copying object (issue#22729, pr#21294, Malcolm Lee)
rgw: core dump, recursive lock of RGWKeystoneTokenCache (issue#23171, pr#20639, Mark Kogan, Adam Kupczyk)
rgw: data sync of versioned objects, note updating bi marker (issue#18885, pr#21213, Yehuda Sadeh)
rgw: dont log EBUSY errors in ‘sync error list’ (issue#22473, pr#19908, Casey Bodley)
rgw: ECANCELED in rgw_get_system_obj() leads to infinite loop (issue#17996, pr#20561, Yehuda Sadeh)
rgw: file deadlock on lru evicting (issue#22736, pr#20076, Matt Benjamin)
rgw: file write error (issue#21455, pr#18304, Yao Zongyou)
rgw: fix chained cache invalidation to prevent cache size growth (issue#22410, pr#19469, Mark Kogan)
rgw: fix doubled underscore with s3/swift server-side copy (issue#22529, pr#19747, Matt Benjamin)
rgw: fix GET website response error code (issue#22272, pr#19488, Dmitry Plyakin)
rgw: fix index update in dir_suggest_changes (issue#24280, pr#22677, Tianshan Qu)
rgw: fix marker encoding problem (issue#20463, pr#17731, Orit Wasserman, Marcus Watts)
rgw: fix swift anonymous access (issue#22259, pr#19194, Marcus Watts)
rgw: Fix swift object expiry not deleting objects (issue#22084, pr#18925, Pavan Rallabhandi)
rgw: fix the bug that part’s index can’t be removed after completing (issue#19604, pr#16763, Zhang Shaowen, Matt Benjamin)
rgw: fix the max-uploads parameter not work (issue#22825, pr#20479, Xin Liao)
rgw: inefficient buffer usage for PUTs (issue#23207, pr#21098, Marcus Watts)
rgw: libcurl & ssl fixes (issue#22951, issue#23203, issue#23162, pr#20749, Marcus Watts, Abhishek Lekshmanan, Jesse Williamson)
rgw: list bucket which enable versioning get wrong result when user marker (issue#21500, pr#20291, yuliyang)
rgw: log includes zero byte sometimes (issue#20037, pr#17151, Abhishek Lekshmanan)
rgw: make init env methods return an error (issue#23039, pr#20800, Abhishek Lekshmanan)
RGW: Multipart upload may double the quota (issue#21586, pr#18121, Sibei Gao, Matt Benjamin)
rgw: multisite: data sync status advances despite failure in RGWListBucketIndexesCR (issue#21735, pr#20269, Casey Bodley)
rgw: multisite: Get bucket location which is located in another zonegroup, will return 301 Moved Permanently (issue#21125, pr#18305, Shasha Lu, lvshuhua, Jiaying Ren)
rgw: null instance mtime incorrect when enable versioning (issue#21743, pr#20262, Shasha Lu)
rgw: radosgw-admin: add an option to reset user stats (issue#23335, issue#23322, pr#20877, Abhishek Lekshmanan)
rgw: release cls lock if taken in RGWCompleteMultipart (issue#21596, issue#22368, pr#18116, Casey Bodley, Matt Benjamin)
rgw: resharding needs to set back the bucket ACL after link (issue#22742, pr#20039, Orit Wasserman)
rgw: resolve Random 500 errors in Swift PutObject (22517) (issue#22517, issue#21560, pr#19769, Adam C. Emerson, Matt Benjamin)
rgw: rgw_file: recursive lane lock can occur in LRU drain (issue#20374, pr#17149, Matt Benjamin)
rgw: S3 POST policy should not require Content-Type (issue#20201, pr#19635, Matt Benjamin)
rgw: s3website error handler uses original object name (issue#23201, issue#20307, pr#21100, liuhong, Casey Bodley)
rgw: segfaults after running radosgw-admin data sync init (issue#22083, pr#19783, Casey Bodley, Abhishek Lekshmanan)
rgw: segmentation fault when starting radosgw after reverting .rgw.root (issue#21996, pr#20292, Orit Wasserman, Casey Bodley)
rgw: stale bucket index entry remains after object deletion (issue#22555, pr#20293, J. Eric Ivancich)
rgw: system user can’t delete bucket completely (issue#22248, pr#21212, Casey Bodley)
rgw: tcmalloc (issue#23469, pr#21073, Matt Benjamin)
rgw: upldate the max-buckets when the quota is uploaded (issue#22745, pr#20496, zhaokun)
rgw: user creation can overwrite existing user even if different uid is given (issue#21685, pr#20074, Casey Bodley)
RHEL 7.3 Selinux denials at OSD start (issue#19200, pr#18780, Boris Ranto)
scrub errors not cleared on replicas can cause inconsistent pg state when replica takes over primary (issue#23267, pr#21194, David Zafman)
snapset xattr corruption propagated from primary to other shards (issue#20186, issue#18409, issue#21907, pr#20331, David Zafman)
systemd: Add explicit Before=ceph.target (issue#21477, pr#17841, Tim Serong)
table of contents doesn’t render for luminous/jewel docs (issue#23780, pr#21503, Alfredo Deza)
test: Adjust for Jewel quirk caused of differences with master (issue#23006, pr#20463, David Zafman)
test/CMakeLists: disable test_pidfile.sh (issue#20975, pr#20557, Sage Weil)
test_health_warnings.sh can fail (issue#21121, pr#20289, Sage Weil)
test/librbd: fixed metadata tests under upgrade scenarios (issue#21911, pr#18548, Jason Dillaman)
test/librbd: utilize unique pool for cache tier testing (issue#11502, pr#20524, Jason Dillaman)
tests: rbd_mirror_helpers.sh request_resync_image function saves image id to wrong variable (issue#21663, pr#19804, Jason Dillaman)
tests: test_admin_socket.sh may fail on wait_for_clean (issue#23499, pr#21125, Mykola Golub)
tests: tests/librbd: updated test_notify to handle new release lock semantics (issue#21912, pr#18560, Jason Dillaman)
tests: unittest_pglog timeout (issue#23504, issue#18030, pr#21135, Nathan Cutler, Loic Dachary)
tools: ceph-objectstore-tool set-size should clear data-digest (issue#22112, pr#20070, David Zafman)
Ubuntu amd64 client can not discover the ubuntu arm64 ceph cluster (issue#19705, pr#18294, Kefu Chai)