recovering Ceph from “Reduced data availability: 3 pgs inactive, 3 pgs incomplete”

When your pool stuck and you don’t know what to do.

Published in

OpsOps

3 min readOct 29, 2018

I have a gory story to tell. It’s less gory then a situation for some of possible readers, as it had happened in my private laboratory with no important data whatsoever.

Nevertheless I was able to recover every single bit of those non-important data, therefore, if reader of that text have THE PRODUCTION in the same state, stay calm, as MAY BE I can help you.

Close inspection

It was my benchmark pool with size=1. That means all my data was in a single copy. After some jerking with reboots (and may be this bug, but I’m not sure), I got this picture:

cluster:
    id:     bbc3c151-47bc-4fbb-a0-172793bd59e0
    health: HEALTH_WARN
    Reduced data availability: 3 pgs inactive, 3 pgs incomplete

At the same time my IO to this pool staled. Even rados ls stuck at the middle of output, never able to finish.

I evaluated PGs:

ceph pg ls incompletePG_STAT OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND BYTES LOG  DISK_LOG STATE      STATE_STAMP                VERSION      REPORTED UP  UP_PRIMARY ACTING ACTING_PRIMARY LAST_SCRUB   SCRUB_STAMP                LAST_DEEP_SCRUB DEEP_SCRUB_STAMP2.19          0                  0        0         0       0     0 1500     1500 incomplete 2018-10-29 15:40:22.301233 1033'4056498 1644:142 [2]          2    [2]              2 1033'4056498 2018-10-26 01:02:58.706233    1033'4056498 2018-10-20 21:04:09.8878542.50          0                  0        0         0       0     0    0        0 incomplete 2018-10-29 15:40:22.301294          0'0 1644:118 [2]          2    [2]              2 1033'4267614 2018-10-25 15:06:35.887580    1033'3946410 2018-10-19 13:02:17.2817202.57          0                  0        0         0       0     0    0        0 incomplete 2018-10-29 15:40:22.301368          0'0 1644:113 [2]          2    [2]              2 1033'4050786 2018-10-25 21:22:40.186352    1033'4050786 2018-10-22 05:25:13.534321

Bueh… What a mess in output. Sorry for this gibberish. Here is a terse version:

Each of affected PG is in incomplete state and resides on OSD.2 (see ACTING column), each has 0 objects and only PG 2.19 have 1500 DISK_LOG size.

Here should be a long and thrilling story of my unsuccessful attempts, but I’ve spent all my enthusiasm on a corporate blog entry, therefore, I keep my story short and successful.

OBJECTS COUNT MATTER (should be 0)

Before continue, important notice:

I was lucky (my random pile of useless bytes was lucky) that each of those PG have 0 object. 0 mean ZERO. None.

I can do anything I want with those objects and wouldn’t loose data.

If you have non-zero number of objects, I need to ask you to stop reading. You have a different issue and my way won’t do you anything good.

So, OBJECTS=0 for EACH incomplete PG. MUST.

Low-level OSD trickery

Next, I located OSD where those PG were. I want to remind you that I have pool with size=1, therefore each data chunk was stored with no redundancy, so I have to deal with a single OSD for each PG. If you have size>1, you may need to do this with more then one OSD for each PG.

I found OSD responsible for those PGs. It’s in the field “ACTIVE” . In my case it was just a ‘[2]’, which means OSD.2

I’ve stopped OSD.2 (systemctl stop ceph-osd@2), then I checked info for each pg with low-level utility ceph-objectstore-tool:

ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-2 --op info --pgid 2.19

ceph-objectstore-tool

I have two types of injured PGs: with logs and without.

I removed PG with no data:

ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-2 --op remove --pgid 2.19 --force

And I marked as ‘complete’ two completely empty PGs:

ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-2 --op mark-complete --pgid 2.50
ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-2 --op mark-complete --pgid 2.57

After that I’ve started ceph-osd service (systemctl start ceph-osd@2), and forced creation of the removed PG (2.19):

ceph osd force-create-pg 2.19

After that I got them all ‘active+clean’ in ceph pg ls, and all my useless data was available, and ceph -s was happy:

health: HEALTH_OK

Conclusion

If you have stuck PGs with zero object count (and you are sure it has 0 object count for real) you can remove them or ‘mark complete’ them. It’s an offline operation, but it allows your cluster continue to thrive. It’s fast operation, so you may get away with a downtime of ‘discovery_time’ plus 1–2 minutes.