recovering Ceph from “Reduced data availability: 3 pgs inactive, 3 pgs incomplete”

When your pool stuck and you don’t know what to do.

George Shuklin
OpsOps
3 min readOct 29, 2018

--

I have a gory story to tell. It’s less gory then a situation for some of possible readers, as it had happened in my private laboratory with no important data whatsoever.

Nevertheless I was able to recover every single bit of those non-important data, therefore, if reader of that text have THE PRODUCTION in the same state, stay calm, as MAY BE I can help you.

Close inspection

It was my benchmark pool with size=1. That means all my data was in a single copy. After some jerking with reboots (and may be this bug, but I’m not sure), I got this picture:

At the same time my IO to this pool staled. Even rados ls stuck at the middle of output, never able to finish.

I evaluated PGs:

Bueh… What a mess in output. Sorry for this gibberish. Here is a terse version:

Each of affected PG is in incomplete state and resides on OSD.2 (see ACTING column), each has 0 objects and only PG 2.19 have 1500 DISK_LOG size.

Here should be a long and thrilling story of my unsuccessful attempts, but I’ve spent all my enthusiasm on a corporate blog entry, therefore, I keep my story short and successful.

OBJECTS COUNT MATTER (should be 0)

Before continue, important notice:

I was lucky (my random pile of useless bytes was lucky) that each of those PG have 0 object. 0 mean ZERO. None.

I can do anything I want with those objects and wouldn’t loose data.

If you have non-zero number of objects, I need to ask you to stop reading. You have a different issue and my way won’t do you anything good.

So, OBJECTS=0 for EACH incomplete PG. MUST.

Low-level OSD trickery

Next, I located OSD where those PG were. I want to remind you that I have pool with size=1, therefore each data chunk was stored with no redundancy, so I have to deal with a single OSD for each PG. If you have size>1, you may need to do this with more then one OSD for each PG.

I found OSD responsible for those PGs. It’s in the field “ACTIVE” . In my case it was just a ‘[2]’, which means OSD.2

I’ve stopped OSD.2 (systemctl stop ceph-osd@2), then I checked info for each pg with low-level utility ceph-objectstore-tool:

ceph-objectstore-tool

I have two types of injured PGs: with logs and without.

I removed PG with no data:

And I marked as ‘complete’ two completely empty PGs:

After that I’ve started ceph-osd service (systemctl start ceph-osd@2), and forced creation of the removed PG (2.19):

After that I got them all ‘active+clean’ in ceph pg ls, and all my useless data was available, and ceph -s was happy:

Conclusion

If you have stuck PGs with zero object count (and you are sure it has 0 object count for real) you can remove them or ‘mark complete’ them. It’s an offline operation, but it allows your cluster continue to thrive. It’s fast operation, so you may get away with a downtime of ‘discovery_time’ plus 1–2 minutes.

--

--

George Shuklin
OpsOps

I work at Servers.com, most of my stories are about Ansible, Ceph, Python, Openstack and Linux. My hobby is Rust.