Practical ways to avoid bitrot

Posted on January 9, 2017

steve

The king:

http://www.freenas.org

Others:

https://pthree.org/2014/04/01/protect-against-bit-rot-with-parchive/

https://github.com/mk-fg/fs-bitrot-scrubber/

How to recover that encrypted, ext4-formatted logical volume you allowed Fedora to create!

Posted on January 24, 2012

steve

Here’s the deal:

1)You have no time and you want to try the latest Fedora release because it looks pretty darn good.
2) You accept the default disk partitioning scheme which the kind people at Red Hat / Fedora project set up for you, because they only have your best interests at heart, right?
3) While using this cutting-edge release, something nasty happens like, ooh, perhaps a sound driver locks up the entire system and you have to hard-reset the machine (that is, switch it off by the power switch because nothing responds to input).

What next? If, like me (on one occassion) you try to boot up the machine and get no further than the recovery console, you’d feel a bit aggravated. But there is an alternative – do a disk check. You may have read my verbose coverage for How to do a disk check in Linux before. This takes it one step further – how to check your logical volume when it’s encrypted and formatted using the latest ext4 filesystem.

Instead of the method used before, this time I booted from a Live CD. You can find one to download at the Fedora project. Ensure that this CD matches the release of the version you are trying to recover. In this case, that’s Fedora 10.
Once you have booted the offending machine up with the Live CD, open up a terminal by pointing to Applications > System Tools > Terminal. Once in the terminal window, just type:

# su

…to become the root user. This is essential to using all the disk tools.
You may be tempted to check for volume groups first:

# vgscan

.. but this would return nothing.
What’s happening here is that the Volume Group, located on that partition, is itself encrypted. Once unlocked, you can then gain access to both of the Logical Volumes – the swap volume and the root (/) volume.
To unlock the encrypted Volume Group, first you need to establish which partition it resides on:

# fdisk /dev/sda
The number of cylinders for this disk is set to 12161. There is nothing wrong with that, but this is larger than 1024, and could in certain setups cause problems with: 1) software that runs at boot time (e.g., old versions of LILO) 2) booting and partitioning software from other OSs (e.g., DOS FDISK, OS/2 FDISK)

Hit p to print the partitions on your primary disk:

Command (m for help): p
Disk /dev/sda: 100.0 GB, 100030242816 bytes 255 heads, 63 sectors/track, 12161 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0xb07eb07e
Device Boot Start End Blocks Id System /dev/sda1 * 1 5377 43190721 7 HPFS/NTFS /dev/sda2 5378 5402 200812+ 83 Linux /dev/sda3 5403 12161 54291667+ 8e Linux LVM

So the partiton of type “Linux LVM” (Logical Volume Managed) is the baby we’re after.
To unlock the encrypted Volume Group, use the following:

# cryptsetup luksOpen /dev/sda3 mydisk

This sets up the encryption/decryption kernel subroutines to allow access to device /dev/sda3, mapped to a device node called “mydisk” in /dev/mapper/ . We’ll not actually need to use this device node, but it could be handy to know if you needed to perform further diagnostics.
You will be prompted to enter the encryption key which is stored in one of eight “slots” on the disk. If the key you enter matches a key in any slot, your disk will become unlocked. Assuming that it is, you can then scan once again for Volume Groups:

# vgscan Reading all physical volumes. This may take a while... Found volume group "VolGroup00" using metadata type lvm2

Now we’re getting somewhere. Let’s activate the VG and display the LVs (Logical Volumes) it contains:

# vgchange -a y 2 logical volume(s) in volume group "VolGroup00" now active # lvdisplay --- Logical volume --- LV Name /dev/VolGroup00/LogVol00 VG Name VolGroup00 LV UUID RE7t2h-nIy9-dWZ9-xt26-Fgq4-gFd8-34E3f2 LV Write Access read/write LV Status available # open 0 LV Size 47.81 GB Current LE 1530 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 256 Block device 253:3
--- Logical volume --- LV Name /dev/VolGroup00/LogVol01 VG Name VolGroup00 LV UUID B7XJzD-9sS0-3iHx-AWBE-W9qN-TvRb-vCdYZF LV Write Access read/write LV Status available # open 0 LV Size 3.91 GB Current LE 125 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 256 Block device 253:4

We can deduce from the sizes of these two volumes that the first of the two is the root (/) volume, and the second is the swap volume.
As the purpose is to FIX the filesystem on it, which may have become corrupt through the hard-reset performed earlier, we do not want to mount this volume. Instead, as we now have a device node for this activated volume, at /dev/VolGroup00/LogVol00, we can simply perform a disk check straight on it.
To check which extn file system checking tools are on the system, you can tab-complete at the command line:

# fsck. (hit tab)
fsck.cramfs fsck.ext3 fsck.ext4dev fsck.vfat fsck.ext2 fsck.ext4 fsck.msdos

As this was formatted an ext4 volume, that’s what we use:

# fsck.ext4 /dev/VolGroup00/LogVol00 esfsck 1.41.3 (12-Oct-2008) /dev/VolGroup00/LogVol00: recovering journal Clearing orphaned inode 730 (uid=0, gid=500, mode=0100600, size 2263160) Clearing orphaned inode 187182 (uid=500, gid=500, mode=0100600, size 4096) ... and so on until ... /dev/VolGroup00/LogVol00: clean, 190926/3137536 files, 2016683/12533760 blocks

Now there are two more steps to perform: de-activate the volume group, and lock the encryption of the drive.

# vgchange -a n 0 logical volume(s) in volume group "VolGroup00" now active # cryptsetup luksClose mydisk

The second command returns nothing, which means it did not error (the disk is now encrypted and not writable-to without unlocking again).
I hope that helps someone with a sense for adventure but not enough time on their hands for when things go somewhat awry!

How to do a disk check in Linux

Posted on January 23, 2012

steve

We’re all used to doing a disk check in Windows XP. It’s easy. Just double-click on “My Computer”, then select the drive you want to run the check on. Right-click, Properties, Tools tab, then select “Check Now…” in the Error-checking section. In almost every instance you’ll be told that the check will be done upon the next reboot. Easy.

So how does one go about it on Linux? Well… as you may have guessed, it’s not quite so straightforward. Linux, by default, does actually have an intelligent disk-checking system already in place. By all accounts, you generally needn’t worry. But if you have a reason to believe your disk may be slowly dying, and nothing is reporting in the SMART status of your drive, perhaps it’s worth checking the file system instead.

That’s where File System Check comes in (duh!). Like all Linux tools, it’s painfully abbreviated to simply “fsck”. Terse, to say the least. Now the warning:

DO NOT. I REPEAT, DO NOT EVER EVER EVER RUN THIS COMMAND WHILE YOUR DRIVE IS MOUNTED (I.E. IN USE). I TAKE NO RESPONSIBILITY FOR ANY LOSS OF DATA THAT YOU MAY CAUSE BY FOLLOWING THESE INSTRUCTIONS.

To unmount your root (/) volume, follow these easy steps:

Boot from a Live CD. Your root volume will not be mounted by default.
Open a terminal and type:# dmesg | grep sda If you see output relating to your “SCSI” device, then this will identify that your hard disk, in all likelihood, contains your root partition. For example, amongst other output, I see this:
sd 2:0:0:0: [sda] Assuming drive cache: write through sda: sda1 sda2 sd 2:0:0:0: [sda] Attached SCSI disk
In the example above, we see that SCSI disk 2 (2:0:0:0:) the Linux kernel registers it as the first logical drive (“sda”) in the system. We can also see it has only 2 partitions, sda1 and sda2. If this is the only physical drive in the machine, we should strongly suspect that it uses one partition as /boot (formatted with ext4) and the other as a Logical Volume containing both root (/) and swap. Furthermore, it’s foregone conculsion that the smallest partition will be /boot and the larger one will contain our swap and / partitions, so let’s proceed with accessing them.
So, how do we access a “Logical Volume” within an equally mystical “Volume Group”? Luckily, Linux LVM comes with a plethora of useful tools to make the job easy.
# /sbin/vgscan Reading all physical volumes. This may take a while... Found volume group "VolGroup00" using metadata type lvm2 Great. We have identified the volume group. But before we can identify the logical volumes it contains, we need access it.
# /sbin/vgchange -a y 2 logical volume(s) in volume group "VolGroup00" now active
Here, the -a flag indicates that we want to change the “active” status of the volume group, and the y means “yes”.
# /sbin/lvdisplay --- Logical volume --- LV Name /dev/VolGroup00/LogVol00 VG Name VolGroup00 LV UUID DG2WxJ-sKa5-20mg-NtjW-CsPW-t99V-Egqlja LV Write Access read/write LV Status available # open 0 LV Size 7.25 GB Current LE 232 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 256 Block device 253:2
--- Logical volume --- LV Name /dev/VolGroup00/LogVol01 VG Name VolGroup00 LV UUID HqKozT-16PQ-HUaT-Yyc7-lMCO-007m-Xcc2c8 LV Write Access read/write LV Status available # open 1 LV Size 512.00 MB Current LE 16 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 256 Block device 253:3
We can now see two partitions contained within the volume group. The first partition, although small by today’s standards, looks a lot larger than the second. We can also see that each logical volume has a device node (/dev/VolGroup00/LogVol01, for example).
As we want to perform the disk check without the parition being mounted, we do not issue any mount command here. However, if you wanted to double-check that this is the partition to check, mount it and have a quick look around. The following step is only offered to help in this case – skip this if you wish to perform a disk check.

# mkdir /tmp/lv0

For me, the first logical volume (the 7.5GB one) would be the one to test.
# mount -t ext4 /dev/VolGroup00/LogVol00 /tmp/lv0 # cd /tmp/lv0 # ls bin boot dev etc home lib lib64 lost+found media mnt opt proc root sbin selinux srv sys tmp usr var
Ok, that looks like the root partition, so let’s get out of it and unmount it before running the file system check on it.
# cd / # umount /tmp/lv0
An alternative to the above steps, if you have already booted into your main system, is to investigate /etc/fstab to see which is your / volume. All you do is open a terminal and issue: # cat /etc/fstab On my CentOS 5 system, I see this:
/dev/VolGroup00/LogVol00 / ext4 defaults 1 1 LABEL=/boot1 /boot ext4 defaults 1 2 tmpfs /dev/shm tmpfs defaults 0 0 devpts /dev/pts devpts gid=5,mode=620 0 0 sysfs /sys sysfs defaults 0 0 proc /proc proc defaults 0 0 LABEL=SWAP-sdb1 swap swap defaults 0 0

So, /dev/VolGroup00/LogVol00 is my root volume.

So, now that that’s out of the way, what next? Well, assuming you now know which is your root partition, the most sensible thing to do would be to boot from a Live CD of some distribution (Ubuntu, Fedora, etc) if you haven’t done so, and then perform the disk check from that.

Once in the LiveCD desktop, we’ll need to fire up a Terminal window.
If you know your filesystem type, e.g. if it’s Ext4, which is the default on the most common distributions, you can run a modified version of the fsck command specifically for that file system. Here’s what I run for a thorough disk check:

# fsck.ext4 -c -D -f -P -v /dev/VolGroup00/LogVol00
Alternatively, if your partition structure is slightly older and only contains physical paritions (not Logical Volumes), it may just be a case of finding the partition directly – by checking /etc/fstab on the system when running. In that case, your command may look more like this (when / is unmounted!!):
# fsck.ext4 -c -D -f -P -v /dev/sda2

Here’s what the flags do:
-c – forces a bad block scan. Although bad blocks are remapped dynamically by the file system, if the file system or its journal are corrupt, this may not work correctly.
-D – performs a directory check and optimisation. Doesn’t hurt, and can speed up directory listings of a large number of files.
-f – forces the check itself to actually run. As mentioned previously, the file system maintains itself quite well, and if you don’t force the check, fsck may look at the last check interval and decide a check is not required.
-P – perform all file system fixes automatically. This is usually a safe flag, but if your file system is potentially very corrupt, this may not be advisable. In this situation, contact an expert – or restore your back-up… ;-)
-v – verbose output. See what’s going on.
/dev/VolGroup00/LogVol00 or /dev/sda2 – this is the partition I want to perform the disk check on.

This little guide doesn’t explain how to perform a check on an encrypted logical volume… That one’s coming. :-)

Updated from post originally put here: http://onecool1.wordpress.com/2008/09/19/how-to-do-a-disk-check-in-linux/

dowe.uk

going boldly where some have gone before

Tag: data integrity

Practical ways to avoid bitrot

How to recover that encrypted, ext4-formatted logical volume you allowed Fedora to create!

How to do a disk check in Linux