NFS mount hanging? Not working from client to server, or server to server?

Posted on December 9, 2009

I recently came across this slightly bizarre issue. I was trying to mount a NFS share from one server to another server, using very loose permissions (I was basically sharing a DVD to a machine which had no DVD-ROM drive).

So, what was happening? Well, basically nothing. On the NFS server (the machine with the DVD exported) I ran tcp dump to see what traffic was being received (the server was IP 192.168.10.200):

#tcpdump -nn | grep 192.168.10.1

No output was displayed when I was trying to mount the share on the client. None at all. Well, almost none. The one bit of output that got me wondering was a broadcast packet which was received from the client.

10:14:45.651572 IP 192.168.10.1 > 192.168.10.255 arp who has 192.168.10.201 tell 192.168.10.1

The IP address 192.168.10.201 was a typo made by me the day before. I’d meant to type in .200 in my mount string. My incorrect mount command thus read:

# mount -t nfs 192.168.10.201:/mnt/share /mnt/dvd

It seemed strange that an incorrect mount command that I’d typed in yesterday (and then hit CTRL-C to)might still be working in the background.

Back to the client, I realised that perhaps the mount command worked in a queue/serial-like way. Therefore, each mount command would have to complete – either successfully or not, so long as it finally returned – before the next one was attempted. Checking out this theory, I investigated local processes:

# ps ax | grep mount

Sure enough, there were lots of mount entries pointing to the wrong IP address. These were all my attempts to mount a non-existing server’s share to a local directory. Dumb mistake, eh. Still, CTRL-C didn’t cancel the mount request, which continued to run in the background.

The easiest solution was to reboot the server, but in situations where that’s not practical, killing the rogue processes should suffice.

random *nix problems

Posted on October 14, 2009

Steve

Currently having a couple of issues with laptop and server. Hmm.. sratching head, thinking cap on, etc..

Problem 1 – laptop swap
On my laptop, I recently resized my root partition (lv) and removed/recreated my swap partition (also a logical volume). On my new swap, I used mkswap, added an entry in fstab and turned it on. All rudimentary stuff.

But when the system came to using it, it hung. No response in X whatsoever, although there seemed to be disk polling going on, suggesting the kernel was still operational. I couldn’t flip to another console or SSH in to find out, though.

I created a new partition directly on the disk, not using LVM, and made that a swap too. Same set up procedure as before, then activated it. This time, when the system needed to swap, it did – as you would expect it to. Bizarre. I can’t think why this might be happening, apart from something going wrong through the device mapper, maybe.

Problem 2 – server dump
The second problem I’m having is backing up the server using dump. In short, when I dumped out a level 0 backup, not all my files were copied. Strangely, also, directory sizes on the tape, and when restored, seemed padded/boundary-aligned – e.g. 4kb, 8kb or 16kb. I’m trying to solve this one too, and am using tar in the meantime (which, if testing proves positive, may stick with).

Diagnose and fix ‘SELinux is preventing mysqld (mysqld_t)’

Posted on October 13, 2009

Steve

The full title of this blog should really be ‘SELinux is preventing mysqld (mysqld_t) “search” to ./tmp (public_content_rw_t)’ as that is the problem I’ve been having with CentOS recently (and hence my searches on the web for a solution).

The cause of the problem

I use SugarCRM for customer and project management data – and very good it is too! (Gratuitous plug – I can help your company install and use this fine software :-) ). Except that recently, when listing my Accounts within Sugar, I would not see all of the account context. Only the account data itself would be displayed and none of the subpanels/links.

The query to retrieve more data was failing, with this error message displayed in the browser window:
mysqld: Can't create/write to file '/tmp/#08y2jw' (Errcode: 13)
In my system log (/var/log/messages), I also got multiple SELinux errors like this:
Oct 13 09:07:50 server setroubleshoot: SELinux is preventing mysqld (mysqld_t) "read" to ./tmp (public_content_rw_t). For complete SELinux messages. run sealert -l 1762c478-f3a2-4eeb-be09-bd3dc037d945
Clearly, the reason for “Errcode: 13″ was due to SELinux.

Incidentally. if you have seen a similar error on your web site, but with (Errcode: 28) instead, this is likely due to shortage of disk space. A great way of determining operating system errors like this, is to use ‘PError’, thus:
# perror 28 OS error code 28: No space left on device
# perror 13 OS error code 13: Permission denied

So there we are – two distinct and different issues.

With SELinux, resolving the permission issue can be difficult. By issuing # sealert -l 1762c478-f3a2-4eeb-be09-bd3dc037d945, as suggested above, I got the following output (trimmed and highlighted for clarity):

Summary:
SELinux is preventing mysqld (mysqld_t) “search” to ./tmp (public_content_rw_t).
Allowing Access:
Sometimes labeling problems can cause SELinux denials. You could try to restore
the default system file context for ./tmp,
restorecon -v ‘./tmp’
Additional Information:
Source Context root:system_r:mysqld_t
Target Context system_u:object_r:public_content_rw_t

First things first: issuing # restorecon -v './tmp' didn’t fix it for me. I was also surprised to see that the path to /tmp was relative to the current working directory, so I tried a slightly modified # restorecon -v '/tmp', but to no avail. After restarting mysqld, the problem persisted: MySQL was simply being refused access to /tmp. Somewhere, a policy is disallowing this.

It’s a mistake to assume the the source context and target context should be the same; they don’t have to be, as it’s entirely policy-driven. I made bold those aspects (the file Type) above to highlight this incorrect assumption (that I previously held).

Find and fix a policy?

Although finding the troublesome policy and analysing it is a Good Thing, it’s also time-consuming and requires significant knowledge of SELinux, chiefly to avoid creating security holes. A better way, I found, was simply to relocate where mysqld tries to store temporary data.

Thanks to Surachart Opun’s blog, I learned that you can specify a new location for temporary files. In /etc/my.cnf, add or edit the following:
[mysqld] tmpdir=/tmp # # e.g. tmpdir=/var/lib/mysql/tmp

Now do the legwork to set up the directory properly:

First, create directory with appropriate permissions
# cd /var/lib/mysql # mkdir tmp # chown mysql:mysql tmp # chmod 1750 tmp

Now set the SELinux context up:
# chcon --reference /var/lib/mysql tmp

and make the SELinuiux context permanent:
# semanage fcontext -a -t mysql_db_t "/var/lib/mysql/tmp(/.*)?"

Finally, restart mysql:
# service mysqld restart

Closing thoughts: optimisation

The methods above fixed the particular problem I was having. They didn’t, however, actually pinpoint the cause. This is one of the good things about Linux and SELinux in particular: you are forced to rethink what the system is doing and work out a solution that sits within the predefined security context – or learn how to write SELinux policies. Personally, I prefer the former ;-)

There is an additional benefit to the solution above – namely, optimisation. Because we have specified the security context with semanage, we are free to mount an external file system and use that instead for MySQL’s temporary files. In other words, we can maintain the security but increase the performance. One such filesystem could be tmpfs. tmpfs is actually a RAM Disk, uses a fixed amount of RAM to provide file storage. It is much quicker than an on-disk filesystem and thus perfectly optimised for storing temporary, caching data. There are many resources about tmpfs on the web. A good introduction to tmpfs can be at Planet Admon.

Open Source Scores A Major Breakthrough in the U.K.!

Posted on February 27, 2009

Steve

No doubt open-source proponents will rejoice over this news: The British government has decided to increase its use of open-source software in the public services field. It will be adopted over Windows whenever it delivers the best value for the money. Schools, govenment offices and public agencies will all give open source a new look.

SMART ain’t so smart, it seems

Posted on February 25, 2009

Steve

It’s worry-time on the server:

# tail -20 /var/log/messages
Feb 25 10:09:32 myserver smartd[2785]: Device: /dev/sdc, 3 Offline uncorrectable sectors
Feb 25 10:39:32 myserver smartd[2785]: Device: /dev/sdc, 9 Currently unreadable (pending) sectors
Feb 25 10:39:32 myserver smartd[2785]: Device: /dev/sdc, 3 Offline uncorrectable sectors
Feb 25 11:09:32 myserver smartd[2785]: Device: /dev/sdc, 9 Currently unreadable (pending) sectors
Feb 25 11:09:32 myserver smartd[2785]: Device: /dev/sdc, 3 Offline uncorrectable sectors
Feb 25 11:39:32 myserver smartd[2785]: Device: /dev/sdc, 9 Currently unreadable (pending) sectors
Feb 25 11:39:32 myserver smartd[2785]: Device: /dev/sdc, 3 Offline uncorrectable sectors
Feb 25 12:09:32 myserver smartd[2785]: Device: /dev/sdc, 9 Currently unreadable (pending) sectors
Feb 25 12:09:32 myserver smartd[2785]: Device: /dev/sdc, 3 Offline uncorrectable sectors
Feb 25 12:39:31 myserver smartd[2785]: Device: /dev/sdc, 9 Currently unreadable (pending) sectors
Feb 25 12:39:31 myserver smartd[2785]: Device: /dev/sdc, 3 Offline uncorrectable sectors
Feb 25 13:09:32 myserver smartd[2785]: Device: /dev/sdc, 9 Currently unreadable (pending) sectors
Feb 25 13:09:32 myserver smartd[2785]: Device: /dev/sdc, 3 Offline uncorrectable sectors
Feb 25 13:39:32 myserver smartd[2785]: Device: /dev/sdc, 9 Currently unreadable (pending) sectors
Feb 25 13:39:32 myserver smartd[2785]: Device: /dev/sdc, 3 Offline uncorrectable sectors
Feb 25 14:09:32 myserver smartd[2785]: Device: /dev/sdc, 9 Currently unreadable (pending) sectors
Feb 25 14:09:32 myserver smartd[2785]: Device: /dev/sdc, 3 Offline uncorrectable sectors
Feb 25 14:39:32 myserver smartd[2785]: Device: /dev/sdc, 9 Currently unreadable (pending) sectors
Feb 25 14:39:32 myserver smartd[2785]: Device: /dev/sdc, 3 Offline uncorrectable sectors

.. and so it goes on. So, I’ll check it out by performing a SMART self-test on the drive:

# smartctl -a -d ata /dev/sdc
smartctl version 5.36 [x86_64-redhat-linux-gnu] Copyright (C) 2002-6 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Device Model: Hitachi HDP725040GLA360
Serial Number: GEB430RE15UEVF
Firmware Version: GMDOA52A
User Capacity: 400,088,457,216 bytes
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: 8
ATA Standard is: Not recognized. Minor revision code: 0x29
Local Time is: Wed Feb 25 14:55:30 2009 GMT
SMART support is: Available – device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (7840) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 130) minutes.

[snip]

I’m not sure what to make of a disk that reports it’s broken to the kernel but reports its “PASSED” to a userspace tool.

One thing’s for certain – it’s being replaced!

Journeys with Fedora 9

Posted on July 16, 2008

Steve

I’ve decided to catalogue my experience with Fedora 9. The reasons for this are:

It is a Linux distro aimed at being totally “free” (as in speech, not as in beer).
Fedora is the distribution I am most familiar with.
I believe Red Hat is actually a pretty cool company and they are serious about the user community.
Fedora always aims to be cutting-edge. I like that.

So, what first?

I plan to record my experience of installing Fedora 9 on my blog so that people who are considering switching to Linux, or switching from another distribution to Fedora, can decide what the benefits might be. It’s also going to serve as a reference for myself, so I can see why it’s such a good/bad idea to do it again!

Finally, I have gained so much by simply being interested in Open Source software, that I felt it was about time to give something back. As an English graduate, documentation is probably the best thing I can start with. I hope it’s of help to someone!

dowe.uk

baldly going where >=0 blogs have gone before

Tag: GNU+Linux

NFS mount hanging? Not working from client to server, or server to server?

random *nix problems

Diagnose and fix ‘SELinux is preventing mysqld (mysqld_t)’

The cause of the problem

Find and fix a policy?

Closing thoughts: optimisation

Open Source Scores A Major Breakthrough in the U.K.!

SMART ain’t so smart, it seems

Journeys with Fedora 9