Posts tagged ‘linux’

A deep dive in to /var/log/lastlog

A few days ago we had a very peculiar situation at work regarding the file size of /var/log/lastlog and I decided to find out why.

This was the initial output that made me very confused:


[root@dev ~]# du -sh /var/log/lastlog
52K /var/log/lastlog
[root@dev ~]# ls -alh /var/log/lastlog
-rw-r--r-- 1 root root 85G Jan 11 14:52 /var/log/lastlog

As you can see, the file size clearly differs depending on what command I use, I also ran ‘df’ and since it reported that my partition was not bigger then 30G’s in size, and not even half of it was used, I understood that it wasn’t really a problem, but just something I hadn’t came across earlier.

After the regular minutes on Google and IRC, I quickly understood that it was a sparse file, and for those of you that aren’t familiar with sparse files, this is Wikipedias explanation which I found very fitting.

“In computer science, a sparse file is a type of computer file that attempts to use file system space more efficiently when blocks allocated to the file are mostly empty. This is achieved by writing brief information (metadata) representing the empty blocks to disk instead of the actual “empty” space which makes up the block, using less disk space. The full block size is written to disk as the actual size only when the block contains “real” (non-empty) data.”

After understanding this, my heart rate dropped back to normal, but I still wanted to find out why it was showing me such a size as 85GB, which is millions times more then just 52K.

After consulting with Peter van Dijk, who tends to have the answer to everything, I managed to understand why, and here it is.

This is a snippet from the lastlog source code (lastlog.c) which I hope is pretty self explanatory.


/*
* Read the right structure.
*/
fseek(fp, pwd->pw_uid * sizeof(struct lastlog), 0);
fread(&ll, sizeof(struct lastlog), 1, fp);

This means that the program takes the uid (type ‘id’ to find out) of your user, which in my case (connected to Active Directory through LikeWise Open) was 311428236, and multiply that with 292 bytes which is the size of the lastlog structure, and from there adds another 292 bytes, and there’s your final file size.

In short:

311428236*292+292 = 90937045204
And output from ls without -h (human readable out) is … you guessed it.
-rw-r–r– 1 root root 90937045204 Jan 11 17:00 lastlog

I hope this shed some light on why you suddenly find a huge file on your system and you don’t know why.
After reading up on this I’ve managed to realize that specifically lastlog is always a sparse file, they even mention it in the man page.

“NOTE
The lastlog file is a database which contains info on the last login of each user. You should not rotate it. It is a sparse file, so its size on the disk
is usually much smaller than the one shown by “ls -l” (which can indicate a really big file if you have in passwd users with a high UID). You can display
its real size with “ls -s”.”

Good luck.

  • Facebook
  • Twitter
  • Digg
  • del.icio.us
  • LinkedIn
  • RSS
  • StumbleUpon
  • Google Bookmarks
  • Yahoo! Buzz
  • email
  • MySpace
  • PDF
  • Print
  • Reddit
  • Tumblr

How to manage and monitor your raid using mdadm

After an upgrade to Ubuntu 11.04 on my computer at work I noticed one of my SSD drives were dead in my RAID1 (mirror) and this is how I fixed the problem as well as adding some monitoring to being noticed next time around something breaks.

I managed to get the drive working merely by physically plug it out and in again, but my RAID were still obviously screwed so I had to sync the drives.

First, I checked the current status, and this can be done in two ways.

mdadm --detail /dev/md0
cat /proc/mdstat

And, as the output stated my second drive (/dev/sdb2) were not in sync.

First I removed it from the RAID altogether.

mdadm /dev/md0 --remove /dev/sdb2

Then I re-added it as such

mdadm /dev/md0 --re-add /dev/sdb2

You’ll see in syslog that the RAID is being rebuilt, but I’ll rather do this for some more information

watch cat /proc/mdstat

You’ll see this: [U_] if one of the disks is faulty, and [UU] if both are up.

Now to the next step.
I noticed this mainly because I rebooted the computer, and with a >100 day uptime this could’ve been broken for ages, and of course this is quite dangerous if the other disk were to fail.
So I used mdadm’s built-in monitoring to send me an e-mail as soon as something would go wrong.
I run postfix locally and this is how I start the process.

/sbin/mdadm --monitor --pid-file /var/run/mdadm/monitor.pid --daemonise --scan --syslog --mail=your@email.here

You can actually try this out by breaking your RAID on purpose using these commands, but don’t forget to rebuild it as explained above!

mdadm /dev/md0 --fail /dev/sdb2
mdadm /dev/md0 --remove /dev/sdb2

This will send you an e-mail with the output from /proc/mdstat as well as a short explanation of what mdadm think is wrong.

  • Facebook
  • Twitter
  • Digg
  • del.icio.us
  • LinkedIn
  • RSS
  • StumbleUpon
  • Google Bookmarks
  • Yahoo! Buzz
  • email
  • MySpace
  • PDF
  • Print
  • Reddit
  • Tumblr