How to interpret smartctl messages like ‘Error: UNC at LBA’?

When running smartctl on your hard drive, you often get a plethora of information that can be hard to interpret for unexperienced users. This post attempts to provide aid in interpreting what the technical reasons behind the error messages are. If you’re looking for advice on whether to replace your hard drive, the only guidance I can give you is it might fail any time, so better backup your data, but it might also run for many years to come.. Furthermore, this article does not describe basic SMART WHEN_FAILED checking but rather interpretation of more subtle signs of possibly impending HDDfailures.

One example that is particularly hard to interpret is the device error log storing the last few errors, forexample

Error 8910 occurred at disk power-on lifetime: 7257 hours (302 days + 9 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 41 1a 00 33 96 61 Error: UNC at LBA = 0x01963300 = 26620672 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 60 08 18 00 33 96 40 00 03:09:52.125 READ FPDMA QUEUED 60 88 10 50 06 11 40 00 03:09:52.125 READ FPDMA QUEUED 60 08 08 60 ac 5e 40 00 03:09:52.113 READ FPDMA QUEUED 60 08 00 48 cf 6d 40 00 03:09:52.099 READ FPDMA QUEUED 60 90 f0 b0 ef e5 40 00 03:09:52.065 READ FPDMA QUEUED

Obviously, the first line shows when this error occured. The other lines, however, are not as obvious. Let’s examine the nextsection:

After command completion occurred, registers were:ER ST SC SN CL CH DH-- -- -- -- -- -- --40 41 1a 00 33 96 61 Error: UNC at LBA = 0x01963300 = 26620672

While this section also shows the content of some registers while the error occured, the interesting part of it is the error description Error: UNC at LBA = 0x01963300 =26620672.

A LBA is a logical block address, i.e. some logical address on the hard drive. It is shown in both hexadecimal form 0x01963300 and in decimal form 26620672. In order to convert it to a byte address, you need to multiply it by the value listed at the head of the smartctloutput:

Sector Size: 512 bytes logical/physical

In almost any case, this value is 512 bytes, so in this example the byte offset would be 26620672 * 512 = 13629784064 = 12.69 GiB. In some cases it might be helpful to look up this address in a tool like GParted to see in which partition the error occured in. Also see this smartmontools HOWTO describing this process indetail.

UNCerrors

The error message now tells us than an error called UNC occured at this LBA. UNC is shorthand for UNCorrectable, which means the data which has been read from the hard drive at this LBA was damaged and could not be corrected.

Hard drives not only store your data by itself, but automatically compute a so-called error-correction code (ECC). While there are many subtypes of those mathematical codes, they have one aspect in common: Given a set of bytes (e.g. the ones stored on the hard drive) which might be slightly damaged (i.e. some 0-bits are now-1 bits or vice versa) and and the matching ECC code (constituting of a few extra bytes) a suitable decoder can recover a limited number of bit errors. In most cases, ECC codes can also detect errors – for example, one specific ECC code might be able to correct one bit flip in two bytes, but it can detect up to three bitflips in twobytes.

If there are more bitflips than the ECC can recover (but not more than it can detect), this results in an unrecoverable error – the UNC. If there are more bitflips than the ECC can detect, anything might happen: Usually, the data that is computed from the ECC will be damaged, or no error might be detected atall.

Note that this explanation is highly simplified. For example, ECC codes are not stored as bytes separate from the data, but instead a mathematical function is computed on the data, resulting in a set of bytes that is larger that the original dataset – containing both the data itself plus the error-recovery extra data. In other words, the ECC data plus the data itself are mixedtogether.

This has multiple consequences for the interpretation. Firstly, this means that physically the data could be read, yet it does not seem to be correct. Thismeans

Other errormessages

While UNC errors occur reasonably often, there are other, more rare errors that you can’t find too much documentationabout.

There is one definitive source for all smartctl error messages: The smartmontools source code.

We can find the error descriptions in ataprint.cpp (also see the GPL license information in the sourcetarball):

const char *abrt = "ABRT"; // ABORTEDconst char *amnf = "AMNF"; // ADDRESS MARK NOT FOUNDconst char *ccto = "CCTO"; // COMMAND COMPLETION TIMED OUTconst char *eom = "EOM"; // END OF MEDIAconst char *icrc = "ICRC"; // INTERFACE CRC ERRORconst char *idnf = "IDNF"; // ID NOT FOUNDconst char *ili = "ILI"; // MEANING OF THIS BIT IS COMMAND-SET SPECIFICconst char *mc = "MC"; // MEDIA CHANGED const char *mcr = "MCR"; // MEDIA CHANGE REQUESTconst char *nm = "NM"; // NO MEDIAconst char *obs = "obs"; // OBSOLETEconst char *tk0nf = "TK0NF"; // TRACK 0 NOT FOUNDconst char *unc = "UNC"; // UNCORRECTABLEconst char *wp = "WP"; // WRITE PROTECTED

Realistically, you’ll only encounter a few of these errors even if you are working with hard disks professionally. Some of these errors like MC, MCR or NM are also related to hot-swapping of hard drives and do not neccessarily represent errors related to hard drive healthitself.

One important error is ICRC – the interface CRC error. This means that there are errors being detected on the IDE/SATA or PCIe bus the hard drive is connected to. Although this is rare and might be caused by the HDD itself, it might mean that your chipset (the hardware controlling e.g. SATA) is damaged – in this case, replacing the hard drive would not fix the issue. Possibly there is also an intermittent cableconnection.

How severe are thoseerrors?

Over the life of most hard drives, especially consumer models, errors will occur – more often so in portable devices where high acceleration forces are more like to beencountered.

What separates a good hard drive from one at the end of its life (excluding those that fail without warning) is often the frequency of new errors. If you look at the total lifetime of the HDD, i.e. Power_On_Hours orsimilar:

9 Power_On_Hours 0x0032 082 082 000 Old_age Always - 8586

and compare the value (in this case 8586) with the lifetime at the lasterror,

Error 8911 occurred at disk power-on lifetime: 7257 hours

in this case, 7257, you can see over a thousand HDD operational hours have passed since the last error. This indicates that there is no mechanical defect which could result in destruction of the hard drive but rather a couple of defective or damaged sectors. UNC errors do not even neccessarily mean that the sectors are physicallydamaged.

Often hard drive errors are triggered when a files that are accessed very rarely (such as archived video files that are only opened every few years). When there are enough bit flips in such files for any reason, this can result in a larger number of HDD errors appearing atonce.

Another indicator is the total number of errors the hard drive has encountered, i.e. 8911 in

Error 8911 occurred at disk power-on lifetime: 7257 hours

orin

ATA Error Count: 8911 (device log contains only the most recent five errors)

While this number is not shown for all hard drives, a very high number or a number which is growing rapidly indicates there is some physical issue with the drive. Issues relating to only a few bad sectors induce a sudden jump in the error counter, but after that. Note, however, that there can be other reasons for a high error counter, for example a bad or intermittent physical connection to the harddrive.

Also see this previous post on how to fix bad HDDsectors.

How to interpret smartctl messages like ‘Error: UNC at LBA’? - TechOverflow (2024)

UNCerrors

Other errormessages

How severe are thoseerrors?