NTFS Circular Reference
Microsoft has identified an NTFS corruption issue that can cause a Windows 2008 server to freeze or hang. This article explains what the corruption is, some ways that we have been able to recreate the corruption in our test lab and potential workarounds to prevent the corruption from causing the server to freeze. If you don’t care about the details you can just download a hotfix from Microsoft that solves the problem: http://support.microsoft.com/kb/2866695
The specific corruption discussed here is a multi-level circular NTFS reference that Windows self-healing cannot fix. An example of a circular reference is an NTFS ID that has above it an NTFS parent ID that points back to that same ID. If it is single level circular reference (NTFS ID xxxx has a parent ID whose ID is also xxxx) then Windows 2008 self-healing is able to auto fix it. If on the other hand it is more deeply nested such as a->b->c->a then without a Microsoft hotfix the kernel could go into an infinite loop and the server would hang.
The primary way that GroupLogic has been able to reproduce the problem is using a Mac client that has two different windows open to the same folder hierarchy on the server. If a user drags a folder from one window into one of the folders below it in the other window the circular reference will be created. In other words moving the folder “a” from one window into folder “c” of a/b/c in the other window. Although less likely, the circular reference can also be caused by two different users moving folders within a very short period of time. An example of that would be Fred moving folder “a” to x/y/z at approximately the same time Sally moves folder “x” to a/b/c.
Once there is a NTFS multi-level circular reference the reason that the ExtremeZ-IP hangs is because when ExtremeZ-IP asks the OS about that folder the kernel goes into an infinite loop. ExtremeZ-IP does have built-in safeguards so that if an operation takes a thread more than 5 minutes to complete ExtremeZ-IP will attempt to cancel it. With a circular reference after 5 minutes ExtremeZ-IP does attempt to kill off the stuck thread, unfortunately by then it is too late. The other ExtremeZ-IP threads which need to do kernel tasks are all backed up behind that single thread and the service can’t even get enough work done to shut down the stuck thread. More information about ExtremeZ-IP stalled thread handling can be found in the following article: http://support.grouplogic.com/?p=3787.
The hotfix from Microsoft should solve all known circular reference issues in the NTFS driver and is the recommended solution to the circular reference problem. See their KB article: http://support.microsoft.com/kb/2866695 for more information from Microsoft about the hotfix.
According to Microsoft the kernel hang caused by the corruption was supposedly fixed in Windows Server 2012. Although we have been able to reproduce the issue using Windows Server 2012 in our own labs, we have had any reports from the field of that involved Windows Server 2012. Because the issue many not be completely resolved in Server 2012 we recommend that you upgrade those servers to the latest version of ExtremeZ-IP.
ExtremeZ-IP 8.0.4 and later attempts to detect and avoid Microsoft disk corruption (also known as circular reference) regardless of whether a Microsoft Hotfix is installed or not. Unfortunately sometimes just checking the file system for the circular reference is enough to hang the kernel. In cases where we can identify a circular reference ExtremeZ-IP will log an event log message pointing to the location of a circular reference.
In ExtremeZ-IP 8.0.5 and later not only do we attempt to detect existing corruption but we also prevent users from moving parent folders into child folders for regular Windows drives.
With mount points it can be much more difficult to determine from an NTFS ID the actual path to an item therefore we were not able to come up quickly with a way to solve the issue. The ExtremeZ-IP development team continued to work on this and an upcoming ExtremeZ-IP 8.0.6 release that is currently in Quality Assurance testing will extend the blocking behavior to mount points. When this is released all known ways of creating the circular reference from a Macintosh client will be blocked by ExtremeZ-IP.