Nov 192014
 

Power SupplyIf you are responsible for keeping systems up and running it’s important to keep an eye on your hardware. This especially denotes to hard disks, fans and power supplies as they break most often. Today’s post is about how to easily and automatically check the state of power supplies.

To do so I wrote a small script that uses ipmitool to check the state of all detected power supplies. I used it primarily on Supermicro X9 class motherboards however all systems supported by ipmitool should work.

The USPs of my script are that it supports more than two power supplies, that it is fully documented and that it reports a unique exit code per system state. All this features help you to integrate it perfectly into your workflow. Here’s how to use it:

$ ./checkPowerSupplies.sh -h
This tool checks the state of all installed power supplies and reports their current state. It can be used in automated monitoring tools like nagios.
It depends on ipmitool and supports all systems that report the state of the installed power supplies through the sensors subcommand. I used it primarily on Supermicro X9 class motherboards.

Usage: ./checkPowerSupplies.sh
	-h 	Shows this help
	-p=2	The number of expected power supplies
	-r=0x1	The value that indicates a working power supply (see ipmitool sensors)

Example:
./checkPowerSupplies.sh -p=3 	# Check 3 installed power supplies
./checkPowerSupplies.sh		# Check 2 installed power supplies
./checkPowerSupplies.sh -r=0x4	# A working power supply reports a state of 0x4

Exit codes:
	0	All power supplies are working
	1	ipmitool is not installed
	2	Found more power supplies than expected
	3	At least one power supply is missing
	4	At least one power supply failed

Version 1 released in 2014 by Florian Bogner - http://bogner.sh

If you are interested you can download the checkPowerSupplies.sh script over at Google Code.

Nov 102014
 

Over the last few years I encountered the following QuickTime 7 error message several times while trying to play a file:

The movie could not be opened. The resource map is incorrect

The cause of this message is a broken Resource Fork which is stored within the file’s extended attributes. This error often occurs if the file is or was stored on a network drive or an external disc. It also affect all applications that build upon QuickTime 7’s API. To verify if a file is unplayable because of it’s Resource Fork just try to play it with VLC. It should work fine. If so, it can be fixed.

To make it easy to repair such files I wrote QT7 EA FIX. Just launch the script and drop the broken file. The rest will be handled automatically and you should be able to play the file afterwards.

Screen Shot 2014-10-22 at 14.19.48 Click here to download QT7 EA FIX.command.

Nov 022014
 

SharedEAStoreExtended attributes on OS X allow applications to store additional metadata along data files. Filesystems like JHFS+ that natively support this feature store this metadata completely hidden from the user. On filesystems that don’t support extended attributes OS X writes this data into Dot Underscore (._) sidecar files. Many people including me have been bugged by this fallback and have searched for ways to remove those files. I described one possible way in my post Win & Mac: Clean Dot Underscore Files. However this post is about a very interesting problem that occurs only if you try to use a Linux server as a fileserver that exposes the same folder for OS X clients using SMB and AFP with extended attribute sharing.

The goal is that whatever network protocol you use you should always see the same data (including it’s metadata). To do that I disabled netatalk’s native EA support (ea = none in afpd.conf). Thereby SMB and AFP use the Dot Underscore fallback. In theory now all OS X clients independently of the network protocol they use should have a consistent view. However that’s only the theory.
SharedEA AFP/SMB

If you write extended attributes using SMB and read the data over AFP it get’s scrambled. Here’s an example executed on a mounted SMB sharepoint:

$ xattr -w sh.bogner.test.entry 1234567890abcdef testfile #write EA
$ xattr -l testfile #read EA
	sh.bogner.test.entry: 1234567890abcdef

Up to this point, everything is as expected: We could access the metadata and it was exactly what we put in. However, if you try to read the same metadata over an AFP-mounted sharepoint it’s a completely different picture: The metadata is completely unusable.

4The problem I described here is an issue by itself, however some applications depend on working extended attributes. These applications either don’t work at all or have issues on such storages. Possible solution are to either go with separate EA stores for SMB and AFP, to delete all Dot Underscore files in close-to-realtime or to only use one protocol.

Oct 232014
 

Screen Shot 2014-10-23 at 16.17.09Currently I’m busy building several automated workflows that run on our flow:rage Video Storage System and that move files and directories from A to B while processing them in some way.

During qualifying this workflows for production we discovered that moved items were not immediately visible for Windows clients connected using SMB. Sometimes it took several minutes for the files and directories to show up. Even manually refreshing the parent folder did not help. Linux and OS X clients were not affected by this issue and so it was clear that this had to be a client side caching problem.

My google-fu helped me to find the TechNet article SMB2 Client Redirector Caches Explained that explains the Windows SMB2 Cache and it’s configuration options. Based on the I created the following .reg file [download]:

Windows Registry Editor Version 5.00

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\services\LanmanWorkstation\Parameters]
"DirectoryCacheLifetime"=dword:00000000

This configuration change disables the client side directory content cache. That means that file and directory changes are immediately visible for the client. This was exactly what I needed to solve my issue. However be aware that this change causes the client to contact the server more frequently.

Oct 172014
 

We are using Confluence as our internal documentation platform. It’s a gorgeous tool that really helped to bring our documentation to the next level. There is only one thing that I dislike about it: The show “X more child pages” link in the sidebar. I highlighted the troublemaker in the picture below that I captured from one of Atlassian’s demo videos.

confluence

Tired of all the complex solutions? found on Google I created a simple Tampermonkey userscript that automatically clicks the link if present. It is inserted into all pages that contain the keyword confluence in the URL and checks if it’s really a Confluence page. If so, it clicks the “X more children” link as soon as the page is ready. Thereby you immediately see all available child pages. It’s a major timesaver in my daily workflow.

Download SmallYou can download it for free on GreasyFork.

Oct 072014
 

We have several smaller customers that use a VMware ESXi host with just a single Windows Server VM. To backup these VMs we often use either Windows Backup or a third party application running within the guest that saves all important data to an external USB drive. To be prepared for things like water damage we always suggest to use two rotating disks with one stored securely at another location.

USB Pass-throughThe problem here is that many of our customers are not very tech savvy and they generally don’t want to change anything on the server. That means I had to find a way so that they just have to replace the physical USB backup drive and don’t have to think about how the USB drive is forwarded to the VM.

To solve this I use the PCI pass-through option on the ESXi server. The screenshot on the right shows how that looks on an ESXi 5.0. After shutting down the VM add a new PCI device and select the USB controller of the host system. After switching the VM back on you can connect any device to any USB interface on the host and everything is forwarded automatically. However be aware that only a single VM can access the USB ports because you can forward a PCI device only once.

Another solution regarding USB devices is discussed in the VMware KB entry “USB support in ESXi/ESX 4.1 and ESXi 5.x“.

Sep 282014
 

Buried deep within the network stacks of all major operating systems there are two TCP extensions called Nagle’s Algorithm and Delayed ACKs. Both aim to relieve the pressure on networks (read: the Internet) by changing the way TCP acknowledgements work. This article focuses on the quirks that occur on OS X while doing real-time video editing.

Before we start it is important to note that these extensions are very important and should NOT be disabled:

Please note that while, in certain cases, the current Nagle algorithm can
have a negative performance impact for certain applications, turning OFF the
Nagle algorithm can have a very serious negative impact on the internet. ~Greg Minshall on the ietf-discuss w3.orf mailing list

Furthermore the source of most Nagle’s Algorithm related problems has already been fixed several years ago. Please check out Rolande’s blog [1,2] and the article “TCP Performance problems caused by interaction between Nagle’s Algorithm and Delayed ACK” by Stuart Cheshire for useful background information.

This post is based on problems reported by some of our flow:rage customers using OS X and 10Gbit Ethernet. They reported things like dropped frames in Final Cut Pro 7 or increased render times within Adobe Media Encoder. These issues were sometimes easily reproducible (like encoding a file twice) and sometimes they appeared and disappeared at will. They were caused by the read performance over the network dropping to only a few MBps – writes were not affected and still performed as expected. The graph below illustrates the observed performance drop:

Performance

To fix the performance issue it was necessary to disable Nagle’s Algorithm and to switch Delayed ACK to it’s compatibility mode. To do so I used the following Terminal command based on the documentation found in this post. As this is only a temporary change you still have to edit /etc/sysctl.conf for a permanent solution as explained in SmallTree’s KB.

sudo sysctl -w net.inet.tcp.delayed_ack=2

I invested quite a lot of my time in researching and writing down all of this information. I hope this post helps people to understand what Nagle’s Algorithm and Delayed ACKs are used for and that they are generally very important and useful extensions. However there are always exceptions and in this case it looks like 10Gbit Ethernet on OS X is one of those …

Sep 082014
 

Last week a local customer reported strange problems with his EMC Isilon storage. For example sometimes when they copy a file from Mac A to their central storage they can’t see it on Mac B. Only Macs are affected by this strange behaviour – all their PCs work great. I was happy when they booked an on-site appointment to investigate the problems further.

I started the investigation by talking to all the people there and writing down all the issues.
After summarising I found out that most of the issues where caused by the fact that they mixed SMB and NFS.
I discussed this with the customer and he happily agreed to switch all machines to SMB.

Isilon

However we still had one issue to solve: Some files (those with umlauts in their filename) where only visible over NFS. The following blog post is a summary of my on-site procedure and my findings:

1.) File Creating with Umlauts over NFS3

If you create a file with umlauts in it over NFS3 (tested with MS Word) it can’t be opened over SMB (“Das Programm kann nicht gefunden werden”). It still works over NFS. You are unable to delete the EAs (._ Files) over SMB (“No such file or directory”).

After removing the EA files over NFS, MS Word launched but complained about an illegal filename. I was furthermore unable to read the file (“no such file or directory”) using cat – however it was still working with NFS.

2.) File Creating with Umlauts over SMB

Same problem as above! The file could not be accessed using NFS – everything working as expected over SMB. Furthermore SMB supports alternate data streams -> EAs get lost between protocols. It this case it is somewhat good that they are enabled as it would break QT7 otherwise.

3.) File Creating without Umlauts

Everything works fine if there is no umlaut in the filename.

4.) Files are not shown within Finder

In Terminal you can see them using ls. This is related to the EA ._ metadata files! If a file has EA’s they disappear if Finder is unable to access those. This is the reason why some movies with umlauts in their name are hidden. If you delete the ._ files they reappear – but are still inaccessible.

5.) Word sometimes unable to save files with Umlauts?

“Word kann dieses Dokument aufgrund eines Bennenungs- oder Berechtigungsfehlerd nicht auf dem Zielvolume schreiben”

6.) Verification

Based on that knowledge I used the following procedure to locate the problem: I created a file with QT X (test.mov) on the storage. Then I duplicated it and renamed to either “NFS aaaÜ.mov” and “SMB aaaÜ.mov” over the corresponding protocol. Thereby two files were created.

Then I ran the following hexdump commands:

NFS Test File

MBP:Test3 user$ ls /Volumes/Broadcast/Test/Test3/NFS*|hexdump -C #over SMB
00000000  2f 56 6f 6c 75 6d 65 73  2f 42 72 6f 61 64 63 61  |/Volumes/Broadca|
00000010  73 74 2f 54 65 73 74 2f  54 65 73 74 33 2f 4e 46  |st/Test/Test3/NF|
00000020  53 20 61 61 61 55 cc 88  2e 6d 6f 76 0a           |S aaaU...mov.|
0000002d
MBP:Test3 user$ ls /Volumes/broadcast-1/Test/Test3/NFS*|hexdump -C #over NFS
00000000  2f 56 6f 6c 75 6d 65 73  2f 62 72 6f 61 64 63 61  |/Volumes/broadca|
00000010  73 74 2d 31 2f 54 65 73  74 2f 54 65 73 74 33 2f  |st-1/Test/Test3/|
00000020  4e 46 53 20 61 61 61 55  cc 88 2e 6d 6f 76 0a     |NFS aaaU...mov.|
0000002f

SMB Test File

MBP:Test3 user$ ls /Volumes/Broadcast/Test/Test3/SMB*|hexdump -C #over SMB
00000000  2f 56 6f 6c 75 6d 65 73  2f 42 72 6f 61 64 63 61  |/Volumes/Broadca|
00000010  73 74 2f 54 65 73 74 2f  54 65 73 74 33 2f 53 4d  |st/Test/Test3/SM|
00000020  42 20 61 61 61*55 cc 88* 2e 6d 6f 76 0a           |B aaaU...mov.|
0000002d
MBP:Test3 user$ ls /Volumes/broadcast-1/Test/Test3/SMB*|hexdump -C #over NFS
00000000  2f 56 6f 6c 75 6d 65 73  2f 62 72 6f 61 64 63 61  |/Volumes/broadca|
00000010  73 74 2d 31 2f 54 65 73  74 2f 54 65 73 74 33 2f  |st-1/Test/Test3/|
00000020  53 4d 42 20 61 61 61*c3  9c*2e 6d 6f 76 0a        |SMB aaa...mov.|
0000002e

Thereby I found out, that there is a different filename reported if you are using SMB. I marked the corresponding changes with an *. What that means it, that there are character encodings issues.

7.) The Issue: NFS

To test if the issue was related to their Isilon I repeated the test on a Debian VM. It shows the same strange issues. Thereby I conclude that the issue is caused by OS X’s NFS client and Finder. A possible way to reproduce this is to rename a file using terminal:

mv "/Volumes/NFSServer/testfile.mov" "/Volumes/NFSServer/testfileäöü.mov"

The expected behaviour is, that the file testfile.mov got renamed to testfileäöü.mov. While exactly that happened, the file got inaccessible. You cannot open it anymore.

9.) Next Steps

To fix this issues we recommend the following next steps:

  • Switch all machines to SMB – Thereby pretty much all problems should be fixed automatically.
  • To finalise the migration we have to fix the remaining issues:
    • We need a script that deletes all ._ EA files
    • Than we have to check if we can access all files containing umlauts. If not we have to rename them to work again (rename to some temporary name over NFS and rename back using SMB) This is the hard part, as we have to preserve the umlauts. Thereby we may be able to avoid the need to relink all assets.

If you have the same problem and need help see the About Me page for contact details.

Aug 312014
 

This blog post is first of all a reminder for myself as I often have to rerun the preview generation within Archiware P5 to test my custom preview generator PresSTORE Media Converter 3. The procedure it also described in this knowledgebase entry and the official CLI documentation. The command returns the ID of the verify job.

/usr/local/aw/bin/nsdchat -c "ArchivePlan <ArchivePlan Name> verify <Client> <Job ID>"

All the needed information can be found in the extended log of the original archive job. The naming of the ArchivePlan Name placeholder is a bit ought as you have to provide the ArchivePlan ID instead.

Extended Archive Log

For this example the following command is the correct one:

/usr/local/aw/bin/nsdchat -c "ArchivePlan 10002 verify localhost 10738"

Be aware that all the original files still have to be located at the original archive path. If they were already deleted you have to restore them first.

Aug 172014
 

Over the last weeks we migrated one of our post production customers from Mac OS X Snow Leopard to OS X Mavericks and from Final Cut Pro 7 to Adobe Premiere Pro CC 2014. Furthermore we added a flow:rage as their central video storage to simply their workflows as they used to share their projects from Mac to Mac. However as they still had to access their old projects we also installed Final Cut Pro 7 on Mavericks. In theory Final Cut Pro 7 is still somewhat supported however there are some glitches here and there. This is the story about one such glitch that makes Final Cut Pro 7 almost unusable for our customer…

FCP-and-Network-Shares

After we finished the migration the cutters reported dropped frames in Final Cut Pro 7. Over time we were able to nail the problems down to projects that were opened over the network from a different Mac. If the projects were located on the flow:rage everything was working great. Based on that we tested the throughput of the hard disks and the network, checked the CPU and memory usage, used different network protocols and examined all logs on both the client and the server. However we couldn’t find the source of the problem!

We then tried to reproduce the problem at several other customers that still use Final Cut Pro 7 and to our surprise we could do so sometimes. What that means is that there can be problems with Final Cut Pro 7 on Mavericks if you try to edit over the network from Mac to Mac. This is especially true if there is a lot of traffic on the corresponding network interface. We never had any problems with projects stored on flow:rage storage system. In the end we suggested the customer to copy all projects to either the flow:rage or the local disk. No further dropped frames where reported.

I think the problem is a combination of a high kernel task utilisation caused by the network traffic, the fact that Final Cut Pro 7 was not extensively tested by Apple and some change in the VFS layer. For me it’s not worth to invest more time to further diagnose the problem. If you have any further hints please leave a comment.