June 19, 2013

SharePoint Downloads Interrupted for Large Files

While we primarily use SharePoint Online (2013) for fairly small documents - the largest are still less than 5MB - I recently decided to start uploaded recorded team meetings from Lync 2013. The videos are about 30 minutes in length and end up being around 40-60MB in size. Although the upload runs fine, users were reporting issues when trying to stream the files or attempting to download it. The stream would simply stop and the downloads fail with a message that the download was "Interrupted". We replicated this behavior across 4 different locations with the same result - at some random point in the download, it would fail.

May 13, 2013

Repairing Mailbox Corruption in Exchange 2010

I recently got through recovering an SBS 2011 server after Active Directory face-planted in the middle of a workday. When I say recover, I mean I repeated the entire migration, using a cleaned up secondary DC - it was a fun weekend (expect another post about that experience). Although I thought we were in the clear, I got a call from the client about 24 hours after we had verified everything was working. He indicated that his iPhone had suddenly stopped receiving mail in the inbox (calendar, contacts, sent items were still fine) and throws up an error after spinning in circles for a few minutes that it "cannot connect to mail server".

February 25, 2013

Failure of vShield Edge NAT/VPN Traffic Post-5.1 Upgrade

UPDATE: Turns out this is a known issue during the 1.5 > 5.1 VSM upgrade and a fix should be released in an upcoming patch.

That's about the shortest title I could think of to be descriptive of this issue. TLDR is that NAT rules on vShield Edge appliances appear to be causing unexpected behavior on VPN traffic after a vCloud upgrade from 1.5 to 5.1.
Background: We recently upgraded from 1.5 to 5.1. For most of our vDCs, we simply have a single vSE/Routed network that connects a private subnet to a "WAN" network and pulls a public IP from a pool. We forward (NAT) and allow (firewall) selected ports (e.g. 3389 for RDP) to virtual machines. Most of these networks also have a site-to-site VPN tunnel with a physical firewall across the internet. After the upgrade, we went and converted our rules to match on original IP and then enabled "multiple interfaces" - effectively taking them out of compatibility mode. Everything looked good (even for the vSE devices still in compatibility mode)
Issue: We first noticed this when a client reported that they could not access a virtual machine via RDP using it's internal (VSE protected) IP across a VPN tunnel, but could access the VM via RDP using it's public hostname/IP address. We allow all traffic across the VPN (firewall has an any:any rule for VPN traffic). When we logged in to troubleshoot (simply thinking the VPN was down), we found that we could connect to any port on the remote VM across the VPN tunnel except 3389. I could ping from the local subnet to the troubled VM on the vApp network with no problem. I could connect to other ports that were open on the remote VM with no problem. I could not connect to 3389 across the VPN.
We thought it might be isolated, but found the issue on every VSE we have: If there existed a DNAT rule to translate inbound traffic for a particular port, that port would be unresponsive when traffic traversed the VPN tunnel destined for the target of the DNAT rule.

While vCloud Director doesn't show anything strange in the firewall section of vSE configuration, if you log in to vShield Manager and look at the firewall rules there, a "Deny" rule with the private/internal/translated IP is added for any NAT rule that exists:

This, I'm assuming, is for security reasons during the upgrade but it does not show up in vCloud Director (thus our confusion). After taking our appliances out of compatibility mode post-upgrade, the rules were still there.

Solution:  After the vSE is out of compatibility mode (see pg. 49 of the vCD 5.1 Install Guide), re-apply the service configuration (Right-Click vShield Edge Appliance in vCloud Director and select "Re-Apply Service Configuration"). You can also re-deploy the appliance or add an arbitrary rule to the firewall list - both appear to have the same effect.

Red Flags and the Value of Experience

One of the things I hear often said, and something I subscribe to as well, is the idea that a lot of technical knowledge in the world of IT ...