Category Archives: Uncategorized

Repurposing my blog

My original idea was to make this a purely technical blog. When I was working on building the core infrastructure in Azure I thought that I could write something about how it was built. However I couldn’t write much because most of the time I was working on confidential projects. Cloud computing is a very competitive area as you may know. The 2nd reason is that there is already so much useful info on technologies and I wouldn’t be able to provide the best value by focusing on only technical stuff.

Anyway I have decided to write whatever I think can provide more value. The audience is people in the United States. I will write in Chinese most of the time since there is less info in Chinese. The main focus will be IT and Real Estate.

Solving CORS issue became easy

// If you don’t have time just jump to the last section – takeaway. If you do that solving CORS issue will be easy.

CORS stands for cross origin resource sharing. Let me use a simple example to explain the basics.
Say that you have a web site http://foo.com and it has a client side javascript which uses AJAX to call http://bar.com to get some info. That is a CORS request. To make the request successfully the server needs some change to allow CORS requests.

What is considered cross origin? Use http://foo.com as an example the below urls are considered as different origins.
Https://foo.com (different scheme)
Https://foo.com:8080 (different port)
Https://api.foo.com (different subdomain)
Https://bar.com (different domain)

In some cases a complaint browser may send a preflight request before it allows the request to be sent. Let me use a real case to explain.
Scenario: Client side uses token authentication to get info from server side

1. The client side javascript sends a POST request to https://localhost:44300/Account with token (basically http header “Authorizattion”:”Bearer aAbdkkixlkid…”)

2. The browser determines a preflight request is needed and it sends preflight request
3. The server has to respond with status code 2XX AND with required headers. In this case Access-Control-Allow-Origin cannot be *. Access-Control-Allow-Headers has to contain Authorization.
4. The browser then sends the actual request.

preflight request

actual_request

Actual request

preflight_request

Takeaway

I spent a lot of time solving my particular case and read tens of links. In the end it I found that it could be much simpler and quicker if I knew the below two things.
1. Read just one link https://msdn.microsoft.com/en-us/magazine/dn532203.aspx
2. Do check the console message when useing developer tools in browser. I didn’t check the console message. Otherwise it could have saved me a lot of time. The below is a screenshot from Chrome. It tells exactly what went wrong. It makes trouble shooting much easier.

chrome_console

Big Bonus if you are using ASP.NET Web API

Web API presents a unique challenge that the “/Token” service is different from the normal Web API controllers and the nuget cors package only works for web API. Some people suggest adding the below in web.config. It will work for “/Token” service but NOT web api especially when you are using https. The chrome browser does not allow “*” in Access-Control-Allow-Origin.

<add name=”Access-Control-Allow-Origin” value=”*” />

The solution is adding the below code at the top in IdentityConfig.cs.

if (string.Equals(context.Request.Uri.PathAndQuery, “/Token”, StringComparison.OrdinalIgnoreCase))
{
context.Response.Headers.Add(“Access-Control-Allow-Origin”, new[] { “http://foobar.com” });
}

 

 

TF401189: The source branch has been modified since the last merge attempt

If you are using Visual Studio Online and setup your project using TFS doing a code review is straightforward in visual studio.

However ff you are using Visual Studio Online and setup your project Git you might wonder how you can conduct a code review. You may do a quick search and then try ‘new pull request’ . You might get a weird error message you don’t understand.

TF401189: The source branch has been modified since the last merge attempt

Then you do another search (bing or google) you won’t find anything useful.

After trial and error I found that the root cause is that I didn’t use a topic branch. It is that simple.

Two useful links as below

https://www.visualstudio.com/get-started/code/git/pullrequest
https://blogs.msdn.microsoft.com/visualstudioalm/2014/06/10/conduct-a-git-pull-request-on-visual-studio-online/ (unfortunately all the images are missing)

Debugging a weird process crash issue

I recently debugged a very interesting process crash issue. The code was running in the cloud so I could not just attach the debugger as easily as on my dev box. Besides using debugger should be the last option in my opinion. So I logged on to the machine. Well before that I had to go a through a process to get permission and get the environment ready for security/compliance reason. I observed that the process was recycling by using taskmgr. So I did the below things
1. Checked the logs. No exceptions. No errors. No logs related to the crash
2. Checked crash dump files. No files were there
3. Checked OS eventlog. No crash events. Normally for any process crash there is at least one event

I never saw a process crash like this. But I knew the related code which caused this. It was a background task kicked off by calling Task.Run(). I thought that there must be some unhandled exceptions but I checked the code. There was already a catch block which caught all exceptions. However it was still possible that the catch block might throw exception. So I put another catch all inside the catch block to make sure that no unhandled exception was thrown.

However it did not work. The process was still crashing. I asked around about how a process could crash since I was relatively new to the code base. I got some pointers but none of them could match what I observed.

Then I had to turn to the last option. I copied windbg to all the nodes (remember that this is distributed system and there are multiple replicas) and hooked it up with the process. Boom! I got an exception in the debugger and then the process crashed.

I looked at the exception which was a normal .net exception. I could not think of a reason how it could cause crash. Then I looked into the call stack and saw the below line.
000007f9`8538bb27 : 0000001b`9c29cf80 0000001c`2c50c880 0000001c`2c50c888 0000001c`2c50c890 : mscorlib_ni!System.Environment.Exit(Int32)+0x7b

That was really weird. My code was calling a library from another team. So I got the tool ILSpy (I don’t use .net reflector anymore btw) and looked into the code. Man the code was calling Environment.Exit() when a certain exception threw! That explained everything. As for why the coding was doing so this was a new library and I guess that the piece of code was copied from somewhere else.

From the above text it looked that it didn’t take much time. In reality two days passed.

StackOverflow Exception, Long time no see

I have not seen a stackoverflow exception for a long time. Today I happened to bump into one in random test code. I will explain how it happened. What the code is used for and why it was written this way are not interesting to me and are not in the scope of this post.

    public class ChildClass : BaseClass
    {
        protected override void Dispose(bool disposing)
        {
            if (disposing)
            {
                base.Dispose();
                //DO something
            }
        }
    }

    public class BaseClass
    {
        public void Dispose()
        {
            Dispose(true);
            GC.SuppressFinalize(this);
        }

        protected virtual void Dispose(bool disposing)
        {
            if (disposing)
            {
                //DO something
            }
        }
    }

As you can see the stackoverflow exception will happen as the below
1. Child.Dispose(bool)
2. Base.Dispose()
3. Child.Dispose(bool)
4. Base.Dispose()

Persistent MAC Address in Cloud (Microsoft Azure /Amazon AWS / Google GCP / OpenStack)

What is a NIC? It is the network adaptor on your machine. MAC address is the physical address of the adaptor. You can consider it as fixed on your machine for this discussion. However the NIC you get on a VM in cloud is a virtualized one. If you delete a VM and create a new one you will probably get a new MAC address. Why do we need a persistent MAC address? Many licensed software rely on MAC for the licenses. If you want to move licenses from one VM to another persistent MAC address becomes very useful.

It looks like a small feature. Let’s look at a few major cloud provides to see how they deal with this.

Microsoft Azure
NIC https://azure.microsoft.com/en-us/documentation/articles/resource-groups-networking/#nic

It is not available yet. According to the post it is already planned.
http://feedback.azure.com/forums/216843-virtual-machines/suggestions/5217933-static-mac-address

Amazon AWS
It is already supported. The below statement is from the official AWS documentation.
The interface maintains its private IP addresses, Elastic IP addresses, and MAC address.
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-eni.html

When did AWS start supporting this? I have no idea. From this post it was not supported yet in 3/2011 but it was supported in 2/2012. https://forums.aws.amazon.com/thread.jspa?messageID=107448&amp;

Google Cloud Platform (GCP)
Google does not seem to support persistent MAC address from the public official doc.
https://cloud.google.com/compute/docs/networking?hl=en#networks

OpenStack
It does not seem to support persistent MAC address from the public official doc.
https://www.openstack.org/software/openstack-networking/

C# import .pfx file to cert store and CryptographicException: Keyset does not exist

I had to add a .pfx file programmatically today but got an exception when accessing this file in code after importing . The simple demo code and the exception are as follows.

X509KeyStorageFlags.PersistKeySet);
X509Store certificateStore = new X509Store(StoreName.My, StoreLocation.CurrentUser);
certificateStore.Open(OpenFlags.ReadWrite);
certificateStore.Add(cert);

System.Security.Cryptography.CryptographicException: Keyset does not exist at System.Security.Cryptography.Utils.CreateProvHandle(CspParameters parameters, Boolean randomKeyContainer) at System.Security.Cryptography.Utils.GetKeyPairHelper(CspAlgorithmType keyType, CspParameters parameters, Boolean randomKeyContainer, Int32 dwKeySize, SafeP rovHandle& safeProvHandle, SafeKeyHandle& safeKeyHandle) at System.Security.Cryptography.RSACryptoServiceProvider.GetKeyPair() at System.Security.Cryptography.X509Certificates.X509Certificate2.get_PrivateKey()

I did a search and the top results took me to the direction of adding permission on the private key. There are two versions of code for this task.

http://www.codeproject.com/script/Forums/View.aspx?fid=1649&msg=2062983

private static void AddAccessToCertificate(X509Certificate2 cert)
{
  RSACryptoServiceProvider rsa = cert.PrivateKey as RSACryptoServiceProvider;
  if (rsa != null) {
    string keyfilepath = FindKeyLocation(rsa.CspKeyContainerInfo.UniqueKeyContainerName);
    FileInfo file = new FileInfo(keyfilepath + “\\” + rsa.CspKeyContainerInfo.UniqueKeyContainerName);
    FileSecurity fs = file.GetAccessControl();
    var sid = new SecurityIdentifier(WellKnownSidType.WorldSid, null);
    var everyone = (NTAccount)sid.Translate(typeof(NTAccount)); 
    fs.AddAccessRule(new FileSystemAccessRule(everyone, FileSystemRights.FullControl, AccessControlType.Allow));
    file.SetAccessControl(fs);
  }
}

http://stackoverflow.com/questions/425688/how-to-set-read-permission-on-the-private-key-file-of-x-509-certificate-from-ne

private static void AddAccessToCertificate(X509Certificate2 cert)
{
  RSACryptoServiceProvider rsa = cert.PrivateKey as RSACryptoServiceProvider;
  if (rsa != null) {
    var cspParams = new CspParameters(rsa.CspKeyContainerInfo.ProviderType, rsa.CspKeyContainerInfo.ProviderName, rsa.CspKeyContainerInfo.KeyContainerName)
  {
    Flags = CspProviderFlags.UseExistingKey | CspProviderFlags.UseMachineKeyStore, CryptoKeySecurity = rsa.CspKeyContainerInfo.CryptoKeySecurity};
   cspParams.CryptoKeySecurity.AddAccessRule(new CryptoKeyAccessRule(WindowsIdentity.GetCurrent().User,
   CryptoKeyRights.FullControl, AccessControlType.Allow));
   using (var rsa2 = new RSACryptoServiceProvider(cspParams))
   { // Only created to persist the rule change in the CryptoKeySecurity }
  } 
}

Unfortunately both solutions did not work for me. This code was just a part of my other code. Every time I had to change the code, build and re-run test case. It took me a couple of hours to prove that both solutions didn’t work. I think that it was time to try in differet direction. So I read more search liks. It turned out that the fix is simple. Just adding X509KeyStorageFlags.PersistKeySet in the constructor of X509Certificate2 made it work.

var cert = new X509Certificate2(certificatePath, password, X509KeyStorageFlags.PersistKeySet);
X509Store certificateStore = new X509Store(StoreName.My, StoreLocation.CurrentUser);
certificateStore.Open(OpenFlags.ReadWrite);
certificateStore.Add(cert);

The source of this solution is http://support.microsoft.com/kb/950090

TurboTax 2014 installation Error: “.NET Framework Verification Tool can’t be found”

This was an very annoying TurboTax 2013 issue which took me a couple of hours. Look like that the solution I found applies to TurboTax 2014 as well. More importantly the solution is so simple and I think that in most cases people don’t need to look at other solutions at all.

http://blogs.msdn.com/b/paullou/archive/2014/02/11/turbotax-2013-installation-error-quot-net-framework-verification-tool-can-t-be-found-quot.aspx

Forgot password for your Azure VM?

Unfortunately Azure Portal doesn’t have this simple feature. The only solution I know is using Azure Powershell Cmdlets. There are different articles talking about this. But they are not concise at all. To me I just need simple steps to reset password for one single VM. So here you go. Just “8 simple steps” you can get your VM back.

  1. Install Azure PS if needed http://azure.microsoft.com/en-us/downloads/
  2. Set-ExecutionPolicy RemoteSigned
  3. Import-Module Azure and Add-AzureAccount
  4. Select-AzureSubscription -Default “Subscription-1” // use Get-AzureSubscription to find the name if needed
  5. $vm = Get-AzureVM -Name paulloutest // use Get-AzureVM to list all VMs if needed
  6. $adminCred = Get-Credential // DO NOT type machine name, just pure user name
  7. Set-AzureVMAccessExtension -VM $vm -UserName $adminCred.UserName -Password $adminCred.GetNetworkCredential().Password -ReferenceName “VMAccessAgent” | Update-AzureVM
  8. Restart-AzureVM -ServiceName $vm.ServiceName -Name $vm.Name

A design pattern for caching – two tier caching with strong consistency between servers

This is a very interesting and useful pattern in practice. I haven’t seen any books for classic patterns to talk about this. But I think that it should be a classic one for caching. To make it clear I think that it is an old pattern. I do not invent it and I do not discover it. I designed the solution for our problem then I realized that it should be a pattern. I just want to write it down since I didn’t know it is a pattern before designing the solution and the pattern is useful in real world. More importantly the solution was not so obvious to me before I did the work so it is worth writing it down.

Problem

  • You have a not scalable data source which has relatively stable data (e.g. config or metadata) but the data does change.
  • You have many servers (front ends and backends) to access that data. Since the data source is not scalable you need caching.
  • You want strong consistency between the servers. i.e. when one server sees a new version of data all other servers need to see the new version or newer version(s). Simple put it is shared caching.

Old Design – one tier caching

First you may ask why I didn’t use Zookeeper or equivalent service. Yes that would be ideal if it is available. Sometimes it is just not available in your project but you need to solve this problem.

Take a look at the old design as below. Azure Table was used for the shared caching. It is very scalable and actually fast than many people thought (tens of milliseconds for each access).

caching_1

Why we need a new design

  • The config/meta data grows and one column/property doesn’t fit anymore in Azure Table. The size limit for one row/entity is 1MB. The size limit for one column/property is 64kb. The data is over 64kb now
  • As the data grows always getting the full data is not optimal
  • This is not a main reason for the design change or it is also not a major issue but the servers have clock skew. Sometimes it is not ideal and it is hard to reason about it or explain to others. Take an extreme case as an example. Let’s say you have only one server and it might take much longer for the servers to refresh data in case of clock skew

New Design – two tier caching

I call it two tier caching since it has two tiers now – In Memory and Azure Blob. The changes are as below

  • In memory cache is introduced
  • Azure Blob is used (in this case it performs as good as Azure Table)
  • When a server tries to read the data from the memory it will try to compare the version in the cache to the version in the blob first. If they are the same it will skip reading the data from the blob. Otherwise it will refresh the cache
  • A local refresh time is introduced so clock skew doesn’t matter anymore. so every a few mins one server will refresh the blob and all other servers will get the data from the blob and reset the local refresh time.

caching_2

The new design solved our problem very well. The code for this pattern is not much. I left out some technical details on purpose so readers can think about them. Please try to answer the below questions.

  • What do you use for the version number? How to update it? Should you keep the version in a NoSql database or with the blob?
  • How about concurrent reads and writes on that single blob? We have hundreds of servers. What is the limit?
  • What is the right exception handling? The data source, the blob or the servers might be down.
  • How to test your code?
  • What tracing do you need for live site troubleshooting?
  • Are you confident in doing it in a hotfix? or you would rather do it in a major release.