InitializeBusinessRules/AuthorizationRules potential for deadlocks in Web Service portal?

rsbaker0 posted on Tuesday, May 25, 2010

For the last week or two, I have been troubleshooting CSLA web service data portals that seem to be hanging intermittently. (Regularly at first, but after some obvious threading problem corrections on my part, it's better)

It turns out that the hanging appears to always occur at the exact moment the application pool recycles, and hence during the initialization of the BO assemblies or shortly thereafter. Note that at the time the pool recycles, the server could be servicing many requests per second, and there is no predicting what objects will be materializing inside the application pool or in what order.

After much logging of activity over several weeks, I believe I have traced it down to one of these two calls:

      InitializeBusinessRules();
      InitializeAuthorizationRules();

The problem seems to be that both of these calls do a lock() on the object type while the rules are being added. If these functions load information from the database to dynamically generate validation or authorization rules, the rule registration can take a signficant amount of time from the perspective of a typical lock duration.

It seems like there is the potential for a deadlock if your BusinessBase derived classes have dependencies on each other that come into play when registering the rules. Consider this admittedly contrived scenario:

1. Type A uses B in it's rule initialization, and vice versa.

2. A comes through the portal, and locks type A in InitializeBusinessRules()

(thread switch)

3. Meanwhile, type B comes through the portal on another request, and locks type B in InitializeBusinessRules()

(thread switch)

4. While initializing its own rules, A attempts to create a B. It can't, because the B constructor will call InitializeBusinessRules() which will block on the lock. So now A is blocked waiting for B.

5. B can still run and now tries to instantiate an A. It can't, since A locked its own type in #2.

Both threads are now deadlocked on each other. Since the types are locked, the deadlock will extend to subsequent threads servicing requests on the application pool. (I've witnessed something very similar this in the logging -- I log both the entry and exit of each call to the data portal, and when the deadlock occurs, no exits are logged. I see a series of incoming requests without a matching exit and eventually all the threads are consumed and the pool is dead until the next time it recycles.

Would it be reasonable to change these from an outright lock() to a Monitor.TryEnter() and throw an exception if the lock cannot be acquired in a reasonable amount of time. (The exception isn't a perfect solution, but at least the application pool would recover from this.). Certainly lock() calls that only span a short duration while an object is created and assigned to a member aren't usually going to be a problem but InitializeBusinessRules (and InitializeAuthorizationRules) are intended to be overridden and the time duration of the lock can be long.

rsbaker0 replied on Tuesday, May 25, 2010

RockfordLhotka

You should try your suggestion and see if it addresses the problem. Since you have a scenario where the issue is visible, you are in a relatively unique position to try this out.

I've confirmed with additional logging that the deadlocks are indeed occurring in the AddBusinessRules() override.

However, in looking at this more closely, my original idea has serious flaw in that InitializeBusinessRules() appears to leave the ValidationRules for the type in an indeterminate state if an exception is thrown during AddBusinessRules(). There is no provision for removing the partially filled ValidationRulesManager associated with that type -- It looks to me like you'd have a partial set of rules consisting only of those rules added prior to throwing the exception. Future use of the object type server side would operate using an incomplete set of rules. Ugh.

It might just be the case that BusinessBase derived objects should not be instantiated in either AddBusinessRules() or AddAuthorizationRules() without risking deadlocks in a multi-threaded environment.

RockfordLhotka replied on Tuesday, May 25, 2010

rsbaker0

I would say that is a scenario outside the boundaries of the design. You should be able to get ReadOnlyBase objects in those methods - because that's how you'd retrieve the metadata to drive your rules. But it is hard to imagine how those metadata objects would have any interaction with the actual business model objects - they are really two separate models - a model and a metamodel.

rsbaker0 replied on Wednesday, May 26, 2010

OK, I can probably refactor to do this (although I think I've seen some discussion about ReadOnlyBase objects also having rules, maybe in 4.0)

Here is the "real" problem -- if you can help me solve this one the above problem will go away.

What I really need to do is to run some initialization either just before (e.g. when my BO assembly is loaded) or on the very first call to the server side of the data portal (logical or actual). Other data portal requests should wait until this is finished. (Something like the equivalent of DLLMain in the pre-C# days was what I first looked for).

If I can complete my own initialization before any more incoming requests start to process, this issue will go away. Do you have any suggestions on how I might do this?

RockfordLhotka replied on Wednesday, May 26, 2010

I see. This isn't easy.

You can use an IAuthorizeDataPortal implementation to get early access to every logical server-side data portal call. From there you could call some code that does your server-side initialization.

Obviously this could occur many times simultaneously on different server threads. So your initialization code would need to do some locking to ensure that only one thread does the init work, and other threads are blocked.

Also, this would prevent the use of the data portal in the init code, because you'll have blocked all subsequent data portal invocations until the initialization is complete.

rsbaker0 replied on Wednesday, May 26, 2010

RockfordLhotka
I see. This isn't easy.

You can use an IAuthorizeDataPortal implementation to get early access to every logical server-side data portal call. From there you could call some code that does your server-side initialization.

I thought about this, but it's already too late. The problem is that at that point the incoming request has been deserialized, and if the request is an Update for a BO, then the OnDeserializedHandler() handler has already been called. The deadlock I describe above could occur before Authorize() is ever called.

I came up with this solution for the time being, which actually involved changing the WebServicePortal host itself. Pardon the analogy, but it seemed to me that WebServicePortal.Deserialize() is, pardon the analogy, the "transporter pad" for all incoming objects.

So, I took page from your playbook and implemented the concept of a putting a "CslaDataPortalInitHandler" key into the config file. This provides for specifying a class and static method to call a single time before any incoming objects are even deserialized. Right now I'm calling this code in Deserialize(), but perhaps the WebServicePortal constructor would also work -- I was trying to allow for the possibility that the initialization could fail in a transient fashion (e.g. SQL Server is down) and subsequent requests would retry until it succeeded. This puts the lock() in a single central location and forces all other requests to wait until the initialization is complete until they can even be deserialized.

      static void CheckInit()
      {
          if (_initDone == 0)
          {
              lock (_syncLock)
              {
                  if (_initDone == 0)
                  {
                      var provider = DataPortalInitHandler;

                      if (!string.IsNullOrEmpty(provider))
                      {
                          string[] items = provider.Split(',');
                          Type containingType = Type.GetType(items[0] + "," + items[1]);
                          var methodInfo = Csla.Reflection.MethodCaller.FindMethod(containingType, items[2], System.Reflection.BindingFlags.Static | System.Reflection.BindingFlags.Public);
                          if (methodInfo != null)
                          {
                              try
                              {
                                  ApplicationContext.SetLogicalExecutionLocation(ApplicationContext.LogicalExecutionLocations.Server);
                                  methodInfo.Invoke(null, null);
                              }
                              finally
                              {
                                  ApplicationContext.SetLogicalExecutionLocation(ApplicationContext.LogicalExecutionLocations.Client);
                              }
                          }
                      }
                      _initDone = 1;
                  }
              }
          }
      }

(I may need to tweek the execution location code above, but I needed the logical location set to execute some my server side code that I explicitly block from being allowed on the client and the above seemed safe enough.)

Maybe this idea would be a worthwhile addition to CSLA as surely I'm not the only user who would like to be able to do some internal initialization before any data portal requests are processed on the server side.

rsbaker0 replied on Wednesday, May 26, 2010

RockfordLhotka
All the other data portal channels rely on .NET to do the deserialization, and so that always runs before your code....

Indeed, I noticed this when looking at the other portals -- the request is already deserialized by the time they are called.

Might it be possible to implement a constructor for the other portals and do the initialization there?

Alternatively, I can see moving the CheckInit() into BusinessBase.OnDeserializedHandler() as the first call as I think this would have the same effect (although would be slightly more overhead).

Thanks for your input on this...