Is ReadOnlyBase.ReadProperty intended to be thread safe?

Is ReadOnlyBase.ReadProperty intended to be thread safe?

Old forum URL: forums.lhotka.net/forums/t/11836.aspx


Jaans posted on Thursday, February 14, 2013

Hi all

We just ran into a very nasty problem in production where a statically shared CSLA ReadOnlyBase object (used for caching static information to reduce DB hits) caused the Application Service hosting the WCF data portal to max out the CPU's at 100%.

After catching the issue happening again, doing a memory dump and some adventures in WinDbg + SOS,
tracked it down to all the threads sitting at the top of the following stack:

 

System.Collections.Generic.Dictionary`2[[System.__Canon, mscorlib],[System.Boolean, mscorlib]].FindEntry(System.__Canon)

Csla.ReadOnlyBase`1[[System.__Canon, mscorlib]].CanReadProperty(System.String)

Csla.ReadOnlyBase`1[[System.__Canon, mscorlib]].GetProperty[[System.Int32, mscorlib]](Csla.PropertyInfo`1<Int32>, Csla.Security.NoAccessBehavior)

 

The ReadProperty method actually seems to cope, but it's the lookup of the property in the Field Manager Dictionary where it goes pear shaped.

As a workaround, we're instead caching the individual valuetype values instead of the object and creating clones for reference types.

Just wanted to know what the expectation of multi-threaded access to a CSLA business object's properties are from a design point of view.

JonnyBee replied on Thursday, February 14, 2013

Hi Jaans,

A business objects properties is not designed to be thread safe.

Actually -not much the .NET framework is thread safe and the threadsafe collections is only available for .NET 4 and 4.5.

Jaans replied on Thursday, February 14, 2013

Thanks Jonny

I suspected as much.
Having something support true multi threaded concurrency is  very hard to do right, but block/locking at a chunky level is sometimes useful. All good, our workaround to use clones is sufficient as the cloning operation on such small objects is quite quick.

PS: Are you missing "paradise" yet? 

kreid replied on Monday, May 12, 2014

JonnyBee

Hi Jaans,

A business objects properties is not designed to be thread safe.

Actually -not much the .NET framework is thread safe and the threadsafe collections is only available for .NET 4 and 4.5.

Can we expect the new synchronizedcollections - eg: ConcurrentQueue to be utilized in CSLA in the future? It would really help me since I need some of my usages of ReadOnlyListBase to be thread safe, and at the moment, I have to use a workaround.

Best Wishes!

 

 

trives replied on Thursday, February 14, 2013

This is a timely post.  Recently, we applied a significant upgrade to production and ran into CPU spikes of 100%, too.

We use generic dictionaries to cache a lot of information server-side.  What interests me about this post is that the memory dump seems to indicate that Dictionary.FindEntry is the culprit.  It's not totally clear from the post how to get from CanReadProperty to FindEntry, but this is my guess:

Dictionary<Type, List<IPropertyInfo>>.FindEntry(Type)
Dictionary<Type, List<IPropertyInfo>>.TryGetValue(Type, out List<IPropertyInfo>)
FieldDataManager.GetConsolidatedList(Type)
FieldDataManager.SetPropertyList(Type)
FieldDataManager.ctor(Type)
FieldDataManager ­get_FieldManager
ReadOnlyBase.CanReadProperty(string)

The call to FieldDataManager.GetConsolidatedList caught my eye because it does seem to attempt to be thread-safe.  I also noticed that somewhere between CSLA version 4.1 and 4.5, a try-catch was added around _consolidatedLists.TryGetValue in GetConsolidatedList.  This try-catch recognizes that TryGetValue is not thread-safe and can cause an exception when several threads read while one thread writes.  We've noticed this behavior, too.  I believe that TryGetValue can also cause an infinite loop when mulitple threads read while one thread writes, and this is my guess as to the root of Jaans problem.

I attached a document which is from an e-mail I sent to a colleague explaining the infinite loop if anyone is interested.  I haven't proved it; it's just a theory.

I just want to throw out a suggestion for a modification that I think would help GetConsolidatedList be a bit more thread-safe.  The modification relies on ReaderWriterLockSlim.  I havn't tested the code specifically, but I've tested the ReaderWriterSlimLock in other scenarios.

private static Dictionary<Type, List<IPropertyInfo>> _consolidatedLists = new Dictionary<Type, List<IPropertyInfo>>();
private static readonly ReaderWriterLockSlim _consolidatedListsLock = new ReaderWriterLockSlim();

private
static List<IPropertyInfo> GetConsolidatedList(Type type)
{
   
List<IPropertyInfo> result = null;
    try
   
{
        _consolidatedListsLock.EnterReadLock();
       
if (_consolidatedLists.TryGetValue(type, out result))
        {
            return result;
        }
    }
    finally
    {
        _consolidatedListsLock.ExitReadLock();
    }

    try
   
{
        _consolidatedListsLock.EnterWriteLock();
        if (!_consolidatedLists.TryGetValue(type, out result))
        {
            result = CreateConsolidatedList(type);
            _consolidatedLists.Add(type, result);
        }
    }
    finally
    {
        _consolidatedListsLock.ExitWriteLock();
    }
 
    return
result;
}

ReaderWriterLockSlim is not available for Silverlight, so just a plain lock might do:

private static Dictionary<Type, List<IPropertyInfo>> _consolidatedLists = new Dictionary<Type, List<IPropertyInfo>>();

private static List<IPropertyInfo> GetConsolidatedList(Type type)
{
    List<IPropertyInfo> result = null;
    lock (_consolidatedLists)
    {
        if (_consolidatedLists.TryGetValue(type, out result))
        {
            return result;
        }
        result = CreateConsolidatedList(type);
        _consolidatedLists.Add(type, result);
    }
    return result;
}

ngm replied on Thursday, February 14, 2013

Trives,

This is very interesting analysis.

Even if it's not proven, the whole fact that Dictionary is thread-safe for reads only, would require much more strict locking than the one that is in place right now.

Jonny, it's true that CSLA BO properties are not thread-safe, but CSLA's underlying property system should be, due to its shared nature, right?

- ngm

 

Jaans replied on Friday, February 15, 2013

Hi Trives

Thanks for your post! Your analysis of the Dictionary.FindEntry(...) method made for some very interesting reading.

Here is some background on our scenario might be useful, as I believe it matches your diagnosis:
We have a CSLA based solution that is physically distributed and employs an "application server" (multiple load balanced on a cluster in fact). This application server tier is mostly a WCF host for the Data Portal, implemented as a Windows Service. I works and scales very well with many concurrent users.

A recent development change was to optimise and add caches for static reference data. It result had a very positive effect on improving performance for our current "scale" of load.

We deployed this new version (after passing the UAT lab without issue) and it even went without issue for an extended time. Then suddenly, we observed that the service was maxing out all 8 CPU's on each server and staying maxed out. It was so severe that even logging into the server remotely was difficult and slow. The service stopped responding to any new DataPortal requests. Soon it started happening on the other servers too.

The fact that it never crashed, threw an exception to the logs, leaked memory / handles / etc. meant that it was very very difficult to identify the culprit of the "hang".

Such behaviour strongly suggested an infinite loop, but we were hard pressed to understand how, since we passed the automated test and UAT. We were concerned that it might be a WCF or CSLA issue, but could not think why now.

We had nothing to go on - so we restarted the services and prepared a simple batch file to debug dump the service memory when the CPU's max-ed out. We also changed the CPU affinity of the service to at least leave 2 CPU's remained available for the OS and other services on the server.

The subsequent and time consuming analysis with WinDBG proved fruitful as it showed as many threads as there are CPUs, with 100% load and an excessive amount of time spent in user mode. All of them were at the exact same place in the stack (at Dictionary.FindEntry(...) atop of GetConsolidatedList(...) )

SoSex analysis confirmed that there were no deadlocks, blocking happening, or resource starvation. The finalizer thread was idle with no excessive GC work.. 

To us, it would be highly beneficial if at least the reading of property values for a CSLA business object was thread-safe, even if it used rudimentary/expensive locks to sequentialize access to critical code.

Would the the change suggested by by trives be something that could be reconsidered for the currently stable releases of CSLA (3.8.x; 4.3.x and 4.5.x)?

Thanks all for your input!
Jaans 

JonnyBee replied on Friday, February 15, 2013

Hi,

It would seem that you create "uninitialized" objects (no managed properties has been set) and add to a cache and then multiple thread try to read/set values?

So when multiple threads reach this code: 

    protected FieldDataManager FieldManager
    {
      get
      {
        if (_fieldManager == null)
        {
          _fieldManager = new FieldDataManager(this.GetType());
        }
        return _fieldManager;
      }
    }

from
    private bool CanReadProperty(string propertyName, bool throwOnFalse)     {       var prop = FieldManager.GetRegisteredProperties().Where(c => c.Name == propertyName).FirstOrDefault();       if (prop == null)         throw new ArgumentOutOfRangeException("propertyName");       return CanReadProperty(prop, throwOnFalse);     }

then the initialization of property FieldManager (container for Managed Properties) is not thread safe.

One possible workaround for now is to make sure that FieldManager is initialized  before it is added to cache.

f.ex:

    private ReadOnlyBusinessObject()
    {
        var fm = FieldManager;
    }

ngm replied on Friday, February 15, 2013

JonnyBee

One possible workaround for now is to make sure that FieldManager is initialized  before it is added to cache.

f.ex:

    private ReadOnlyBusinessObject()     {         var fm = FieldManager;     }


 

That might not work at all.

Creating two objects of different types or event the same ones - "fixed" as suggested above (ReadOnlyBusinessObject) on two threads (what should be totally supported) can lead to the same race condition. As I stated earlier, the shared resource i.e. consolidated list dictionary is probably the core of the issue. Even if it is not in this case, it's still not locked correctly since dictionary is thread safe for multiple reads i.e. before you write to it, you have to serialize any reads.

Jaans, that's the debugging worst nightmare, really sounds scary.

Hopefully, you're going to solve it soon.

- ngm

trives replied on Friday, February 15, 2013

I think JonnyBee's suggestion is reasonable, but I believe it would need one important change.  It would need some sort of lock using a synchronization object that is global in scope to the application.  Something like this:

private ReadOnlyBusinessObject()
{
        lock (FieldManagerSync.SyncRoot)
        {
                var fm = FieldManager;
        }
}

public

 

 

static class FieldManagerSync
{
    private static readonly object _syncRoot = new Object();
    public static Object SyncRoot { get { return _syncRoot; } }
}

I'm pretty sure this would suffice, but I should probably think about it some more and test it out.

It would be ideal for FieldManagerSync not to be public.  Our application has a framework-type assembly that contains classes that extend all the standard CSLA stereotypes like BusinessBase, ReasOnlyBase, etc..  So, in that framework assembly, FieldManagerSync could be internal.

 

 

ngm replied on Friday, February 15, 2013

trives

I think JonnyBee's suggestion is reasonable, but I believe it would need one important change.  It would need some sort of lock using a synchronization object that is global in scope to the application.  Something like this:

private ReadOnlyBusinessObject()
{
        lock (FieldManagerSync.SyncRoot)
        {
                var fm = FieldManager;
        }
}

public class FieldManagerSync
{
    private static readonly object _syncRoot = new Object();
    public static Object SyncRoot { get { return _syncRoot; } }
}

Locking in the constructor of the BOs is not  so good idea. Even if you put that aside, CSLA lazy initializes FieldManager so that in scenaros where it's not used at all, no additional memory or perf. overhead is incurred.

Also keep in mind what is the shared resource that has to be protected here. FieldManager instantiation or its instance members are not for sure. That's because CSLA's BO instances and their properties are still not thread safe i.e. setting and getting property value through FieldManager which in turn manages its internal IFieldData instance of array is not thread safe. But consolidated list dictionary is shared resource, because no matter that two threads interact with two different BO instances, it can potentially lead to the same resource - consolidated list.

There's reason why access to consolidated list is locked at the first place. However, it seems it's only protected against race condition where two threads are not going to get two different instances of List<IPropertyInfo> for the same Type. But its locking doesn't ensure single writer with no readers, as required by Dictionary.

- ngm

 

trives replied on Friday, February 15, 2013

ngm raises all good points.  I should have given that more thought before posting. 

It sounds like Jaans was able to refactor the application code to workaround this issue.  If there is anyone stuck with this type of problem and refactoring isn't practical for some reason, this is what we are trying at the moment.  It's strictly a short-term fix.  It's a not a great long-term fix because it depends on the internal implementation details of the .NET Dictionary, which Microsoft could change without notice in future versions of the .NET Framework.  It also assumes that there is only one way for Dictionary to cause an infinite loop, and that's a huge assumption.  After studying the Dictionary code at length through .NET Reflector, I believe that reading the Dictionary during a write can follow several potential code paths that could cause an exception, but for now I can find only one code path that would seem to cause an infinite loop.

Our server-side application is an ASP.NET application that hosts the WCF services called by a Silverlight client.  In the Application_Start method (in global.asax.cs), I added a call to get a managed property.  I chose UnauthenticatedIdentity because it's relatively small and doesn't involve any data access.  The call is just something to trigger any business object to lazy load its FieldManager.  This will cause the FieldDataManager to call GetConsolidatedList and initialize the static Dictionary.  I'm assuming a single thread runs Application_Start before any other threads can execute.

 

 

protected void Application_Start(object sender, EventArgs e)
{
   
var name = Csla.Security.UnauthenticatedIdentity.UnauthenticatedIdentity().Name;
}

ngm replied on Sunday, February 17, 2013

I tried little bit of stress testing today with three multi-core boxes.

My test was around parallel instantiating of FieldDataManager for a pool of 20 types representing BOs. I allocated from 2 to 8 threads, where each picked random number of types from the pool for up to 20 of them. The number of instantiations of FieldDataManager ranged from 10 to 100.

After one cycle finished i.e. all threads completed, shared dictionary of consolidated list was cleared and the cycle continued.

I did up to 50k cycles. My machines ranged from 2 to 8 cores.

No infinite loops were experienced during this testing.

Still it would be much safer to lock that dictionary read/write access appropriately.

- ngm

 

 

Jaans replied on Sunday, February 17, 2013

@ngm - Are you running these tests on the existing code to try and find the fault, or is this against the changes discussed thus far showing its pretty safe?

ngm replied on Sunday, February 17, 2013

Jaans,

 

All tests were run against current code base i.e. without any proposed modifications. I really expected to trigger this problem with my tests, but I'll see to add even more variability in the testing logic when I get some spare time.

Do you have any repro for this issue?

- ngm

 

Jaans replied on Sunday, February 17, 2013

After discovery of the issue in Production and diagnosing it via WinDBG, we were able to create a stress test that I was able to show up the same issue in our software.

Basically we created a console application using the TPL to fire-up about 50K tasks that all executed the same code, and for about 60% of the time, it triggered the issue with the same stack trace for the multiple executing threads as observed in production with WinDBG.

The following aspects could be key to the issue:

Unfortunately our tests are against portions of our software product and I haven't gone to the trouble of building a separate proof concept of this.

Hope that helps

 

skagen00 replied on Tuesday, February 19, 2013

I guess the interesting followup would be - did you try any of the suggested adjustments that might alleviate this and thus make it no longer replicatable? I think it's great that you were able to get something fairly replicatable.

Jaans replied on Wednesday, February 20, 2013

I've created a sample application to show the issue, hopefully to allow for testing of the suggestions and any subsequent tweaks.

The sample application is too big to be attached (if it includes the NuGet Packages - so I've enabled NuGet package restore) - refer attachement.
I have also placed it on my SkyDrive, along with a video showing the behaviour:

Feel free to try your suggestions and tests above against this project. Currently it's referencing CSLA 4.5.10 via NuGet.

Jaans

JonnyBee replied on Wednesday, February 20, 2013

Thanks Jaans,

We will look into it. 

skagen00 replied on Wednesday, February 20, 2013

BOOM shakalaka! ;) (couldn't resist)

Jaans thanks for doing that, and Jonny thanks for looking into it. 

It's an issue in our own production system and currently I have a utility literally monitoring CPUs and killing processes when they spike to >90% for a certain time threshhold.  It's a once a day sort of thing but due to the nature of the problem it happens in off-hours, which required such a utility to help smooth things in the meantime.

trives replied on Wednesday, February 20, 2013

Jaans,

Thanks for the test application.  I think you'll find that if you override CanReadProperty in Configuration.cs, then the test application will run to completion.

public override bool CanReadProperty(IPropertyInfo property)
{
   
return true;
}

The base CanReadProperty references a dictionary (_readResultCache) that caches read authorization for each property. The code does not prevent read-while-write scenarios, so the potential exists for the infinite loop scenario.  For reference, the code is around line 201 in ReadOnlyBase (at least in the version of CSLA I have).

 public virtual bool CanReadProperty(Csla.Core.IPropertyInfo property)
{
   
bool result = true;

    VerifyAuthorizationCache();

   
if (!_readResultCache.TryGetValue(property.Name, out result))
    {
        result =
BusinessRules.HasPermission(AuthorizationActions.ReadProperty, property);

       
// store value in cache
       
_readResultCache[property.Name] = result;
    }

    return
result;
}

Our application, like yours, uses several ReadOnlyBase objects as cache objects.  We don't have any authorization rules for the properties in our cache objects, so it's not a problem to override CanReadProperty like that.

Note that CanExecuteMethod has the same type of issue.

Again, thanks for supplying the test application!

Tim

trives replied on Wednesday, February 20, 2013

Just to clarify...

I still believe that FieldDataManager.GetConsolidatedList should provide locking that prevents read-while-write scenarios on its dictionary.  Jaans's test application, however, seems to expose an issue in ReadOnlyBase.CanReadProperty instead of the potential issue in FieldDataManager.GetConsolidatedList.

I ran the test application with the standard CSLA executable and one that was modified with the proposed locking changes mentioned earlier in this thread.  The CSLA executable didn't seem to make a difference to the test application, but that doesn't mean there isn't a problem.

The bottom line is that read-while-write scenarios on a dictionary in a multi-threaded environment is risky.

ngm replied on Wednesday, February 20, 2013

Jaans, nice job showing it.

It's obvious that the issue is behind dictionary that caches authorization results, as Tim pointed out.

That's the reason my tests didn't hit this since they were around FieldDataManager's consolidated list of properties.

Even though the problem is nasty, it basically confirms that CSLA objects are not thread safe. Authorization results cache is local instance for every BO as opposed to consolidated list. By syncing access to authorization results cache nothing is going to be achieved because there are a lot of other mechanisms per object instance that are still not thread safe. One of them (related to authorization cache in particular) is caching-per-principal which also needs to be thread-safe. Up to the point where even lazy instantiation of the cache itself needs to be locked and so on.

There's tons of works in order to make BO's thread-safe, with the core question whether that's the real goal of CSLA?

On the other hand, this particular scenario of caching read-only objects is very common. Personally I think that the root of the issue (keeping in mind that CSLA BOs are not thread-safe) is that the access to every single cached instance of that read-only object for duration of its members access had to be synced at the first place.

I agree with Tim that consolidated list in FieldDataManager is still something that has to be synced with single writer semantics or in other words, that's part of the property management infrastructure which is shared and thus has to be thread-safe.

- ngm

 

RockfordLhotka replied on Monday, February 25, 2013

I've filed an issue for this

https://github.com/MarimerLLC/csla/issues/42

But the reality is that even solving this particular issue might reveal another threading issue. As Jonny noted earlier in the thread, the CSLA base classes aren't threadsafe.

Or to put it another way, there is _absolutely_ no intent to allow multiple threads to interact simultaneously with a single CSLA business object instance. That is entirely out of scope for the framework.

We _might_ try to implement a fix around this particular scenario. If there's a repro/test that establishes that there's a problem with the code when multiple threads DON'T interact with the same object instance, then the priority of the issue will become very high.

Otherwise it is something to be considered only after doing some serious perf testing. Adding locks in code that runs every time a property is touched could cause major perf issues, because data binding touches properties constantly. It would be important to ensure that forms with a few hundred data bound properties wouldn't become extremely slow due to additional locking.

trives replied on Friday, February 15, 2013

Jaans,

The behavior you describe is extremely similar to what we experienced in our production environment (i.e. we also had all 8 CPUs at 100% with no exceptions being thrown by anything, no apparent memory leaks, etc.).  Our QA testing also did not expose any problems.

Thank you for sharing all the details of your debugging efforts.  It gives us more confidence that we are on the right track to finding a solution/workaround.

Tim

Copyright (c) Marimer LLC