DB Fault tolerance - thoughts / ideas?

rosscj posted on Wednesday, October 26, 2011

We're in a spot now where we are engineering software solutions to replace a lot of old Access97 + VB6 GUIs. (Finally).

MSSQL 2008 R2, VS2010, CSLA framework here now, and I almost have these guys convinced it's the way forward. Some discussions are coming up about 'What if the DB goes down". There are some old apps here that our production floor rely on to work, if it can't write data to the database, it's at a standstill.

For the critical apps out there, I'm thinking a 2 physical-tier approach; UI and business layer on the process PC, database is in the IS datacenter.

Mind you, I'm still sorting out the logic, I'm coming from the old school vb6 business object days, so a lot of the CSLA stuff now is.... wow. Cool; hard on the head sometimes, but I'm getting there. (CSLA ebooks bought & read, VB2008 & c# 2008 books read. I'm absorbing all I can!)

Getting to my question - on this proposed 2 tier approach, I realize it's not the best, but I fail to see how else to do it. If the DB is down, it shouldn't stop the production process; should we be looking at some sort of MSMQ solution with CSLA? If the database transaction fails, put the transaction in a queue to be processed? What techniques have you guys utilized in the past to overcome the 0.0000001% chance the db will be offline, but your app has to keep chugging along; handling additional dal code?

On a positive note, I'm sure I'll have lots of opportunity to hone my skills! We have 100s of apps to look at and re-write. (Hence my pushing to use CSLA as part of our coding best practices...)

Thanks in advance,

Chris

tmg4340 replied on Wednesday, October 26, 2011

As Jonny has already suggested, the Sync Framework may be a good idea for you. The only suggestion I would make there is that you're going to want to treat your application a little like Outlook: as far as your app knows, the *only* database for it is the local CE database. Let the Sync Framework do the work for you in getting that data to and from the "real" database.

Outside of that, you could also build out to a 3-tier system, where you have an app server that sits between your desktop and the database server. That makes your desktop app simpler (it only worries about talking to the app server), and is also supported out-of-the-box by CSLA. The potential advantage there is that you can concentrate your "DB isn't there" code in your app server. You still have to cache data locally for the app, which in the end may look quite a bit like what the Sync Framework does. But you can customize it for your particular situation if necesssary, and it's not all that bad - you likely are going to cache some data locally anyway.

Obviously that complicates your deployment a little, as well as your environment. But you can gain quite a bit in robustness and scalability.

HTH

- Scott

rosscj replied on Wednesday, October 26, 2011

Hi Scott;

I've thought about that, but then the issue now becomes 'what if the app server goes down", or "The forklift just ran over the CAT5 cable for this process PC" situation. For our 'app must remain alive" situation, the fewer points of failure, the better.

Mind you, I do totally get the 3-tier approach, and look forward to sinking my teeth into it. We'll get there, and CSLA will be there every step of the way, (I can smell multi-interface apps coming my way in the new year); but in this specific 'run at all costs' situation... I'm not so sure.

I like what I've read about the Sync framework. Looks like it will fit the bill nicely... sorry I missed it earlier! Thanks for your input, I do appreciate it.

Chris

Paul Czywczynski replied on Wednesday, October 26, 2011

I would back up a few steps and look at your infrastructure before considering sync as your end-all solution.

Lets start with the SQL Server. A SQL Server can be clustered with multiple nodes. When properly set up you'll be able to recover "mostly" transparent to the still active SQL Server node when the primary goes down. A side bonus this will work with all your existing SQL Server based applications. If your worried about transactions then take a look at using a transaction coordinator with SQL Server clusters or using SQL Server's mirroring functions.

Next is the network. Loosing connectivity to the SQL Server that can be, again "mostly", resolved using multiple network paths from the client to the SQL Server. Redundant NICs, firewalls, switches, cabling, etc... To set up a redundant network, figure on buying two of everything. Keep in mind plugging all the redundant equipment together doesn't make for a healthy network. Today's switches don't scale well if you haven't thought out your spanning tree structure.

I put emphasis on the word "mostly" because nothing is perfect. Generally off the shelf networking gear and SQL Server can recover to redundant hardware but they usually drop the active TCP connection. If your client can retry a dropped connection you'll be fine.

Personally I would rather invest in your server hardware and network. If you get it resilient then that would not only help your application but everything else that depends on those resources. I know its not cheap but it makes for happy users and IT people that can work on business problems instead of chasing fires.

In the end if you have all the redundancy in the world and you're still worried about downtime then the SQL Server Sync Framework is your last choice. We've tried syncing in the past. It's easier today but still complex. It's super easy to fail-over, tremendously difficult to fail-back. Unless your application is designed for merging data from the start you're going to have a tough time. Save syncing for applications that are designed to be offline. It shouldn't be your redundancy solution.

-Paul

rosscj replied on Thursday, October 27, 2011

Hi Paul;

Very good hardware suggestions. I've followed these best practices in the past (14 year's of being the entire IT department, one man wrecking crew so to speak. Network design & admin, DB analyst/design, email, app development, etc... you get the idea). A corporate merger meant the less senior people in various departments were let go. Fast forward 4 years (3.5 years of production line machine maintenance), here I am in a new role.

I have a lot of experience with the process PC's as it talks to production PLC hardware; (first line of defense as a shift troubleshooter). Strange to have the perspective of the previous designer, maintenance troubleshooting aspect, and now re-engineering perspective with today's technology rather than band-aid 1990's software & mindset. It's not a bad thing, it's certainly helping me with design considerations.

Getting to my point here, the network design here isn't bad at all, perhaps my previous posts made it sound a lot worse than it actually is. Based on past experience, I can vouch for the network performing @ 99.99% uptime. Not so much with the shop floor hardware, but that's yet another tangent...er, design consideration, sorry! ;-)

My hands are pretty much tied with the server situation (in process of installing a failover SQL box), and the network part is, again out of my hands. Having said that, the infrastructure is pretty solid and sound.

It's that 0.01% "2:30am-crap-has-hit-the-fan" production floor situation I have to account for in this design; if nothing else, to show management that it's at least been a factor in the design stage. And, this design consideration is only for *maybe* half-dozen applications out of many that are going to be built in the next 5 years. Sync Framework here will be used sparingly, as a redundancy on the process PC side; definitely not plant wide with every app.

Anyway, that's the short version... thanks for the suggestions. It's all good!

Chris

DB Fault tolerance - thoughts / ideas?

rosscj posted on Wednesday, October 26, 2011

JonnyBee replied on Wednesday, October 26, 2011

rosscj replied on Wednesday, October 26, 2011

tmg4340 replied on Wednesday, October 26, 2011

rosscj replied on Wednesday, October 26, 2011

Paul Czywczynski replied on Wednesday, October 26, 2011

rosscj replied on Thursday, October 27, 2011

ajj3085 replied on Sunday, October 30, 2011