Database and Object Design Question

lefty posted on Tuesday, October 16, 2007

I'm working on rewriting a legacy loan application system. As I'm working on the database structure I'm trying to normalize the data anywhere that seems worthwhile. The end plan is to create either one large application screen where all the application info is entered or a wizard like UI where they switch from screen to screen to enter the data.

There is a lot of information that is taken from the applicant and also a lot of parameters that are copied into the application from the loan product definition tables. (We don't want a change to a loan product to affect previous applications.) Even with normalization of things like addresses I'm going to end up with at least 800 columns in the LoanApplications table. I'm going to be using SQL Server 2005 for this project, and so far I believe I should be able to do at least 1024 columns in a table.

The question is: Should I leave those 800 columns in the one table, or split them out between multiple tables?

The current system has them split out between multiple tables; however the naming of the tables makes almost no sense. It also causes a lot of table joins when we do reporting on the data.

I'm thinking that if I go with one large application screen I'm going to want all the data in a single BO. It would be much easier to grab all that information from a single table, although a stored procedure that reads from multiple tables isn't much harder. It would also be easier to insert or update data to a single table than to have to do it to multiple tables.

If I do the wizard interface I'm not even sure yet what my BOs would look like, but splitting the tables out to match the wizard screens wouldn't make it much easier, as some data is duplicated across screens so there is still data I would have to pull in from multiple tables for each screen. This is also the interface the end users are most familiar with and that more than likely will be the chosen method.

So at this point, I'm wondering if there is a performance difference between the two methods? This is also going to be used for reporting, so I want the reporting to have good performance as well as the application.

Does anyone have any suggestions? I’m also looking for any experience with creating BOs for a wizard like interface using CSLA.

ajj3085 replied on Tuesday, October 16, 2007

Wow, 800 columns? Well, I'd think about what data you're likely to use at one time. Are you REALLY going to read or write all 800 columns at once? Probably not. I'd try to split things up that way.

Then, if you're worried about reporting, just create views to bring the data together, and write your reports off of those views.

Performance wise reading / writing to one table with a lot of columns may create locks on the table, which could reduce performance. I'm not a db expert though... so verify that before you proceed, but I do think I remember coming across that point at one time.

Telarian replied on Tuesday, October 16, 2007

Now here's a subject I have some expertise in. Really your question is a database design related one and there are far better places in which to pose it. Here for instance.

That said. I happen to have some expertise.

I would never dare speak in absolutes, but I really doubt you need an 800 field table. That would be extremely rare and loan application programs are not extremely rare. That just doesn't happen in a properly normalized relational database. (Unless it's Oracle time tracked accounting or something like that and that's just because the people at Oracle are crazy muppets or some such thing.) You said something about leaving it in an 800 column table but then you also said that it is currently spread out into multiple tables. So really you would be changing things. Combining data that is currently spread into multiple (probably normalized) tables into one monster table. Correct?

You said that the naming of the tables doesn't make sense. It doesn't make sense in terms of database design or it doesn't make sense in terms of the specific business space? It's possible that it would make more sense to an experienced relational database designer.

Joins in your queries are not really a bad thing, they are just the way things work in a relational database. I'm not saying they shouldn't be avoided in some situations but normally they are actually kind of a good thing. You just have to be proficient at using them. There are situations in which it might be best to do things the way you're suggesting. Off hand I would say they are very rare and I doubt that's your situation. There are even situations where it might be best to have both. Two copies of the data. A very dangerous thing to do indeed. None the less, those situations do exist.

The truth is that you would have to have someone that knows database design with at least some details of your situation in order to give you a solid answer. There are so many variables that go into good database design. What's the volume of data? What's the volume of transactions? Is the work load more transaction centric or report centric? What's the timing of the workload and the different parts of the workload? What's the hardware situation? My thoughts here are really all just guesses. Give an experienced relational database designer a list of your 800 fields and I'd bet he would spot some obvious candidates for normalization just from that.

I can tell you this much with certainty. Basing your database design on what kind of data entry screen you are going to use, is pretty silly. Not a good way to come up with a database design at all.

Yes there are definite performance differences between the two strategies you've described. They also depend a lot on some of the other factors I mentioned but yes there are major differences.

I would be happy to offer any advice I can to you on the subject. It's nice to help someone with a subject I'm an expert in rather than always being the clueless one reading this forum.

lefty replied on Tuesday, October 16, 2007

Thanks for the response, it helps me a great deal. I'm new to CSLA and although I have some database experience, it's always been small projects with a few small tables. The link to the other forum is great as well, since I have a bunch of other database questions about thinks like replication.

I've decided that it's smart to try to split the columns into a bunch of small tables, especially to avoid having to write 800+ columns after every change to a BO. I also think that smaller tables will give smaller more manageable BOs. I can also separate columns that get changed or used rarely from the ones that get used often.

The current system is written in Progress and the way the main part of the database is designed there is a ~300 field master table that holds most of the data, another 150 field "misc" table, and then a few smaller tables with specific purposes. There are also 5 tables at about 200 fields each that contain only spare columns (each of a different type, ie: an int table, a date table, a decimal table, etc). Those are just the tables I've had time to investigate in detail. I'll be able to remove a lot of columns just because they aren't used, and there is some room for normalization. A lot of the columns are "state" columns. They basically hold the "state" of what some other data in the database was at the time the loan application was created.

A good example of this would be a quote system where your item prices all change, but you can't change the price on the quote, so the quote has a field that stores the prices of the items as of when the quote was created. This could be solved by moving the price to say a date-keyed table of it's own, however the difference between this and the loan program is that the loans have at least a good 200 parameters that can be modified to change the loan criteria and the way the loans are calculated. I'm thinking I could probably solve this with some sort of date-keyed table as well, but I'm not sure if that's the best solution. Have you ever had to solve a problem similar to this?

I understand that it's silly to build the database design around the UI, however I feel like it's all linked. Reading Rocky's book I understand the BOs should be designed around use cases. I feel that my use cases depend some on the UI I'm looking to use. The object relational mapping is one of the parts I'm worried about and I don't want to end up with a database design that makes the ORM part of the project super difficult, or makes my wizard UI almost impossible.

Originally my question was going to be more about the Wizard UI and BO creation, which is what I'm really stuck on, but as I starting writing it out I realized the database was going to need to be redesigned and that part was still a big question mark.

Wal972 replied on Tuesday, October 16, 2007

Whilst the database design and the UI are linked and effect each other they have the own requirements.

The BOs are based on the use cases which are not the same as the database. Think of it like your desk and the filing cabinet what works for one, definitely doesn't work for the other.

The idea of separating our the database into multiple tables works much better because it keeps you data storage as tight as possible.

From previous experience of using loan application programs the questions are generally grouped around themes. ie. Wage situation, loan types, etc. So breaking the database along those lines would make it easier.

Hope that helps

Ellie

JoeFallon1 replied on Wednesday, October 17, 2007

I like to store time sensitive data in the same table as the transaction. I may have a table of Items with prices. But when a user buys something I store the price with the Order so that the price at the time of the order is never lost when you update pricing in the Items table. I like this design better than a date-keyed table. I guess one advantage of a date-keyed table is that you can easily get a trend of price changes over time. But if that is not a requirement I would go with storing the value with the transaction.

Joe

Dog Ears replied on Wednesday, October 17, 2007

If you're going to store the current value in a prices table and the actual value for each transaction, then considering the amount of data you're storing I can't see the issue with doing both:

Storing the actual price for each transaction along with the transaction data.

And...

Storing a date-keyed historic price table.

Regards,

Graeme.

n.b. MS SQL server has got a limit of 8060 bytes per record [or a bit less]! So this could be a consideration if storing 200 fields in one table..!!!