25 Aug 2008

Storing Conditions in the Database

Caching Business Logic in the Database




I had this relatively simple idea of storing conditions in the database. The idea is to store the condition of the if-then-else from your code in the database. While this will add (slightly) more space to the database, I hope that it will do away with processing the data over and over again as well as help drastically speed up searches and any analytical processing you would like to do in the future. I will try to explain with an example..


Story:
You run an electronics store and you have sales people working in that store. You would like to give a bonus incentive and you decide that the rules for the bonus would be the following:
1) Sales people that sell more then average amount of money each for at least 15 days out of a month will receive a bonus. After a bit of calculation you see that the average is 300$.
2) If anyone received a complaint from a customer, that's an automatic disqualification of the bonus.
3)If the sales person came late more then 5 days in that month, they will be disqualified from the bonus.


Now we have our requirements, lets try to do some pseudo code:

1) Go to the `Employee` table and check for all the sales people that are currently working.
2) Go to the `Sales` table and check for each sales person, if they got over 300$ that day.
3) Count how many days they sold over 300$ that month. Check if its over 15 times.
4) Check for each sales person in the `Complaints` table, if they got a complaint. If they did, they cant get the bonus.
5) Check for each sales person in the `Timesheet` table, if they were late that month. Count if its over 5 times. If it is, then they cant get the bonus.

Final check after all the information is gathered:
Check for each sales person if they sold more then 300$, more then 15 times that month and if they had no complaints and if they didn't come late more then 5 times, then give them the bonus


How long would it take?
Now lets take a moment to theorise the CPU and hard disk processing required for all that calculation. You would need to check the table where the sales people are located. Then, according to which programming background you come from, you would either create an array or list of objects, then run a query for each sales person x3, because its 3 different tables. Or you could do a complex join of all tables using SQL (and probably generate the results faster). Either way, you are going to work the hard disk a few times, use memory and use the CPU to do some light math.
(This why I like to test my SQL with MySQL Query Browser. I can see exactly how fast it took to generate the result)


Suggestion
This is where I suggest something not too controversial, that you might want to store the conditions for the final decision of the bonus in another place.
For this example, I would suggest creating an extra table that would have:
The date of each day - date
The sales person ID - integer
Did s/he more then average that day - boolean (Edited from - sold more then 300$)
Did s/he come late that day - boolean
Did s/he get a complaint that day - boolean

The booleans store the conditions in a compacted way. All you need is true or false for the answer to the conditions. Then whenever you want to see the results of who will get bonus, you can query that table. This could save you that processing and waiting time we talked about earlier.

Depending on how fast you would like your response to be when generating that list, you could use this method which does take up space but saves on processing time.


Notice also, that I do not specify how you should process and generate the new table. You can write the process for finding and storing the conditions in an application or with stored procedures or PL/SQL.
Since there is a trend in MySQL for "less-is-much-faster" with Drizzle and that many organisations see putting logic in the database layer as evil, I am leaving the decision in the hands of whoever uses it.
I believe that storing the conditions in the database should compliment the database layer and the application layer, but I need to test my theory some more.


Final Note
Please tell me what you think about this approach. I am sure its not new, but I would like to get feedback about it.

I plan on continuing with it and already plan to do a more complicated example soon.

4 comments:

  1. Hi!

    "...Then whenever you want to see the results of who will get bonus, you can query that table. This could save you that processing and waiting time we talked about earlier.

    Depending on how fast you would like your response to be when generating that list, you could use this method which does take up space but saves on processing time..."

    Well, wouldn't you have to take the cost of generating that list into account? So that at any rate, the consideration to put the conditions into the database would not save any costs, merely change the moment when the cost is suffered?

    Personally, I would only store conditions for business rules into the database if I would want the flexibility of changing the rules. And if I would do it, I would probably only store the raw constants that drive the logic, not the outcome. If you precalculate the outcome of the sub-rules ("earned 300$ or more", "less than 5 times late", "0 complaints") then you lose any hope of calculating the bonus in case the rules changed. For example, if you decide to change the rule "earned 300$ or more" to "earned 350$ or more" then you will have to either update the flags for the last month, or settle for unjustly shelling out the bonus. If on the other hand you store the amount that must be earned ("300" and then "350") than any calculation: "earned ${bonus-earning} or more" will still work regardless of the actual value.

    ReplyDelete
  2. Hey Roland

    Yes, there would be a cost anyway to calculate it and I should have mentioned it in the post. My point was that it would be a one-time pre-paid cost.
    * One-time meaning - if 3 different managers want to see the data (lets say sales, accounting and human resource), then you wouldn't need to process it again.
    * Pre-paid meaning - you would process it before you would ask to see it, probably at the same time you inserted the data. If the time to insert then becomes unbearable, then this solution would be counter-productive.

    It very much depends on how you use the data. If you insert rarely and view often, then this would benefit you more.

    Regarding storing conditions, in my example I meant to use "sold more then average" which is more flexible. I will make the change now.

    As always Roland, I appreciate your feedback.

    Best Regards
    Jonathan

    ReplyDelete
  3. Hi, your solution looks a bit like a star schema. You should read about Dimensional Modeling (DM) which is a data-mining paradigm led by Ralph Kimball. I haven't read enough about it but I think it may be applicable to this sort of problem.

    See also:
    "A Dimensional Modeling Manifesto"
    http://www.dbmsmag.com/9708d15.html

    "The Data Warehouse Toolkit" by Ralph Kimball and Margy Ross
    http://www.amazon.com/Data-Warehouse-Toolkit-Complete-Dimensional/dp/0471200247/

    ReplyDelete
  4. Since even one complaint, regardless of the day, negates the bonus, you could probably just have a single complaint flag on the employee entry, which is set to true as soon as the first complaint of the month comes in. Then you only need to check against that rather than against each individual day.

    Similarly, you might implement counters for "Days over $300" and "Days late". Instead of compiling them one by one and then summing them at the end, you just add one to the counter fields. Then your query for a bonus becomes a simple:

    SELECT * FROM employee
    WHERE greater_than_avg >= 15
    AND days_late < 5
    AND complaints = 0

    This, of course, assumes that you don't need the specific days that things happened for other record-keeping, but even if you do store them elsewhere, this touch of de-normalization can increase performance a bit (as long as you have a LOT of employees... otherwise the difference will probably be negligible).

    That said, I am inclined to agree with Roland that this is probably a bit more specific business logic than you want in the database.

    ReplyDelete