Sunday, September 5, 2010

NULLs and the Database

Database:
Oracle is not a relational database: http://www.webopedia.com/TERM/C/Codds_Rules.html

You'll see here that Codd is actually agreeing with nulls - which annoyed Date and they had a whole series of articles on what nulls mean and why it's a mistake - in the late 80s, early 90s. Basically
Date seemed to win the debate - at least in my mind.

http://www.amazon.com/Relational-Database-Writings-1989-1991-Date/dp/0201543036/
http://www.amazon.com/Relational-Database-Writings-1991-1994-Date/dp/0201824590/

Date was saying that nulls lead to 3 values logic (3VL) which leads to errors in logic things like NULL AND T in truth tables for examples - nulls make no logical sense - and as a system based on relations and logic it's a bad system.

Trivalent Logic:


Hibernate UserType
There's a reasonable example: http://i-proving.ca/space/Technologies/Hibernate/User+Types+in+Hibernate

This one lets nulls but combines readResolve(). http://community.jboss.org/wiki/UserTypeforpersistingaTypesafeEnumerationwithaVARCHARcolumn

Thoughts on the counter arguments??: http://mooneyblog.mmdbsolutions.com/index.php/2010/04/14/quick-rant-on-why-database-nulls-are-not-necessarily-evil/
Nulls aren't evil. They aren't logical - they're simply a mistake.
A good article about a different domain (Geo science) and how missing data, nulls and the problems that there are: http://portal.acm.org/citation.cfm?id=1296606

Calculations with NULL
Basically you can't compute on null - hence NPE.

Another example is that COUNT counts nulls but other aggregates don't (AVG and SUM). If nulls were values why can't you average them or sum them? Why can you count them?

Relational databases are based on relations and set theory. Sets are empty - they aren't null. And semantics with a good foundation in maths makes the whole thing just generally better (another book "Third Manifesto" goes into some detail about what a proper relational database would look like).

Another Date book really describes what an outerjoin (again nulls cause problems there) is and so on: http://www.xaprb.com/blog/2009/03/29/a-review-of-sql-and-relational-theory-by-c-j-date/


The guy that introduced NULLs into SQL and RDBMSs (Codd), said later that one null was inadequate and SQL should be enhanced to allow for two nulls: 1 for unset (Missing but Applicable), 1 for missing (Missing but Inapplicable)  http://en.wikipedia.org/wiki/Null_%28SQL%29