Sunday, September 12, 2010

Eclipse - Shortcut Keys

Block Select Mode: Alt + Shift + A
Mouse Hover: Ctrl + 1, Ctrl + 3
Incremental Search: Ctrl + J
Java Search: Ctrl + H
Refactor Menu: Alt + Shift + T
Change Method Signature: Alt + Shift + C
Extract Method: Alt + Shift + M
Format Code: Ctrl + Shift + F
Move: Alt + Shift + V
Re-Run: Ctrl + F11   (For CTRL+F11 to work the way you want, you must set (from "Windows/Preferences") the "Run/debug > Launching : Launch Operation" setting to: Always launch the previously launched application)
Shift Code Down 1 Line: Alt + Down
Shift Code Up 1 Line: Alt + Up
Inline: Alt + Shift  + I
Show Javadoc: Shift + F2
Convert to Uppercase: Ctrl + Shift X
Convert to Lowercase: Ctrl + Shift Y
Focus Package Explorer: Alt + Shift + Q, P
Run JUnit Plugin Test:  Alt + Shift + X, P
Collapse all Branches: Ctrl + Shift + /
Quick Type Hierarchy: Ctrl + T
Copy current line to below : Ctrl+Alt+Down
Copy current line to above : Ctrl+Alt+Up
Surround With: Alt + Shift + Z
Display occurrences in file: Ctrl + Shift + U
Add import: Ctrl + Shift + M
Toggle Breakpoint: Ctrl + Shift + B
Next Perspective: Ctrl + F8
Next View: Ctrl + F7
Editor View: F12
Toggle Mark Occurrences: Alt + Shift + O
Static Imports...
Favourites section under Window->Preferences->Java->Editor->Content Assist
If you use org.junit.Assert a lot, you might find some value to adding it there

Eclipse - Missing Functionality

Find in Files … Search -> File
Search and Replace in Files … Search -> File, then right click Replace…
Subversion Highlight diffs in editor
Subversion jump between changes
Subversion automatically ‘svn add’ new classes
Subversion nice diff dialog
Copy-Paste Buffer
Compare With Clipboard

mvn.bat -- change colour depending on build outcome

Got this idea from Erik Dornenburg at a Thoughtworks briefing...

Add to start of mvn.bat:
@REM yellow on black
color 0E


Append to end of mvn.bat:
if ERRORLEVEL 1 goto red
color 2f
goto final
color 4f

@Transactional autowiring problem

Given this bean1 that contains bean2
public class ServiceImpl implements Service {
    private DelegateService m_delegate;
Breaks when using @Transactional on the delegate component:
public class DelegateService implements Service {
With this exception:
java.lang.IllegalStateException: Failed to load ApplicationContext
       at org.springframework.test.context.TestContext.getApplicationContext(
       at org.springframework.test.context.TestContextManager.prepareTestInstance(
       at org.springframework.test.context.junit4.SpringJUnit4ClassRunner.createTest(
       at org.springframework.test.context.junit4.SpringJUnit4ClassRunner.invokeTestMethod(
       at org.junit.internal.runners.JUnit4ClassRunner.runMethods(
       at org.junit.internal.runners.JUnit4ClassRunner$
       at org.junit.internal.runners.ClassRoadie.runUnprotected(
       at org.junit.internal.runners.ClassRoadie.runProtected(
       at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(
       at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(
       at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(
Caused by: org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'ServiceImpl': Autowiring of fields failed; nested exception is Could not autowire field: private com.service.DelegateService com.service.ServiceImpl.m_Service; nested exception is java.lang.IllegalArgumentException: Can not set com.service.DelegateService field com.service.ServiceImpl.m_delegate to $Proxy32
       at org.springframework.beans.factory.annotation.AutowiredAnnotationBeanPostProcessor.postProcessAfterInstantiation(
       at Method)
       at org.springframework.test.context.TestContext.loadApplicationContext(
       at org.springframework.test.context.TestContext.getApplicationContext(
       ... 17 more
Same thing happens if @Transactional is applied at the method level instead of class level.

Solution is to @Autowire the delegate with resource name.

Spring Links

Spring 2.5

Spring 3.0

mvn eclipse:eclipse is not pulling in dependencies properly

“mvn eclipse:eclipse” generates a .classpath but the transitive dependencies are sometimes different to those reported by “mvn dependency:tree” (and used to generate a war file)
See this page:
Forcing a change to the .classpath file does not help Eclipse tomcat servers (they still pick up what “mvn eclipse:eclipse” generated).

SVN Update Batch Script

start TortoiseProc.exe /command:update /path:"d:\work\1" /closeonend:0
start TortoiseProc.exe /command:update /path:"d:\work\2" /closeonend:0
start TortoiseProc.exe /command:update /path:"d:\work\3" /closeonend:0
start TortoiseProc.exe /command:update /path:"d:\work\4" /closeonend:0
start TortoiseProc.exe /command:update /path:"d:\work\5" /closeonend:0

JMX and DEBUG options for Tomcat

export JPDA_ADDRESS=8888
export JPDA_TRANSPORT=dt_socket jpda start

Wednesday, September 8, 2010

Tony Hoare Billion Dollar Mistake Transcript

Transcript from Tony Hoare's presentation on the danger of nulls.

Don’t know how much it’s cost the IT industry. Somewhere in the order of $100m - $10bn
How I came to make this rather bad idea.
Reference to array with subscript
Check every subscript agains the array bounds on every access to an array – historically good idea.
Java has reproduced this decision made 30 years ago

Extension of ALGOL-60 committee designed successor to ALGOL-60
Hoare suggested “record handling” – concept of object, an object to which reference could be made via a pointer
Bitter experience with machine code using pointers, indirect addressing – wreak ultimate havoc on program because if you happen to use a floating point number or integer as pointer and update contents of whatever location it was pointing to, it was likely as not you would update a piece of your own code.
Took it for granted that for every var or attribute that could take a ref as its value, the programmer must declare type of location of that pointer. Standard for languages like C++.
Pointer types can be checked at compile time.
Discovered years ago that this was quite an original idea. No other language checked the types of variables at indirect addresses pointed to by pointers.
Great thing about record handling was that if you structured your data as a set of records you would never have a subscript error – no need to test whether a pointer is within range - you cannot construct a pointer that doesn’t point to something that exists and is of the expected type. That is logically impossible and so here is a whole class of errors that can never happen again. Good idea to do this checking at compile-time because there was a time penalty in doing subscript checking in ALGOL compiler. Asked customers if they’d like option of switching off type checking after they’d tested the programs. Once said that removing type checking from your running programs and using them only for testing is like wearing a life jacket on your practice emergency drills and taking them off as soon as your ship was really sinking. But customers where asked if they’d like the option of taking the life jacket off – they said no. Never put the option in. But still it was a very serious mistake for Elliots. Because a lot of potential customers were Fortran users rather than ALGOL. Translate from Fortran to ALGOL. Disaster. No Fortran user would touch it. They couldn’t run any of their programs – subscript error. User’s didn’t care about subscript error – just wanted it to run like it did on their machines.

So, everything so far seems fine. We’ve got our object-orientation with careful type-checking of all the pointers, safe and there’s no overhead. Testing every subscript for its correctness.
And then I went and invented a null pointer.
And if you use a null pointer you either have to check every reference or you risk disaster – most people of course (like Fortran programmers) would prefer to risk disaster, and indeed suffer disaster, rather than check all their subscripts. I didn’t know that at the time but my friend Dykstra who was also working with the team developing ALGOL-60 thought the null reference was a bad idea and he gave a very subtle and abstract reason for it. He said if you have a null reference, then every bachelor who you’d represent in your object structure would seem to be polygamously or rather polyandorously to the same person that’s called her “nuller”. I thought that was rather a nice criticism but my real criticism is that it brings back again unnecessarily all the agony of having to choose whether to run your program fast without checking or run it slow with checking.

Now, I did know that there was a solution to this problem. It was based on the idea of discrimination between objects belonging to a disjoint union class. Idea I got from pure mathematics is that of a disjoint union that is a union between two sets which have no members in common. So a language can quite reasonably have a facility for declaring a new class not as the Cartesian product of attributes drawn from other classes but rather as a disjoint union of various other classes. E.g. if you had a class of vehicles you might classify them as either busses or private cars and separately declaring different structures of attributes for those two classes. Bus has max passengers and private car has capacity of trunk/boot. And every time you accessed either of those components of the object you would have to do it within a discrimination clause which tested. It looked at the vehicle and say when it is a bus you can look at max passengers, when it’s a car you can look at its capacity. But size of source program gets quite a bit larger by having to make all these discrimination clauses and you have to deal with both cases separately. If we insisted on a discrimination clause then we should make null not into a value of a pointer but rather into a class – a class which never had any objects in it and which only had one pointer which obviously didn’t point to any objects. So now whenever you wanted to have a null pointer you would declare your pointer to be either a pointer to the null class or a pointer to the Vehicle class or the Wife class and that would give you a way of specifying whether you wanted this pointer to be able to take a null value or not or whether you wanted it to remain definitely pointing to something and you would never have to test.

Cumbersome. Lots of corner cases. What happens if you assign a new value to the pointer which you are currently analysing and assuming to be a member of the Bus class?
Even worse problem with initialisation. One of the things you want a high-level language to do for you is to protect you against uninitialized variables. Standard way of doing that is to assign a fixed known value to every variable or attribute and null is a very – well it’s really the only – certain thing that you can assign as a value of an attribute to a recently created pointer or pointer attribute. And so if you want to avoid using null then you have to invent a whole sub-language to use for initialising pointers that aren’t allowed to be null. This is alright as long as all your objects are in a tree structure, because you can then start with the leaves of the tree, build up in a hierarchical and well-ordered fashion the value of a little tree network to be the initial value of your new variable. But if you wanted to create a cyclic structure this way there’s no way of doing it. You would assign a null value to the pointer and later on you would insert the cyclic pointer to somewhere else in the tree. All those problems in the end I didn’t want to deal with and that led me to suggest that a null pointer was a possible value of every reference variable and the possible mistake on every use of that reference variable. And perhaps it was a billion-dollar mistake.

Well the world is gradually recovering from the mistake I’m glad to say. Modern languages like C# and Spec# and Java are introducing the idea of reference declarations which are declared to be non-null and reference parameters which are declared to be non-null and compile-time checking which checks that such variables do not have, could not possible have null values. And they have had to tackle all the problems of initialisation and discrimination in a context which is now very much more complicated than I’d ever thought of all the contexts of overloading and inheritance makes the concept of initialisation very much more elaborate than I would ever have dreamt of. But that is what is happening. I think the movement must have been motivated that null references were an expensive mistake.

Billion dollars... put things in reasonable proportion... I did feel I ought to take responsibility for this mistake, but since it was unlikely I’d every be able to pay back a billion dollars I would keep rather quiet about it – so don’t tell anyone about it please.

I put forward the view (which really derived from making ALGOL octal dump proof) that a programming language designer should be responsible for the mistakes that are made by the programmers using the language. And that means, since programming language design is rather a serious engineering activity – not one that you give to shall we say programmers with 9 months experience in machine code programming [referring to self] – but rather one which requires a good scientific basis, good understanding, ingenuity, invention, a lot of control of detail and a clear objective that the programs are written by people using that language would be proof against at least certain kinds of errors and at least as easy as possible to get right, not full of little traps and syntactic awkwardnesses that people are constantly bumping their toes against.

This was the view that led to me idea of using proof, formal verification of programs, as logical and mathematical models is a method of conducting research into the design of good languages. I wasn’t too optimistic in 1969 about the likelihood that people out there would actually be using proofs to guarantee the correctness of their programs at least not in the immediate future – in fact not for the next 30 years was my prediction – but that by investigating the logical properties of your programming language and finding out how difficult it would be to prove correctness if you wanted to you would get an objective measurement of how easy the language was to use. So if the proof of program correctness requires a very large number different proof rules and if each proof rule has a lot of side conditions, in particular, if the validity of the local application of a rule to a small bit of program depends on properties which can only be established by a scan of a program as a whole then you know you’ve done a bad job as a language designer and you do not need your customers to tell you that. Mind you they don’t because it’s actually very easy to persuade the customers of your language that every thing that goes wrong is their fault not yours. But I rejected that and thought that no, language design is a serious scientific engineering activity and we should begin to take responsibility for mistakes our users make when they use our languages. It’s beginning to happen again – Java and its successors have all used avoidance of error as one of the criteria that they use in the detail design of new features of the language. Of course it’s only one of the criteria and it’s not or at least it wasn’t at the time the most important criteria. The most important criteria is of course compatibility with every thing that has gone before – you can’t throw away millions of lines of code that have been written in other languages. Every commercial product you have to make concessions to the commercial and historical reality – you cannot pursue an ideal to its limit. But gradually ideas changes, programmers get more interested in correctness and demonstrable correctness and production techniques and checkers, analytical tools, testcase generators and so on that are going to help them get their programs correct.

The analogy that I draw is with agricultural pollution and vehicle safety. When Ralph Nader first started publishing his books on Unsafe At Any Speed it had no connection with the market place – the customers where just not asking for reliability or safety as one of the properties of their vehicles. But gradually, over 30 years, customer feeling about unreliable vehicles has changed with the aid of law-making, there are legal constraints now which require basic standards of safety into every vehicle sold. And so there is a possibility that the marketplace and the commercial necessity will move in the direction of greater reliability of programs and the languages in which their expressed.

You know what’s driving this move towards more ideal programming languages? It’s not idealism – although I think for many professional engineers they do have ideals and they do pursue them in preference to not pursuing them whenever the opportunity arises. No the real commercial imperative which requires greater attention paid to formal correctness of the programs is the virus. See the virus/malware/worm enters your computer and does dreadful things to it. It reaches parts of your program that normal execution never reaches. So it’s no longer adequate to test your program against all the cases that are likely to arise, because even if you do the virus will find a case that is not likely to arise. So it forces you to get the whole program correct, not just the parts that are going to be used by your customers. You’ve got to get the parts that will be used by viruses correct too. And that can’t be done by testing – it has to be done by analysis of the source code using type-checking techniques (simplest) but increasingly more sophisticated reasoning techniques are being applied to high-volume code to check that it doesn’t contain any naughty things like null reference de-referencing.

Well if I’m responsible for a billion-dollar mistake I think my reason for bringing it up it is only to put even more blame elsewhere. The designers of C [shakes his finger]. Well, one can definitely quantifier because the buffer overflow is a direct result of the “get” routine in C which does not check the subscript bounds of the string that it is asked to input. That allowed the early very simple viruses to get in by overwriting the return pointers in the code and these viruses taught the world how to write malware. Without this very simple entry it’s quite possible that nobody would ever have thought to look for the more subtle things which are now being exploited every day by people who are now motivated and skilled and whose profession and income it is to write botware/malware. So if it hadn’t been for the
“get” routine of C we might have had no malware. Now one virus “Code Red” was estimated to have cost the world economy $4bn because it really brought down all the networks in the world, interruptions to business/banking

Sunday, September 5, 2010

NULLs are not the same in SQL and Java

In SQL, NULL really means "unknown". Of course, when you compare one value that is "unknown" to another value that is also "unknown", the result can only be "unknown".
In C# (and Java, etc), null doesn't mean "unknown". It means "there is no object here". The correct way to translate that to relational terms would be to treat every object reference as a reference to a relation which may have either 0 or 1 tuples. Null means 0 tuples. Obviously, two relations with 0 tuples each are definitely equal - "unknown" doesn't enter into equation here.

NULLs in SQL Maths and Strings

Mathematical and string concatenation

Because Null is not a data value, but a marker for an unknown value, using mathematical operators on Null results in an unknown value, which is represented by Null.[5] In the following example, multiplying 10 by Null results in Null:
10 * NULL          -- Result is NULL
This can lead to unanticipated results. For instance, when an attempt is made to divide Null by zero, platforms may return Null instead of throwing an expected "data exception - division by zero".[5] Though this behavior is not defined by the ISO SQL standard many DBMS vendors treat this operation similarly. For instance, the Oracle, PostgreSQL, MySQL Server, and Microsoft SQL Server platforms all return a Null result for the following:
NULL / 0
String concatenation operations, which are common in SQL, also result in Null when one of the operands is Null.[6] The following example demonstrates the Null result returned by using Null with the SQL || string concatenation operator.
'Fish ' || NULL || 'Chips'   -- Result is NULL
This is not true for all database implementations. In an Oracle RDBMS for example NULL and the empty string are considered the same thing and therefore 'Fish ' || NULL || 'Chips' results in 'Fish Chips'.

Nulls in SELECTs

Data Manipulation Language

SQL three-valued logic is encountered in Data Manipulation Language (DML) in comparison predicates of DML statements and queries. The WHERE clause causes the DML statement to act on only those rows for which the predicate evaluates to True. Rows for which the predicate evaluates to either False or Unknown are not acted on by INSERT, UPDATE, or DELETE DML statements, and are discarded by SELECT queries. Interpreting Unknown and False as the same logical result is a common error encountered while dealing with Nulls.[2] The following simple example demonstrates this fallacy:
The example query above logically always returns zero rows because the comparison of the i column with Null always returns Unknown, even for those rows where i is Null. The Unknown result causes the SELECT statement to summarily discard each and every row. (However, in practice, some SQL tools will retrieve rows using a comparison with Null.)

The dangerous subtleties of LEFT JOIN and COUNT() in SQL

NULLs and the Database

Oracle is not a relational database:

You'll see here that Codd is actually agreeing with nulls - which annoyed Date and they had a whole series of articles on what nulls mean and why it's a mistake - in the late 80s, early 90s. Basically
Date seemed to win the debate - at least in my mind.

Date was saying that nulls lead to 3 values logic (3VL) which leads to errors in logic things like NULL AND T in truth tables for examples - nulls make no logical sense - and as a system based on relations and logic it's a bad system.

Trivalent Logic:

Hibernate UserType
There's a reasonable example:

This one lets nulls but combines readResolve().

Thoughts on the counter arguments??:
Nulls aren't evil. They aren't logical - they're simply a mistake.
A good article about a different domain (Geo science) and how missing data, nulls and the problems that there are:

Calculations with NULL
Basically you can't compute on null - hence NPE.

Another example is that COUNT counts nulls but other aggregates don't (AVG and SUM). If nulls were values why can't you average them or sum them? Why can you count them?

Relational databases are based on relations and set theory. Sets are empty - they aren't null. And semantics with a good foundation in maths makes the whole thing just generally better (another book "Third Manifesto" goes into some detail about what a proper relational database would look like).

Another Date book really describes what an outerjoin (again nulls cause problems there) is and so on:

The guy that introduced NULLs into SQL and RDBMSs (Codd), said later that one null was inadequate and SQL should be enhanced to allow for two nulls: 1 for unset (Missing but Applicable), 1 for missing (Missing but Inapplicable)

How to protect from Nulls in Java

(with thanks to Andrew Newman for a lot of these ideas)

How to protect from Nulls...
Each public method in each class rejects any null values with an IllegalArgumentException.
This is enforced by test cases - each test case extends from a common TestCase that checks public methods by ensuring they throw exception when null is passed in.

  • No objects exist with nulls as values.
  • All classes have default values (objects) that can be persisted to the Database. Have default values like "". Or optional values (like something not yet done) are separate relations.
Null Value Object and Serialization

class Xxx {
    public static EMPTY_XXX = new Xxx();
    private Object readResolve() {
        return EMPTY_XXX;

You should write code that checks that == works - as a suite of tests too. Like AssertSerialization.

Null-safe operators in Java 7: