Sep 02 2008

Thoughts of Software Engineering: Calendar Woes and Date Interchange Formats

For a laugh, I decided to create a calendar event in Gmail for a very old event. In fact, I was looking for a date in the region of 2 BC.

When adding an event to a Gmail message, you enter the year/month/date of the start of the event into one box, and of the end of the event into another box. The boxes are prepopulated with today’s date. Entering “2BC” as the year gets changed automatically to 2002 when you click outside the box, as does “0002″. Dates down to “100″ do not get changed, but entering “10″ gets mapped to “2010″. Entering “49″ gets mapped to “2049″ but “50″ gets mapped to “1950″. In summary, one or two digit dates 0..49 map to 2000..2049, and 50..99 map to 1950..1999.

On to iCal. iCal only offers a two digit year field. :-( 0..50 map to 2000..2050, and 51..99 map to 1951..1999.

On to Emacs. Surely Emacs gets it right. Well, nearly. Emacs calendar mode allows you to create an event for any year > 0.

So Emacs wins with 1 AD, followed closely by Gmail with 100 AD and iCal a distant third at 1951.

While this exercise was fairly frivolous, the consequences of the results could be nontrivial: what happens if you want to exchange dates and events between calendars? What happens in 1951? What happens if you want to add historical dates to your calendar?

PS Yes, I remember Y2K!


Aug 13 2008

Thoughts on Software Engineering: Plug and Play Consumer Electronics Not There Yet

Tag: Thoughts on software engineering, softwareadmin @ 2:14 pm

How often does it happen that you get a shiny new device home, plug it into your computer to set it up, and then the problems start? You need to upgrade to Music Player 4.3, you need to enter a 28 digit registration code, the device is not recognized, Music Player 4.3 is no longer available, … fail, fail, FAIL. So… you need to mount the device as a disk, reformat, drag an updated firmware image onto the device disk, reboot the device, reboot again…

You are led to expect by the software and manuals which come with your device that you will be able to do what you want, but you end up in a situation where it is either impossible to do that, or it is only possible with very detailed knowledge of the device and its accompanying software. I’m trying to think of a name for this kind of problem and the best I can come up with is “packaging failure”.

Last week a friend and I did an iPhone switcheroo. She had her old phone stolen but didn’t want to get the new more expensive iPhone plan with a new iPhone, and preferred the iPhone v.1 look as well. I was happy to get the new iPhone. So she bought the new phone and then we switched. Everything went smoothly at the AT&T store. I synchronized my new phone with iTunes without a problem. But when it came to synchronize her new phone (my old one) with her iTunes, we turned up a problem: iTunes was complaining that there was a security lock on the old phone and that it should be removed before synchronizing. It made sense – I always put a 4 digit combination lock on my phone in case it gets lost – but the problem was that the phone now only offered the option of emergency phone call and there was no option to enter the security code. So, I plugged the phone into my Mac and reinstalled the firmware, and that wiped the lock. It worked and we synchronized her phone after that without any problem.

How often does this kind of thing happen? ALL THE TIME. My next question is:

WHY?

These devices are very complex. This complexity is hidden underneath a software and/or documentation layer which lets you easily achieve what you want to do. As long as you remain in a situation which is covered by the the software and/or documentation, you are OK. But if you need to perform some task which was not anticipated by that software, or if you somehow end up in a situation not handled by the software, then you have to start dealing with the device at a much lower level, a lower level which requires a lot of expertise to understand.

So one cause of these kinds of packaging failures is failure of the designers to anticipate and provide for all the ways in which the device might be used.

Another kind of failure is failure of the designers to protect against errors which can occur. For example, you are following the setup process and get error -53 and the setup wizard quits and won’t restart, and the device won’t wake up.

Typically, the GUI and the instructions guide the user through a sequence of intermediate states towards successful completion of the task at hand. When I was working at NASA Ames, there was a group there which had designed a software system called CLARISSA to help astronauts by verbally guiding the astronauts through their procedures in space. An important part of the work of writing such procedures is verifying that they achieve the task and handle all contingencies.

Astronauts’ procedures, and packaging software and documentation, are effectively based on finite state machines. The arcs of the machines are labeled with steps in the procedures, or actions in the software interface. Packaging failures are events which lead from a state in the finite state machine to an unanticipated state which is not part of the machine.

In order to handle these problems, the finite state machines need to be extended: the instructions finite state machine could be extended, e.g. “if the device displays an error message, unplug it, hold down the power button for 10 seconds, then plug it back in to restart the installation”, or the software finite state machine could be extended to handle this error condition.


Jul 28 2008

Thoughts on Software Engineering: Interfaces

Interfaces Contribute to Software Risk

When I was at NASA, I did some research on software risk. The most common identifiable cause of software failures, apart from “functional defect” (i.e. the code was perfectly valid, but just did not implement the right thing) is “interface defect”, i.e. a failure in the interfaces between components.

Interface Problems are Worse in Large Multilanguage Systems

The reason why interface defects are particularly common is probably two-fold: compilers and other tools do not provide good support for preventing this kind of cross-component defect, components tend to correspond to the boundaries between the work of different people, and interface defects are due to lack of coordination between different people.

A large system – especially one providing a consumer web interface – is often composed from many components, written in several different languages, some of them running on a single machine and others broken into client and server parts. The system and its components may be evolving quite rapidly and different parts are usually written by different people. The checks provided by compilers are only of benefit on a component-by-component basis and so are not very helpful in this situation.

Check Types + Properties At Runtime

The kinds of type-checking available in programming languages are generally limited by the need to check types statically, which imposes a severe restriction on the notion of “type”. Many types in practice would be well represented by specifying an underlying data type plus properties which must hold of valid elements, e.g. lists (underlying type) whose elements are pairs (property), arrays of strings (underlying type) which are sorted (property). Such types are known as dependent types. Dependent types are not supported by most programming languages, because their use sacrifices automation of static type inference and checking.

Although dependent types cannot be checked statically, or used explicitly in most programming languages, a lot of the benefit of dependent types can be realized by writing functions which check whether given data have the right underlying data type and satisfy the required properties. Instead of statically checking types, which is mathematically hard or impossible, we just check data input to or output from functions to ensure they satisfy the properties, i.e. have the right (dependent) types.

Represent Interface Contracts As Types + Properties

These runtime checkable properties (dependent types) provide a good way to tackle part of the interface problem, by representing the interfaces in a machine-checkable way and building mechanisms to check the consistency of the component interfaces both at build time and at runtime. Speed is often important, so components should distinguish between a “development” environment, in which the interface checks are performed, and a “production” environment in which they are skipped.

My proposal is, therefore, to define APIs which specify the classes/functions/methods provided by the components in a system. Those APIs should specify at an appropriate level of detail the signatures, types, and properties of those classes/functions/methods. “Appropriate level of detail” here means exercising judgement about when to use heavy-weight, detailed representations, and when simpler abstractions are good enough. The functions which check the required properties of data can be used both at runtime in the development environment to check much or all of the data passing between components, and can be used at build time to check aspects of the correctness of test results. One way to view these functions is as test oracles.