Thursday, January 24, 2008

Dynamic and Static Languages

Ted Neward blogged about the "Can Dynamic Languages scale" debate going on ServerSide. He discusses the two dimensions of scale that are being debated, namely Size (LOC) and capacity (reqs/sec).

Aside the flamefest that the thread became, I always get the feeling that most arguments are somewhat technically based, but very biased.

I'm not a Dynamic Language Programmer. But as I try to stay current on the evolution of Software Development and Languages, I try to read as much as I can about technology that is being touted as the future, or very promising. Not that I immediatly try to use it in production, as I admit that some corner cases could be tricky (or badly done) because of lack of knowledge or technology imaturity. Beyond that, many hot technologies, while having their merits, are pushed way too much because of the hype (CORBA, XML, EJB, SOAP, anyone?) generated by the BIG (MS, IBM, Sun) companies.

Refactoring

Anyway, one of the arguments countering dynamic languages is the ability to refactor it. Namely that the refactoring isn't fullproof and that there can be mistakes. But guess what, Java/.NET because of Reflection, Dependency Injection can also have these mistakes. I agree that tool support is better in Java/.NET, but that's it.

Conciseness

Another argument, these time by the other camp (dynamic languages) is that dynamic languages are more concise. And comparing the best known instances (Java, C#) to Ruby or Python, one would have to agree. And no, even if the compiler could generate it, it's still code left to maintain and read, that distracts from the code intent.

But even that is changing, as C# 3.0, with type inference, lambdas, automatic properties, extension methods and Linq, removes or simplifies much of this verbosity. So the conciseness argument doesn't depend on being dynamic or static.

Speed

The argument of speed has two meanings: which language/platform executes faster and which language allows the programmer to develop/test faster. The Java/"static" side sits with the faster execution (even though, I still remember when the comparisons to C++ were very bad, this argument was dismissed by the faster programming argument), while the "dynamic" couples to the faster development side.

Even stranger is the bitterness or stuborness of each camp. It's not like the languages/platforms are stagnant. As features are added, and the platforms mature, static languages will get some of the features from the other side (conciseness, interactivity) and dynamic languages will get better tool support and faster platforms. Only by it's evolution will languages stay relevant/dominant. If they don't evolve, they'll lose dominance for the next Big Language (Cobol, C, C++?).

All of this makes sense to me, because as someone said: those that don't learn from history are condmened to repeat it...

Marcas Technorati: ,

Wednesday, January 23, 2008

SQL Tricks/Patterns 0

It's been a while since my last post... These last couple of months were full of events for me. The biggest one was the birth of my second child. It changed our routines quite a lot... Right now managing to sleep more than a couple of hours with a baby and a small child is our biggest achievement. But it's full of rewards :)

Getting back to technology, and looking at my last post and some functional programming posts around C#, I came to realize the similarities between SQL and FP. They're both strongly mathematically based, declarative... And both require a change of mindset (I'm still on a very rudimentary level on FP), from what we're used to in imperative programming.

So the first and most important pattern in SQL, and that is the base for many other patterns and the correct usage, is that SQL is set based, and all row by row processing should be avoided (by principle). Let's be a little more pragmatic, and if it's a quick and dirty solution, for a small problem set, a one shot solution never to be used again, it's ok. But if it's a migration, an operation script, a script for the client, then be very careful... It will be used on increasing larger sets, multiple times and with a diminishing time window.

Only being through a data migration in which a test takes a couple of days, that must be executed multiple times (because of bad data, or bad mappings, or bugs), or having to put up with a client because some job is increasingly taking longer and risking it's time window, or even witnessing data intensive operations being done (excruciatingly slow) on the application client, one understands the problem.

Ok. But to a freshmen/rookie/intern, and unless told otherwise, the imperative solution would be the only solution. And imperative has conditionals, and many other goodies...

  • In SQL, we can simulate conditionals with CASE, for example, updating the salary by 1.1 or 1.2 based on the category of the employee:

UPDATE Employees
SET salary = salary *
CASE categoryId
WHEN 1 THEN 1.1
WHEN 2 THEN 1.2
END

  • But what if it's impossible to do in one query, or better yet to avoid turning the query overly complex?
    Then we use temporary tables to hold the intermediate results. But each step of the processing is done on the set and not on a row by row basis. If some step is impossible, or too hard to implement using set operations, then only that step is done row by row, but the remaining steps are set based.
    If some task is complex (reading, parsing, validating and loading a file to the DB) or the business allows for it, it can even be done on table using some sort of state identifier (step 1 is reading, or aproving...).

  • But what if I have to insert into other tables? Well, insert is also set based so instead of INSERT (...) VALUES (...), use INSERT (...) SELECT ...
    There's another problem, to which I don't have a satisfactory answer. That's when you have a master/detail relationship and the master uses Identity values. In that case, apart from switching the identity property on and off, and defining the max value, it's difficult to insert a batch of these relations (using set based processing). Other possibilities are using triggers or in SQL 2005, the output clause, but they all seem awkward and fragile.

  • Another problem is generating a number or some kind of order in the set. To that, the numbers table from my last post is the answer. There are some variations and simplifications. If all you need is a sequential number for the data (to order it, or to differentiate it - very good to remove complete duplicates), a temporary table with an identity column might suffice.This is also the principle used in most solutions that paginate result sets (joining the table with itself using some computed number and filtering it).

Well, I must be forgetting a lot more, but as it's already too long, I'll end it here.

Until next time...