Friday, December 12, 2008

Architecture again

Right after my last post (which I was on my mind because it happened to me a couple of times this year and the last year), I discovered (thanks to a post from a friend patterns & practices- App Arch Guide Pocket Guides) that MS PnP has some interesting stuff on this topic (shame for not looking recently on PnP).

Apart from the pocket guides, that I haven't looked into, what I found were some cheat sheets and diagrams.

PnP Application Architecture Frame Cheat Sheet describes architecture frames (authentication, caching, ... and 14 others), quality attributes (14) and the mapping between application types, architecture styles and the architecture frames (and the common issues). It's great stuff, and should be mandatory read for every developper. Shame that Microsoft only has teaching offerings around products (maybe that's what the industry cares about).

ArchMetaFrame

Another one is PnP Application Type Matrix Cheat Sheet that summarizes the main application types, their benefits, considerations, scenarios and solutions.

The last one is PnP Visio index Diagrams that MS makes available for modification and use to document our architectures. Even if only used for proposals, it's very good and allows reuse.

Marcas Technorati: ,

Friday, December 05, 2008

Architecture & Process

Lately, I've been surprised by how little attention some developers pay to application architecture. By architecture here, I mean to say the high level decisions that crosscut all the application code.

Recently, during the transfer of an application from another team, I was shocked that there was no clear vision of the main decisions regarding some fundamental aspects of a web application (I don't mean documented, just a clear definition of the decision or the approaches taken in the implementation). Every web application should make explicit the major decisions (and the reasioning behind them) regarding:

    • Logging (policy - how, when; and tools/code)
    • Exception Handling
    • Data Acess (ADO.NET, ORM, Linq)
        • Transaction Management (sharing the same transaction, creating and commiting transactions)
    • Session Management (asp.net memory, DB, custom)
    • Security (low-level - sql injection, cross scripting; high-level - isolate data between different users/entitys/geographically)
    • Profiles and Permissions (management of users/groups and the corresponding permissions on application functionalities)
    • Operation Auditing (especially in financial systems)
    • Composition (tiers, layers, service oriented)
    • Dependencies (3rd party tools, components, services)
    • Patterns used (MVC, Singleton, Composite)
    • Naming conventions
    • Configuration management (reference tables, configuration values, connection strings, etc)
    • Concurrency (synchronizatios, async callbacks, threading)
    • ...

 

There's also other important stuff, more on the process/principles side, that gain by being defined:

    • Planning, prioritization & risk management
    • Organization (teams, projects)
    • Tools
    • Automation of tasks (building, testing, importing reference data)
    • Testing and Coverage (unit testing, integration testing, coverage of significant program states)
    • Refactoring
    • Documentation (design, architecture, glossary, major entities, workarounds to problems)
    • Bug/Incident tracking
    • Version Control (tool, policy)
    • Communication
    • DRY
    • Responsabilities (code & people) and Separation of concerns
    • Coupling & Cohesion
    • Done?

 

As with everything, there's also the risk of overdoing stuff, or doing it as an end (and not as mean to an end).

If just 1/3 of the list were implemented, maintenance would be such an easier job...

Marcas Technorati: ,

Friday, August 01, 2008

SQL set approach

SQL is a powerful tool that can be a great help for solving some problems. You just have to think in a set oriented approach (Thinking in Sets by Joe Celko comes to mind). I just hope thinking this way doesn't transform me into some uptight person that thinks less of everyone else, just because they don't understand it initially. And this approach must not be taken as a dogma, or an end in itself. If a solution doesn't appear in some reasonable time, I resort to iterative approaches. Better not even get into primary keys discussion and business identifiers... (end of rant)

 

Today, during the implementation of an algorithm for shift atribution, the client asked for the implementation of some exceptions when there wasn't anyone available in the specified region. After some discussion, the algorithm should cope with this situation by looking up in the neighbour regions, then in a larger area, and finally the whole country. As the algorithm isn't just a select (it has more rules), I didn't want to replicate the whole algorithm for each universe or apply the algorithm iteratively for each region (that would also distort the ordering and equity of the algorithm).

 

After some brainstorming, the problem was how to change the universe in which to look by joining to the region table. And as I was to give up, the solution became evident: what was needed was filtering the universe. But how if the table doesn't have in itself all the attributes to determine the regions?

One way would be to compute the regions with a function and using a dynamic query with an in filter. Something like:

regionId   IN  
    dbo.fnComputeRegions(regionId, coverage)

But that forced us to use dynamic querys and the in could become quite big (there are a couple hundred regions).

Then, a better solution came to my mind: the problem was in the join, so the query should join with a table valued function that returns a table with the regions to lookup. This solution didn't force the use of dynamic queries and allowed us to reuse the whole algorithm just changing the join, surrounding the algorithm with a while that changes the coverage:

while (coverageId < numberOf Coverages)

     (algorithm)

     select ...

     from

        Region R

        inner join dbo.fnComputeRegions(regionId,
              coverageId) CR on R.regionId = CR.regionId

 

With this approach we gain abstraction and independence of the computation of regions (it can return a table with one record, the neighbour regions, the neighbour neighbour regions, all the regions within an hierarchy, or all the regions), without changing the algorithm. And what's best is that these computation of regions is reusable in other areas and algorithms.

Now the toughest part is convince the client that this is the best solution and that to these rules there must be no exception :)

Marcas Technorati: ,

Wednesday, February 27, 2008

Thursday, February 14, 2008

Classic Books

I really like reading, specially blogs, because of their small format (one can read a couple of entries in a couple of minutes). Nowadays, I don't really buy that many books as much of the information is available on the web. The exception to this is some classic books or intemporal books (Patterns, Methodology, Usability, ...) that I buy and try to read (usually takes me a long time because of interruptions, priorities, certifications, ...).

But recently there's been a trend on releasing classic books and articles free on the web:

 

There are some great gems (Macintosh Human Interface Guidelines, Structured Programming) available, some books related to Lisp, Smalltalk, and other esoteric languages, and CMG papers (mostly performance and ITIL).

Marcas Technorati: ,

Wednesday, February 06, 2008

Microsoft and Yahoo

It's not the most recent news, as it's been dissected and commented by everyone else, but nonetheless an interesting topic.

From the MS fan base Hello, Google, can you spell hypocrisy? to the MS (particulary Ballmer) bashing A Defining Test for Ballmer, the most interesting ones were the internal ones Microsoft + Yahoo! = Microsoft - $44,600,000,000 ? and Microsoft and Yahoo! -- Stay on Target?, and Bruce Eckel's Should Microsoft Buy Yahoo?.

 

The internal perspective focus more on the restructuring, layoffs, overlapping of products and foresee the potential problems (culture, technological - spreading MS technology within Yahoo), and possible sinergies, defining the dominant reaction as: "talk to me in a year".

 

Bruce Eckel focus is on the acquisitions, and the merge failings when two big software/hardware companies try it. I can't really assess if it is truly that bad, but MS and Yahoo should really be on their toes to avoid the problems. Those that don't learn from history are condemned to repeat it (or something similar).

 

I really think it's impossible to predict the outcome of this. MS and Yahoo should take close attention to the details of the merge (cultural, tecnological, user base), and avoid imploding themselves.

Google should take attention as MS showed in the past, they can/will use any tactic they can to get back into the game. If MS can leverage their dominant position to spread Silverlight and get control over Web 2.0 content, Google will be in deep trouble. It's not like Google's monopoly generates user lock-in (as MS Windows and Office, and other MS tactics - IE), so they better watch it carefully as changing my search engine is just some clicks away...

 

So, in conclusion, I guess only time will tell what the outcome will be. Whoever does the fewest mistakes in this game, will take leadership. Although, for the stake of evolution, I hope that no true winner comes out of it, and competition becomes fiercer...

Marcas Technorati: ,,

Friday, February 01, 2008

SQL 2008 is late^H^H^H^H on schedule for Q3

As Joel Spolsky puts it, Microsoft can't speak straight any more. Instead of coming forward and saying SQL 2008 is late, marketing added it's twist to the message (I don't think an engineer can talk this way), transforming a simple message in a convoluted, positive tone marketing speak.

It's on par with Dilbert material:

dilbert2

I think it just transformed an inocuous message of schedule slipage (pretty common in software development) into a comic situation (or patronising, depending on how you take it).

Phil Factor digged deeper and tried to explain it: Microsoft Boy announces his School Homework.

Oh, here's the original message:

The past few months have been an amazing time for the SQL Server team as we gear up for the start of the global launch wave on February 27.

...

Simply put, SQL Server 2008 is a significant release for us – one that builds on all of the great things that we were able to deliver in SQL Server 2005. We see it as a critical step forward for our data platform...

Not surprisingly, one of the top areas of focus for us is always to deliver a high quality product, and in a very predictable manner.

...

To continue in this spirit of open communication, we want to provide clarification on the roadmap for SQL Server 2008. Over the coming months, customers and partners can look forward to significant product milestones for SQL Server.  Microsoft is excited to deliver a feature complete CTP during the Heroes Happen Here launch wave and a release candidate (RC) in Q2 calendar year 2008, with final Release to manufacturing (RTM) of SQL Server 2008 expected in Q3.

...

This does not in any way change our plans for the February 27 launch...

Marcas Technorati: ,

Thursday, January 24, 2008

Dynamic and Static Languages

Ted Neward blogged about the "Can Dynamic Languages scale" debate going on ServerSide. He discusses the two dimensions of scale that are being debated, namely Size (LOC) and capacity (reqs/sec).

Aside the flamefest that the thread became, I always get the feeling that most arguments are somewhat technically based, but very biased.

I'm not a Dynamic Language Programmer. But as I try to stay current on the evolution of Software Development and Languages, I try to read as much as I can about technology that is being touted as the future, or very promising. Not that I immediatly try to use it in production, as I admit that some corner cases could be tricky (or badly done) because of lack of knowledge or technology imaturity. Beyond that, many hot technologies, while having their merits, are pushed way too much because of the hype (CORBA, XML, EJB, SOAP, anyone?) generated by the BIG (MS, IBM, Sun) companies.

Refactoring

Anyway, one of the arguments countering dynamic languages is the ability to refactor it. Namely that the refactoring isn't fullproof and that there can be mistakes. But guess what, Java/.NET because of Reflection, Dependency Injection can also have these mistakes. I agree that tool support is better in Java/.NET, but that's it.

Conciseness

Another argument, these time by the other camp (dynamic languages) is that dynamic languages are more concise. And comparing the best known instances (Java, C#) to Ruby or Python, one would have to agree. And no, even if the compiler could generate it, it's still code left to maintain and read, that distracts from the code intent.

But even that is changing, as C# 3.0, with type inference, lambdas, automatic properties, extension methods and Linq, removes or simplifies much of this verbosity. So the conciseness argument doesn't depend on being dynamic or static.

Speed

The argument of speed has two meanings: which language/platform executes faster and which language allows the programmer to develop/test faster. The Java/"static" side sits with the faster execution (even though, I still remember when the comparisons to C++ were very bad, this argument was dismissed by the faster programming argument), while the "dynamic" couples to the faster development side.

Even stranger is the bitterness or stuborness of each camp. It's not like the languages/platforms are stagnant. As features are added, and the platforms mature, static languages will get some of the features from the other side (conciseness, interactivity) and dynamic languages will get better tool support and faster platforms. Only by it's evolution will languages stay relevant/dominant. If they don't evolve, they'll lose dominance for the next Big Language (Cobol, C, C++?).

All of this makes sense to me, because as someone said: those that don't learn from history are condmened to repeat it...

Marcas Technorati: ,

Wednesday, January 23, 2008

SQL Tricks/Patterns 0

It's been a while since my last post... These last couple of months were full of events for me. The biggest one was the birth of my second child. It changed our routines quite a lot... Right now managing to sleep more than a couple of hours with a baby and a small child is our biggest achievement. But it's full of rewards :)

Getting back to technology, and looking at my last post and some functional programming posts around C#, I came to realize the similarities between SQL and FP. They're both strongly mathematically based, declarative... And both require a change of mindset (I'm still on a very rudimentary level on FP), from what we're used to in imperative programming.

So the first and most important pattern in SQL, and that is the base for many other patterns and the correct usage, is that SQL is set based, and all row by row processing should be avoided (by principle). Let's be a little more pragmatic, and if it's a quick and dirty solution, for a small problem set, a one shot solution never to be used again, it's ok. But if it's a migration, an operation script, a script for the client, then be very careful... It will be used on increasing larger sets, multiple times and with a diminishing time window.

Only being through a data migration in which a test takes a couple of days, that must be executed multiple times (because of bad data, or bad mappings, or bugs), or having to put up with a client because some job is increasingly taking longer and risking it's time window, or even witnessing data intensive operations being done (excruciatingly slow) on the application client, one understands the problem.

Ok. But to a freshmen/rookie/intern, and unless told otherwise, the imperative solution would be the only solution. And imperative has conditionals, and many other goodies...

  • In SQL, we can simulate conditionals with CASE, for example, updating the salary by 1.1 or 1.2 based on the category of the employee:

UPDATE Employees
SET salary = salary *
CASE categoryId
WHEN 1 THEN 1.1
WHEN 2 THEN 1.2
END

  • But what if it's impossible to do in one query, or better yet to avoid turning the query overly complex?
    Then we use temporary tables to hold the intermediate results. But each step of the processing is done on the set and not on a row by row basis. If some step is impossible, or too hard to implement using set operations, then only that step is done row by row, but the remaining steps are set based.
    If some task is complex (reading, parsing, validating and loading a file to the DB) or the business allows for it, it can even be done on table using some sort of state identifier (step 1 is reading, or aproving...).

  • But what if I have to insert into other tables? Well, insert is also set based so instead of INSERT (...) VALUES (...), use INSERT (...) SELECT ...
    There's another problem, to which I don't have a satisfactory answer. That's when you have a master/detail relationship and the master uses Identity values. In that case, apart from switching the identity property on and off, and defining the max value, it's difficult to insert a batch of these relations (using set based processing). Other possibilities are using triggers or in SQL 2005, the output clause, but they all seem awkward and fragile.

  • Another problem is generating a number or some kind of order in the set. To that, the numbers table from my last post is the answer. There are some variations and simplifications. If all you need is a sequential number for the data (to order it, or to differentiate it - very good to remove complete duplicates), a temporary table with an identity column might suffice.This is also the principle used in most solutions that paginate result sets (joining the table with itself using some computed number and filtering it).

Well, I must be forgetting a lot more, but as it's already too long, I'll end it here.

Until next time...