Friday, November 09, 2007

SQL Tricks/Patterns 1 - Numbers Table

One pattern that I found very useful (I think the first time I saw it was on SQL Server Central), is a Numbers Table.

So what is a numbers table? It's a table pre-loaded with numbers :)
Literally 1, 2, 3, 4, 5, ..., 100.000 (the upper limit depends on the maximum estimated size of the results).

Ok. So how can it help you? How can it be used? It has many uses, beign the most common ones reporting sheets.

Suppose you have to make a report showing all the orders by day of month (or year). But with a business twist (a requirement), you must show all dates, even if there's not an order for that day. Assuming NorthWind orders table:

OrderId         Date
10270           01/08/1996
10271           01/08/1996
10272           02/08/1996
10273           05/08/1996
10274           06/08/1996
...

You want a report that shows the number of orders for every day:

Date                        #Orders
01/08/1996        2
02/08/1996        1
03/08/1996        0
04/08/1996        0
05/08/1996        1
06/08/1996        1

So a simple group by date in the orders date doesn't cut it (we have no orders for some dates).

 

Enter the Numbers table. In this case, what we want is a table with the dates we want to report (the month, the year) and left join it with the # of orders by day. But we can generalize it, and instead of a table with dates, we can use a Numbers table. Something like:

 

declare @date datetime

set @date = '1996-07-31'

select
            dateadd(day, N.number, @date) as Date, 
            isnull(O.NumOrders, 0)
from 
           Numbers N
          left outer join (
                 select OrderDate, COUNT(*) as NumOrders
                 from Orders
                 group by OrderDate
          ) O on dateadd(day, N.number, @date) =
                         O.OrderDate

where N.number <= 31
order by Date

And voilá, the report we want comes out. Note also, that with this approach, it's easy to change the granularity of the report (by hour, by month, by year). The number column (of Numbers table) is added as days to the initial date of the report, which acomplishes us a table with 1 month dates (31 days).

If you want to see some other examples (some esoteric), see also:

The Joy of Numbers

The Helper Table Workbench

 

Technorati tags: , ,

Thursday, November 08, 2007

VB Linq Cookbook

The VB Team has a great series named Linq Cookbook. In it they show some examples of Linq usage in VB that look great.

Entries so far:

linq cookbook recipe 1 change the font for all labels on a windows form

linq cookbook recipe 2 find all capitalized words in a phrase and sort by length then alphabetically

linq cookbook recipe 3 find all the prime numbers in a given range

linq cookbook recipe 4 find all complex types in a given assembly

linq cookbook recipe 5 concatenating the selected strings from a checkedlistbox

linq cookbook recipe 6 your first linq application using northwind

linq cookbook recipe 7 selecting pages of data from northwind

linq cookbook recipe 8 querying xml using linq

linq cookbook recipe 9 dynamic sort order

linq cookbook recipe 10 pre compiling queries for performance

And apparently VB support for Linq seems superior to C#. I had already read about it (VisualBasic Envy), but the XML syntax seems much terser and cleaner. Just compare:

Dim itemList1 = 
      From item In rss.<rss>.<channel>.<item> _
          Where item.<desc>.Value.Contains("LINQ") Or _
               item.<title>.Value.Contains("LINQ")

http://blogs.msdn.com/vbteam/archive/2007/10/01/linq-cookbook-recipe-8-querying-xml-using-linq.aspx

to

var query = from c in xElement.Descendants("book")
     where double.Parse(c.Element("price").Value) < 10
     select new {
            author = c.Element("author").Value,
            title =   c.Element("title").Value
    };

http://blog.adrianroman.ro/post/C-30-Querying-XML-in-C-with-LINQ-to-XML.aspx

 

Update:

Just found out an example of Linq usage inside Excel (with VSTO). The video is available at: How Do I: Use LINQ Against Excel Ranges?

Looking at the example, two things are missing (that could be implemented):

  • Turning an excel range into a list of objects (better yet, mapping the columns into fields of a supplied class)
  • "Dumping" the results starting on a specified cell 

Other than that, just another great example of Linq usage.

 

Technorati tags: , , ,

Friday, November 02, 2007

ASP.NET Configuration - "Unable to read the project. The device is not ready"

ASP.NET configuration is somewhat an hidden and ancient art. Recently, after setting up my new machine, and while changing projects (reconfiguring IIS home directory), one of my web projects refused to open from within Visual Studio (mind you this is Visual Studio 2003 and all of the projects are still ASP.NET 1.1).

The site was accessible from IE, and worked fine.

The error message isn't very helpful: "Unable to read the project webxxx. The device is not ready". As the machine was new, I thought that I forgot to apply some of the steps while configuring the solution. So I rechecked the entire process:

  • Validate that the directory configured in IIS is the correct one (very easy to get wrong when changing between multiple solutions, each with a Web project)
  • Validate that the site is configured for ASP.NET 1.1
  • Validate the authentication for the virtual directory (Integrated windows authentication)
  • Give the apropriate permissions to ASP.NET user in the file system (in despair I even tried giving full permissions to everyone, with no luck)
  • Remove the project from the solution and re-adding it from the web
  • Validating the site URL configuration in the webinfo file (local configuration)

So I resorted to ignorance and re-installed ASP.NET (aspnet_regiis.exe -i).

All to no avail. The web project refused to load. Googling it up, there weren't many answers (I found some forums, but they either wouldn't have any answers or were in an unknown language).

Until I found this: http://www.velocityreviews.com/forums/t112064-unable-to-read-the-project-file.html

    which resolved my problem. It states the problem and solution (removing the hidden suo file), which I should have tried as it's one of the things I do when the settings and synchronization with VSS start to get all messed up.

                          After that, you have to reconfigure the settings for the web project, so it's best to:
                        • Remove all the files from the local directory mapped by in IIS (and the directory itself)
                        • Accept the setting in VS (which should maintain your original directory and not get you into the _N suffix hell. I have seen developers getting to directory _23 in some projects)

                        Thursday, November 01, 2007

                        Programming Languages and People

                        Jeff Atwood posted a thought provoking (and controverse) post: Embracing Languages Inside Languages. In it he argues againts fluent interfaces (APIs that read like English), specifically two examples about SQL and regexes. I agree that those examples add complexity, verbosity and that trying to hide one's ignorance behind an API is not a solution (one of the commenters suggests replacing "a = (b + 1) * a ^ 2" with "a.valueOf(b.plus(1).times(a.pow(2))").

                        But I also see the power in fluent interfaces, specially in Domain Driven Design. Having a DSL defined on a specific domain, the use of a good fluent interface, could allow business experts to understand code (and ease the translation from the business domain to code). Redefining two clearly established DSLs - SQL and regular expressions - with a more verbose syntax, with new quirks, seems foolish. The stated advantadge of being fully OO, is just people trying to hide or cover the fact that they don't fully understand the syntax or the semantics of the language. The fact that it's more verbose, doesn't imply that it's more readable (to me it's quite the opposite, as I prefer terseness to verboseness). And if you don't understand a subject, instead of covering it up, admit that you are a terrible programmer like me, go out, buy a book (try Head First books, as they are really good to learn new & difficult topics) and master the topic.

                        People embrace ASP.NET (or JSP) programming without question, seeing it's advantadges, conciseness over hand generating HTML. The same happened with High Level Languages and Assembly or recently between VM languages (C# and Java) and native languages (C, C++).

                        So, I really don't understand people's aversion to SQL. It's one of the oldest, more popular and widely used DSLs. Yet it scares so many people, that is hidden or abstracted away. That way, it's true power is never realized (specially the set oriented paradigm, instead of row based).

                        It must be a human thing like the ones I read in the Dinosaur Programmers Know More Than You blog post. As we stick around and watch technology waves (I've been watching professionally for almost 10 years) we get to know "what programmers forget, screw up, hope won't happen, and ignore" because that's what we did. And we become "Dinosaur Programmers", because we see each new technology as it is (to us), not as it is marketed. And see it transform from the hype of the day (XML can be used everywhere, for everything is one of my favorites) to something usable, stable and mature. To end I must quote the great ending of the post, with which I deeply agree (emphasis added by me):

                        Every new technology sounds great, and is capable of great things when it's optimally applied. Which it almost never is. And nothing brings out the imperfections of a new technology like real world usage, in a mission critical environment, with money on the line.


                        So you'll forgive me when I don't get excited about your new framework/language/architecture/process that's going to change everything/usher in a new paradigm/launch the next Google. As long as people are writing code, and assembling component frameworks, and modeling the business processes, they're going to continue to forget, screw up, hope for the best, and ignore the vague and uncertain parts.


                        While a new technology might eliminate a whole species of software development errors, you'll just discover, and now have the opportunity to explore, a whole new jungle of failure.
                        Dinosaur programmers already know this, it's our home turf.

                        Wednesday, October 31, 2007

                        Terrible Programmer

                        My name is André and I am a terrible programmer (Confessions of a Terrible Programmer)...
                        I think we all are (maybe not Dijkstra, Knuth, Wirth and some few others, that think their programs mathematically and demonstrate them http://www.cs.utexas.edu/users/EWD/transcriptions/EWD10xx/EWD1009.html).

                        So I try to mitigate my human factor by:

                        • testing;
                        • reviewing my code carefully;
                        • automating tasks that I'll have to do more than a couple of times (tests, gathering information for monhtly reports, ...);
                        • investigating new tools and techniques that allow me to do less or with better quality;

                        And it's the last point that I think holds more promise. Testing and reviewing code are all a double check (most of the time by the same person) that the code is correct. But it's also subject to error.

                        Automating tasks also buys us more confidence (especially if the automation is tested and used for a while), but sometimes hard to justify. On a recent project that I manage, I invested a couple of days developing some office automation that allows me to save excel reports received by e-mail, uploading them to a database and importing them to a word table. But I knew that, the monthly task would take me half a day every month for one year. So I got a 4 day savings (more if the contract is renewed), and confidence on the output (at least on what depends on me). Another automation that is a must (ok, I still don't implement it) is build automation. It just eliminates all those problems related to builds (multiple web configs for various environments, flags - Release and authentication, conditional code, ...).

                        But automating tasks like code reviews or quality assurance is more problematic. That's why I'm longing for static analysis tools powerful enough to catch most problems (code coupling, unnecessary complexity, concurrency problems, finding and eliminating code duplication) and easily extensible so that I can easily implement new rules.

                        I already posted about some tools (NDepend, nStatic - not released yet, FxCop, Simian), but I still think most have some limitations and don't address many problems. Maybe with .NET 3.5 (and the ability to create Expression Trees), gives way for a new generation of tools more powerful and that don't treat code as text files but as ASTs. That way it should become easiser to find duplicate code (independent of variable names), inject faults in code (Fault Simulation), analyze and compute call graphs based on the arguments of functions (and the opposite, given an exception state, what conditions and call graph could lead to that), and many other that I can't even think of.

                        Blog Voices

                        Congratulations to my friend Mario Romano for reaching 100 posts in October. The grand total of my posts from Nov/2006 doesn't reach 50, so it's an impressive figure.

                        That got me thinking in what I value on blogs, like the update frequency, the voice, and made me evaluate what are the blogs that I read most and that are my biggest influences.

                        So, here are the bloggers that I can't miss and which posts I value the most:

                        • Joel on Software is one of the expert voices that has been around longer. His posts show through his experience, and his opinions on Human Resources (hiring developers, ), Software Development (Process, UI), Project Management and Economics (Buy vs Build) are truly outstanding;
                        • Coding Horror is another great blog, with great opinions on Programming, Software Development, Project Management, and some off-topics that appeal to many programmers, that keeps an independent voice (somestimes a bit controverse);
                        • DevCatharsis, from my friend Mario Romano is a very good all rounder which has the same main interests that I have right now: Architecture, Functional Programming and Software Development Process;

                        Some more focusing on a specific area:

                        • On SQL, there's Kimberly L. Tripp and her great posts on index tuning and general SQL tuning. Has many code samples and comparative tests on performance;
                        • Paul Graham- Unofficial RSS Feed has some great insights on Lisp and recently a focus on startups;
                        • On Functional Programming, with a twist and focus to imperative developers, I really like defmacro.org (great articles on FP for the rest of us) and Did it with .NET (with a focus on C# 3.0 and recently F#). See also The Mechanical Bride (C# 3.0, Linq and lots of => lambdas);
                        • On Performance, Rico Mariani s Performance Tidbits, are a must, although it's been somewhat quiet since he became Chief Architect of Visual Studio (a huge job indeed). It has great analysis on performance from a very high level right down to the bit;
                        • On Microsoft, don't miss Mini-Microsoft to get a feeling on Microsoft internals, Human Resources, shares and profits;
                        • Creating Passionate Users from Kathy Sierra, which is shutdown due to some creeps that harassed her, has many good points on user focus and learning (she's one of the Head First series editors);
                        • On a fun perspective (or maybe not, as it reflects our reality), I really like Worse Than Failure and Dilbert (Ok, it's not a blog);
                        • There are also some other authors like Martin Fowler, Bruce Eckel that I prefer in book format, and Ted Neward on article format and not so much on blog format;

                        Ok, the list is getting long but I try to keep up to date on these. So what blogs am I missing and why?

                        What's on your blog short list? If someone reads this, post your list and the reasons why it's on your list.

                        Tuesday, October 23, 2007

                        Software Estimation

                        So why are software estimates so difficult? I don’t know, but between technical problems, little details that take forever, customers changing requirements and conflicting requirements, estimates are normally way below the real time that takes to do something (a rule of thumb is just doubling your normal estimate just to be safe).

                        Steve McConnell just wrote a book (Software Estimation: Demystifying the Black Art), that I recently bought (but still haven’t read –I’m planning to read it soon), that tries to ease this problem and shed some light into the black art. But what can you say, when even someone as knowledgeable as Steve McConnell, misses estimates when building a fort house by 100% (http://blogs.construx.com/blogs/stevemcc/archive/2007/09/23/building-a-fort-lessons-in-software-estimation.aspx). Well, it was just a small construction plan, that most wouldn’t even try to estimate and just plug along.


                        But there are great lessons to be learned, that apply to software projects:

                        • Numerous unplanned problems collectively added up
                        • Underestimation of unfamiliar tasks
                        • Not decomposing big tasks into smaller subtasks
                        • Using overly round time units
                        • Substituting a target for an estimate
                        • Sweeping numerous little tasks under the estimation rug
                        • Never creating a real estimate
                        And some differences to SW projects:
                        • No way to compromise quality for the sake of schedule
                        • Schedule overrun was free
                        • The estimation error didn't really matter, because the project would be done regardless of what the estimate turned out to be

                        Jeff Atwood analyzes it from a different perspective (http://www.codinghorror.com/blog/archives/000960.html), and concludes that it’s very different (and requires a different skillset and discipline) to build a doghouse or a skyscraper. And the same applies to software (building a toy app is very different from an enterprise application). That is a statement that most programmers would agree but something very hard to explain to a customer (that thinks software is easily changeable without causing structural problems or bugs).


                        More recently (http://www.codinghorror.com/blog/archives/000981.html) Jeff took another look at estimation and current practices (planning poker sounds great) and reviews the FogBugz tool from Joel Spolsky (from the Joel on Software fame). Apparently the tool has evolved from a bug tracker to a project management tool.


                        The tool uses a method called evidence based scheduling (a Monte Carlo simulation based on the historical estimates of developers), that predicts when will software ship (not an exact date, but the probability that software will ship on said date).


                        You can also see the estimation probabilities for the developers:


                        So how do you know if the tool is any good? Well, I haven’t tried it, but coming from Joel Spolsky's company, it has a good chance of being high quality (attention to detail, good UI). But you can see a video from the new version of the tool FogBuz 6.0 Movie Demo, try it out or read abook about it by Mike Gunderloy: Painless Project Management with FogBugz.

                        Monday, October 22, 2007

                        SQL 2008

                        Well, it looks like SQL 2008 is just around the corner, but it seems rather quiet for a new release. Maybe it's a small release, or MS isn't using all the marketing power.

                        Anyway, as the end of support for SQL 2000 is just 2 months after, looks like a comppeling reason to upgrade :)

                        Among the features for development that I like more are:

                        • Auditing data changes (new T-SQL command AUDIT that allows auditing centralization)
                        • Integration with Linq, Entity Framework, .NET 3.0
                        • MERGE statement (allows update, insert and delete in a single statement)
                        • Some general nice additions (that I won't use much, but look good to have), like date and time data types, file streams (uses the file system to store files pointed by the database), table valued parameters and Spatial Data

                        In the administration area, there are also some highlights:

                        • Data compression (those that worked with medium databases, know what a hassle it is to manage files with 100 GB in storage with less than 0,5 TB)
                        • Data encryption (columns, database)
                        • Resource Governor (limit resource usage by users or groups)
                        • Hot-plug CPUs!
                          To end the long post, some links about all the new features in more detail:

                        SQL Server 2008 Improvement Pillars

                        What's new in SQL 2008 Part 1
                        What’s new in SQL 2008 Part 2
                        What’s New in SQL Server 2008 Part 3

                        Saturday, October 20, 2007

                        Programming Language Humor

                        Great post about programming language (religious) wars with exceptional humor: Language Trolling Post You'll Ever Need To Read (Hopefully). And don't miss the comments, as some of them are of the same quality :)

                        In the end, programming languages are just tools and the companies behind them, the supporters. Never quite got it why some people take a critic to a programming language or a company personally...

                        Friday, October 19, 2007

                        Source Code

                        Scott Hanselman has a new column/category (Hanselman - Weekly Source Code) based on the tag to be a better developer started by Justice Gray (Be a Better Developer in 6 Months).

                        Like Justice Gray committed to reading 1 book per week, Scott Hanselman started reading good source code (as it's one of the best ways to be a better programmer, with which I deeply agree - Reading to Be a Better Developer - The Coding4Fun DevKit).
                        In one of the recent blogs in this category, I discovered this post: LukeH and a fully LINQified RayTracer. It's a LINQified C# ray tracer (in one Linq Expression that extends over 60 lines!!):

                        var pixelsQuery =
                        from y in Enumerable.Range(0, screenHeight)
                        let recenterY = -(y - (screenHeight / 2.0)) / (2.0 * screenHeight)
                        select from x in Enumerable.Range(0, screenWidth)

                        ...

                        select new { X = x, Y = y, Color = traceRay(new TraceRayArgs(ray, scene, 0)) };

                        He advises against this type of programming (big bang methods/expressions), but it's one heck of an example for Linq.

                        In the same vein, he links to a list of Programming Guiding Principles by Patrick Cauldwell (This I Believe- The Developer Edition), that start with general principles and has good advices on Unit Tests, Test coverage, Buy vs Build, Dependencies, Contracts (code), Visibility, Authentication, Tracing, Error handling and The definition of done (it means different things to every developer that I know).

                        Reading all this stuff about code (and the code itself), I started thinking about what are good qualities of Source Code. So, to every beginner (and as a reminder to all senior developers that practices and repetition makes perfection) go [re-]read Code Complete by Steve McConnell...

                        To me, the first that comes to mind is:

                        • stop and think for five minutes before defining the name of a method, variable;
                        • think early on on the design of the code (method or class), it's complexity, performance and all other important characteristics;
                        • DRY (Don't repeat yourself - don't copy paste code, documentation, use cases, ...)
                        • KISS (Keep it Simple ..., the design, the methods)
                        • ...

                        Thursday, October 04, 2007

                        Functional Programming Examples?

                        Recently, Mario Romano and me did a presentation about Language Trends, focusing on Linq and the .NET framework. It's clearly a trend in .NET, and there are some great blogs about FP in .NET:

                        Wes Dyer

                        Wesner Moise nStatic

                        Mechanical Bride

                        FP Tutorial

                        This one focuses more on Lisp, Haskell and other FP languages, but gives an hand to us imperative programmers:

                        http://www.defmacro.org/rss/news.xml

                         

                        My main problem is applying this to practical day to day examples. I've seen many examples of the Fibonnaci, Hanoi Towers and others that are great to get your hands on code, and I can relate to the advantadges of FP (composability, restriction on side effects, conciseness/terseness?).

                        But I think that to convert more people (or at least look) to FP, there has to be some compelling examples that clearly show the advantadges. Maybe related to Web programming (continuations), or meta-programming (get rid of code generators/copy-paste and with the aid of expression trees and higher order functions, define the basic skeleton of methods - try , validate, call some function passed in, log exception, commit, rollback).

                         

                        Any good examples/ideas?

                        .NET Framework Source Code

                        It's all over the place, but here it goes again: Microsoft is making available the sources of much of the .NET Libraries with Visual Studio 2008. It will allow to step into framework code while debugging!

                        http://blogs.msdn.com/brada/archive/2007/10/03/net-framework-source-code-release.aspx

                        http://blogs.tedneward.com/2007/10/03/Reports+Of+Snowballs+In+Hell.aspx

                        Wednesday, September 19, 2007

                        The Past and the Future

                        Just finished reading two great articles about the past, history, and the future.

                        The first article is from Joel Spolsky (Strategy Letter VI) and how history tends to repeat itself. Starting there, and with some extrapolation from performance considerations in the past (with Excel) and the present (with Ajax and JavaScript), some patterns become apparent (although the actual names could change). But the real kicker is from the comparison between mainframes, CICS and the evolution to microcomputers and Windows to the current state of the Web and it's problems (interoperability between Ajax applications). Very good and thought provoking. It just reminds me of how circular history is: we started with mainframes and dumb terminals, then to client PCs applications (Office, Access, fat clients), then back to big servers and thin clients (with the Web), and with smart clients we are returning the processing power to the client.

                         

                        The other great post (more of a link to actual MSDN content - Finally some news on PLINQ), reveals the future after .NET 3.5 and MS solutions to the increasing cores with the same speed problem.

                        The MSDN article shows the primitives for concurrency and how easily concurrency can be "added" to a program. The PTasks article shows it's application to Linq (specially Linq to SQL).

                        Just checked the MSDN October cover, and it's mainly about concurrency/threads and multiple cores. I'll put it on my short list of "must read"s.

                        Marcas Technorati: , ,

                        Friday, September 14, 2007

                        Yet another PK debate

                        Recently the article in http://weblogs.sqlteam.com/jeffs/archive/2007/08/23/composite_primary_keys.aspx got me thinking again in DB design, especially the PK dilemma (natural or surrogate).

                        So i googled around to see the current discussion and came up with some links:

                        Articles defending surrogate keys:
                        http://www.4guysfromrolla.com/webtech/110806-1.shtml
                        http://searchsqlserver.techtarget.com/general/0,295582,sid87_gci1156604,00.html
                        http://r937.com/20020620.html
                        http://www.bcarter.com/intsurr1.htm

                        http://www.aspfaq.com/show.asp?id=2504
                        http://databases.aspfaq.com/database/what-should-i-choose-for-my-primary-key.html
                        Articles defending natural keys:
                        http://www.sqlteam.com/article/identity-and-primary-keys

                        http://www.solidqualitylearning.com/blogs/Dejan/archive/2005/02/24/434.aspx

                        http://groups.google.com/group/microsoft.public.sqlserver.programming/msg/6d61dbf80d6f0fb6?hl=en&lr=&ie=UTF-8&oe=UTF-8&rnum=14

                        But returning to the initial article, in it, it's argued that in relation tables the PK should be natural (composite of the PKs of the relations). Naturally, it seems right, and it aligns well with the relational theory. Unfortunately, business clients don't align so well ;)

                        The initial problem is modeling a CustomerProducts relation table, to which the following two solutions are proposed:

                        Create table CustomerProducts
                        (
                            CustomerID int references Customers(CustomerID) not null,
                            ProductID int references Products(ProductID) not null,
                            OrderLimit int not null,
                            Primary key (CustomerID, ProductID)
                        )

                        Create table CustomerProducts
                        (
                            Customer_ProductID int identity primary key,
                            CustomerID int references Customers(CustomerID) not null,
                            ProductID int references Products(ProductID) not null,
                            OrderLimit int not null
                        )

                        The exchange of ideas in the comments was good, and it got me thinking how I do design.

                        So here are my thoughts (published also in the comments of the initial article).

                        One of the downsides of the modeling based on natural keys is it's averse to change.

                        Of course with an identity column, the model, the application must be validated, and possibly changed (including migration) to acommodate the change. But the change doesn't ripple throughout the model as with natural keys.

                        I honestly can't see the big difference between entity tables and relation tables. I could argue (and many DB purists would) that the PK in an entity table should be a natural key (probably composite).

                        And it appears you can relate to the problems it introduces:

                        • very difficult to change (I can only imagine what must have been the migration of a DB where ISBN was used as a PK, or using the bank filial as part of the PKs of bank cards - imagine changing the account);
                        • no significant advantadge over the more flexible surrogate model.

                        To me the same argument holds with relation tables. It eases future changes. The disadvantages are extra complexity, but the end result is more flexible.

                        I haven't seen a solid example of the supposed advantages (in practice) of the composite key approach. The extra space isn't so much as stated if considering that other indexes include the PK. So the wider is the PK, bigger becomes the index.

                        I haven't seen any examples or papers that demonstrate in practice that it's more performant to use composite keys (in general).

                        Regarding the purity or correctness of the model, I can only say, that it's a model... It can be abused if one tries to be to flexible, but sometimes it's better to be flexible and pragmatic than to be rigid and theoretic.

                        This comment generated a response by Jeff Smith, in which he defends his article conclusions.

                         

                        Until any of this is refuted, we've clearly established that, for relation tables:
                        * using a natural PK uses less space, less resources, since if you use an identity, you MUST also introduce a unique constraint on your true key columns
                        * using a natural PK clarifies your data model and makes it easier to read, understand and work with
                        * using a natural PK simplifies your SQL statements
                        * using a natural PK enforces data integrity without the need for an additional unique constraint
                        * using a natural PK results in true normalization, since your attributes are fully dependent on your primary key and not just a meaningless, random number
                        * using an identity PK actually complicates things like editing a relation, if you are relying only on the PK, since knowing that you are editing or viewing "CustomerProductID 3922" means nothing, while knowing that you are editing or vewing the relation between "CustomerID 23 and ProductID 4" tells you everything, simplifying your code and your logic.
                        I'd write more, but again, I feel as though I am just repeating the same points over and over and no one is specifically addressing them ....
                        Let's talk actuals, here, not theory ... can you please give me a simple example where using an identity as a PK for a relation table, instead of a composite PK, makes future changes "easier"?

                         

                        But this got me thinking even more in the merits and shortcomings of each design technique, in primary keys and it's relation with natural keys, business keys/rules, etc.

                        So my response headed for evolutive maintenance, change, flexibility:

                        I just can't run away from a good discussion. :)

                        Suppose that the client comes in and wants to introduce some “minor” changes/improvements.
                        First, he wants to keep the history of changes in the CustomerProducts table. So you create a CustomerProductsHistory table with the timestamp of the operation.
                        But he also reports that querys to find associations within an OrderLimit range are running very slowly. So you add an index to the OrderLimit.

                        These “changes” compound and the table becomes larger and wider with more indexes. Child tables are added and because we shouldn't use a surrogate as PK, we keep the parent PK and add some other attribute. Using Ids as the PKs from Customer and Products, the effect is lessened (the size of the PK is just double of a surrogate key, but with natural keys the effect is clear), but eventually your indexes start to suffer because the PK is larger, as well as all child tables PKs and indexes.

                        Future changes are easier because you get more flexibility in maintenance and with the model.

                        • In maintenance: if a mistake was detected and you MUST correct an association (the CustomerId or ProductId), you don’t have to mess with the PK. Because you didn’t expose the PK of the table to the customer (the surrogate key), you can change the data easily.
                        • In evolutions: if there’s a need to change the CustomerProducts table (you want to turn it into a temporal table, adding a start date and end date), it’s much easier if there’s a surrogate key, especially if the table has child tables.

                        I think most pragmatic people, having been through similar situations, can see the benefits and flexibility of the surrogate key model. Besides that, there’s also the other advantages mentioned like:

                        • Consistency in the model (easing mapping tools and development),
                        • Comprehension and usage of the model (this table PK is what 4 columns? And joins this other table with what 3 columns?) because the PK is just TableNameId and joins with the FK named TableNameId

                        The last point I don’t understand. You say CustomerProductID 3922 means nothing. Yet CustomerId 23 has meaning? Why? Only if you look it up in the Customers table… I just hope you don’t show CustomerId 23 in your UI, because then you just exposed the PK, and you can’t ever change it easily.

                        I’ll concede to the argument that it’s not completely 3NF. But if you go this way, why not make all the tables 5NF mandatory?

                        Just my 0.02 €

                        And just to clear Jeff Smith's doubts:

                        When you create your tables, and you are adding your identity primary keys, do you also create additional unique constraints to all of your tables to ensure you have data integrity?

                        Got me thinking in the design of the models that I currently adhere, and why I like them:

                        Yes, of course.
                        But maybe I wasn't explicit in my last comment (or there's some ambiguity in my terms). To me TableNameId is the PK of the table. It's "private" and not exposed to the business. If the natural key (defined by the business) is just a number, I normally use TableNameNumber (eg: CustomerNumber, ProductNumber). This column is visible to the business, and follows the business rules (no gaps, prefixes, suffixes, etc).
                        This way the model is very consistent, predictable, flexible and usable. It may have some overhead (not much), and not be completely normalized, but to me it's a good compromise.

                        Marcas Technorati: ,

                        Wednesday, June 06, 2007

                        Trends in Programming Languages

                        It’s been a while since my last post, due to work pressure and little spare time. But in between some code analysis for the maintenance projects I’m involved, the preparation of a Linq presentation and some new blogs I discovered, I’ve been thinking about program language evolution.

                        It’s not new that imperative languages are integrating functional characteristics. The novelty is the big push by Microsoft, mainly Anders Helsjberg and the .NET team in that direction. Right now, we have XAML, WPF, WCF, and Linq, SilverLight, Entities, VS Orcas just ahead. I still remember the backlash that Java received because it incorporated Garbage Collection, and a Virtual Machine that run bytecode. Nowadays, the main platforms (C# and Java) all include these characteristics, and the new contenders (Ruby, Python) all incorporate it.

                        Recently I found a great blog that explains the characteristics and advantages of functional programming, in a way that our imperative brains can associate with. In the aptly named article Functional Programming For The Rest of Us, the author presents the following advantages:

                        • Greater Expressiveness, which equates to less lines of code and less bugs;
                        • Meta-programming in the same language, dismissing the need to learn a new languge and displacing many code generators;
                        • Code as Data (in Lisp, it’s possible to treat code as data, by quoting), opening the door to code manipulation, analysis and generation;
                        • Data as Code (in Lisp, a data structure can be “executed”);
                        • Composibility, or the possibility to change/define an application behaviour by the simple composition of functions;
                        • No SIDE EFFECTS
                          • Easier testing as the result of a function depends only on it’s arguments and produces no side effects. This way a test with the same arguments has always the same behaviour;
                          • Debugging is simpler as there are no dependencies on Global State;
                          • Concurrency (with the advent of multi-cores, this is extremely important) scales much better, because locks aren’t necessary because there is no shared state;
                          • Hot code deployment, without the need to persist and import state (it is passed on the stack in the arguments), it becomes possible to substitute code without stopping the system;
                          • Machine assisted proofs and Optimizations, without state and side effects, it’s possible to reason about the functions and demonstrate some characteristics about it

                        From my recollections, this blog and some other blogs on Functional Programming, these are the characteristics attributed to Functional Programming Languages:

                        • Anonymous Functions (lambda calculus)
                        • Type Inference
                        • Higher Order Functions (functions as a first citizen language construct. Can receive and apply functions passed as arguments)
                        • Currying (adaptation to an interface, by fixing one or more arguments)
                          • int pow(int i, int j);
                          • square = int pow(int i, 2);
                        • Lazy Evaluation allows:
                          • New types of optimization (complex calculations that are not used aren't computed)
                          • Creation of new abstract control structures
                          • Infinite data structures
                        • Continuations (allows the storing of state and the definition of return point or the function to execute)
                        • Pattern Matching (definition of functions based on types, values and expressions)
                        • Closures (functions that can reference the encompassing environment, typically defined in the body of another function, allowing the association of intermediate set of data with the function: http://en.wikipedia.org/wiki/Closure_%28computer_science%29)

                        Why does it matter or In what way does it affect me?

                        Due to the natural evolution of micro-processors, the future doesn't hold increases in speed but in integration of multiple cores in the same die. Because of this, it becomes very important to not limit the concurrency, which is simpler in functional programming (because of no shared state and the expressiveness).

                        Also the evolution in the .NET space points in the same direction (transition from imperative to declarative), with the eminent arrival of Linq. It's clear in the .NET evolution the trend to a more declarative style with the incorporation of many characteristics of Functional Programming.

                        .NET evolution

                        C# 1.0

                        • Function Pointers (delegate)

                        C# 2.0

                        • Limited Continuations (yield)
                        • Closures (anonymous methods)

                        C# 3.0

                        • Expressiviness
                          • Query expressions: from x in …
                          • Automatic Properties (get; set;)
                          • Partial Methods (“lightweight” event handling)
                          • Extension Methods: public static void Print(this string s)
                        • Type Inference for local variables (var)
                        • Lambda Expressions/Anonymous functions (x => x * x)
                        • Anonymous Types (new {c.name, c.Phone})
                        • Code as Data (ExpressionTrees)
                          • Func<int, bool> nonExprLambda = x => (x & 1) == 0;
                            • Initizalizes a delegate: public delegate T Func<T, A>(A a);
                            • Expression<Func<int, bool>> exprLambda = x => (x & 1) == 0;
                            • Is equivalent to
                            • ParameterExpression xParam = Expression.Parameter(typeof(int), "x");
                              Expression<Func<int, bool>> exprLambda = Expression.Lambda<Func<int, bool>>(
                              Expression.EQ(
                              Expression.BitAnd(xParam, Expression.Constant(1)),
                              Expression.Constant(0)),
                              xParam);

                        References

                        Functional Programming explained

                        http://www.defmacro.org/ramblings/fp.html

                        http://www.defmacro.org/ramblings/lisp.html

                        Artigo MSDN C# 3.0

                        http://msdn.microsoft.com/msdnmag/issues/07/06/CSharp30/default.aspx?loc=en

                        Expression Trees

                        http://www.interact-sw.co.uk/iangblog/2005/09/30/expressiontrees

                        http://spellcoder.com/blogs/bashmohandes/archive/2006/06/12/143.aspx

                        Closures

                        http://pluralsight.com/blogs/dbox/archive/2005/04/27/7780.aspx

                        http://www.claassen.net/geek/blog/2006/08/c-closures.html

                        Blogs

                        http://blogs.msdn.com/wesdyer/rss.xml

                        http://themechanicalbride.blogspot.com/feeds/posts/default

                        http://wesnerm.blogs.com/net_undocumented/index.rdf

                        Wednesday, January 24, 2007

                        .NET Tools

                        Recently I've been looking at tools that could alleviate the burden on lead programmers reviewing code produced by junior programmers, measuring it's quality and assisting in finding errors. But to me (and others), the current tools are not up to the task. It seems that the tools I could use with C a decade ago (while studying in the university) were more powerful than the current set of tools available for .NET or Java.

                        Only recently I found some tools that can help in this task (and the most promising one isn't yet available) and are affordable or even free (I don't know if many of the commercial tools are better).

                        The first tool, known by many programmers, but (in my opinion) used by few (me included) is FxCop by Microsoft. The tool is a code analysis tool, that does some style checking and validates that software conforms to a set of rules, signaling non conformities as defects. In my experience, if the tool is used from the start of a project, it may succeed. On a project already built, it's difficult to use (on a recent project that I work in, it found over 1000 errors. I didn't check if they were false positives or errors that can be dismissed as minor inconsistencies).

                        The next one, is nDepend . It allows the identification of problem areas in the overall architecture of a program (solution), identifying big/complex methods, non-cohesive classes and helps in understanding the program at an higher level. It allows the definition of rules and alerts with a SQL like language. The site has some videos showing its usage.
                        I haven't tried it yet on a complex project, but if it easily allows to flag calls from one layer to the wrong layer (UI to data, or business to UI) it seems promising. Another feature that I'm eager to try is the generation of metrics (# lines, # methods, # classes, cyclomatic complexity, % comments).

                        A little on the side are the "projects" like AspectC# and other languages that allow the separation of concerns and the "injection" of code, that could be used and parametrized to catch some of these errors. But it's a hard task just to get to a point that provides an interesting return on the investment made (in setup and parameterization time).

                        The last tool that I read about (this one isn't out yet), is nStatic. Now if this tool does half of what's shown and talked in Wesner Moise's blog, it will be a life saver. Apparently it interprets the program not as lines of code in a text file (like most IDEs do today), but in an functional way in internal memory (maybe those ASTs - syntax trees that we learned in compiler classes). With this representation, and using an AI workbench, it computes the paths in the program that generate errors (and the assumptions for those errors).

                        This reminds me of the first time I heard of an internal representation that wasn't the text source file . I think it was Visual Age for Java, that stored the tree representing the program in a DB . This approach (VA failed maybe for being ahead of it's time, and some shortcomings in speed and in error tolerance) has an enormous potential as shown by the current projects by JetBrains with Metaprogramming System and Intentional Software (I haven't tried either and I don't know if Intentional Software has some tool available for test). Just imagine the refactorings possible this way. And if it's possible to plug-in in to this representation, it would be much easier to find coupling mistakes (between layers), copy-pasted code, etc..

                        But back to nStatic, one of the features mentioned was finding the possible exceptions for a method and it's assumptions (for the exception). That will assist in finding many subtle errors. I'm just curious to try the tool (I'm thinking in applying for the beta), and have some questions/requests:

                        • will it allow to detect similar or duplicate code (I found Simian, but I haven't tried it on my complex project that I know has much similar code, to figure out if the match is by line or by abstract representation of the code)?
                        • will it be extensible, or allow access to the internal representation of the code, or provide an API?
                        • can the scan focus on one particular method (and be completely thorough with that method)?
                        • will it allow a fine grained control over loops (limit the # of times in a particular loop by annotation or coment) and other constructs?
                        • will it allow exporting graphs, assumptions to a XML/text format?
                        • it could be used to track dependencies(?) between assemblies/classes?
                        • could it generate some metrics, and integrate some features from other tools, meaning that it would be the only code analysis tool needed :)