Tuesday, August 21, 2007

Naming Objects In Your Database

This is something I posted first in a SQL Server forum. Here it is again, touched up a bit.



So you want to name the objects in your database: your tables, your views, your functions, and anything else. And there are a lot of people out there who name them with what the object type is. Maybe even with more information. For example:

    tblOrders
    Table_Orders
    sp_OrderSend (NEVER prefix SPs with "sp_" because the server will look for a system SP first!)
    Proc_OrderSend
    OrderStatusFunc
    fn_OrderStatus
    fn_t_OrderStatus
and even worse:

    joe_sp_OrderSend
    mike_sp_OrderCancel
    david_fn_OrderStatus
Then there are column names:
    ID
    ProductID
    keyProductID
    pkeyProductID
    pkProductID
    pkProductIDCode
    frnkeyProductID
    fkProductID
So does this stuff help at all? Is it useful to put the type or relationship in the name of objects? Can you leave the name off the ID column in your tables? Should you put more information in there? What does it matter, anyway, as these are mere naming conventions, right? Just pick something, and stick with it!

I agree that consistency is important. But consistently helpful is a far improvement over consistently something-less-than-that. Consistently unnecessary is extra work and frustration. Consistently interfering impairs your ability and speed. Consistently mediocre is not the greatest.

Overall, I think it is best to take a minimalistic approach. I hope my reasons make sense to you, but whatever you do pick, you should have clear and compelling reasons for what you do. Notice the compelling part. Your default should be to avoid clutter and only add things if there is a materially demonstrable reason to violate that rule.

First of all, much of my advice is meaningless if you're using a GUI where one performs the less taxing task of recognition instead of the recall required to write queries in text. If you are writing queries in text, more power to you, and if you keep doing it I predict you'll eventually see the sense of my recommendations.

But if you're using a GUI to write queries you'll only reach a certain level of ability, never stepping into the next level of professional skill and seasoned experience in the SQL realm. I haven't used a GUI to write a serious query in a very long time (I might have started a couple in an Access database but even then switched to SQL view). I recently helped someone who posted his query that was obviously built in a GUI and it had about 10 left joins. I seriously doubt they all needed to be left joins, which means he was killing performance for no reason.

So if you are writing queries in a text editor, here are some thoughts for you:

• Lose the underscores. They clutter up the screen. They take the shift key to type and also your pinky finger which has to move two rows up from the home row. And using CamelCase is just as clear and saves space and that pinky traversing. And if you're really feeling lazy you don't even have to hit the shift key.

order_id
customer_name
device_code_reason

OrderID
CustomerName
DeviceCodeReason

Look at it this way: if you can train your eyes to scan either kind of naming convention just as efficiently, why not use the one that involves less typing and takes less space on the screen?

• The prefix/suffix naming convention is not always understood or followed. In one shop I worked in they named tables with prefixes: tbl for base tables, tlk for lookup tables, and tin for "intersect" tables. But no one really knew what they were for sure. "tin" tables were supposed to be transactional tables that had constantly changing data such as orders and transactions. "tbl" tables were supposed to be things like customers and products. But the problem was that many tables didn't fit a clear definition, and people were careless, or came along years after the creator of the database (who yes used a GUI database creation program), and they got named wrong. So now after all that silliness and extra typing, there was a "tin" table that was really a "tbl" table and a "tlk" table that was really a "tin" table after some changes, and ... so on and so forth. Just leave off that junk.

• You'll eventually become completely sick of typing the extra characters over and over again for NO GOOD REASON. That is, if you care about speed. If you don't care about speed, type your prefixes merrily. If you do get tired of them, though, you'll have these prefixes so wrapped up in your code and database that ripping them out becomes impossible. All those extra letters will come to be so much useless froofroo to you. At the same shop as tbl/tin/tlk they really named columns like keyOrderID and frnkeyOrderID. They added nine characters of typing to EVERY SINGLE JOIN I ever had to write. After a year of working there, the pain was no less. It took me about 2 weeks to learn the primary and foreign keys of that database. The other fifty weeks I wished the database designer had left well enough alone.

• Naming is something that is so much more important than most people seem to think. What does it matter what we call things, as long as we're "consistent," right? I disagree. My experience is that the naming in a database the single most important thing that structures how programmers relate to the data and that determines their final speed and facility working with it in the long run.

Mislabeling things is like putting up signs at the cliff edge that say "keep going" and "this is the right way" when it's not. Label things appropriately.

• Scanning and comprehension speed suffer when extra things are tacked on. When you look at a stored procedure or query, if you yourself weren't just working on it yesterday, then you have to digest it to understand it. The more prefix_blob_description_gorp you have in there, the harder it is to see what's going on. The assumption here is, you know your database like the back of your hand. Will that extra stuff speed you up now or slow you down? Who should you code for, the least common denominator or a reasonable level of facility? In fact, this brings up another important point:

• You do not want to give that new developer, who is unfamiliar with your database, a prematurely quick sense of competence and ability in your database. By doing this, you increase the chances that he or she will make some serious mistake. This is the opposite mistake of mislabeling. Now the path over the edge of the cliff is littered with so many friendly and distracting signals that the poor new person gets the same wrong signal to keep going when the correct response ought to be, "wait, I'm not sure I am doing the right thing." I'm all for helping new developers get going as fast as possible. But they ought to do this by studying the schema and practicing with it. Don't make it so apparently easy (but not necessarily materially easy) that even an ignorant person will believe he knows all about your database.

• The information is unnecessary and ultimately makes developers remember more stuff, not less. It sounds at first like a great way to expose information about the tables and objects. But the fact is that the period of time where a new developer isn't familiar with the database is actually very short. So for the benefit of two months of instant answers to the beginner, everyone else in the shop is suffering through unnecessary baggage.

• Don't label things with each developer's name. Why would you do that? Do all the developers write different versions of the same SPs? Are they not production SPs but simply a sort of learning environment where each can write whatever he wants so it's important to keep the names straight? In that case, why do the objects have the same owners? Just use a different owner name and don't put your name in the object. But wait, why are they even in the same database? Why does it possibly matter who wrote something in the function of the database as a whole? Things should be named according to what they are or do, not who worked on it first or last.

If I was offered a consulting job on a database that had every stored procedure and object using different names based on who had written them, I would refuse the work for fear of going absolutely nuts. If I was forced by my situation to take the work, I would every day curse the people who decided that putting their names on their stuff would somehow have any kind of long-term value and I would quit the moment I found something else. Long after Shannon and Victor and Felix are gone from the company, poor developers are toiling away trying to remember if that was a felix function or a victor view or perhaps it was a shannon function and a felix view! GAHHH!

• Spend careful and deliberate time naming things. I often use a thesaurus when designing a database. I have at times spent (cumulatively over the project) hours working out the correct naming of things. But when I was done, WOW how things flowed. I've worked with databases where the detail table wasn't, and the parent table wasn't, and the order table wasn't, and the Determination column wasn't (it should have been called DeterminationMethod, the actual determination was called something else entirely). Every single one of those is a roadblock to proper usage, even to the developer who's been working with that database for years. Make it intuitive instead of notated.

The way I do it is to never add anything to table names, and to use carefully chosen words for my functions and SPs that make it clear what they are. For example, all my SPs are either SubjectVerb or VerbSubject, consistently throughout the database (still experimenting with which I like better--I'm leaning toward SubjectVerb). Functions have words that make it clear what they do. NumberToHex is a function. OrderNumberToOrderID is a function. ProductDetailCount is a scalar function, ProductDetail is a table, and ProductCreate is a stored procedure. Develop your own pattern. I can go through my own code so fast when I've been careful like this. The meaning nearly leaps out at me.

• If you absolutely must use a prefix or a suffix (I lean toward suffixes myself because I like things to group by subject instead of function) why not use a single character instead of three? Tables can still do without any fluff at all. Then your other objects are still distinguishable:

OrderNumber_OrderIDF

(Before you criticize why one might have order numbers and order IDs, consider a system that has to handle orders from disparate sources which use overlapping order number schemes. So you either have to have a composite key--source,ordernumber--or use a global and internal order ID that can be unique per order.)

And think about that suffix part. In what possible situation would you see all your objects mixed together, and you need to ordered them by type but don't have a column that specifies type? This is the only way I can think of that a prefix with the type might be helpful. If it was actually needed at all because the objects aren't named intuitively.

• The object type can change. What if you decide to change the structure of one of your tables, and to support your older versions of software, you build the new table with a new name and convert your old table to a view? Now you have tbl_Whatever that's a view. Whoops. And what if you need to use that really important sp, and you name SPs with their creator (shudder), only you can't remember who wrote it? Now you have to look it up. What a time waster. What if you can't remember if it's an inline table function or a table-variable function? More wheels spinning. If these seem farfetched, they are not. I have worked with databases that had "tbl" objects that were views and believe me it was a nightmare remembering which was which. Who cares what it is! Views and tables are used in exactly the same way so why call them something different? If you're worried about the execution plan and query performance (and you ought to be) then you're already at a level that demands familiarity with the type of objects you're working with. If you absolutely need to notate that something is a view, put it in comments.

• Think of it this way: you're trying to build a race car--a sleek, elegant, efficient thing that can go as fast as possible and be interfaced with well. What good is a racecar that can theoretically go 200mph if the steering wheel has sharp spikes all over it and the driver has to spend extra time avoiding them? He won't be spending as much time at top speed as he ought to be.

• Name a column the same thing everywhere. Don't call it different things in different tables. Especially don't just call it "ID." Call it BlahID everywhere, including the Blah table that it originates from.

Calling all identity columns ID is certainly a kind of standardization. But it's a standardization of seriously doubtful value that makes different columns in different tables have the same name! This is now the opposite error: conflating things that one ought to be able to tell apart. If you've been trained to differentiate each thing by putting the type of it in the name, why now depart radically from that philosophy by calling all the IDs the same thing? From the beginner/new developer perspective, it seems to make things easier in some ways--no looking up column names when selecting, right? But it's not easier, it's harder. If you ever get the experience of doing it both ways in a large and busy production database (that is, ID vs. TableNameID), I suspect you'll eventually agree (unless you've become dogmatic about it by then and don't really care about the actual function or efficency any more).

I prefer to standardize things in this way: If there is an ID column, it shares the same name as the table, + ID. Now there's no confusion about what the name is. Now if you have a query with many joins and you need to use ID you're not getting confused.

• Another reason to avoid columns named "ID" is that you will eventually put the wrong alias on some join. You may not even notice because the query parses fine and you save your SP and off you go, but whoops, you got an effective cartesian product because your join condition referenced the wrong ID. Or just the wrong resultset. What a confusing mess. Much better, when you accidentally put D.ProductID instead of P.ProductID, for the compiler to complain that the table aliased by D has no such column. The alternate "D.ID" though would have compiled just fine and given the wrong results farther down the path, maybe even after release. It's much better to catch errors early.

• Selecting is not where the problem is. Yes, it's shorter to say SELECT ID FROM Table. Fewer characters. But the real problem is in JOINs. The problem is in old resultsets you saved in Excel that you don't know what table they came from because they so unhelpfully say ID. You don't know what some error message means that a user reported to you because all he remembered was the main column that the constraint violated and he says "there was a foreign key constraint in column ID!" Great. That's every table in your database.

2 comments:

Anonymous said...

Is there a Reader's Digest version of this diatribe? Geez!

Erik Eckhardt said...

Nope.

Scared to leave your name? Coward.

All you took away from my post (which no one forced you to read) is memory of a "diatribe?"

I can't force you to drink knowledge.