Many-to-many relationships, join tables, and Fluent

If you’re looking for a quick solution to why your join table’s rows aren’t being deleted, scroll to the end of this post.

For a long time I’ve been opposed to ORMs because of their “black box” aspect.  Now that Entity Framework is open source and distributed on NuGet, I’ve lightened my stance on ORMs because of their potential to save an enormous amount of time for the easy stuff.

However, once in a while, you pay for that convenience when the “weird ones” show up.  The weird ones being manifestations of behaviors that are almost impossible to explain.

Yesterday I encountered one such incident involving this post’s namesake.

I’ll show you the table involved first.

Image

This is a very common paradigm in software, as it represents an internal node in a graph structure.  A group can have zero or more children and a group can also be the child of zero or more parents.  This is accomplished by a join table, which I called GroupChildren.

The columns ParentGroupId and ChildGroupId have foreign key constraints back to the Group table.  Thus, you can’t delete a row from the Groups table without first removing its relationships from GroupChildren.

Now, Entity Framework abstracts these facts away from you.  Whether you’re using model first or code first, you can hide this table from your object model entirely.  You don’t need to create a GroupChild entity to correspond to rows in the GroupChildren table.  You can create navigation properties on your Group entity that represent the group’s parents and children, which will automatically reference the join table to find the entities with which to populate those properties.  This is accomplished by code like this in Fluent:


this.HasMany(group => group.Groups).WithMany(group => group.Parents).Map(m =>
{
m.ToTable("GroupChildren");
m.MapLeftKey("ParentGroupId");
m.MapRightKey("ChildGroupId");
});

This is of course in the context of an instance of EntityTypeConfiguration<Group> which I bind to my ModelBuilder during the context’s OnModelCreating override.

The above Fluent declaration creates the many-many relationship described above.  The call to Map is critical for establishing the join table.  Without it, Entity Framework would assume that the two key columns (parent, child) are part of the Group entity itself, and would create those columns in the Group table.  To be entirely honest, I don’t understand how that could even work in a database structure.  For example, suppose I have a single group that has 1 million children.  Using the join table approach, I would expect 1 million rows to exist in my join table.  Without using the join table, would I have 1 million rows in the Group table, with all its other columns duplicated 1 million times?  It doesn’t make sense to me to not use a join table.

So, I began testing some simple inserts into this object model.  Entity Framework, as expected, created the appropriate rows in my join table and populated my Groups and Parents navigation properties correctly when loading Group entities from the database.  Wow, this is great!  Classic example of a big time saving win.

However, things began to unravel when I started trying to delete entities from these tables, in a pretty big way.

Remember what I said about the foreign key constraints?  You can’t delete a row from Groups before you delete all of the rows which reference that Group’s primary key (its Id column) first.  This is a database-level constraint which Entity Framework establishes for you if you let the context create the database automatically.

Because of this, you are totally relying on Entity Framework to do the right thing here.  This is because you do not have an entity for your join table.  You have no way (outside of writing SQL explicitly, which is a giant KISS violation when using EF, unless you’re writing some specific optimization queries in known areas where EF falls short, such as bulk updating or deleting).  It is true that in deleting a group, it is probably faster to write a quick SQL command than to rely on the EF framework to load navigation properties and figure out what to do, but that optimization really should only be done if you find that it’s an actual bottleneck in your application.  TL;DR- don’t optimize EF with manual SQL until you identify an actual performance problem.  Otherwise you’re better off just writing ADO.NET.

The problem that I ran into is that Entity Framework was not doing the right thing all the time.  In computer science programs at universities, a recurring theme in many courses is the concept of determinism.  Deterministic behavior is critical to correct programs, because a deterministic program is one that behaves the exact same way given the exact same input every time.  In this case, Entity Framework was non-deterministically failing while trying to remove Group entities.

Here’s where it gets weird.  As part of my unit testing, I create a graph structure which mimics our business data that this graph is going to represent; in this case, it’s actually a tree, not a graph.  I wrote some simple code during the TestInitialize method of my unit test class to essentially wipe the database by deleting the entities one by one.  I chose to use this route instead of an easier SQL script approach “DELETE FROM <table>” as a way to further test my Entity Framework code.

Each time I would create this graph structure – and the structure was identical in each test case – I would find that the following test run would fail, during the delete on the Groups set.  One row would simply not delete, and the error message was a foreign key constraint violation.  Basically, Entity Framework seemed to arbitrarily forget that there’s a join table with a foreign key relationship.  And what’s weird about this is that it would be different rows in different cases.  I’d try it once and it would fail on some random internal node.  I’d manually blow away the database tables in SSME, I’d recreate the structure, and the next time it would fail on a random leaf node.  Sometimes it wouldn’t fail at all.

Non-deterministic bugs are the worst.

In this case, Entity Framework violated the principle of least surprise by surprising me.

Knowing nothing else about this system, it’s fair to make these assumptions:

1) By declaring the many-to-many relationship and declaring the join table, Entity Framework is aware of the possibility of foreign key violations when deleting from the Groups table;

2) Entity Framework will always run a query like, “DELETE FROM [GroupChildren] WHERE [ParentGroupID] LIKE @groupToDelete OR [ChildGroupID] LIKE @groupToDelete” before ever running a query like “DELETE FROM [Groups] WHERE [Id] LIKE @groupToDelete”

The problem is that #2 is not true.

The reason for this is complex and long.  Mostly, it’s because Entity Framework is only a viable technology solution if it performs almost as good, or even better than a human being writing the SQL manually.  It’s a corollary to the argument made against managed-memory stacks vs. native stacks (e.g., .NET vs. C/++).  The C++ programmers claimed that .NET would waste tons of memory and not work for production applications.  It would also lower the quality of programming because .NET is easier to understand and use.  The same arguments can be made against Entity Framework, and unless it can prove, as .NET has, that neither is necessarily true, it will never be adopted.

The long story short is that Entity Framework gives you very fine control over what to load and when – but also makes its own decisions based on, well, somethingIn my case, I found that the entities that simply wouldn’t delete were also not loading their navigation properties (e.g., Groups and Parents were both null).

This was curious since other entities did not appear to have this issue.  I tried calling Include to explicitly load the navigation properties, but I had the same problem; even when it was quite clear that Entity Framework understood that a particular Group had either parents, children, or both, it would not bother to delete those rows from the join table.  It would only execute the delete command against the Groups table, which would of course fail with the foreign key constraint error.

I simply could not get Entity Framework to delete my join table rows.

I involved another senior developer at my organization in this problem, because I was ready to pull my hair out.  Among us, he has the most experience with Entity Framework and I hoped he had seen it before.  As it turns out, he hadn’t.  And was just as baffled as I was.  We walked through the symptoms and could not explain them.

In the end, we narrowed it down to two possibilities.

The first is that, unlike the majority of developers, I implement Equals and GetHashCode for interface objects, a.k.a. DTOs.  I do this because it makes unit testing, particularly of complex hierarchical data structures, vastly easier.  My Group entity is only one part of a data structure that contains a large number of properties, collections, and so on.  When I want to confirm that the structure I get out of the database is exactly the same as the structure I put in, if every object in my domain has an Equals method that is correct, then I can simply use Assert.AreEqual.

The problem is that in order to implement true equality, particularly when you’re dealing with objects whose references are implicitly not equal, you need to compare collections for value equality; this means that your objects are fundamentally mutable.  Mutable objects pose a major problem for GetHashCode because one of the rules of  GetHashCode is that it shouldn’t change.  However, another requirement of GetHashCode is that it returns different values for two objects whose Equals() methods against each other return false; in other words, whatever you take into account when comparing two objects for equality, you must also take into account when generating the hash code.

The reason this is problematic for Entity Framework is that by default it uses HashSet for navigation properties.  If your GetHashCode method returns a value based on the value of properties, then it will change as those properties are loaded.  Entity Framework will put your entity into its HashSet navigation property, then it will populate the entity.  This will mean that your GetHashCode() method will return a different value later on, and it will appear to not exist in the collection.

If EntityFramework doesn’t think your entity exists in a navigation property, it won’t understand the relationship.

To get around this initial problem, I replaced all instances of ICollection for my navigation properties with IList.  This prevented the weird behavior of the HashSet.  But it still didn’t work, because Entity Framework uses complicated change tracking mechanisms to determine what it needs to do when you call SaveChanges().  If you do anything outside of the box which might confuse it’s ability to determine whether an object exists in a collection, such as Equals or GetHashCode, you are jeopardizing EF’s ability to figure out what it needs to remove from tables.

In the end, after much toiling and tolling, I did 2 things to make this problem go away.

1) Update to Entity Framework 6.  This is still in pre-release, but is available on NuGet if you stop filtering only by stable builds.  I am not sure this is required to solve this problem, as you can do the same technique in EF 5.0.

2) Initialize all of your navigation properties according to this article.  My colleague found this.

According to another one of my colleagues, I’m the only person on earth who overrides Equals() for reference types, which explains why Entity Framework waited 6 releases to solve this problem.

Anyway, this problem took me way, way too long to solve, and I’m glad it’s gone away.  For now.  Good luck!

Public transactional services are a security risk

November 26, 2012 Leave a comment

Sorry it’s been a while since I’ve posted.  I’ve been super busy writing code.

Recently, I developed and deployed a RESTful web service that supports an OEM who was manually copying data out of their homegrown application into our public web application.  They can now use our service within their application.  Instead of storing data locally and then copying to our systems by hand, they just store data in our system.  Oh, the magic of web services!

In the past, most of the service work I did were components of a service-oriented architecture, and as such, these services were almost always exposed only with named pipe or post-DMZ TCP channels.  These were mostly SOAP, mostly app-to-app, and not open for public consumption.

So, as such, they were transactional.  If you are not familiar with transactional program, I strongly suggest reading everything you can about it.  Transactions help to prevent a failure in one operation in a sequence from resulting in an unpredicted and unsupported state; for example, suppose you must insert serveral related rows into several related tables, and these may occur not in a single SQL statement but instead as a result of a series of business layer method calls.  Suppose one row insertion fails.  Shouldn’t the other rows be automatically deleted?

As it turns out, that’s exactly what a transaction does for you.  When you wrap your sequence of method calls in a transaction (finished by calling Commit() on the transaction scope – you’ll understand when you understand transactions in .NET) – if any of them fail by throwing an exception, a rollback procedure occurs.  You can write your own custom rollback, but if you’re doing nothing more complicated than updating some rows in a database, you don’t have to – SQL server will automatically do it for you!

How, you may ask?  Well, simple.  SQL Server is transaction-aware.  If you begin a transaction in .NET client code, any subsequent SQL operations will be included in that transaction on the SQL server itself.  SQL Server will remember what it changed, and if it is told to rollback because an exception is thrown in your .NET client code (or, because you forgot to call Commit on your transaction), it will revert the database to the state it was in before the transaction began.

You may wonder how this works when SQL Server is another process on probably another machine.  You have a service running locally called the Distrubted Transaction Manager (or it might be stopped, which will become apparent to you when your transactional code fails with a message to this effect).  I won’t go into the technical details of how this actually works because I don’t know them myself.  Suffice to say, this service is responsible for announcing to all other transaction-aware processes what is going on with a transaction.  If one of the actors involved in the transaction demands a rollback (for any reason), all the other actors will receive this notification and will also agree to rollback their stuff.

As it turns out, WCF is (optionally) transaction aware.  If one of your clients begins a transaction on their machine, his transaction will be propogated to your service.  If your service uses SQL server, your service will propogate that client’s transaction to the SQL server, too.

If your service is transactional (by using OperationBehavior attributes to allow transactions), then a client can call several transactional service methods, all of which must succeed before any change is made permanent.  If the client throws an exception for any reason before calling Commit locally, or if any of the actors in the transaction demand a rollback, it will be as though nothing happened at all, and your state remains consistent.

There’s a problem with this magic, though.

Generally, in order for SQL Server to guarantee state consistency, it uses locks.  Row locks, and more commonly, table locks (whenever inserts or deletes are involved).  if you are a guru at this type of thing, you can use hints and other mechanisms to minimize locking, but this is the default behavior for a reason.

You’ll find that if you are in the midst of a transaction which has locked a table on one thread and another thread comes in and attempts to do anything with that table, thread #2 will wait patiently (e.g., is blocked) until transaction #1 is done.

What if the transaction in thread #1 is poorly written or malicious?  What if the transaction simply inserts something into a table and then uses Thread.Sleep to block for a few hundred years before the client transaction is allowed to complete (either commit or rollback?)

Well, the answer is that, for the duration, that table is locked.  Any other thread trying to use that table waits forever or until it inevitably times out.

Now, SQL server, and WCF, both have mechanisms in place to prevent this type of thing from happening in the form of transaction timeouts.  SQL will not lock a table for ever.  It will not hold the connection with which the transaction is open forever.  WCF will timeout in accordance with your configuration options.

But these settings are the kind of thing that a typical DBA and developer will overlook because they are somewhat off the beaten path.  Even if they are relatively modest – say, 60 seconds –  it’s still incredibly easy to run a denial-of-service attack with a malicious client.  60 seconds is a long time, and what’s to stop you from wrapping your 60 second intentional lock in a while(true)?  You’re doing nothing but hammering the service, locking the table until you timeout, and doing it again.  Meanwhile, all of the other legitimate users of your system are completely and utterly locked out, because all of their operations are also all timing out while they wait for the malicious user’s table lock to expire.

How can you avoid this?  By being very smart and very careful.  It is very hard to design a public transactional service that is immune to lock-related denial-of-service attacks.

How do I avoid this?

Simple – I just don’t let clients proprogate their transactions to my services if they are public facing.

But Evan, you may be thinking, I want my services to be transactional because I don’t want my state corrupted.  That’s the point of transactions to begin with!

Very true.  You can still use transactions inside the body of your service method if you need to ensure that all of the rows get inserted properly. 

Your service methods should be state-atomic, meaning that it is not possible for a client to call any single service method that would result in an inconsistent state.  If your intention is for the client to call two methods back to back with the assumption that they won’t fail (or else your state is corrupted), you need to combine those two methods into a single one, because clients are totally unpredictable and you should assume that they are unreliable.  Even if every client writes perfect state-maintaining code according to the documentation you’ve given him, what happens when he loses power mid-exeuciton so that only the first part of the two-part state transition succeeds?

In addition, your service should always be reversible.  If you have a method that creates something, you should also have a method that deletes it.  If you have a method that deletes somethimg, you should have methods that would allow you to recreate the object entirely after deleting it.  If your service is implemented this way, then a client could write an entirely client-side transaction with a custom rollback routine.  Creating three objects, and one of them fails?  Call delete on the other 2 objects.  That’s the custom rollback.  Client gains transactional control (mostly) without also gaining the ability to run an intentional or accidental DoS attack on your service.

TL;DR – don’t make public services transactional unless you know exactly what you’re doing. 

Categories: Uncategorized

Tip of the Day: InternalsVisibleTo doesn’t work?

This one will be short and sweet.

Often, I use the InternalsVisibleTo attribute to allow a test assembly to get access to the internals of the assembly it’s testing.

Unfortunately the IDE doesn’t make this easy for you.  You need to make sure you specify the assembly name exactly and you also need to supply the full public key (not just the token!).  Because of the way this mechanism works, it’s difficult for the IDE to know whether or not you’ve made a mistake.  It could issue a warning if it detects that an InternalsVisibleTo attribute is specified which names an assembly that isn’t in your solution, but I’m sure you can imagine a scenario where that is exactly what you intended.

This is compounded by a nasty bug.  Namely, even when you specify the parameters to InternalsVisibleTo correctly – that is, the name and public key token are correct – the IDE willstillissue errors in your test project and it will not resolve internal symbols located in the source project…

… until you unload the project and reload it.  (Or, alternatively, exit Studio entirely).

TL;DR:

When you’re aboslutely sure that your InternalsVisibleTo attribute is correct but your internals are still not being resolved correctly, try unloading the project and reloading it.

Categories: C# Tags:

Tip of the Day: Null vs. DbNull

Lately, object-relational modeling systems (ORMs) are all the rage.  Hibernate, NHibernate, and most recently the Entity Framework from Microsoft.

Soon, I will dedicate a new post about why I don’t use ORMs.  The long and short of it is that for all the time they save you doing the easy stuff, you spend 3 times longer trying to figure out how to make the ORM do the hard stuff that you need it to.

So, even today in 2012, I do my database work with ADO.NET.

I also don’t use the DataSet/DataTable classes.  Again, that’s another post.  If you’re like me and prefer to use the good ole’ fashioned DbDataReader, then you may have run into this conundrum before.

When you’re trying to get a column from a row (and that’s really what DbDataReader is), you use methods like GetInt32(columnIndex), GetString(columnIndex), and so on.

The problem with these methods is that they all fail if the data is null.  So, you generally must write code like this:

 var stringValue = null; If (row.IsDbNull(index) == false) {  var stringValue = row.GetString(index); } 

Which, of course, gets very old, very fast.  I use an extension method to solve this problem:

public static TValue GetValueOrDefault<TValue>(this DbDataReader row, int ordinal, TValue defaultValue)
{
  if (row.IsDBNull(ordinal))
    {
      return defaultValue;
    }
    else
    {
      var value = (TValue)row.GetValue(ordinal);
      return value;
     }
  }

You will note that you must use the method IsDbNull (or a variation thereof, such as Convert.IsDbNull, or comparing the reference with DbNull.Value)

You may have written this kind of code yourself and never paused to wonder why it is that we have two concepts of null.  Why shouldn’t GetValue(index) simply return null?

Well, the answer is pretty simple, actually.  For the sake of argument, let’s consider a very simple table in your RDBMS of choice, with a single nullable int column.  (That’s a 32 bit signed integer, generally).

Suppose the table has only one row.  So we write a query to return it, and we use ADO.NET’s ExecuteScalar() method.

In the world without DbNull, ExecuteScalar would return one of two things: it would either return the value from the database, or it would return null.

The problem is that there are actually 3 possibitiles for our query.

The first is that our single column of our single row has an integer value in it, like 10.  When we execute ExecuteScalar on such a table, we get 10.

The second is that our single column of our single row has a null value.  When we execute ExecuteScalar on such a table, we get DbNull.Value.

The third is that our table doesn’t have any rows at all.  If that is the case, then ExecuteScalar returns null, as in .NET’s null.

It is easy to incorrectly mix-and-match DbNull and actual null when you’re working with ADO.NET and you have tables with nullable columns.

The easiest way to think about it is simply that a null value in a row’s column is a value – that value happens to be DbNull.  When there are actually no values (e.g., your query returns 0 rows), then you would get null results.

Note that it is not possible to get acutal null when you call GetValue() on a DbDataReader.  You will always either get a value or DbNull.  The only time you can really get actual null is ExecuteScalar.  (There are probably others, but you’ll know them when you see them as long as you’re keeping these facts in mind).

How to Start Running

I mean actually, physically running. No, not software (this time).

I first got into running about 5 years ago as a means to shape up for my wedding. After all, if you’re going to spend several hundred dollars for a professional photographer you should probably look good. In fact, I would guess that the majority of both men and women look the best they ever will on their wedding day(s).

Since the baby was born a little over a year ago, my recreational running had ceased more or less entirely. I probably gained 5 pounds just in the few days I was in the hospital with my wife eating nothing but fried hospital cafeteria food. You’d think a hospital would serve healthy (and only healthy food), but you’d be wrong. After that, our eating habits weren’t as good as they used to – my wife did not have the energy to prepare healty home-cooked meals every night like we used to have, so we ate more freezer food and we ate takeout more. Healthy lunches were replaced with fast snacks. Before I knew it I had gained somewhere between 30 and 40 pounds, and my clothes were starting not to fit. I went from a 34 to a 38 waist. In perspective, the last time I wore something less than a 34, I was in middle school.

My unofficial New Year’s Resolution has been to put a dent in that figure literally and figuratively.

It’s important to note that I am not an athlete by any stretch of the imagination. If you, like many software developers out there, are more or less a sedentary couch potato, well, that’s what I was, and by nature, that is what I am. If it weren’t for the inevitable weight gain (and health problems that result), I would be perfectly content never to exercise. I didn’t play any sports in high school. Even when I was under 18% body fat I still weighed 30 pounds more than the average person for my height. I am not a natural runner.

But I did study anthropology in college. Specifically biological anthropology. I read the papers on this: the human body evolved to run. It is not a coincidence that our bodies have more sweat glands than any other mammal and it’s not a coincidence that we lost our body hair (which would have greatly helped us when we migrated up north). We aren’t just evolved to run, we’re evolved to run long distances efficiently. The authors of this research hypothesize that we used this ability to hunt. A gazelle can run faster than we can, bu we can run fast enough for much longer. Eventually the gazelle is exhausted and cannot run any more, but we can, and we catch up. That’s when we get ‘em.

So, I figured if there’s one activity that we are all meant to do, it’s run.

When I first started running, I approached it scientifically and algorithmically, and I developed a set of rules and procedures for running. Now that I am beginning to do it again I am dusting off those same rules and they work for me now just as they did then.

Here they are:

1. Start slow.

This one sounds obvious but I personally struggled with this. Every person has a natural jogging gait that feels right for them, and it’s based mostly on the length of their legs. For me, this natural jogging pace is around 6.2 miles per hour (which is about a 9:40/mile, give or take). This may sound slow, and indeed it is slow after you’ve been running for a while, but if you haven’t run in a long time – or ever – this pace can be way too fast, especially if you’re overweight. For me, it feels unnatural to run any slower than this, but when you first get into it after a long hiatus, you really need to, because of rule #2-

2. Focus on distance, not on speed.

Remember the gazelle hunt? We’re good at running long, not fast. Even the fastest sprinters in the world only 27 miles per hour, which while almost twice as fast as I’ve ever run a mile, they only did it for 100 meters. Personally, I have a tendency to do the opposite. I want to run faster rather than longer, and I have to remind myself of this rule. The reason is pretty simple: running faster does make your body work harder, in that your muscles are contracting faster and your heart beats faster, but the human body, at least my human body, tires exponentially. At my peak I could pretty handily run 6 miles at between 6.5 and 7 miles per hour, but I could only run 1 mile at 9 miles per hour before I was utterly exhausted. And, although I am not an expert on the human body, I trust that the people who designed treadmills are. When you run slow (relatively) for an hour on a treadmill, you burn well over a thousand calories (according to them). But when you run fast for 7 minutes, you burn only a couple of hundred. When I was really into it, once a week I would do “speed work” – namely I would run a mile as fast as I could, then rest a while (walk), then run as fast as I could again until I got winded, et cetera. Unfortunately by the time I started doing this it was a couple of weeks before the wedding and by the time I got back from my honeymoon I had already regressed and started to run less, so I didn’t follow through with that program long enough to say whether it had any effect.

If you only have 15 minutes to work out it’s better to run that fast mile than to run a your normal pace if weight loss is your goal, but if you really care about it, you’ll need to find the time.

3. Wear good shoes.

Most people who start running just run in whatever they have. For me, this was an old pair of sneakers that I used for home improvement projects. I used them for about 6 weeks until I realized that I kept coming home with odd foot pains and leg pains and other nonsense. I went out and bought a $70 pair of running shoes and it made all the difference in the world. I could run much longer without any kind of weird pain (I’m not talking about muscle soreness, which you should grow to love). Running shoes last between 250 and 500 miles depending on the surface you use (closer to 250 on pavement, closer to 500 on treadmills). That may seem like an enormous amount, but as a total amateur who was just doing it to get into shape, I ran close to 400 miles in 2009.

5. Listen to yor body.

There’s a fine line between not pushing hard enough and pushing too hard. When I first started running and I finally made a breakthrough in distance and could actually start to put some real numbers up – 4, 5, 6 miles – I decided one day I would see how far I could go. I really wanted to go for 8 miles. So I did.

At around 6.5 I started to get a new sensation in my right ankle. Mild pain. Not enough to cause me to stop running, but it was new and I thought to myself, “well, this is a new sensation!” By 7 it was getting a little worse, and by 7.5 it was definitely noticeable. I pushed to 8. My ankle hurt for hours afterwards.

When I say my ankle, what I really mean is the muscle on the inside of your leg running from your angle to midway up your calf, which I now know is my soleus muscle. I know this because after this 8 mile endeavor, my ankle would hurt in the same place even after running average distances (2, 3 miles). Your soleus muscle is going to become noticable when you start running. I never knew it existed until it went from barely there to very noticable. The buildup of this muscle changed the shape of my leg when I started running.

After a week or two, my soleus would start hurting instantly and wouldn’t stop until I stopped running. It took something like 4 months before I could run even short distances again.

If I had only stopped running when it started aching at 6.5, I would not have lost 4 months of running time, but I had a goal and I wanted to achieve it.

That’s going too far.

Going not far enough is when you stop running before your CV system physically prevents you from doing it. I tend to jog or run until my body physically forces me to stop – when I am simply breathing too hard and not replenishing oxygen fast enough for my legs to keep moving. For me, if I didn’t push myself to that point, I wouldn’t progress.

6. Keep your mind on progression

The only way I can stay motivated running is by constantly setting new goals and working to achieve them. At first, my goals were all distance related. The first day after not running in a long time (more than 6 months), I can’t do more than half a mile before becoming totally winded. Usually, by the 3rd run, I can do a mile before I am beat. From then, progression starts to slow down. I would increase my target distances by a quarter of a mile, which at a 6 mile per hour pace is another 2 minutes 30 until I could do a 5k (that’s 3.1 miles).

Once I could do a 5k I focussed for a while on just repeating. If I’m running 4 days a week, can I do that 5k all 4 days? (You may find that you can’t – don’t get discouraged!). It took me a month or two before I could consistently run a 5k at any pace. Once I did that, I began focussing on improving my speed.

The first time I ran a 5k, it took me well over 30 minutes. By the time my wedding had come around, my fastest speed was around 23 minutes. Still not good enough for the U.S. Marines (which was my ultimate goal), but still not bad – that’s faster than 8 minutes per mile. Considering the fastest I ever ran was slightly under 6 minutes (in high school, and that was only 1 mile), that was not bad.

The easiest way to track this progress is with a treadmill, but treadmill running has its cons for two reasons. First, it’s boring as sin. Even with music and a little built in TV screen, it’s boring. You can only stare at the same gym floor day in and day out for so long. Second, it’s deceptive.

I had no problems running 6 miles on a treadmill, but running even 3 miles on an actual road was a huge challenge. They say you’re supposed to put the treadmill incline up to 1% or 1.5% to simulate actual road conditions but that doesn’t help, and here’s why. On a treadmill, all you really need to do is lift your feet off the belt so that it doesn’t drag you to the back and throw you off. You don’t need to provide any actual forward momentum – you only need to lift your own weight vertically. But when you run on a road, you need to provide both forward and vertical motion, which greatly increases the amount of energy it requires. The other two factors that are important are wind speed (barely an issue on a treadmill unless you’re running into a very large fan) and the fact that unless you’re running on an actual track (like at a high school), you will never find a truly flat surface to run on. I find that the downhills don’t make up for the uphills, contrary to what you might believe.

Still, treadmills are great to gauge your progress because they’re constant and objective. They also force you to keep pace, which is good for endurance – it’s very easy to slow down while running on a road. The other big advantage to treadmills is that they’re easier on your body, mainly your knees. If you have access to one, they’re not a bad choice once in a while.

7. Focus on how you feel, not how you look

I tend to embarass easily, so it wasn’t easy for me to start running in a gym or on the road. I would see other overweight middle aged people in their sweatsuits running and think to myself, “God, they look terrible.” So naturally I was hesitant to put myself in their company for fear that other people would think that of me. This was also a factor, especially early on, which led me to run faster instead of longer – I thought I would look less lame if I were at least running fast. So many people (especially older people) shuffle along at a pace barely faster than walking. I didn’t want to be that guy.

But those are the kinds of demons that keep people fat. I would constantly remind myself that if anyone thought I looked like a goofy fat guy running, at least I was running, which is a lot more than the vast majority of overweight people do (thus, why they are overweight).

So, it took me a long time before I really stopped caring about that and just ran however fast I felt I could run and didn’t care who saw me or what they thought.

8. Don’t give in to excuses

There are a million reasons why you shouldn’t run today, and if you still can’t find one, you’ll find that you naturally invent them, even subconsciously. With running, consistency is the key. For me, if I take even a 2 week break, I lose, in terms of endurance, speed, or both, far more than I gain in 2 weeks or running. This goes back to simple facts of evolution. We did not evolve in a world where food is abundant like it is today, which means our bodies are constantly taking the path of least resistance because food might not be forthcoming. If you don’t use your muscles, your body stops maintaining them instantly because muscles cost calories and calories are biologically precious.

If you give in to excuses, you will never be a successful runner and you will probably not achieve any real weight loss (if that’s your goal). You need to come up with a schedule and stick with it. For me, I run 3-4 times per week. Rest is important, but running is more important. If you can force yourself to run 7 days a week, you’ve already succeeded; now force yourself to rest 3 days a week.

Remember, if you don’t rest, you will hurt yourself.

That’s all for now. Good luck!

Categories: Off-topic Tags:

Tip of the Day: MissingMethodExceptions

February 27, 2012 Leave a comment

Periodically I encounter the dreaded MissingMethodException or something similiar (e.g. MethodAccessException) and for a long time, every time this happened, I would scratch my head like a caveman for 20 minutes (or longer).

How can my code compile but then the CLR can’t find my methods?

Well, in all cases – that’s right, 100% of cases – the underlying cause is that the compiler and your running process are not loading the same assembly.

For me, this is nearly always because an old copy of my assembly is in the GAC.

How does that happen? Well, generally, it’s with the gacutil command. Sometimes, my projects have to install assemblies into the GAC. For example, when you’re doing a SharePoint integration, you pretty much have to install your assemblies there.

Suppose I have project A that I released last year and I am working on project B currently. Project B contains a couple of assemblies that I used in project A, but I’ve updated them and added some new functionality. When I compile project B, it uses the new assemblies that are being built in my bin\debug directories in order to resolve symbols (like methods). But when I run project B’s unit tests, the CLR will look for assemblies in the GAC first. Usually it won’t find them.

But then I get a support call. Some customer had the nerve to try to use that product I released year and they have the further gall to insinuate that I might have a bug in my code. A bug? ME? Surely you jest!

But ultimately this means I dust off project A, compile it, and then try to reproduce the customer issue and in the infintesimally small probability that there is in fact a defect that needs fixing, produce a patch.

But my postbuild events in project A shove everything in the GAC so that SharePoint can find them. When I open up project B and resume work, I can compile all day long but yet when I run it, the new methods that I added since project A was released are mysteriously not there and are throwing MissingMethodExceptions.

That’s because they’re loading project A’s old version in the GAC. The easiest way to determine that this is what’s going on is by looking at Visual Studio’s Output window during a debugging session. Look for your assembly name, and then look for its path:

Lo and behold, that looks like the GAC to me.

The simple solution is to scrub those assemblies out of your GAC and those heinous exceptions should go away.

There’s another way to immunize yourself from this problem and that is to bump the assembly version of your component when you start a new project, but that has a lot of practical drawbacks. Namely, suppose you have a component called Library that you put into the “shared” part of TFS, so that solutions for different products can reference this component. When you build product A version 1.0 and it references Library version 1.0, everybody is happy. When you start building product B and decide to bump Library to version 2.0, the first time you need to patch product A (which now references Library version 2.0 because you bumped its version when you started product B), you will suddenly find that your patches don’t work, because the new patch assemblies that you build are trying to find Library 2.0, so unless you also deploy Library 2.0 with your patch, nothing works. This can get very frustrating very quickly.

In fact it’s a subject of a future post, which will be several thousand words.

The bottom line is: check your GAC!

Categories: C# Tags: ,

Tip of the Day: Self-Document with Named Parameters

February 14, 2012 Leave a comment

In .NET 4, the compiler now supports out-of-order parameter lists when invoking a method, like this:

void F(int a, int b, int c) { ... }
F(c: 5, a: 1, b: 3);

This type of invocation is very hard to read and generally frowned upon. It would be considered poor usage because arbitrarily specifying the parameters out of order serves no actual purpose.

The reason that this feature came about is thanks to another .NET 4 feature: default parameters. Consider this declaration:

Document Open(
	Object FileName,
	bool ConfirmConversions,
	bool ReadOnly,
	bool AddToRecentFiles,
	string PasswordDocument,
	string PasswordTemplate,
	bool Revert,
	string WritePasswordDocument,
	string WritePasswordTemplate,
	DocumentFormat Format,
	Encoding Encoding,
	bool Visible,
	bool OpenAndRepair,
	DocumentDirections DocumentDirection,
	bool NoEncodingDialog,
	XTransform XMLTransform)

This is a modified version of the method used to open Word documents from the Office interop API. The original method passes all of those parameters as ref object (making it even harder to use!). Since we’re talking about default parameters here, I changed them to make more sense; ref parameters cannot have default values for (hopefully) obvious reasons.

It would be extremely tedious to have to spell out every single option explicitly every time you open a Word document. In fact, it would look like this:

var doc = Open("MyDocument.docx", true, false, true, string.Empty, string.Empty,
     false, string.Empty, string.Empty, DocumentFormat.Docx, Encoding.UTF8,
     true, true, DocumentDirections.InOut, true, null);

Imagine dealing with an invocation like that and not having IntelliSense. Often, I wonder if IntelliSense didn’t come about as a necessity because of APIs like this one. But, I digress.

One of the major flaws of an API like this one are methods to open documents with parameters named “visible”. What does that mean? Should I pass true or false? I have no idea. And with no default value for guidance, I just have to guess. Or hope that the parameter is adequately documented. Unfortunately, for some of these things, nobody knows what they do.

With default parameters we could instead change the declaration like this:

Document Open(
	Object FileName,
	bool ConfirmConversions = true,
	bool ReadOnly = false,
	bool AddToRecentFiles = true,
	string PasswordDocument = "",
	string PasswordTemplate = "",
	bool Revert = true,
	string WritePasswordDocument = "",
	string WritePasswordTemplate = "",
	DocumentFormat Format = DocumentFormat.Docx,
	Encoding Encoding = null,
	bool Visible = true,
	bool OpenAndRepair = true,
	DocumentDirections DocumentDirection = DocumentDirections.InOut,
	bool NoEncodingDialog = true,
	XTransform XMLTransform = null)

Thus making a possible invocation this simple:

var doc = Open("MyDocument.docx");

With all other parameters having default values which will work 90% of the time, this API just got 1000% better. But here’s the catch, and the reason that named, out-of-order parameter lists were introduced to marry this feature: what if I still want the convenience of using default values for most of the parameters, but not all of the parameters?

Named parameters in invocations are the answer. You can do this:

var doc = Open("MyDocument.docx", ReadOnly: true, Encoding: Encoding.Unicode);

All other parameters retain their default values. Pretty awesome, huh?

As a general rule, always specify named parameters in the order they appear in the declaration. In the above example, I specified values for ReadOnly and Encoding parameters, and in the method declaration, ReadOnly comes before Encoding, so in my invocation, ReadOnly also comes before Encoding. There is no technical reason for this. The following invocation is just as legal:

var doc = Open("MyDocument.docx", Encoding: Encoding.Unicode, ReadOnly: true);

But this is confusing. Since the choice of how to order the parameters is arbitrary, the superior choice is to match their order in the method declaration.

Now, on to the subject of this post: using named parameter invocation to self document. Consider these two methods for a dictionary class:

void Add(TKey key, TValue value);
void Add(IEnumerable<KeyValuePair<TKey,TValue>> values, bool overwrite);

The first method adds a single key/value pair to the dictionary, whereas the second method adds an enumeration of KeyValuePair structures and takes a parameter called overwrite, which will cause an existing key’s value to be overwritten by the value provided for it in the first parameter.

Now consider this invocation:

var kvps = new List<KeyValuePair<string,int>>();
myDictionary.Add(kvps, true);

From the context of this method call it’s pretty easy to deduce which overload we’re calling. But context this good is very rare in actual production code. It’s more likely that kvps as declared above would come from a method parameter, or the result of another method call, or a property value. Imagine that it came from a method call. The code might be written this way:

myDictionary.Add(PoorlyNamedMethod(), true);

Given only that line, which overload of myDictionary is being called? Well, you can’t be 100% sure unless you IntelliSense over it. That takes time and mouse gestures. Your time on Earth as a programmer is measured in mouse and keyboard gestures, so the more you spare on deciphering invocations devoid of any documenting context, the more you have to do productive things like write new code.

If you’re smarter than the average bear you may have noticed that my declarations used generic type paramters. If you knew that myDictionary was declared to have a TValue type other than bool, you’d know instantly that this invocation couldn’t possibly be the first one – Add(TKey,TValue) – because we’re passing true as the second parameter. But again, that’s context that you may or may not have depending on how you’re looking at this invocation.

The generic type checking that you, the human, does when reading this method invocation will also fail you if TKey and TValue are both specified as System.Object, which in dictionaries happens way more often than it should. Given TKey and TValue of System.Object, these two methods are almost ambiguous.

I say almost because the compiler will choose one or the other. It will choose the second overload, which takes an IEnumerable and a boolean, if and only if it knows for certain the type of one of the parameters. If it can’t tell for sure, it will pick the first one. Here’s an example:

object whoKnows = PoorlyNamedMethod();
object badDeclaredType = true;
myDictionary.Add(whoKnows, badDeclaredType);

The compiler will invoke the first overload, which takes a TKey and a TValue, because you’re passing two values to the method whose declared types are both System.Object. There’s a good chance that’s not what you want.

I am in the practice of writing an invocation of an overload like this:

var firstParamter = ...
var secondParameter = ...
myDictionary.Add(values: firstParameter, overwrite: secondParameter);

By adding the parameter name even though it is not necessary, I have documented this invocation in-line. Anyone who sees this code will not be confused which overload is being invoked here, regardless of any context associated with the paramters themselves. That’s why, in the above snippet, I used elipses on the right hand of the parameter assignments and used the var keyword. Even without knowing how those parameters get their values (and therefore being able to infer, or at least guess, their declared types), you can still know which method overload is being used in the invocation.

Another handy feature of this practice is that it immunizes me from the compiler automatically picking the wrong overload. Imagine that firstParameter and secondParameter both resolve to a declared type of System.Object. If I didn’t name the parameters in my invocation, the compiler would automatically pick the first overload. But because I named my parameters in my invocation, the compiler can pick the correct overload.

It is possible for even this to fail; if both overloads use the same two names for its two parameters then the compiler will still pick the first overload because it more accurately matches the declared types of the references that I am passing in my invocation. However, if you’ve created two overloads that do two drastically different things, have different types, but have the same names, your problems run far deeper than worrying about self-documenting your method invocations and you should head straight back to Design 101.

There is, of course, one danger associated with this practice that should not be overlooked. By explicitly naming the parameters in invocations, you run the risk of compiler errors if you rename a method parameter in the method declaration without using Visual Studio’s refactor->rename apparatus (which does correctly update all invocations). However, I consider this to be a benefit rather than a drawback, for an important reason.

A lot of the time, renaming a parameter is a benign change, possibly one as simple as correcting a spelling mistake, and in those situations, this practice can become a hassle. However, some of the time, renaming a parameter actually affects its semantics because you’re changing what the parameter is supposed to represent. In this case, the fact that the compiler automatically tags every single one of your invocations because the parameter you explicitly named does not exist helps you because it automatically generates a todo list of all the invocations you need to check to make sure you’re still using the method correctly.

I don’t explicitly spell out the parameter names always. I only do it when I feel it is important, such as when there are overloads with the same number of parameters and there is a chance of ambiguity – the most common reason is as I illustrated with the examples above, namely paramters with a generic type that could make the overloads dangerously ambiguous if the right type parameters are chosen.

As a general rule, if there’s any question in your mind that it is not clear which overload of a method you are invoking within 2 seconds of reading it, specify the parameters by name so that there is no confusion.

Categories: C#, Style
Follow

Get every new post delivered to your Inbox.