Using story points or ideal days to measure productivity is a bad idea because it will lead the team to gradually inflate the meaning of a point–when trying to decide between calling something “two points” or “three points” it is clear they will round up if they are being evaluated on productivity as measured by the number of story points (or ideal days) finished per iteration.
My view is that points can be used as the best way to estimate and assess progress that we've ever had or they can be used as another weapon with which to hit the team. There are plenty of weapons with which you can hit your team. We don't need to ruin points by using them that way as well. Some teams have measured productivity with things like the number of backlog items delivered or the % of backlog items completed vs. planned into a sprint. Teams will alter their behavior on those as well though so they can be gamed and misleading. These metrics can be useful but only as part of a suite of metrics collected at the end of each iteration. If we rethink the question of “how do we measure productivity” we might get a better answer.
Suppose you own a sandwich shop and want to measure the productivity of the sandwich maker in the back. He responds to our metric by making as many sandwiches as he can–regardless of whether anyone ordered them! At the end of the day there will be 200 extra sandwiches to throw away. A better measure of him might be how quickly he makes any sandwich. So we’d measure the time from when the customer placed the order until the sandwich is put on a tray. Or for a more complete metric we may want to measure the time from when he receives an order until he is ready to receive the next order as this captures any cleanup or restart time. So, one measure we may want to include in our suite of metrics could be the responsiveness of the development organization. This would be measured in the same way as in the sandwich shop. Datestamp each product backlog item and track the time from when something enters the product backlog until it either (a) comes out of an iteration or (b) is delivered into the hands of customers. Choosing between (a) and (b) will largely be a matter of how often you ship software. Option (b) is a better measure of rapid delivery of customer value but is impractical in some cases. It would be a bit of a useless measure for the Microsoft Vista team, for example.