Monday, April 27, 2009

What Java Should Learn From Ruby and Rails

I hate to say this, but Java has gotten a bad rap among some programmers. It's deserved at times, but it's worth noting that it's often more about the people behind Java than the technology itself.

Here's an example. Clearly, this is horribly wrong. Sun refuses to fix a bug because people have worked around it, thus it must be maintained--to the detriment of all programmers to follow. This gradual growth of cruft is one thing that has driven people away from Java.

Why is this important to me? Well, it's mostly important because I think that it prevents good comparisons of technologies. It sends the wrong message when people move to Rails because Java is riddled with unnecessary complexity and buggy libraries. While it is some compliment that it says that Ruby is easy to use compared to Java, it doesn't emphasize what makes Ruby such a different way of doing things.

Real improvements like reduction in code through expressiveness, flexible extension of types, the block syntax, et cetera get lost in the shuffle. What's worse, it doesn't send the message that Ruby can be used to solve industrial problems. The eternal mantra of the Ruby naysayers--that it doesn't scale--is at least partially reinforced when people flock to it for simple projects.

For those who have built scalable applications on top of Ruby, this is laughable. Yet the point remains that dynamic languages in general and Ruby in particular have a great deal of value to deliver to real, serious software development. Even though it's fun to play with, it's not just a toy.

Similarly, when people legitimately work on projects like JRuby that work gets ignored by people who just don't want to deal with the cruft. While that's a great reason to avoid a class-library, it's a horrible way to allocate our resources as an industry. In a world where languages live or die by their "standard libraries", this just makes people's lives harder and guarantees that we spend more energy to move the same distance.

I can only hope that Oracle's recent acquisition of Sun might change this attitude and position Java as a more useful technology. It's a shame to see all of the hard work people have put into the JVM be squandered because some people are afraid to break an API. If anything, this could have been turned into an opportunity for Java to offer versioned APIs, but instead it was met with duplication of functionality and degradation of the clarity of their class library.

I feel that this puts work on Rails 3 in an important light. The Merb project was about a lot of things, but one of its most important aspects was that the people behind it continued the march towards a better product. The Rails core team did a great thing by welcoming the best of Merb into Rails. It's letting us build a better future, and it's showing that we can maintain compatibility and still move forward.

We just need to always be mindful that the day that we stop improving Rails is the day that someone starts writing its replacement. More than anything, we need to insist that there is no excuse for not fixing a bug.

Labels: , , , , ,

Saturday, April 25, 2009

Random MacOSX TCP Behavior

TCP is an interesting protocol. It's interesting mostly because it is less a specification of bytes and more a specification of behavior. Most TCP implementations have developed from the sort of arcane knowledge that you can only amass after trying to implement such a basic protocol over a long period of time.

By the time TCP (and indeed the whole IP stack) has made it to my desktop, it had been on a long journey. MacOSX's network has a storied pedigree that goes back deep into the iterations of the original BSD Unix. There is, as they say, heavy voodoo.

Today I hit a corner case that only made sense after some pretty serious debugging. I'm sharing it here with the hopes that it may save you the headache.

The backbone of the Internet is designed to just have these routes disappear for a while. You get a few ICMP messages back if you're very lucky. Otherwise, your packets might just disappear.

One of the nice features of TCP is that it's incredibly resilient to network links just disappearing. This is no problem for it. In fact, if you're not sending any traffic, you may not even notice that you're down. Failures being invisible is a nice feature when you're not doing anything.

That said, there are protocols that really want to know when they're down. XMPP is one of them. For protocols like XMPP, there is a pretty standard procedure of having some sort of "keep-alive" data that you occasionally send. Since XMPP data streams are just XML documents, most XMPP implementations just send a few whitespace bytes in between stanzas when idle.

Today, I was debugging an XMPP connection over a 3G modem. This manifests itself under a ppp0 link in MacOSX. While I wasn't thinking about it, I walked around with my laptop. One spot caused the phone connection to lose signal and it failed. When I noticed, I reconnected the modem. This provoked some very interesting behavior (or rather lack of behavior) from the BSD IP stack.

Normally, when some fundamental aspect of network changes, there is some device that will interrupt your connection. For example, if my XMPP server had lost power, when it recovered the keepalive packets would have triggered a TCP reset, which breaks the connection. Similarly, if I remove an IP address from a Linux machine, connections on that IP are interrupted. It just so happens that in this case, the IP stack did NOT break the connections.

In fact, it just silently ate any data that the connections attempted to send. So the keepalives completely failed to kill the dead connection. It took almost fifteen minutes until some sort of behavior that caused the IP stack to notice that the connection should be killed.

It took a while to track down what was happening, but apparently the connections were maintained (so says netstat) and the sent packets just disappeared without any sort of sending error! Very weird behavior triggered by an odd corner case. I've also discovered that this appears to also happen when you close your laptop and go to another wireless access point. This is ugly for my use-case, as I want the agent on the laptop to reconnect when it has a new address. If anyone has a good way to detect this in a portable way (i.e. not plugging into Apple's NetKit watchers), please let me know.

While I find this mildly annoying, I have to admit that I can't fault TCP. If the packets are just disappearing, the best behavior is to just resend and keep waiting for the connection to come back up for a reasonably long time out. This is exactly what happened. Instead, I hope that Apple will eventually do what Linux does and push an error into the socket when it tries to send from an address that isn't valid for that machine anymore.

Labels: , , , ,

Thursday, April 23, 2009

One Year Update


Who gave this guy permission to grow?  I sure didn't!  I guess it's okay, though...  :)

Here are some good pics grandma Vantuyl got of the birthday boy:





One down, seventeen to go.  ;P

Labels:

On Tail Recursion Elimination


There was a bit of a controversial post on Guido van Rossum's blog that I thought deserved a little comment.

To sum up Guido's argument, he doesn't feel like implementing Tail Recursion Elimination (henceforth referred to as TRE) in Python because:

  1. Stack traces help debug, TRE makes them useless
  2. TRE Is Not An Optimization (it creates a class of code that explodes without it)
  3. Guido does not subscribe to the "Recursion is the basis of all programming" idea
  4. Due to Python's highly dynamic namespaces, it's very nontrivial to know if a call is a recursion.
The funny thing is that, even though I am a big supporter of TRE, I actually agree with all of these points.  Taking them in turn:

Stack traces are critical to debugging code.  Ruby and Erlang both started out with uglier stack traces than they have today.  They were adjusted because they are important.  Similarly, the most difficult Python frameworks have always been that way due to their affect on stack traces.  Zope and Twisted--I mean you.  Zope would create absolutely titanic stack traces.  Twisted, on the other hand, creates virtually no useful stack traces (as a consequence of the deferred model).

While I find this to be telling, I would caution against considering this in the context of TRE.  Why?  Simple.  Some styles of programming don't generate good stack traces.  As an example, consider the construct of a loop.  Loops don't generate stack traces.  When code explodes in a loop, you have no idea what state led-up to that explosion.  Does that mean that loops are "bad"?  No.  It simply means that what we want is not a stack-trace, but rather information about what led up to the failure.

For some classes of code, stack-traces provide that fairly economically.  The catch is that other types of code don't benefit from them at all.  Guido's statement that you can trivially rewrite any TRE function as a loop is very practical, but it actually detracts from his point.  It's pretty clear that the type of code that benefits from tail recursion doesn't benefit from stack-traces.  This means that effectively he's deciding that Python just doesn't tailor to a certain type of code.  I'm mildly opposed to this and I think there's a good case for a type of TRE in Python.

Taking the second point, Guido has this right.  The fact that there is code that would explode without TRE is a good reason not to just add it to the language.  That said, it doesn't mean that we shouldn't do it.  In fact, it is largely an argument to do so in the main Python runtime.  If anything, fragmenting the Python codebase is negative.  However, TRE tailors to a certain type of code, and the fragmenting effect of that reality means that he is really choosing between fragmenting the Python codebase and driving people to other languages that handle the use-case better.  He might be okay with that, but I don't really believe it to be necessary.

Taking the third point, talking about the "basis of all programming" is a little tough.  From a very high level, design a language without recursion.  What's left that defines programming?  In all seriousness, I think that the statement that "recursion" is the basis for all programming has a strong case assuming that you mean "recursion" in terms of the function call (or subroutine, or whatever you call it).

Without functions, you don't really have anything but a long list of instructions.  All control statements are pretty much functions or branches.  Branches without the concept of subroutine are effectively goto statements, which have always been as controversial as they can be useful.  The point being that TRE allows you to turn one of the most fundamental constructs of programming and use it in a way that increases its utility.  If recursion isn't the basis of all programming, it's still pretty fundamental.

Addressing the final point, I recall part of the Zen of Python, "Explicit is better than implicit."  The statement that Python has difficulty (at compile time) determining whether a call is recursion is a good one.  There's a really simple solution to this.  Add a keyword to make tail-recursion explicit.  Perhaps we could propose abusing the pass keyword.  Basically, where pass is used now, consider it to mean "pass None".  The idea is that it is either passed a function call or None to indicate tail recursion.  The nice bit of magic here is that it reuses an existing keyword compatibly, clearly indicates what's happening, easily translates to bytecode, and works really well for TRE's most powerful use-case (coroutines).

So, as an example, consider the following code:

class cli:
  def start(self):
    print "Welcome to the CLI"
    pass self.loop()

  def main(self):
    print_prompt()
    cmd,args = parse_line()
    cmd = getattr(self,'cmd_' + cmd,None)
    if cmd is not None and callable(cmd):
      pass cmd(*args)
    else:
      print "Command not found"
    pass self.start()

  def cmd_help(self):
    ...
    pass self.main()

...

The above code may seem slightly convoluted, but it's actually interesting because it tackles one use-case that Python is bad at.  Admittedly, the above could have been factored as a loop, but the more steps you insert into the process, the uglier it gets.  What is the purpose of continually having to implement some sort of mini-scheduler in a  loop?  Why have all this convoluted logic that has to handle a random series of states and detect an "end-state".  This is just simpler, and more importantly "Beautiful is better than ugly."

Critics might say that loops give you an easier understanding of the top-level flow of the program.  I would contend that any problem that needs the above technique would instead result in a loop so complex that it would generally serve to reduce the understanding of the problem--replete with various ifs, returns, whiles, and breaks.

In assembly, C, or any number of other systems, this would have been implemented with goto statements.  The problem with goto statements is that they very easily obscure the boundaries of single bits of code and can poorly document the intent of the goto.  This is a nice abstraction that allows the use of goto-like functionality without losing information about what's supposed to be going on.  There comes a point where the above would get factored into a general state machine implementation, but it's not really necessary.

Another thing to notice about the above is that it gives you a really good point for debugging.  Any Python implementation of the above type of problem would not have stack-traces to debug.  However, the pass statement gives you a place to track behavior.  In a debugger it's pretty clear what's going on.  At the beginning of each iteration, the function arguments clearly define the state of execution.

This is something that is much more useful that implementing this with a loop.  With a loop, the state is invisibly contained in the current binding namespace.  Temporary values easily leak.  There's not a clear point where the state of the iteration is distilled.  This is why people want TRE.

There are just a lot of situations where people would benefit from greater control over the constructs of flow control.  Right now we've already abrogated the model of nested-call-return with generators.  Most control structures that are difficult in Python could benefit from further disentangling the assumptions of function calls.

Someday, I would love to see the work that was started with generators end with easy coroutines, promises / lazy evaluation, message passing, and smarter code replacement.  Every language has idioms for what a recursion or evaluation means.  We have the opportunity to make this handling explicit, and I suspect that we would benefit from it.

Labels: ,

Saturday, April 18, 2009

Making Sense of Erlang's Event Tracer


Let's face it people, while Erlang is wonderful in a number of ways documentation is not one of them.  I mean, it's there, and it's better than some other projects.  However, there are incredibly useful things staring you in the face that are just impenetrably difficult.  The et module is one of those.

Fear not, gentle reader.  After more time than I'd care to admit, I've managed to figure out roughly how the pieces fit together.  As usual, they exhibit the combination of weirdness and inspiration that have driven us to embrace Erlang.  Without futher ado, let me take a stab at laying out exactly how to actually use it.

Four Modules

The event tracer framework is made up of four modules:
  • et
  • et_collector
  • et_viewer
  • et_selector
In addition, you'll probably want to familiarize yourself with the dbg module and possibly seq_trace module as well.

The Event Tracer Interface

This is perhaps the most confusing module.  It contains a function that you call to report arbitrary events to the tracing interface.  The method is named report_event.  There is also a humorous alias named phone_home.  Get it?  et:phone_home?  Yeah, it makes me sad, too.

Ostensibly, they're supposed to be called with options something like:
report_event(85,from,to,message,extra_stuff)
The number (in this case 85) is an integer from 1 to 100 that specifies the "detail level" of the message.  The higher the number, the more important it is.  This provides a crude form of priority filtering.  Avoid using 100, since it seems to disagree with being displayed in the viewer.

The from, to, and message variables are exactly what they sound like.  From and To are visualized as "lifelines", with the message passing from one to the other.  If from and to are the same value, then it is displayed next to the lifeline as an "action".  The extra_stuff value is simply data that you can attach that will be displayed with someone actually clicks on the action or message.

In a perfect world, this would be enough to start to get your hands direty.  We do not live in that perfect world.  Oddly enough, these functions do absolutely nothing.  Let me repeat that.  They do nothing, except return an ominous atom hopefully_traced.  This confused me at first, and let me down the long rabbit hole that is the event tracer.

Their purpose is to be traced.  Rather than having them do something, they're just there so that the system can notice when you call them (via tracing) and use the data accordingly.  While this is a great idea it has three faults.  The first is that it makes the process rather magical.  The second is that Erlang tracing is a seething pile of pain that involves reasonably complex knowledge of clever ports, tracing return formats, and specialized tracing MatchSpecs (which are really their own special kind of hell).  The third problem is that this conspires to make an incredibly useful and reasonably flashy feature inaccessible to users.

The Collector and Viewer

These two pieces work in concert.  Basically, the collector receives trace events and processes them.  The viewer interrogates the collector and displays a relatively nice, interactive representation of them.  As usual, it's in TK, which is a shame.

You might wonder why these aren't just one module.  It turns out that, in typical Erlang style, the collector is a generic full-fledged framework that allows processes to "subscribe" to the events that it collects.  This would probably be really useful if there were any other subscribers.  Someday I may write one to help debug some code, but for now just trust that it really is a pretty interesting architecture.

You also have the advantage that the viewer creates a collector for you.  With a few options and some debugging settings you can start collecting events.

The Selector

This is perhaps the most useful and frustrating module in the entirety of the et suite.  This is mostly because it isn't really mentioned in the User's Guide.  It turns out that the collector needs "filters" to convert the raw trace data into "events" that it can display.  The et_selector module provides the default filter and some API calls to manage the filter pattern.  Due to the architecture of the collector, this module is quite a bit of a bear.

Effectively, it's a mishmash of functions that achieve the following:
  • Convert Any Trace Message Into An Appropriate Event
  • Magically Notice Traces of the et Module and Make Appropriate Events
  • Carefully Prevent Translating A Message Twice
  • Manage A "Trace Pattern"
It turns out the there are a few strange interactions.  First of all, you still have to tell the system to trace the right things.  Secondly, without the correct incantation you will get infuriatingly strange "trace" events instead of the nice events you're reporting with et:report_event/5.  Finally, the documentation doesn't do a decent job of telling you what the trace pattern even is.

How To Put It Together

It turns out that the collector automatically registers itself to listen for debugging events, so all you have to do is enable them.

For those people who want to do general tracing, consult the dbg module on how to trace whatever you're interested in and let it work its magic.  In my case, I just wanted et:report_event/5 to work, so that's what I'll illustrate here.  I did the following:
  1. Create A Collector
  2. Create A Viewer (this can do step #1 for you)
  3. Turn On and Pare Down Debugging
The following module achieves this.
-module(trace_test).

-export([test/0]).

test() ->
  et_viewer:start([
    {title,"Coffee Order"},
    {trace_global,true},
    {trace_pattern,{et,max}},
    {max_actors,10}
  ]),
  dbg:p(all,call),
  dbg:tpl(et, report_event, 5, []),
  Drink = {drink,iced_chai_latte},
  Size = {size,grande},
  Milk = {milk,whole},
  Flavor = {flavor,vanilla},
  et:report_event(99,customer,barrista1,place_order,[Drink,Size,Milk,Flavor]),
  et:report_event(80,barrista1,register,enter_order,[Drink,Size,Flavor]),
  et:report_event(80,register,barrista1,give_total,"$5"),
  et:report_event(80,barrista1,barrista1,get_cup,[Drink,Size]),
  et:report_event(80,barrista1,barrista2,give_cup,[]),
  et:report_event(90,barrista1,customer,request_money,"$5"),
  et:report_event(90,customer,barrista1,pay_money,"$5"),
  et:report_event(80,barrista2,barrista2,get_chai_mix,[]),
  et:report_event(80,barrista2,barrista2,add_flavor,[Flavor]),
  et:report_event(80,barrista2,barrista2,add_milk,[Milk]),
  et:report_event(80,barrista2,barrista2,add_ice,[]),
  et:report_event(80,barrista2,barrista2,swirl,[]),
  et:report_event(80,barrista2,customer,give_tasty_beverage,[Drink,Size]),
  ok.
Running through the above, the most important points are:

  • Turn On Global Tracing (it doesn't work for me if I don't)
  • Set a Trace Pattern
  • Tell The Debugger to Trace Function Calls
  • Tell It Specifically To Trace The et:report_event/5 Function
The Aftermath

The most vexing part of figuring this out was the trace pattern of {et,max}.  The trace pattern is basically a tuple of a module and a detail level (either an integer or the atom max for full detail).

The specified module flows from your instantiation of the viewer, to the collector that it automatically creates, gets stashed in as the trace pattern, and eventually goes down into the bowels of the selector.  Tracking this down sucked.

This fact is documented basically nowhere.  It also doesn't make a lot of sense, either.  The module that you specify gets passed down (eventually) into et_selector's default filter.  The format of the report_event/5 function call is hardcoded in that filter.  That makes it very hard for me to imagine why you would ever specify another module.  I suppose you could replace the default filter, but it turns out that doing so is pretty obtuse to do even if you wanted to.  At any rate, just pass it through and it works.

That said, I think it was worth it.  If you compile the above example code and execute trace_test:test(), you'll see something like the following screenshot.  Source code here, screenshot here.

Beautiful.  I think I'm going to get a cup of coffee now.


Labels: , ,