samskivert
samskivert » 2010 » January

January 30, 2010

Automatic Documentation Inference for Exceptions – Buse and Weimer

Summary
Presents a two phase technique for statically inferring the conditions under which exceptions are thrown by Java code and automatically generating Javadoc documentation for same. Performs interprocedural, context sensitive analysis. Evaluation on two-plus million lines of Java code.

Comments
This looks pretty useful. Indeed, the reason I read the paper is because it’s related to a project I’m working on now to use the results of numerous analyses to augment Javadoc documentation for poorly or undocumented libraries. We even talked about doing essentially this same analysis when brainstorming useful analyses at the start of the project. Peskily they don’t provide source code, which probably means their tool is mostly duct-tape and baling wire. We’re hoping to produce a generally usable tool but I’m already losing hope on that front as we try to integrate numerous existing analyses that are very duct-tape and baling wire themselves. I haven’t totally given up hope, and we’ll be releasing the source to our tool one way or the other. It may just come with a verbose instructions on how to get the underlying analyses to run on your source code and what to do if they keel over.

Source: PDF ACM

Posted to papers by mdb at 2:32 pm | Comments (0)       

Snugglebug: A Powerful Approach to Weakest Preconditions – Chandra, et al.

Summary
Presents an approach on interprocedural backwards propagation of weakest preconditions, specifically various techniques for improving the efficiency of such an analysis. Three main components: directed call graph construction, generalization of procedure summaries, and an ad-hoc logic simplifier. Detailed discussion of algorithm and the contributed techniques. Evaluation performed on five subject programs: ant, antlr, batik, tomcat and eclipse.ui. Individual contribution of each technique evaluated.

Comments
This is a pretty nuts and bolts paper about efficiently computing weakest preconditions (WP), which tell you the range of possible program states that will lead you to a particular target statement. They evaluate their tool by taking bug reports for existing Java programs and determining whether they can compute conditions that will trigger them. This isn’t a super compelling motivation in my opinion, but there are lots of other reasons why WP are useful, so I’m not going to complain about that.

The tricky thing about computing WP in an OO language like Java is that any time you see a method call on an interface or non-final class, you have to consider all possible implementations of that method. This blows up pretty fast and makes doing analysis on large programs infeasible. The clever idea introduced here is to delay that analysis and keep propagating backwards, looking for constraints on the actual class that will be used for that method call. If you can find such a constraint (like you see the variable being assigned to ArrayList) then you know exactly which method you need to analyze and you can save a ton of work. In their evaluation, they ended up looking at at most 93 call graphs whereas more traditional “consider everything” techniques would have had to look at upwards of 100,000 call graphs. If you happen to find yourself needing WP for Java, you’ll definitely want to try their tool.

Source: PDF ACM

Posted to papers by mdb at 11:45 am | Comments (0)       

Static Extraction of Sound Hierarchical Runtime Object Graphs – Abi-Antoun and Aldrich

Summary
Presents an approach for constructing a runtime object graph (ROG) from an ownership domain type annotated source program. Shows the soundness of the extraction process in that it reflects all possible runtime object relations. Extraction process: source → abstract graph → runtime graph → ownership object graph. Formalizes extraction process as a rewrite rule system. Provides details of abstraction by types (performed in addition to abstraction by ownership annotations), limitations and heuristics to improve architectural relevance of extraction.

Comments
This paper provides some details on the extraction process used in the larger work I summarized a while back. I have since dug into some of the authors’ earlier work, which I actually find more compelling.

I have a growing feeling that full ownership annotations are too heavy weight to provide utility outside special circumstances where one wants to strongly verify properties of their code. The ArchJava work tries to represent architectural constraints more directly through components, ports and connections, rather than trying to piggy back on manually specified ownership annotation. Inference of ownership information is also not a way out, because there are many possible ownership structures and without guidance from the programmer it is impossible to tell which one they intended. I think one needs to look elsewhere for implicit architectural information and have some ideas in that regard. Hopefully when I dig myself out from under my current project, I’ll have some time to explore them.

Source: PDF ACM

Posted to papers by mdb at 11:21 am | Comments (0)       

Flapjax: A Programming Language for Ajax Applications – Meyerovich, et al.

Summary
Flapjax is a programming language that extends JavaScript. It provides: event-driven reactivity, consistency, and uniformity. It can also be used as a library directly from JavaScript. Callbacks are necessarily side-effecting which makes reasoning and analysis difficult. Flapjax uses reactive semantics to eliminate callbacks. It supports event streams and behaviors as basic reactive constructs. Both UI and network I/O are modeled with these constructs (example: auto-save text box). Combinators allow complex composition of event streams (example: drag and drop). Event streams and behaviors naturally lead to composable abstractions (example: filtering). Existing web services can easily be recast as event streams (example: Flickr thumbnail viewer). Doing so results in reusable abstractions that make mashups easy (example: Twitter + Geocoder + Google Maps). Flapjax also comes with a server that provides an event-based interface to JSON object persistence. Implementation details: evaluation model, dataflow graph construction, lifting expressions to behaviors, JavaScript interoperability, inline Flapjax, event propagation, DOM event conversion, library design. Evaluation: third-party apps: Data Grid, Interactive Wiki, Network Monitor, first-party apps: TestFest, Resume, Continue. Design discussion: benefits of consistency, potential for security analysis, challenges of debugging, relational dependencies between DOM and application models. Related work: FrTime, Frappé, Yahoo! Pipes, StreamIt, Flex, JavaFX, ThingLab, Kaleidoscope, Arrowlets, jQuery, Formlets, Links, Hops, MapJAX.

Comments
I’ve been growing increasingly fond of FRP in the UI realm, and these guys show that it’s a pretty good match for JavaScript and AJAX. The idea of modeling web services as event producers and consumers is interesting as well, but I feel that it’s probably less broadly applicable. They briefly touch on modeling persistence in this way as well which really starts to feel like a stretch. It’s informative to try to push the technique as far as it will go, so that stretching is useful. FRP is clearly a useful tool to have in the box, but it probably should not be the dominant paradigm for a language. Their comment that using Flapjax as a library turned out to be less cumbersome than they imagined is good news in that regard. I found doing FRP as a library in Java (GWT actually, but close enough) was not totally cumbersome, but there was non-negligible syntactic overhead. JavaScript has first-class functions and no type system, which makes it possible to achieve more syntactic conciseness, perhaps at the expense of some under the covers magic. I’d be interested to see what one could accomplish in Scala, with tastefully selected implicit conversions and some type-system magic.

Source: PDF ACM

Posted to papers by mdb at 11:01 am | Comments (0)       

January 26, 2010

Arch Java: Connecting Software Architecture to Implementation – Aldrich, et al.

Summary
Describes the ArchJava language, an extension of Java to support explicit architectural constructs. Architectural components define ports to which inter-component communication is restricted (other than communication from a parent component to a sub-component). Mechanisms are provided for dynamic component connection. Overview of proof of communication integrity provided. Case study described wherein 12kloc system was converted into ArchJava given a target architecture provided by the original developer.

Comments
This appeals to me greatly. The conceptual extensions to Java are pretty minimal and yet capture a substantial portion of what one wants to express in a program architecture. The mechanisms by which architectural communication are expressed are very similar to the mechanisms you’d already use: interfaces between implementations with hidden details. Inter-component communication must take place through explicitly declared channels, but that’s a good thing. Effort expended to refactor a program to make these communications explicit bears the fruit of looser coupling and a more understandable system structure. In many ways this was doing in 2002 what Guice does today. And ArchJava gives the added benefit of explicit distinction between architecture and implementation, along with communication enforcement between components. And it makes pretty architecture graphs! What more could you want? With the very large caveat that I have not actually tried using it, I don’t see why it didn’t become way more popular.

Given that history has played out such that it has, I wonder if one couldn’t reimagine ArchJava as an annotation based system that worked with something like Guice to wire up components at run time as well as enforce communication channels between them, and generate useful architectural diagrams directly from the code. That might have the necessary buzzword compliance to actually gain some traction.

Source: PDF ACM

Posted to papers by mdb at 9:32 pm | Comments (0)       

January 24, 2010

“Duplicate field expected”

In my recent programming adventures, I separately encountered two perplexing error messages:

  • RuntimeException: Variable is used without definition!
  • IllegalArgumentException: Duplicate field expected

Inspection of the underlying code brought clarity:

  • throw new RuntimeException("Variable " + m + " used without definition!");
  • throw new IllegalArgumentException("Duplicate field " + fieldName);

The former was reporting a problem with a variable declared as InputStream is, the latter a problem with JUnit’s @Test annotation which defines a parameter java.lang.Class<? extends java.lang.Throwable> expected.

Tip from the pros: always wrap identifiers in quotes when including them in error messages.

Posted to code by mdb at 4:33 pm | Comments (1)       

The Multi-Principal OS Construction of the Gazelle Web Browser – Wang, et al.

Summary
Presents Gazelle, a secure web browser that makes use of an OS-like browser kernel that exclusively manages resources and cross-principal (cross-site) sharing. Security model, architecture described. Display sharing and event dispatch details provided. Prior known vulnerabilities considered for new architecture. Implementation details provided: C# kernel, IE7 Trident rendering engine. Performance and memory use evaluated. Selection of backwards-incompatible security improvements discussed.

Comments
I think the architecture they describe in this paper—a fully isolated web browser kernel that enforces all resource management and cross-site sharing decisions based on simplified abstractions—is inevitable. That’s the model we use for operating systems and web browsers are an even scarier place than the desktop with greater need for security.

Their discussion of how to sanely share the 2D resources of a rendered page among different principals (sites) was interesting. Clearly we’ve gone too far in allowing arbitrary transparent overlap between totally different sites, but they propose a reasonably low-impact reigning in (essentially disallowing transparency when overlapping a cross-principle region) which helps a bit. I still suspect that allowing any overlap will allow nefarious persons to construct websites that fool people into giving away sensitive data. Even the ability to arbitrarily embed any website within any other, regardless of whether the embedee intended it, seems dangerous. Perhaps the solution lies in some sort of cooperation on the part of the embdee with an explicit declaration that “I’m meant for embedding.” Then one could hope that people might be encouraged to be more careful when creating web content that is designed to be embedded in other sites (like not ever asking the user to enter sensitive information into the UI on the embed). That’s probably wishful thinking.

I also think they’re going overboard by isolating subdomains from one another. While it is nice that a bug in news.google.com can’t result in leakage of information from mail.google.com, I think that’s putting the cart before the horse. Google should move their products to different top-level domains, and web application architects should not lose the ability to make use of separate subdomains in the construction of a single application without paying a mammoth performance penalty.

Source: PDF MSR

Posted to papers by mdb at 11:39 am | Comments (0)       

January 22, 2010

Thorn—Robust, Concurrent, Extensible Scripting on the JVM – Vitek, et al.

Summary
Describes design and implementation of a new JVM-based language: Thorn. Features include: static and dynamic typing, aggregate data types (lists, records, tables), OOP (constrained multiple inheritance), first-class functions, pattern-matching, immutable data types, CSP-style concurrency support, module system, annotations, extensible compiler.

Comments
Thorn seems to be a language guided by practicality rather than some particular linguistic ideal. Many of the choices seem motivated by a “batteries included” philosophy like built-in support for lists, records and tables (a hybrid between a map and a full-fledged object). In this case, batteries for concurrency are also included in the form of isolated components that communicate by message passing. The enforcement of isolation by the language sounds great, though one of the examples in the paper seems to pass mutable data from one component to another, so either I’m misreading something or the isolation is not so strong as I thought.

It’s hard to tell where such a language might find a niche in the vast and sprawling landscape. Java interoperability is pretty limited owing to the design of the language (good for them for making bold language choices). However, without smooth interoperability and lacking some flag around which a hard-core group of language enthusiasts can rally, it may never progress beyond an interesting experiment. Maybe they’ll implement a translator to JavaScript and relaunch it as a web client language.

Source: PDF ACM

Posted to papers by mdb at 9:09 pm | Comments (0)       

January 20, 2010

The Semantics of x86-CC Multiprocessor Machine Code – Sarkar, et al.

Abstract
Develops a rigorous and accurate semantics for x86 multiprocessor programs, from instruction decoding to relaxed memory model, mechanized in HOL. Tests the semantics against actual processors and the vendor litmus-test examples, and gives an equivalent abstract-machine characterization of the axiomatic memory model. For programs that are (in some precise sense) data-race free, proves in HOL that their behavior is sequentially consistent. Also contrasts the x86 model with some aspects of Power and ARM behavior.

Comments
Processor architecture memory models are historically underspecified and this is a noble effort at remedying that. Not only did they pore over vendor documentation, they built tools for doing empirical testing to help resolve ambiguities and beat the bushes inside and outside Intel to get what information they could. Interestingly, the model presented in this paper turned out to have some problems, and based on spec refinements from Intel and more testing, they developed a new model based on total store order rather than causal consistency which more accurately reflects reality. They also provide source for many of their tools and their mechanized proofs on their website.

The whole business is dauntingly complex. The concrete examples demonstrating the axioms in the axiomatic semantics show just how difficult it can be to reason about relaxed memory models. The axiomatic semantics itself was a bear for me to comprehend, but that’s more the double whammy of unfamiliar notation and the complexity of the underlying memory models. They are also working on a machine model which is far nicer for developing an intuition, though less useful for proofs. Fortunately, most of us don’t have to think about all this because we can just use locks to avoid data races and rely on the (proven) fact that data race free programs see sequentially consistent memory, but VM and OS implementers will no doubt appreciate unambiguous answers to tricky edge cases.

Source: PDF ACM

Posted to papers by mdb at 3:30 pm | Comments (0)       

January 19, 2010

Static Validation of C Preprocessor Macros – Sæbjørnsen, et al.

Summary
A technique is described for validation of C preprocessor macros based on similarity of normalized abstract syntax trees generated from macro applications. Applications of the same macro that result in differing normalized trees are likely to represent an error. A tool implementing the technique is described and evaluated on a number of subject programs.

Comments
The idea sounds compelling. Their evaluation mainly shows detection of injected errors rather than actual errors, but the subject programs were certainly very mature and likely to have ironed out most bugs in their macro declarations. The technique does rely on correlating multiple uses of macros to establish an “intended” usage and flag deviations from that usage as potential errors. So it would mainly be useful for frequently used macros. Nonetheless, if you are in the unfortunate circumstance of having to use C or C++ and CPP macros, this seems like a low-cost additional protection against screwing up. Not being familiar with the wild and woolly world of C macro research, I am inclined to believe that AST-based macros rather than token-replacement macros are a better choice for future language designers. They’re not prone to the class of errors this tool aims to detect, though they may not be ideal for non-homoiconic languages.

Source: PDF

Posted to papers by mdb at 12:47 pm | Comments (0)       

January 12, 2010

Optimizing Programs with Intended Semantics – von Dincklage and Diwan

Summary
Frequently, unintended program semantics limit optimization opportunities. Programmer supplied additional assumptions that clarify a programmer’s intended semantics can enable optimizations. IOpt is an interactive tool for iterative solicitation of intended semantics clarifying assumptions and subsequent program optimization that accounts for those assumptions. Main ideas: interaction with programmer, tracing optimization failure to programmer understandable reasons, ranking and filtering reasons, constructing and generalizing assumptions that enable optimizations, ranking and filtering offered assumptions. Examples, implementation details and evaluation on two subject programs are provided.

Comments
This is nitpicky but there were numerous grammatical errors and typos, and various (minor) inconsistencies in the graphs and reported data. The editing process definitely failed on this paper. That said, the idea of refining very general program semantics through cost/benefit directed interactions with the programmer is a compelling one. However, the results they provided all seemed to point toward a dire need for tighter specification of Java code. The programmer should not need to be bothered to tell the optimizer that a call to List.size() is not going to modify arbitrary mutable data in their program. Nor that it will not throw an exception. If all of this specification tightening were to be done across the board, I wonder whether significant optimization opportunities would remain and merit the programmer’s time.

I also think that the approach seemed quiet heavy weight: in order to “smoke test” the assumptions provided by the programmer, the system automatically instrumented the code and ran its unit tests. If configuring and using this system is much more complex than configuring and using a profiler, it may not be worth the trouble. If a programmer is willing to spent time and attention on optimization, modern profilers are pretty good at directing them to where manual optimization will bear fruit. Indeed, manual optimization is also a clarification of intended semantics, in a way.

Ray will be pleased to note that our adopted practice of writing code like:

for (int ii = 0, ll = list.size(); ii < ll; ii++) {

subsumes one of the common performance improvements that IOpt enabled.

Source: ACM

Posted to papers by mdb at 1:16 pm | Comments (1)       

January 8, 2010

A Case Study in Architectural Analysis: The Evolution of the Modern Web Browser – Grosskurth and Godfrey

Summary
Postulates benefit from reference architecture for web browser. Briefly summarizes history of web and browsers. Presents methodology for architecture extraction (via QLDX toolkit plus iterative process). Presents reference architecture (extracted from Mozilla and Konqueror source): UI, Browser Engine, Rendering Engine, Networking, JS Interpreter, XML Parser, Display Backend, Data Persistence. Compares conceptual architecture of various browsers to reference: Mozilla, Konqueror, Epiphany, Safari, Lynx, Mosaic, Firefox. Salient technical details noted for each.

Comments
I would have liked to have seen a more detailed architecture, at least for the rendering engine, where most of the complexity lies in a modern browser. There’s no indication of where plugins fit into the architecture, though one can assume mostly in the rendering engine with some interaction with the browser engine, networking, JavaScript interpreter and data persistence. The paper also predates Chrome which would have perhaps motivated more focus on plugins, though I imagine the rest of Chrome fits pretty well within their reference architecture (especially since it’s based on WebKit). They also omit a dependency from KHTMLPart to KHTML in the Konqueror architecture, which I assume is an error, but maybe KHTMLPart somehow avoids talking to KHTML via QT magic.

I would also have been interested in more investigation and discussion of the implications of the modern practice of long-lived complex applications being delivered as web pages. This architecture hasn’t changed radically since the web was just a bunch of simple HTML pages with the occasional <img> tag. The fact that Firefox routinely consumes 60-75% of all the RAM on my fairly well-endowed workstation hints that the architecture may be showing its age. Again Chrome is exploring new avenues here.

Source: PDF CiteSeerX

Posted to papers by mdb at 8:27 pm | Comments (0)       

January 1, 2010

Euler 55

Problem 055:

object Euler55 extends EulerApp {
  def ispal (n :String) = n.take(n.length/2) == n.takeRight(n.length/2).reverse
  def islychrel (n :BigInt, iter :Int = 0) :Int =
    if (iter > 0 && ispal(n.toString)) 0
    else if (iter == 50) 1
    else islychrel(n + BigInt(n.toString.reverse), iter+1)
  println((1 to 9999).map(n => islychrel(n)).sum)
}

Nothing much to see here, other than that I took advantage of a couple of nice features of Scala 2.8 while cleaning this up for posting.

The first is the use of a default argument to islychrel so that when I call it non-recursively, I don’t have to manually specify 0 for the 0th iteration. I do still have to explicitly write n => islychrel(n) because I’m taking advantage of the implicit conversion to BigInt. In theory I could do map(BigInt.apply).map(islychrel) but, not surprisingly, that causes Scala’s implicit argument support to do le freak out.

The other nice 2.8ism I’m taking advantage of is that RichString.take and RichString.takeRight both return String, so where I used to be calling mkString to get the faster string comparison, I no longer need to do so. Yay!

Posted to euler by mdb at 7:29 pm | Comments (0)       

Euler 54

Problem 054:

object Euler54 extends EulerApp {
  class Card (s :String) {
    val suit :Char = s.charAt(1)
    val rank :Int = "23456789TJQKA".indexOf(s.charAt(0))
    def > (o :Card) = if (rank == o.rank) suit > o.suit else rank > o.rank
  }

  class Hand (c :Seq[Card]) {
    def groupCards (cards :List[Card]) :List[List[Card]] =
      if (cards.isEmpty) Nil else {
        val g = cards.takeWhile(_.rank == cards.head.rank)
        g :: groupCards(cards.drop(g.length))
      }

    val cards = c.toList.sortWith(_>_)
    val group = groupCards(cards).sortWith((l1, l2) =>
      (l1.length > l2.length) || (l1.length == l2.length && l1(0) > l2(0)))

    def isPair () = group.head.length == 2
    def isTwoPair () = group(0).length == 2 && group(1).length == 2
    def isThree () = group.head.length == 3
    def isStraight () = !(1 until 5).exists(ii => cards(ii).rank != cards.head.rank-ii)
    def isFlush () = !cards.tail.exists(_.suit != cards.head.suit)
    def isFull () = group.head.length == 3 && group.last.length == 2
    def isFour () = group.head.length == 4
    def isRoyal () = cards.head.rank == 13 && isStraight && isFlush

    val score = group.foldLeft(0)((b, a) => a.head.rank + b*0xF)

    def rank () =
      if (isRoyal) 0x900000
      else if (isFlush && isStraight) 0x800000 + cards.head.rank
      else if (isFour) 0x700000 + group.head.head.rank
      else if (isFull) 0x600000 + score
      else if (isFlush) 0x500000 + cards.head.rank
      else if (isStraight) 0x400000 + cards.head.rank
      else if (isThree) 0x300000 + score
      else if (isTwoPair) 0x200000 + score
      else if (isPair) 0x100000 + score
      else score
  }

  def toHands (line :String) = {
    val cards = line.split(' ').map(s => new Card(s))
    (new Hand(cards.take(5)), new Hand(cards.drop(5)))
  }

  println(readlines("poker.txt").map(toHands).filter(p => p._1.rank > p._2.rank).length)
}

This one is no pat one liner elegantly capturing some simple math. However, Scala sure does let us express the necessary fiddling to compare poker hands with a nice balance of readability and compactness.

Posted to euler by mdb at 6:44 pm | Comments (0)       

Euler 53

Problem 053:

object Euler53 extends Application {
  def fact (n :BigInt) :BigInt = if (n < 2) 1 else n * fact(n-1)
  def choose (n :BigInt, r :BigInt) = fact(n)/(fact(r)*fact(n-r))
  println((for (n <- 1 to 100; r <- 1 to n; if (choose(n, r) > 1000000)) yield 1).sum)
}

Another simple brute force solution. Enumerate and count. BigInt lets us avoid any serious thinking.

Posted to euler by mdb at 5:59 pm | Comments (0)       

Older »
MDB

Bits
Ludography
Reviewlets
Camera Wrestling
Archives:

samskivert.com ©2001 - 2009 Michael Bayne <mdb@samskivert.com>