Feb 06 2008

Firmware… as user code!

Published by matt under Uncategorized

NOTE: We will be unleashing the things you read about here on students at Olin College very soon. We’ll bundle up a release for general consumption (as well as the documentation that we alpha test on the Olin students) Real Soon Now(TM).


I’ve been experimenting with the new firmware for the Surveyor SRV-1b that Carl and Jon have done a lot of excellent work on. Here’s something wacky: what used to be the “firmware” can now be executed as “user code”.

I’ll get there in a few steps. Consider this piece of test code that Jon wrote:

PROC tests (CHAN BYTE in?, out!, CHAN P.LASER lasers!,
            CHAN P.LED leds!, CHAN P.MOTOR motors!)
  SEQ
    out.string("SRV-1 Test Program (of Doom)*n", 0, out!)
    out.string("Testing death lasers*n", 0, out!)
    test.lasers(lasers!)
    out.string("Testing less deadly LED*'s*n", 0, out!)
    test.leds(leds!)
    out.string("Testing harmless motors*n", 0, out!)
    test.motors(motors!)
:

This is is the main process from Jon’s test program. The parameters coming into the process header are channels out to the environment. In this case, the environment is the Surveyor, so we have channels for the serial communications over the WiFi radio (those are the channels `in’ and `out’), and there are output channels for the lasers, the LEDs, and the motors. Each of these are named as you might expect. To run this on the Surveyor, we compile it, upload the bytecode, and things just go.

But what’s great is that this program, although it doesn’t do much, is not really different than the firmware that used to be written in C. The original firmware would handle commands from the SRV Console, and then spit images back, drive around, and do whatever else you commanded your mobile wireless camera platform to do. That is, all the default firmware did was respond to commands over a textual protocol. (There was a C interpreter, to. But my point is that you used a textual protocol to initiate all kinds of things.) Given that we have a channel of bytes representing the textual input coming over the radio, it seems like we could implement the old protocol completely in occam-pi.

And that’s what Carl and Jon have done. They’ve implemented what used to be firmware as a program that any user can now write and upload to the Surveyor. The program srv1.occ can be compiled, uploaded to the Surveyor, and executed as a user program, even though it is implementing the entire SRV-1 protocol. It is now a “user program” that implements what I’ve referred to as the “old firmware.” If we want to kill this “new firmware,” we issue a ‘!’, and it is shut down cleanly. Now, we can upload a newer version, or perhaps a completely different program.

This has its tradeoffs. For example, it means that my SRV-1 doesn’t wake up ready to send me images. On the other hand, it does mean that I can easily modify and extend the firmware, including extending the protocol or (because occam-pi is a parallel-safe language) running my own additional code along with the original firmware. I can filter the channel carrying the commands (perhaps ignoring every other request for an image, or drawing on the images to add data to them), and so on. Over time, we’ll probably end up refactoring the “firmware” into a bunch of reusable components that program authors can selectively include in their programs to get parts of the original firmware’s behavior in their programs. (We’ll see… I haven’t given this much thought yet, but perhaps Carl and Jon have.)

This is still evolving rapidly, but it is an absolute joy to be able to easily modify my occam-pi code, send it over the WiFi to the Surveyor, and get completely new, low-level behavior without having to go through a lengthy reflashing of the bot over JTAG (or, worse, WiFi).

As can be seen above (sorta), I’ve begun to explore drawing on the images before shipping them from the SRV-1 back to the user. I believe this is important for students exploring robotic vision, as they need a way to indicate what they are looking for; drawing onto the image strikes me as a very simple way for their code to communicate that “this looks important!”. It might be that they’re doing edge detection, or blob finding, or any of a host of other things. My code doesn’t do anything exciting yet, but tomorrow is another day; more excitement will ensue.

Nicely done to the UK team! Wootness.

Comments Off

May 30 2007

The Joel Test

Published by matt under Uncategorized

Somewhere in the past, I took the Joel Test. I like this simple measure of a software project’s health, because it hits on the really important things with respect to quality and maintainability.

The Joel Test is:

  1. Do you use source control? Yes
  2. Can you make a build in one step? Yes
  3. Do you make daily builds? Yes
  4. Do you have a bug database? Yes
  5. Do you fix bugs before writing new code? No
  6. Do you have an up-to-date schedule? Yes
  7. Do you have a spec? No
  8. Do programmers have quiet working conditions? No
  9. Do you use the best tools money can buy? Yes(ish)
  10. Do you have testers? No
  11. Do new candidates write code during their interview? N/A
  12. Do you do hallway usability testing? Yes

This is better than a few years ago. On ever commit, buildbot runs and makes sure we didn’t break anything (on multiple architectures). We don’t fix all the bugs before moving on—but some of those tickets in the database are things like “Write a new linker.” I’ve given us a “no”, but for actual, build-breaking, test-breaking bugs, things get fixed before we move on. We don’t work to a fixed schedule—instead, we tend to work towards features. So, I’ll claim we do have an up-to-date schedule. We don’t have a spec anymore, but we work to the best documentation available whenever possible; this may change in the future. We don’t have the best tools that money can buy, but we do the best with the freely available tools we can cobble together. And, we tend to chat about new features before implementing, which is as close to hallway usability testing as we can get when working on a virtual machine.

So, I’d give us a 7 out of 11; we don’t interview, so I’ll drop it from the list. I think that’s not bad… quite good, in fact, for a small, unfunded open-source project.

Comments Off

Apr 04 2007

Our Google TechTalk

Published by matt under Uncategorized

On March 29th, we gave our first Google TechTalk!

You can hit the link to go to the Google Video site. So far, we’ve had quite a few views, and apparently, people enjoy it (according to the ratings, anyway). We had a hard time targeting the talk; we rewrote it several times before giving it, as we couldn’t decide how best to approach the work we’ve been doing for the last four years. Do we dive straight into how we can use a language like occam-pi for abstracting over clusters, or how we can build safer, concurrent software on embedded platforms?

In the end, we gave an introduction to the language, and then focused a bit on how we can convert parallel models of software directly into code, and motivated this using Jon Simpson’s work with the subsumption architecture on small robotics platforms. We received many positive comments on the talk, and one person did say they thought it was light on “hard-core” detail. But what can you do in 45 minutes with an unknown audience?

In the end, we gave a good talk, and were excited to be able to share our work with the Googlers there and who are now catching the video on-line. (And, for that matter, anyone else who wants to see what we had to say.)

Certainly, it was a lot of fun. And yes, the food in the cafeterias at Google really is that good.

Comments Off

Apr 02 2007

AAAI Robotics and Education 2007

Published by matt under Uncategorized

We spent the first three days of last week taking part in the AAAI 2007 Spring Symposium. In particular, we joined the Robotics and Education track with about 50 others interested in using robotics to teach everything from introductory programming to vision, pathfinding/planning, decision making, and all the other tasty bits of artificial intelligence.


Frontright

Scribbler

Historialogo01
Turtle

We saw some very cool projects and platforms. I really liked Roomba Pac-Man (where students program Roombas to wander hallways and vacuum up messes in a pac-man like way), and the Scribbler (while not very aesthetically pleasing) is a cute re-creation of the LOGO turtle. (I’ve provided a side-by-side comparison of what these two robots look like; the original turtle was, I think, far more attractive.)

In terms of platforms, many people are doing “tethered” robotics, where they either physically or wirelessly tether their robot to a PC. This is, I think, unfortunate, as I feel it takes something away from the process of programming the robot—it is a remote control process, as opposed to an autonomous one. However, like many things, it depends on what your pedagogic goals are. However, I do look forward to a few things: I think the Qwerk is a neat (part of the TeRK project) and we hope that we have a chance to work with it sometime in the future. It’s a powerful computational platform with a lot of nice outputs for motor and servo control, as well as managing a host of sensor inputs. Likewise, the Blackfin Handyboard is a very powerful platform, and will also be great to begin exploring, as it provides some very flexible programming options in the form of two Xilinx FPGAs. Fred and Andrew did a very nice job with the board design, and I expect many cool things will be done with this platform.

We also saw David Miller’s XBC/Gameboy combination, which I now have a Gameboy to try this out with (once I have a budget to get the rest of the bits, which is actually the expensive part). However, the really fun new platform was the Surveyor Robotics SRV-1.

Howard wrote about our experiments with the SRV-1 on his weblog, and I’ll add a bit here, and followup in a later post with some more detail. We were given an SRV-1 to borrow on Monday evening, and did very little with it, as we were missing critical software. Tuesday, during coffee breaks and the like, Christian ported the Transterpreter to the SRV-1. In less than two days, we saw a new, ARM7-based robotics platform, and had the Transterpreter running subsumption code on it. In three days (that is, the day after the conference), Christian had vision working.

Aaai The Winners

I went ahead and stole a picture from Howard; here, you can see the end-result of our hacking, which is that we won the AAAI Robotics and Education robotics programming challenge. We did this with the SRV-1 after having seen it for the first time on Monday, having ported our runtime to it on Tuesday, and having seen the challenge on Wednesday morning. We managed a (not-quite-working) 3-layer subsumption network that tackled the challenge in about 20 minutes of furious hacking; I’ll write more in detail about our code (which we’ve subsequently cleaned up and corrected) in a followup post.

For now, we’re still kicking around San Francisco and the Bay area, enjoying two days off before flying back to England on Wednesday.

Comments Off

Mar 23 2007

Is that a supercomputer in your pocket?

Published by matt under Uncategorized

Not too long ago, we made a few interesting changes to the Transterpreter.

The first was that we updated the Multiterpreter. Donkeys-months ago (like donkeys-years, but not quite as long) we modified the interpreter to map occam-pi processes to operating system processes. This meant that we could run code in true parallel on multiple CPU or multiple-core machines. Unfortunately, the cost for process switching was so high, we couldn’t show this to anyone on the planet for fear of being shamed.

About two months ago, Adam and I sat down (I rode copilot) and rewrote that code to use POSIX threads instead of OS processes. In fact, we didn’t even do anything savvy, like use a thread-pool; instead, we just create operating system threads when they’re needed, reclaim then when they’re done. Not the most efficient way to parallelize the virtual machine, but we like to do things in the simplest way possible first. In this case, it had the smallest impact on the codebase, so we did it that way.

On my laptop (”Lyra”, a dual-core 2.0GHz Intel Core Duo MacBook), I ran some code that varies three things:

  1. The number of parallel processes being executed
  2. The amount of computation each process carries out
  3. The number of communications each process makes while computing (context switches)

In a multi-threaded or multi-core system, this will stress our run-time and tell us at what workload our parallel virtual machine is more efficient than running a single-threaded runtime on a single core. The core of the code comes from Kevin Vela’s doctoral thesis from the University of Kent (not available online).

PAR i = 0 FOR abyg
  CHAN INT chan:
  SEQ j = 0 FOR pleng
    PAR
      SEQ
        SEQ k = 0 FOR granu
          array[(granu * i) + k] := array[(granu * i) + k] + i
        chan ! i
      INT t:
      chan ? t

So, what did we find?

(Note that in each of these graphs I’ve varied both the y- and x-axis; this is bad reporting practice, and something I will correct when the tech report is put together—for now, these are the graphs I have, and I’m using them as-is. )

20070202-Lyra-Parallel

To be clear, the different glyphs represent the maximum number of POSIX threads that were allowed to be running at the same time to support a single virtual machine. On the y-axis is the time in seconds needed to compute the replicated PAR shown above (averaged over two runs), and the x-axis is the granularity of the work packets—larger granularity means more work gets done before a context switch is allowed to take place.

On my MacBook (above), we had to have each thread do a significant amount of work (process a 65K element array in a non-trivial manner) with relatively small amounts of communication (long work periods) before we saw a speedup due to parallelism. My suspicion is that OS X is such a hungry operating system that it (and other application processes) were constantly asking for attention, and therefore, the Transterpreter threads rarely had the opportunity to run for a significant amount of time.

So, to belabor this just a bit, we can see that with one thread of control (a single-threaded interpreter), we compute packets of granularity 1 approximately 20x faster than if we run a 2-threaded interpreter. On a two-core machine, this doesn’t necessarily make sense… unless we consider the possibility that the two POSIX threads are spending all of their time contending for an opportunity to run, in which case it makes sense that we’re not seeing any benefit of having a parallel runtime.

As the granularity approaches 50 (which we can convert into a specific number of VM instruction cycles, if we wanted to), we see that the single-threaded and dual-threaded interpreter run at almost the same speed. This is because we are no longer seeing the contention at the operating system level for multiple threads to be executing. Or, perhaps it is because we are no longer fighting the operating system for a chance to execute.

On Hadar, a SunFire 880 with four 750MHz UltraSparc III processors and 8GB of RAM, I found the results were… a bit odd:

20070202-Hadar-Parallel

Ignoring a granularity of one for a moment (which is effectively measuring context switch time of the POSIX implementation on a given machine), we can see that a two- and four-threaded interpreter are always slower than a single-threaded interpreter. However, if we increase the number of threads the Transterpreter is allowed to spawn to 8 or 16, the Transterpreter is always faster for even small workloads than a single-threaded Transterpreter.

I have not fully investigated this yet; I believe this is because Solaris handles pthreads differently than other operating systems. In particular, it has a notion of virtual CPUs, and I think (but am not sure) that with too few threads, it assigns them all to a single virtual CPU, which is assigned to a single physical CPU. Therefore, even though we have four cores (most of which are idle), there is massive contention on a single CPU. When the OS sees more threads in play, it actually farms them out to multiple virtual (and physical) CPUs.

I have to read more on this (and have some excellent resources from Sun that I picked up while at SIGCSE), and do not doubt that this situation can be improved. Our initial testing was only over the course of a few days, and I need to investigate this further before turning the work into a tech report. (From exploration, to blog post, to tech report, to publication, I suppose.) In other words, I’m sure that Solaris isn’t completely borked—instead, I assume I have to do some more work to make sure it does what I want, instead of whatever the current default is.

The last system we did some tests on was Ninja, an Intel SR1200 server with two PIII 1.4GHz processors and 2GB of RAM running Debian GNU/Linux with a SMP kernel (rev. 2.6.17-2-686).

20070202-Ninja-Parallel

Ninja produced the curves I expected from the other two machines, actually. We see that a single-threaded Transterpreter is faster than a multi-threaded interpreter for very small work packets. However, with a granularity of 5, we see that a single-threaded, dual-threaded, and quad-threaded interpreter all run at roughly the same speed. From that point forward, it is better to have multiple threads in the interpreter than a single thread, as more work gets done in unit time. And, I believe that this easily represents “real world” work packet sizes.

What I don’t quite understand is why four threads is better than two for very small granularities. Like the other platforms, there is a good deal more investigation to be done before I understand the implementation of POSIX threads in a given operating system (OSX/BSD vs. Solaris vs. Debian GNU/Linux 2.6) and how that effects the execution of a parallel Transterpreter.

Why do I think this is cool?

The Transterpreter runs on the Texas Instruments MSP430 (an 8MHz, 10KB RAM embedded processor), the LEGO Mindstorms (a 16MHz embedded system with 32KB of RAM for both the interpreter and code), as well as my MacBook, Christian’s G4 Powerbook, our Linux dual-processor server, quad-processor Suns, and most likely any machine with a C compiler. The code I ran to test our “POSIXterpreter” does not exercise any features of the language that would not run on all of these platforms—put another way, the bytecode for my tests could be executed on the MSP430 without modification; we might blow the RAM if we make the replicated PAR too big, but that’s about it. That’s how portable the code is across Transterpreter instances right now.

Tyan recently announced their 40-processor “desktop supercomputer.” This is a set of five, dual quad-core (8-processor) blades in a box on wheels. So, you get five computers in a box, each with eight processors, and a max of 60GB of RAM (12GB per blade). And the price? $20,000.

Not bad.

I haven’t written it up yet, but the Transterpreter also does OpenMPI. That means that we can actually spread computation across machines in a cluster as well as across SMP cores. This is way alpha code, but I’ve demonstrated to the group that we can split things up on the Darwin H4 supercomputing cluster (a 2.4GHz Dell and a 3.2GHz Dell under adjacent desks on a 100 Mb network). Given a bit of time, this could become a first-class part of the Transterpreter release, and we’d have an excellent environment for controlling parallelism in a heterogeneous cluster environment.

Or, as the case may be, an excellent way of exploiting the resources of a 40-processor supercomputer-in-a-box. With five POSIXterpreters running large thread pools, adding some intelligent (batch) scheduling, and a more fully-featured set of MPI bindings, we’d have a seriously smart setup for doing parallel computing. All the pieces are there, but we don’t have the financial justification to dedicate the time to the development. So, we continue stealing a little bit of time here and there to see if we can demonstrate that this is a really powerful way to orchestrate the use of high-perf libraries (like BLAST, LAPACK, etc.) in a robust, safe, and semantically clear way in a truly parallel environment.

Comments Off

Jan 08 2007

An update for the year 2007

Published by matt under Uncategorized

The Transterpreter project was born three years ago around this time; it started life as a Scheme implementation of a Transputer bytecode interpreter, and has grown quite a bit since then. Given that we’re starting into 2007, it seemed like a good time to give an update about what is going on in the group. In no particular order:

  • Jon Simpson (third-year undergraduate at Kent) has been working on revitalizing the LEGO Mindstorms RCX port of the Transterpreter. The old version was built on top of BrickOS; we were, effectively, running two operating systems on the LEGO, one on top of the other! Jon is at the point where he has the Transterpreter running natively on the LEGO Mindstorms RCX, can handle button input (including a nifty, non-blocking debouncer), and is continuing to slowly build up to more useful end-user interface code. By March he expects to have a complete and usable (native) port of the Transterpreter on the RCX. (/branches/new-lego)

    Looking forward, Jon (and the rest of us) are excited about getting the NXT port of the Transterpreter rolling. We have already had some interest expressed from others about taking part in this effort; we’ll be generating some mail on that shortly.

  • Damian ended the year cleaning up his Cell port of the Transterpreter (/branches/cell), with particular attention being paid to developing a native, big-endian port of the Transterpreter (/branches/big-endian). Currently, we pay a penalty on all big-endian hardware, and Damian (and other’s) efforts in this regard will give us full, native performance on big-endian platforms; this effects the PowerPC (Mac, Cell) and Mindstorms RCX, among others. Looking forward, he is coming back to 42 (our experimental compiler), and wants to work out the critical points in generating little/big endian code as well as “fat binaries”.

  • Adam Sampson picked up a few Arduino boards (http://www.arduino.cc/), which are driven by an Atmel ATmega8. This little chip has roughly 7KB of free flash and 1KB of RAM; it represents the smallest device the Transterpreter has been targeted at yet (/branches/arduino). While Adam’s intent was to develop an Arduino port of the Transterpreter, he says that “in the process I’ve done a load of cleanups which are probably more useful than the work I was intending to do.”

    The “cleanups” that Adam has made in the Arduino branch are (in his words):

    1. wrapper (currently untested) for the ATmega8

    2. instruction dispatching via switch, so all the instructions end up inlined, which cuts the code size considerably (and should make it a bit faster)

    3. some fixes to comments

    4. consistent ANSI C prototypes throughout

    5. add multiple-inclusion guards to all the headers

    6. make the word size code use inttypes.h and be configurable at compile
      time

    These are all nice cleanups that will ultimately be merged back into the trunk, and therefore benefit all of the Transterpreter ports.

    Adam’s explorations also involved cleaning up the “memory array” code (/branches/mem-array). When originally developing the TVM, we had a “virtual memory”, which made detecting a variety of faults easier; this was the original memory interface for the Transterpreter, but was dropped shortly after porting to C. Adam has cleaned this up as well, and (in his words, again):

    1. make the array memory backend actually work (mostly with the aim of being able to use a non-native word size, although I haven’t yet tested that)

    2. do bounds checking in mem_array so that the TVM stops (rather than
      segfaults) in the event of a bad memory access

    3. fix a number of bugs related to string handling and memory allocation which were revealed by Valgrind; we need to do a Valgrind run with some proper code (using dynamic memory) at some point to try and shake out more

    4. completely rework the code in stiw that builds the memory map, since it had a number of bugs and didn’t work with mem_array before

    5. fix a few bugs in interpreter revealed by the work above

    These are all excellent, and again will find their way back into the trunk. Indeed, as part of his explorations, Adam generated a number of tickets that I added to Trac (http://trac.transterpreter.org/), and we can begin addressing in the interpreter, which is good.

  • Matt Jadud (that’s me) has been focusing on the MSP430 port, living in the Tmote Sky branch (/trunk/wrappers/tmotesky). As a side effect of this work (along with Jon Simpson), we developed ‘tinyswig’ (/trunk/scripts/tinyswig.scm), a mini, SWIG-like script that lets us quickly and easily write C extensions to the VM, callable from occam-pi. After this, we worked out the details of interacting with hardware directly from occam-pi, which (in some cases) completely eliminates the need to call out to C from occam. For example, in ‘/trunk/wrappers/tmotesky/Native‘, you can find our initial explorations in this area, where we are bringing up the virtual machine with no connections to the hardware, and then configuring the MSP430 directly from occam-pi. We expect this to make implementing functionality for new embedded platforms much easier for the developer in some cases.

    Although it is unexciting infrastructure, I also began work on migration from Autotools to Scons (/branches/scons). Currently, it is possible to build the Transterpreter for Linux/Intel, MacOSX/Intel, and the MSP430 in this branch using Scons. The build scripts are, I believe, clearer, easier to extend, and will (I believe) give us a better cross-platform build experience (eg. Windows). Currently, we have to maintain a completely separate build system for Windows, which is untenable in the long run; with Scons, we have a chance of bringing things together nicely across all major development platforms. So, while unexciting, I consider this a rather important revision of tools and infrastructure. (Of course, I’m working on it, so you might expect I’d say that.)

    Along with the under-the-radar updates to the build system, I’ve nearly finished refactoring the linker; what had grown to 5000+ lines of Scheme has been reduced to roughly 1500. While the new slinker is not done yet, it is much simpler and much more modular. The new slinker was informed greatly by our explorations with 42—the underlying data structures are far more intelligent than they used to be, and allow us to do complex things very simply. It is, in a nutshell, a joy to work with (as source code), which I can no longer say about the original slinker.

    Looking forward, I’ll be working more with two new hardware development boards for the MSP430, exploring radio communications, analog-to-digital conversion, storage of data to external devices (SD, etc.), and most importantly, the integration of interrupts with the Transterpreter scheduler. The development boards are important, as they allow me to do in-system debugging via JTAG, which will be essential for the implementation of interrupt handling. This will, like many other things we do, impact all Transterpreter ports.

  • Christian Jacobsen has recently been sighted trying to set up a “build bot” for the Transterpreter project; this will automatically check out our code, build it, and run the test suite on a regular basis. However, he should not be doing this, but should be submitting the final version of his thesis. We expect that to go out the door ANY DAY NOW. I read it on my flight over to the USA, and believe-you-me, it was very, very exciting. Very.

If this sounds like we’re busy, it’s because we are. We’ve got a few other things up in the air related to our ongoing efforts, and will drop word of those as they see the light of day. If you have any questions about the project, or want to get involved, please feel free to drop me a note (matt at this domain).

Comments Off

Sep 28 2006

Cores, cores, and more damn cores

Published by matt under Uncategorized

Lets look at two articles that recently hit the fan:

The first is a regurgitation of a presentation made by Intel to developers. In short, they’re considering a move to processors with 10s, if not 100s, of discrete computational units on a single die. The second article is a cry for Intel to stop, or slow down, using the argument that we “have to get the architecture right first.”

OK, so the architecture can slow us down. But what slows us down even more is the fact that most programming languages have no support for concurrency, and therefore most programmers think “concurrency” means “threads and spinlocks”. The two most interesting explorations we’ve made in the last few months are the Multiterpreter and the Cell Transterpreter.

The Multiterpreter

If you check out the full Transterpreter project from our Subversion repository, you’ll see in the branches a multiterpreter branch. This exploration was a quick proof-of-concept that Christian and I poked at for a few days. While the threading is terribly inefficient, it demonstrated that we can spawn multiple OS threads on-demand from the Transterpreter runtime, and distribute computation across those threads without any modification in the source program. Put simply, if you write:


PAR
process1()
process2()
process3()
process4()

on a quad-processor machine, then all four processors will be utilized by Transterpreter instances. We haven’t publicized this branch because the approach was the least efficient implementation possible, but simplest to implement. Our explorations into native code generation are, in part, an exploration of what parts of the code-base would need to be refactored to allow efficient multithreading on big machines.

The Cell Transterpreter

The other branch that might be of interest is the Cell branch. The work being carried out here is described fully in the paper A Cell Transterpreter. Damian Dimmich has successfully run 9 Transterpreters on a single Cell Broadband Engine. Although we only have the Cell simulator, he has demonstrated that you can have multiple Transterpreters running, in parallel, all over the device, and preserve CSP channel semantics across the cores. Put another way, his proof-of-concept demonstrates that we can write occam-pi programs that work seamlessly across multiple, heterogeneous cores.

Bring on the cores

As we look to unify various compiler explorations within the group, we expect to target these kinds of platforms more directly. If your language makes it easy to express ideas in parallel, it should be easy for the compiler to automatically distribute work units to many different processors. In the case of occam-pi, this is absolutely the case… so if we have a processor with 8, 80, or 800 cores, we should be able to take advantage of that power without significant effort.

Certainly, we can take advantage of this kind of hardware far more easily than a programmer writing in a sequential language like C, C++, C#, or Java.

Comments Off

Aug 08 2006

Hardware bringup on the NXT

Published by matt under Uncategorized

The porting of the Transterpreter is straight-foward. You check the sources out, configure your cross-compiler, and go. Some twiddling with ‘make’ is usually the biggest problem we run into.

Porting to the NXT will be slightly less straight-forward.

NXT-block-diagram

The core Transterpreter will cross-compile to the ARM7 without any problems. However, there’s a little problem of bringing up the hardware on the NXT that must be addressed. From my experience working on the Tmote Sky, the OS developer needs to check and set a significant number of pins to bring up each piece of hardware. These bit-twiddling operations are tricky; for this reason, it is likely that we’ll want to get a JTAG reader for the Mac and be able to inspect the hardware while developing.

The Atmel AVR is likely to be the first device we bring up; it gives us access to the sensors, motors, and buttons on the device. Once we have rudimentary access to the NXT’s sensors and motors, the simplest way to get feedback from our programs will be to implement sound. In increasing order of difficulty, I suspect the display, the USB port, and Bluetooth radio will follow. Once we get the low-level bindings in place, however, we should be able to write the majority of the NXT port in occam-pi or 42, which will make development a good deal more pleasant.

Comments Off