Software in Space (SE Radio)

Software Engineering RadioIn episode 100 of the Software Engineering Radio, Hans-Joachim Popp, CIO at DLR, talks about software in space. Here is the summary.

Introduction

Today, software plays a major role in the construction of spacecrafts. It started in 1964 but it took several years and several catastrophes to change minds about developing mission-critical software for such high-risk environment.

Reasons for space disasters are not only programming faults but also severe faults in the requirements engineering and the analysis of the whole system. Here are some examples:

1. Ariane 5

Ariane 5 crashed just because some missing casting; it was just an arithmetic overflow that had cost about 1 billion Euros. Only some lines of code could have rescued that. That crash was the major point where the German software developers in the spacecraft industry began to think differently about quality assurance in software development.

2. Mars Climate Orbiter

The climate orbiter had calculation errors and didn't go into a proper orbit but got crashed on the surface of Mars. This was only because the calculation units were mixed; one team calculated in yards and the others calculated in metrics. This would be no surprise if the teams would have been in different countries but there were both in US. It was due to missing testing environments.

To avoid such faults, we should check out hardware pieces and simulate them by software one by one, so we have a better test environment with the complete system in software and in hardware.

How to Avoid Problems

Popp talks about space software development in DLR:

1. Process

  • Some projects have different teams from different European countries with different languages and cultures. So we have to apply really strict methods of cooperation.
  • We pay a great attention to requirements engineering as most of the problems we had came from missing requirements.
  • Our software development process is the standard V Model with some more steps in between. After the Architecture Design phase, we have a Detailed Design phase where we do a very low level unit testing against the detailed design.
  • Missions launch before we reach the end of the development because it might take years to finalize software. So we start the mission as soon as we have the basic software in place. After launch, we continue to learn on the way and update even core units of the software.

    Today, 30 years after launch, one of our crafts are still speaking to us. This couldn't be without severe changes in the coding. The language was all assembly and the memory was only 4k. It takes images and uses a tape recorder to store them on tapes.

2. Tools

  • We use specialized operating systems that have been used and tested for many years.
  • Most software is written in pure C but the system and the testing environment restrict the use of certain instructions.
  • We don't use code generation but we invest about half an hour for each line of code.
  • The ratio between application code and test code is 1 to 12.
  • For code verification, we use PolySpace.
  • For configuration management, we use ClearCase.
  • We use automatic code reviews and automatic testing.
  • Automated tools can only find a small set of errors but most of errors are still in the requirements level (semantic level). You still have to review the whole system and to have a test environment where you play the whole mission through to see what happens.

3. Practices

  • The most important thing is to establish the culture of being free to make faults, to find them and to talk about them. Many catastrophes occurred because this was missing; most of the errors were known, but the management just closed their eyes.
  • Such mission-critical systems need redundancy in software, like having software parts written in different languages that control each other.
  • We regularly review the code in teams where developers explain the code to others who are not involved, junior developers ask questions, etc. This is very expensive and time consuming but it leads to a very high quality.
  • We measure quality by tracking how many faults we made, errors we found, the reasons they occur.
  • Pure development is only 40% of our time, 25% are invested in management and system engineering and 35% are invested in the design and integration test.
  • We measured our productivity and found out that having this high level of quality we can produce only 0.6 lines of code per hour.

Summary

  • In such high-risk missions, you have to go quite conservatively in slow steps.
  • Every member in the team should feel they are free to make, report and discuss faults.
  • Project managers have to be open for any remarks from developers.
  • We spend a hundred million Euros every year for software development. It sounds too much but this saves money at the end.

Did you like this article? Bookmark it:

Related Articles

Leave a Comment

If you want to post code, do this:
<pre><code class="ruby|javascript|css|html"> your code here </code></pre>