Advice for Integrating Satellites

[updated 5/13/2002]

The following is a list of the foibles and missteps we took while integrating Sapphire for the shake test. We are being up front about our mistakes in the hopes that others will read and believe us and avoid the same sorts of errors.

After all, we thought we were smart students and didn't need all the overbearing documentation that makes industry satellites so slow and expensive (Dinosaurs! Dinosaurs!). We didn't need other people telling us how to run our project. Here's where we could have used their advice: 
Recommendation Our "Lesson"
Get Spares of Everything This is a general rule; I've never known our project to have "too many" of anything. Extra tools always come in handy if you want several people to dismantle the side panels. Extra fasteners mean you're not scrounging around. Extra L-brackets means that when one breaks (see below), then you're not worried that you're going to have to make more.
Check every fastener and hole for fit and proper size/number of items
  • We had to build a second set of side panels for the shake test, because our solar panels were not finished. Unfortunately, they were a bit of a rush job and the holes did not line up exactly. I personally stripped and popped out a good half-dozen pem nuts from our flight structure in my attempts to tighten down the panels. When it's 9pm and the shake test is at 9am, and you start breaking flight hardware, let me tell you, the lab is not a fun place to be.
  • The second happy story concerns our CPU box, which had to be filed out a bit because its fasteners couldn't line up with the tray. Of course, we caught that problem at 7pm...
Drill all holes at the same time - especially if they're supposed to line up This traumatic event happened to our solar cells, which were bonded to the panels and soldered together before we noticed that one of the holes was missing. On top of that, none of the holes were aligned with the top honeycomb panel - they were all fractions of an inch off. Two students, two files, one afternoon. Not a lot of fun.
Test the expected telemetry values - do this as soon as the electronics are wired up
  • Four months after establishing the telemetry list and reading the values, somebody finally realized that the THD sensors (our main payload) were misnumbered in their outputs. (Why? Well, the sensors don't do much that's interesting in the lab, so we didn't pay close attention to the signals. Moreover, we hadn't sat down with the PI and defined the expected telemetry output.)
  • Secondly, we realized six months into the functional flight hardware tests that nobody had checked some of the temperature sensors (they hadn't been integrated - though the signal conditioning electronics were on board). A miswiring had damaged one of the boards and we didn't catch it until after shake. (That has since been fixed, but it was a bit of a panic trying to figure out what went wrong.)
  • What this means is that even if the proper hardware is not in place, wire together some sort of kluge to verify that what you DO have is working properly. That way, you can keep an eye on it and save trouble when the full integrations comes around.
Prominently post telemetry values and/or go over them with the pertinent people This relates to how long it took for us to realize that the THDs were mistakenly labeled. In addition, the expected values are essential for troubleshooting problems, as the next example illustrates.
Start a binder with sample telemetry from every major test or reconfiguration Not only did the THDs have the wrong outputs labeled, but somewhere between September 1995 and March 1996, one of the sensors broke. (Yes, this is our primary payload. Thanks for bringing that up.) We had gone through three thermal cycles and two vacuum test in that time, but hadn't really checked the telemetry for long-term variations. (Why? Because, again, we hadn't sat down with the PI to get expected values.) And when we had taken data, we hadn't kept it, which means now we don't know when/how the sensor broke, so we'll have to do another vacuum test to make sure the sensors will work.
Double check interior space before adding components A temperature sensor was epoxied onto the CPU in such an orientation that we couldn't close the box. We had to (carefully) scrape off the epoxy and move the sensor.
Have somebody else double-check your wiring connections One of the wires that ran telemetry data from the Power tray to the CPU was omitted during our construction. Since we didn't have a list of expected values, and hadn't done a detailed telemetry analysis (are you detecting a theme?), this mistake wasn't caught for six months.
Confirm that the space-rated chips fit into the sockets you've created for them - this applies to all components, actually We have had months of redesigns (and then a very time-consuming soldering job - thank you, Godwin Zhang!!!) because the space-rated chips donated to us weren't in the package that we expected.
Queue up jobs so that you can use expensive materials at one sitting. We have used three bags of thermal epoxy for about 1/4 bag's worth of epoxying because the parts weren't ready at the same time. We've wasted probably 2/3 of our conformal coat for the same reasons. I could tell similar stories about our other epoxies and RTV.
Finish whole boxes, whole boards For our CPU, TNC and telemetry boards, we had all but one or two chips conditioned for flight (back to getting those space-rated chips to fit). It is not an easy task to just "go back" and touch up the board once they're ready. We thought it would be; we were wrong. What happens is that you stall and wait for that piece to be done - or you let other parts of the project slide becaue you've got that piece of unfinished business. Closing up boxes and locking down trays builds excitement and momentum that is easily killed by waiting for "just one or two" parts.
Discretely group signals and connections coming off the tray The simplest and often most effective debugging option is to check the pins coming off the connector. If you've got power and signals on the same connector with everything gooped up, it takes work to both power the component and check the output. Specifically, the temperature sensor on the Comm tray wasn't doing what was expected of it. I wanted to stick a voltmeter on the pins coming out, but power for the sensor was supplied through the same connector. I ended up having to scrape some of the RTV away to find contact points, which means I had to remember to go back and add RTV. (On our other trays, this is not an issue, since the power harness has a separate connector.)
A functional engineering model (with the same wiring), a CPU debug port, a power supply that measures current, and a scanner are all invaluable diagnostic tools I cannot begin to describe how much time these items save, but I'll try. So many times we have swapped a questionable component onto our engineering model and been able to quickly diagnose a problem. So many times I have directly connected to the CPU to confirm whether the code was working - often it was shown to be a TNC connection problem. So many times I have been able to look at the current supplied to the satellite to confirm whether or not the camera is turning on or if the satellite has reset or (unfortunately) if there's a short. With the radio frequency scanner, we can listen in to the transmitter packets - and could have realized that we had set up our ground station incorrectly, corrected it, and continued the thermal cycle test. Instead, we went home, discovered the problem, and had to go back the next week to try again.
Build with a modular architecture concept The ease with which I can pull out the TNC, which is acting weirdly again, and put in our engineering model's TNC, which I can already confirm is working properly, to isolate problems is a tremonedous troubleshooting tool. It saves time, every time. On top of that, we can go from separate boxes on the benchtop to the fully assembled satellite in about thirty minutes.

Lessons Learned, Part Two: Operational Experience

And if that list weren't enough, we made more mistakes on-orbit!
Recommendation Our "Lesson"
Have automatic, timely reboots of key components This lesson was painfully learned in April/May 2002. Sapphire is talking (beacon is broadcasting, which indicates that most of the vehicle is working), but he isn't listening. We don't know if it's a problem with the TNC or the receiver or the receive antenna. And we have no way of knowing until the spacecraft reboots -- which won't happen until the software gets enough bit flips to crash (could be 1 week, could be 8 months). If there are components that are left on all the time, you should set some rules to reboot them. Maybe every 24 hours after the last contact, whatever works for your mission.
Have a Beacon It takes days to weeks for the 'public' databases (NORAD/Goddard) to get a good, stable set of orbital elements for a new satellite. (For the first four weeks of flight, NORAD's orbital elements for Sapphire were actually for a different spacecraft.) Unless you want to wait a few days to get good contacts, you need some way for your satellite to say "here I am"! A low-power CW beacon works very well; you can manually zero your antenna in on the source. Sapphire's beacon only broadcasts 2-second blips every 60 seconds, which isn't quite enough time.
Use the beacon for health monitoring If you ever have trouble contacting your spacecraft, it would be very helpful for your satellite to be sending you unprompted health information. If you're using amateur packet, the easiest way to do this is to change the TNC beacon message. (Also, Hams all over the world will gladly point their radio rigs at your spacecraft and e-mail you the results. It's free global tracking!) The Opal team did this and they were grateful for it during the first few weeks after launch when they couldn't get a decent radio contact. We use CW beacon on Sapphire and it's much less useful.
Operate on the ground as you will in orbit This is something that we got right. We used the SSDL ground station for most of our pre-flight testing. Our first contacts after launch went smoothly in part because we had already made all of our communications mistakes before flight. Imagine you are having trouble contacting, but you don't know if it's an antenna fault, a radio misconfiguration, a TNC misconfiguration, or a problem with the satellite! (An oft-proven rule with Sapphire is that 90% of all communications errors are a problem with the ground equipment; in fact, until the April 2002 anomaly, there were no vehicle communications errors!)
Establish basic software capabilites and don't change them Software engineers hate "feature creep", where the users keep changing their minds about the software requirements. It is one of the leading causes of software errors -- because the programmers have to patch/adjust the architecture, often at the last minute. So agree upon a bare-bones functionality and allow that to be hard-coded and tested thoroughly -- and make this your 'boot code', the code that runs after a CPU reset.
Make it easy to change flight software Yes, this seems to be a contradiction to the previous lesson. But as the mission progresses and as the operators become familiar with the system, they will want to do new things. If you have a small and useable boot code, then you have more freedom to make adjustments. Trust me, you'll want to make adjustments; now that Sapphire is in orbit, there are some very basic capabilities we would love to have -- and we have no way to implement them.