 |
Advice for Integrating Satellites[updated 5/13/2002]
The following is a list of the foibles and missteps we took while integrating
Sapphire for the shake test. We are being up front about our mistakes in
the hopes that others will read and believe us and avoid the same sorts
of errors.
After all, we thought we were smart students and didn't need all the
overbearing documentation that makes industry satellites so slow and expensive
(Dinosaurs! Dinosaurs!). We didn't need other people telling us
how to run our project. Here's where we could have used their advice:
| Recommendation |
Our "Lesson" |
| Get Spares of Everything |
This is a general rule; I've never known our project to have "too many"
of anything. Extra tools always come in handy if you want several people
to dismantle the side panels. Extra fasteners mean you're not scrounging
around. Extra L-brackets means that when one breaks (see below), then you're
not worried that you're going to have to make more. |
| Check every fastener and hole for fit and proper size/number of items |
-
We had to build a second set of side panels for the shake test, because
our solar panels were not finished. Unfortunately, they were a bit of a
rush job and the holes did not line up exactly. I personally stripped and
popped out a good half-dozen pem nuts from our flight structure in my attempts
to tighten down the panels. When it's 9pm and the shake test is at 9am,
and you start breaking flight hardware, let me tell you, the lab is not
a fun place to be.
-
The second happy story concerns our CPU box, which had to be filed out
a bit because its fasteners couldn't line up with the tray. Of course,
we caught that problem at 7pm...
|
| Drill all holes at the same time - especially if they're supposed to
line up |
This traumatic event happened to our solar cells, which
were bonded to the panels and soldered together before we noticed that
one of the holes was missing. On top of that, none of the holes were aligned
with the top honeycomb panel - they were all fractions of an inch off.
Two students, two files, one afternoon. Not a lot of fun. |
| Test the expected telemetry values - do this as soon as the electronics
are wired up |
-
Four months after establishing the telemetry list and reading the values,
somebody finally realized that the THD sensors (our main payload) were
misnumbered in their outputs. (Why? Well, the sensors don't do much that's
interesting in the lab, so we didn't pay close attention to the signals.
Moreover, we hadn't sat down with the PI and defined the expected telemetry
output.)
-
Secondly, we realized six months into the functional flight hardware
tests that nobody had checked some of the temperature sensors (they hadn't
been integrated - though the signal conditioning electronics were on board).
A miswiring had damaged one of the boards and we didn't catch it until
after shake. (That has since been fixed, but it was a bit of a panic trying
to figure out what went wrong.)
-
What this means is that even if the proper hardware is not in place, wire
together some sort of kluge to verify that what you DO have is working
properly. That way, you can keep an eye on it and save trouble when the
full integrations comes around.
|
| Prominently post telemetry values and/or go over them with the pertinent
people |
This relates to how long it took for us to realize that
the THDs were mistakenly labeled. In addition, the expected values are
essential for troubleshooting problems, as the next example illustrates. |
| Start a binder with sample telemetry from every major test or reconfiguration |
Not only did the THDs have the wrong outputs labeled, but
somewhere between September 1995 and March 1996, one of the sensors broke.
(Yes, this is our primary payload. Thanks for bringing that up.) We had
gone through three thermal cycles and two vacuum test in that time, but
hadn't really checked the telemetry for long-term variations. (Why? Because,
again, we hadn't sat down with the PI to get expected values.) And when
we had taken data, we hadn't kept it, which means now we don't know when/how
the sensor broke, so we'll have to do another vacuum test to make
sure the sensors will work. |
| Double check interior space before adding components |
A temperature sensor was epoxied onto the CPU in such an
orientation that we couldn't close the box. We had to (carefully) scrape
off the epoxy and move the sensor. |
| Have somebody else double-check your wiring connections |
One of the wires that ran telemetry data from the Power
tray to the CPU was omitted during our construction. Since we didn't have
a list of expected values, and hadn't done a detailed telemetry analysis
(are you detecting a theme?), this mistake wasn't caught for six months. |
| Confirm that the space-rated chips fit into the sockets you've created
for them - this applies to all components, actually |
We have had months of redesigns (and then a very
time-consuming soldering job - thank you, Godwin Zhang!!!) because the
space-rated chips donated to us weren't in the package that we expected. |
| Queue up jobs so that you can use expensive materials at one sitting. |
We have used three bags of thermal epoxy for about 1/4
bag's worth of epoxying because the parts weren't ready at the same time.
We've wasted probably 2/3 of our conformal coat for the same reasons. I
could tell similar stories about our other epoxies and RTV. |
| Finish whole boxes, whole boards |
For our CPU, TNC and telemetry boards, we had all but one
or two chips conditioned for flight (back to getting those space-rated
chips to fit). It is not an easy task to just "go back" and touch
up the board once they're ready. We thought it would be; we were wrong.
What happens is that you stall and wait for that piece to be done - or
you let other parts of the project slide becaue you've got that piece of
unfinished business. Closing up boxes and locking down trays builds excitement
and momentum that is easily killed by waiting for "just one or two" parts. |
| Discretely group signals and connections coming off the tray |
The simplest and often most effective debugging option
is to check the pins coming off the connector. If you've got power and
signals on the same connector with everything gooped up, it takes work
to both power the component and check the output. Specifically, the temperature
sensor on the Comm tray wasn't doing what was expected of it. I wanted
to stick a voltmeter on the pins coming out, but power for the sensor was
supplied through the same connector. I ended up having to scrape some of
the RTV away to find contact points, which means I had to remember to go
back and add RTV. (On our other trays, this is not an issue, since the
power harness has a separate connector.) |
| A functional engineering model (with the same wiring), a CPU debug
port, a power supply that measures current, and a scanner are all invaluable
diagnostic tools |
I cannot begin to describe how much time these items save,
but I'll try. So many times we have swapped a questionable component onto
our engineering model and been able to quickly diagnose a problem. So many
times I have directly connected to the CPU to confirm whether the code
was working - often it was shown to be a TNC connection problem. So many
times I have been able to look at the current supplied to the satellite
to confirm whether or not the camera is turning on or if the satellite
has reset or (unfortunately) if there's a short. With the radio frequency
scanner, we can listen in to the transmitter packets - and could have realized
that we had set up our ground station incorrectly, corrected it, and continued
the thermal cycle test. Instead, we went home, discovered the problem,
and had to go back the next week to try again. |
| Build with a modular architecture concept |
The ease with which I can pull out the TNC, which is acting weirdly
again, and put in our engineering model's TNC, which I can already confirm
is working properly, to isolate problems is a tremonedous troubleshooting
tool. It saves time, every time. On top of that, we can go from separate
boxes on the benchtop to the fully assembled satellite in about thirty
minutes. |
Lessons Learned, Part Two: Operational Experience
And if that list weren't enough, we made more mistakes on-orbit!
| Recommendation |
Our "Lesson" |
| Have automatic, timely reboots of key components |
This lesson was painfully learned in April/May 2002. Sapphire is talking (beacon is broadcasting, which
indicates that most of the vehicle is working), but he isn't listening. We don't know if it's a problem with
the TNC or the receiver or the receive antenna. And we have no way of knowing until the spacecraft reboots -- which won't
happen until the software gets enough bit flips to crash (could be 1 week, could be 8 months). If there are components
that are left on all the time, you should set some rules to reboot them. Maybe every 24 hours after the last contact,
whatever works for your mission. |
| Have a Beacon |
It takes days to weeks for the 'public' databases (NORAD/Goddard) to get a
good, stable set of orbital elements for a new satellite. (For the first four weeks of flight,
NORAD's orbital elements for Sapphire were actually for a different spacecraft.) Unless you want
to wait a few days to get good contacts, you need some way for your satellite to say "here I am"!
A low-power CW beacon works very well; you can manually zero your antenna in on the source. Sapphire's
beacon only broadcasts 2-second blips every 60 seconds, which isn't quite enough time. |
| Use the beacon for health monitoring |
If you ever have trouble contacting your spacecraft, it would be very helpful for your satellite
to be sending you unprompted health information. If you're using amateur packet, the easiest way to
do this is to change the TNC beacon message. (Also, Hams all over the world will gladly point their radio
rigs at your spacecraft and e-mail you the results. It's free global tracking!) The Opal team did this and they
were grateful for it during the first few weeks after launch when they couldn't get a decent radio contact.
We use CW beacon on Sapphire and it's much less useful. |
| Operate on the ground as you will in orbit |
This is something that we got right. We used the SSDL ground station for most of
our pre-flight testing. Our first contacts after launch went smoothly in part because
we had already made all of our communications mistakes before flight. Imagine you are
having trouble contacting, but you don't know if it's an antenna fault, a radio misconfiguration,
a TNC misconfiguration, or a problem with the satellite! (An oft-proven rule with Sapphire
is that 90% of all communications errors are a problem with the ground equipment; in fact,
until the April 2002 anomaly, there were no vehicle communications errors!) |
| Establish basic software capabilites and don't change them |
Software engineers hate "feature creep", where the users keep changing
their minds about the software requirements. It is one of the leading causes of
software errors -- because the programmers have to patch/adjust the architecture, often
at the last minute. So agree upon a bare-bones functionality and allow that to be
hard-coded and tested thoroughly -- and make this your 'boot code', the code that runs
after a CPU reset.
|
| Make it easy to change flight software |
Yes, this seems to be a contradiction to the previous lesson. But as the mission
progresses and as the operators become familiar with the system, they will want to do
new things. If you have a small and useable boot code, then you have more freedom
to make adjustments. Trust me, you'll want to make adjustments; now that Sapphire is
in orbit, there are some very basic capabilities we would love to have -- and we have
no way to implement them. |
|