August 23 [updated: 08/28]

This week I did lots more testing and debugging, and gave my final presentation to the group!

Ethernet vs Serial

The supposedly “different” issues I identified last week (Arduino data looks like garbage without the instrumentation amplifier; the Ethernet and Serial data don’t match) ended up being caused by the same problem.

Last week I thought the crappy data was caused by sampling too fast — it turns out that was not the case! I was rather relieved to find that the entire project wasn’t a failure… Phew!

I decided to test various variables to see what was the cause. My first thought was perhaps packets were being dropped, which could explain the blips in the Ethernet data vs Serial. While experimenting, I saw things like this:


I was packing the voltage readings into UDP packets that were 8KB in size, which is permissible. The MTU for Ethernet, though, is around 1500 bytes. My packets were getting fragmented into smaller chunks. Seems reasonable… yet the fragmented packets did not have the correct flags set when examined with Wireshark. It turns out that the W5100 chip does not support IP fragmentation. It’s even listed on the datasheet  (page 5) as a feature. Awesome.

I verified this was the problem by instead making UDP packets that were around 1000 bytes in size. This data looked fine!

[ Who wins the bet, then? 🙂 ]


So… given that I had to limit my UDP packets to be < 1500 bytes, how much would the overall sampling rate and throughput suffer? The answer surprised me: not at all. I decreased the UDP packet size to around 1200 bytes, benchmarked, and got the same speed as before.

This means that now I could redo the sine wave test and get data that didn’t look like garbage. It also made me think about the DAQ vs Arduino test I did a few weeks ago — if the same thing was happening then, and I don’t see why it wouldn’t be, redoing this test would also improve the Arduino results. Unfortunately, Shane had already packed up the computer I tested this on, so it was impossible to redo this test. ):


I worked on the slides for my final presentation and presented on Wednesday over Google Hangout.

Sine wave test

The goal of this test was to more thoroughly compare the Arduino data to the DAQ. I would generate a sine wave on the signal generator, match the Arduino and DAQ’s sampling rates as before, and capture with both at once.

I would then strip the DAQ’s data of 6 bits of precision — the DAQ has a 16-bit ADC and the Arduino’s is 10-bit. I would compute an initial alignment and shift one of the traces over so they were aligned. The final step would be to align each 1-second chunk of data and compute the sum of squared differences.

Here is a chart of the results. The units for the third column are 1/35635 of a second.

Chunk number SSD Shifted by
1 63.702 1
2 59.046 1
3 55.146 1
4 53.039 1
5 50.901 1
6 50.044 1
7 50.077 22
8 49.73 50
9 50.164 77
10 49.808 107
11 49.501 135
273 32.192 8019
274 31.697 8045
275 32.039 8079
276 31.567 8105
277 31.715 8137
278 31.117 8164

Here is the last aligned segment, calculated by my code’s reported offset, to verify that my code worked (green = arduino, blue = daq. Unfortunately, it seems that the Arduino sine wave has a slightly larger amplitude):

Last chunk

Some observations about the data:

– The SSD generally decreases for each subsequent chunk, and the offset increases. A possible explanation is that because of the larger amplitude in the Arduino sine wave, they are initially aligned well but become less and less aligned as time goes by, and must be shifted more in order to be aligned again.

– When I shift one of the traces to try aligning it, I use circshift, since there doesn’t seem to be a non-circular shift in Matlab. I must delete the data that “wrapped around” back to the front of the time series. Thus, the larger the shift, the more data is cut off, and the less data there is to compute SSD with.  This may explain why the SSD decreases for each chunk.

– What does this really tell us about how similar they are? I’m not sure how to evaluate SSD… The highest SSD I got was around 60. Is that good? I’m not sure. One thing I learned is that for some reason, the Arduino sine wave has a larger amplitude. This is something to investigate further.

Arduino vs DAQ test: redo

I couldn’t completely redo this test, even though I would have liked to, given that I fixed the issue with the Ethernet sending code.

I could, however, strip the DAQ data of 6 bits of precision like I did for the sine wave test, and then re-run the data through Weka.  [updated] Here are the results:

DAQ, 10-bit

BayesNet 3NN RandomForest
Accuracy: 95.91% 97.49% 97.52%
Precision: 98.00% 98.00% 98.00%
Recall: 94.00% 97.00% 97.00%

DAQ, 16-bit

BayesNet 3NN RandomForest
Accuracy: 95.89% 97.59% 97.60%
Precision: 98.00% 98.00% 98.00%
Recall: 94.00% 97.00% 97.00%

There doesn’t seem to be much difference between these two sets of results. This leads me to believe that the differences in performance between the DAQ and the Arduino are rather as a result of 1. the packet fragmentation bug I fixed, 2. noise from the instrumentation amplifier.

The end?
I guess this is the end of my summer at SPQR. Thanks to everyone for making this such a memorable experience 🙂 I had a lot of fun!


August 16

This week I was at HealthTech and I did some more testing of my code.


HealthTech was really great! I have never been to a conference or a workshop before, and it was really inspiring (but also intimidating) to be around so many smart people.

HealthTech '13!

I made some notes as I listened to the presenters.

– 50% of people surveyed about their habits of sharing medical information on social networks thought that sharing medical record data with the entire social network is a “extreme risk”. I would expect this percentage to be much higher! (paper)

– It costs between $1,500 and $15,000 to re-image a medical device. There really is no way to currently assess if a device needs to be re-imaged, e.g. is this malware infection actually dangerous or not? Would it be worse to take the device offline for some amount of time?

– Existing health IT systems are rigid and do not allow for using alternatives or equivalent actions (e.g. a patient is prescribed 1x 10mg of some medication a day, the pharmacy only has 5mg pills). (paper)

Poster session

Before the posters were put upWattsUpDoc

I talked with approx. 10 people at the poster session. There were some useful suggestions:

– Look into using the board used in the “Skynet” paper from a few years ago, at WOOT

– Possible contacts with access to medical devices for testing

RIP Teensyduino

Shane and I attempted to do the same test we did last week with the Arduino on the Teensy — collect data with the DAQ and the Teensy at the same time. Unfortunately, Teensy was only giving us ADC values between 1 and 3, no matter what we did. Even touching it or rotating it (with nothing connected to the inputs) did not change these values. Doing this before caused various observable fluctuations in the value because of the floating input.

We concluded that we must have fried the ADC somehow. This means we can’t test the data from the Teensy ): unless we buy another one…

More Arduino testing

We decided to do a more thorough test of the Arduino. The plan was to generate a sine wave with the signal generator and collect it with the DAQ and the Arduino at the same time. I would then align these and compute the sum of squared differences for each 1-second chunk to get a measure of how different the traces are.

This time, we connected the signal generator to the Arduino without the instrumentation amplifier, which we hadn’t done before. The results were extremely disappointing: the noise that the signal generator was introducing completely masked the fact that the Arduino was returning really crappy data! Instead of a smooth sine wave I observed really jagged, sawtooth-like edges. [image soon]

Debugging mode engaged, I tried changing various variables to see what was causing this. I finally discovered the reason: I was sampling it too fast. When I included use of the serial port whilst sampling, the result was a smooth sine wave.

I can’t express how glad I am that I failed to accomplish the primary goal of this project!ugh

Arduino: Serial vs Ethernet

I did another test to make sure that the data I was sending over Ethernet was correct. I sent the data over serial and over Ethernet and then compared the two. There were some blips in the Ethernet data — perhaps lost packets? I’ll have to continue to investigate next week.

Arduino IDE

As a side note… the Arduino IDE is very picky about filenames. Filenames with dashes are not allowed… I figured this out the hard way.


I also started working on my slides for my final presentation.

August 9

This week I tested how the Arduino data compares to the DAQ data, and did some other miscellaneous things.

DAQ vs Arduino

I gathered power traces of a desktop computer sitting idle and running the ramnit malware. Yes, it’s not a medical device, but it’s simple to collect traces of this machine (it has Deep Freeze installed on it, so rebooting restores it to a clean state) and it provides a useful test case.

I gathered these at the same time with the DAQ and the Arduino, so they were collecting the same data. I also lowered the DAQ’s sampling rate to match the Arduino’s — 35kHz. Qualitatively, the traces looked similar.


The next step was to parse these traces: split them into 5-second chunks, do feature extraction, and then build arff files for use with Weka. This was something I did regularly a few months ago, but I didn’t remember much. Figuring out which scripts to use and fighting with some assumptions that had been coded in took some time.


I tested with the Bayes Network, 3NN, and RandomForest classifiers. Here are the results:


BayesNet 3NN RandomForest
Accuracy: 95.89% 97.59% 97.60%
Precision: 98.00% 98.00% 98.00%
Recall: 94.00% 97.00% 97.00%


BayesNet 3NN RandomForest
Accuracy: 87.68% 90.08% 90.25%
Precision: 90.00% 91.00% 90.00%
Recall: 85.00% 89.00% 90.00%

The slight drops all across the board are to be expected. These numbers are actually fairly good — I have to say I was pleasantly surprised! The Teensy should perform even better because of its higher sampling rate.

Teensy vs DAQ

Shane and I had issues hooking up the Teensy and were getting a lot of garbage data from it. The reason for this is unclear — it was working before, and we didn’t change much! Shane concluded that the instrumentation amplifier might be fried, so we ordered some more.


I printed my poster and got a poster tube for it. The colors turned out differently than they looked on the screen — they were a little too saturated.


I’ve been slowly fixing up the code in my repo and making it cleaner and nicer. I fixed some bugs in my reading-from-serial-port ruby code — apparently the ruby serialport gem is pretty picky. I had to set a timeout for the serialport to be able to receive binary data.

August 2

This week I worked on my poster and did some additional testing of the Arduino and Teensy.

Instrumentation amplifier

Shane ordered some more instrumentation amplifiers, to make sure that the ones we were testing with weren’t simply defective. Unfortunately, we got the same result as before when we tested with the new ones.

Comparing data to DAQ

On Friday Shane and I got some real data with the Arduino and the DAQ. We collected traces of a desktop computer sitting idle and running ramnit malware, and collected with both the Arduino and the DAQ at the same time.

On Monday I will compare the quality of the DAQ data to the Arduino data. I’ll also try running the Arduino data through Weka and seeing if it can differentiate between the malware and the normal behavior.

I spent a lot of time working on my poster this week — and also a lot of time staring at it and panicking. Figuring out how to organize the text and images while minimizing whitespace was tougher than I had imagined.

Finally, I showed my poster to the other members of the lab, and we made some more changes. I also learned about the proper usage of different kinds of dashes.

We also went briefly to the Umass REU poster session, so I got to see some other posters made by undergraduates. This was a necessary albeit nerve wracking experience, given that I had never been to a poster session before. I did not end up working up the courage to talk to anyone there, which I regret.

This coming week I will need to practice, practice, practice…

July 26

This week I slightly modified my code and tried it on a Teensyduino. I also discovered some things about the Arduino and the instrumentation amplifier.


Shane decided to buy a Teensyduino, because it supposedly supports serial communication at a rate of 12 Mbit/sec. That would greatly simplify this project — we could just send the power data over the serial port, and hopefully less optimizing would be required to achieve this speed.

I experimented a bit, and then slightly modified my sampling and sending code for the Teensy. The ADC sampling part was exactly the same, but I had to write the data to the serial port instead. Writing 128 bytes at a time, I got a total throughput of 71 kHz. Not bad!

The data needs to be unpacked on the other side, and I wrote a ruby script to do that. It’s not very fast, but the speed can definitely be improved.

Open source…

Unfortunately, Teensyduino is not an open source project, and thus does not really fulfill the design goals for this project. I will still include the code for sampling with the Teensy in my git repo.

It is also worth mentioning that the Ethernet transfer is probably more useful in the general case.


I asked Shane to look at my sampling and sending code using interrupts, since I felt it was not coded very well. I refactored it: now the Arduino goes to sleep in ADC noise reduction mode — a mode specifically for improving resolution of ADC readings — when it is not processing an interrupt. This should improve data quality and potentially Ethernet throughput as well.

Now I get an overall throughput of 35 kHz (as opposed to the 39 I reported last week, but I am more confident in this result now that my code has been improved).

Signal generator

It was time to start finishing the coding portion of the project and start doing some final tests. We decided to hook up both the Teensyduino and the Arduino to the signal generator once again to test my code.

We initially got some garbage results, and it took some troubleshooting to figure out what was wrong. We discovered the following:

Arduino reference voltage: 2.45 V, as opposed to 5V as the datasheet claims, even when using an external power supply (i.e. not powering it via USB).

Instrumentation amplifier: This piece of hardware isn’t actually correct for what we want to do.  It appears to be zeroing the negative voltage values in a really ugly way. What we really want is to maintain the same structure, just “translate” all of the values so that they become positive.


I’ve been working on my poster and I have a draft ready. It needs more images, but I don’t know what else to put it on it!

July 19

This week I sped up the Ethernet throughput on the Arduino by making some additional modifications. I also tried sampling and sending at the same time.

Ethernet throughput

Last week, the maximum Ethernet throughput I achieved was ~7kHz. This week I managed to increase that by another factor of 10.

send_data_processing_offset: This is the function that I ended up calling to write to the TX buffer. Unfortunately, I overlooked the fact that this function takes void pointer as an argument for the data to write, as well as the data length. So, it is possible to give it a pointer to an array, as well as the array length, and write more than 1 byte at a time.

Figuring out the maximum permissible size of this array took some testing: I had to keep in mind the SRAM limit on the Arduino (2KB). I ended up with a maximum size of 1400 bytes.

SPI clock divider: I was not aware that the SPI controller — the underlying hardware doing all of the writing to the TX buffer — clock rate could be manipulated. By default, this is set  to 4 MHz. Shane explained to me that the SPI clock is set with respect to the hardware it is communicating with, and that if I increased the SPI clock speed, the Wiz5100 chip (Ethernet) may not be able to keep up.

I did some digging in the Wiz5100 datasheet to figure out what the maximum speed it can handle is. Page 66 describes the SPI timing. The slowest action takes 70ns. Globally, the slowest action is the read cycle time, which takes 80ns. This means the chip has a clock rate of ~12.5 MHz. Decreasing the SPI clock divider would result in a SPI clock of 8 MHz, so there should be no problems.

With these improvements in place, I increased the Ethernet throughput to 66 kHz!

Given that we can sample the ADC at 77 kHz, now was the time to try putting the pieces together.

Sampling and sending

I wrote some code to sample the ADC as well as send data over Ethernet. I save ADC readings to an array of size 1400, as mentioned before. When this is full, I write the data to the TX buffer. Only when the TX buffer is full is a packet actually sent. Unfortunately, there was a speed decrease to about 30 kHz.

I also made use of my interrupt code for sampling the ADC, hoping that it would be useful now. When paired with the sending code, the overall throughput was 39 kHz, a slight improvement.


I have also started planning out my poster. Looking at the examples, I see that there is not a lot of text. I think my primary challenge will be deciding what is critical to include and what is not, and making sure there is not too much text on my poster.

July 12

This week I worked on speeding up the Ethernet throughput for the Arduino.

Signal generator

The instrumentation amplifiers arrived, so me and Shane hooked up the Arduino to the signal generator. We confirmed the correct output with the DAQ, and then measured with the Arduino using both single-conversion and free-running modes. The output was nearly identical, indicating that my sampling code is correct!

I also tested my fast sampling code that samples at a rate of 77 kHz, and this also looked nearly identical. This indicates that speeding up the ADC clock to 1 MHz does not significantly affect the output.

Graphs using sine wave on signal generator (it looks like the upper half of a sine wave, Arduino can’t see negative values), voltage over time

Arduino function analogRead(), sampling rate 9 kHz:

Free-running mode, sampling rate 77 kHz:


Using the built-in Arduino function println — send a message to all connected clients — I got a sampling rate of 1 kHz last week. I investigated, and discovered that println sends a TCP packet whenever it is called. Clearly, this is wasteful.


Using UDP: The Arduino UDP library allows for buffering data into a transmit buffer and actually sending the packet whenever you want, by using the beginPacket(), write(), and endPacket() functions.

Transmit buffer size: The W5100 Ethernet chip provides a default transmit and receive buffer size of 2KB for 4 sockets, = 16KB total. Writing to the buffer should be faster than sending data over Ethernet, so increasing this buffer size should increase performance. The maximum buffer size possible is 8KB, thus using only one socket. It may be possible to use the 8KB receiving buffer for transmitting as well, with some hacking.

UDP Packet max size: I set the UDP packet size to be equal to the transmit buffer size, because it was initially a fraction of the size. The goal was to only send a packet when the transmit buffer is full of data.

UDP library, under the hood: From my previous experience with the Arduino library, I know that some of the “nice” functions are pretty inefficient. I followed the trail of breadcrumbs from the UDP write function, and ended up at a much lower level function that interfaces directly with the W5100 chip: send_data_processing_offset. One level lower, the W5100 chip is interfacing directly with the SPI controller. This was about as low as I could go. However, it’s unclear whether the SPI function is synchronous or not, so I stayed with send_data_processing_offset to avoid problems.

Using this function required me to make some other modifications to the UDP library — I had to make the private socket object public.


When using all of my improvements, I managed to speed up the sampling rate from 1 kHz to 7.3 kHz. This is still too slow. I will talk with Shane on Monday about where to go from here.