Tuesday, January 26, 2016

BeagleBone DMA notes

I've been transferring data between the BeagleBone's PRUs and main memory.  If I use the PRU's SBBO instruction to store a range of PRU registers to main (DDR) memory, I find that I get around 600MB/s -- not bad!

But surprisingly, when I try to read that data back, the main CPU seems to go much slower.

I wrote some sample code for the main CPU to sum up all the bytes in a big (many MB) ordinary buffer.  I got ~300MB/sec.  Using the LDM instruction I got that up to 600MB/sec and in one case over 1GB/sec.  So in general the main CPU seems to have no trouble accessing main memory.

But when I run the same code on the buffer allocated by the uio_pruss kernel module, I get only about a tenth of that: 30MB/s, or closer to to 40MB/s when using LDM.

Kumar Abhishek from the BeagleLogic project helped me understand what was going on.  The uio_pruss module allocates that buffer using dma_alloc_coherent(), which is the standard way that linux kernel modules talk to DMA-based peripherals when they need to exchange smallish amounts of data quickly.  It tells the kernel that somebody else is going to be writing to main memory via DMA, so for this block of data, make sure we bypass the cache for every single memory access.

For larger blocks of data travelling in one direction, CPU -> peripheral or peripheral <- CPU, the Dynamic DMA mapping Guide describes that the standard approach is to kmalloc() the memory in the kernel module, then use functions like dma_map_single(), dma_unmap_single(), dma_sync_single_for_cpu(), dma_sync_single_for_device() to make sure entire buffers are safe for access by the peripheral or CPU.

That way, rather than every memory access having to bypass the cache, the kernel can make sure it's safe for the CPU to access everything in the block now that the peripheral is done with it, or vice versa.

Unfortunately, though, on the ARM A8 CPU in the beaglebone, making sure a buffer doesn't already have stale data in the cache (which can happen unexpectedly due to things like speculative preloading) requires the kernel to walk cache line by cache line through the entire buffer, taking longer than the memory transfer it's preparing for!

Kumar reports that he gets upward of 200MB/sec using this approach, dominated by the dma_* kernel calls.  I tried it myself with a simple kernel module and got a bit over 100MB/sec, so it seems plausible to me.

This thread, "dma_sync_single_for_cpu takes a really long time", is worth reading all the way through.

The only other way I can think of to get faster CPU access to big chunks of data from the PRUs is to tell the L1 and L2 caches to flush themselves, then access the data without calling the dma_sync_* functions at all.  The danger there is that it's very much tied to the specific CPU architecture and is very much not the recommended approach, so nobody's going to sympathize if you get corrupt data, and the only way to know if you've done it right is to try to test all the edge cases you can think of.

Monday, December 21, 2015

BeagleBone Access Point (and working around udhcpd)

Warning: it's easy to screw this stuff up and lose the ability to ssh into the beaglebone when you reboot.  Usually it was as simple as manually setting the IP address on my laptop, but you may not be so lucky.  You may not want to attempt this if you don't have a good handle on TCP/IP networking.

My Keebox W150NU seems to be doing a good job with a BeagleBone black as a wifi access point.  (I get about 3MB/s beaglebone -> laptop).  Beware that lots of other adapters (eg., Edimax and D-Link) work really poorly or not at all with the BeagleBone.

With a newer BeagleBone green, the W150NU was recognized out of the box, but on an older BBB with another adapter I had to update the kernel first:
Update kernel if your wifi adapter isn't detected (or if you just want to be up to date):
First I did 'sudo apt-get update ; sudo apt-get dist-upgrade'
Then I upgraded the kernel so it'd recognize the usb wifi adapter:
'cd /opt/scripts/tools ; git pull ; ./update_kernel.sh' 
On BeagleBone Green they tweaked a file to say "BBG" instead of "BBB", so I had to revert it with: 'cd /opt/scripts ; git checkout tools/eMMC/init-eMMC-flasher-v3.sh' then 'git pull' again before I could run the 'update_kernel.sh' script.
Rebooting, the W150NU appeared as wifi2.

Next, I followed the instructions here to set up hostapd.  

First, 'sudo apt-get install dnsmasq hostapd'

Here's my /etc/hostapd/hostapd.conf (beware leading and trailing spaces, or hostapd.conf will refuse to start):

### Wireless network name ###
interface=wlan0
#
### Set your bridge name ###
#bridge=br0

#driver
driver=nl80211

country_code=US

ssid=beaglebone

channel=7

hw_mode=g

# # Static WPA2 key configuration
# #1=wpa1, 2=wpa2, 3=both
wpa=2

wpa_passphrase=yourpassword

## Key management algorithms ##
wpa_key_mgmt=WPA-PSK
#
## Set cipher suites (encryption algorithms) ##
## TKIP = Temporal Key Integrity Protocol
## CCMP = AES in Counter mode with CBC-MAC
wpa_pairwise=TKIP
#rsn_pairwise=CCMP
#
## Shared Key Authentication ##
auth_algs=1
## Accept all MAC address ###
macaddr_acl=0
#enables/disables broadcasting the ssid
ignore_broadcast_ssid=0
# Needed for Windows clients
eapol_key_index_workaround=0

And don't forget to set this in /etc/defaults/hostapd:

DAEMON_CONF="/etc/hostapd/hostapd.conf"
I couldn't get dnsmasq or isc-dhcp-server to work consistently, though.  Turns out that 'netstat -nlp' showed udhcpd was binding to 0.0.0.0 on port 67 (which is a bug, since it ignores the "interface" option), so the other dhcp servers can't start.

Hint: /var/log/daemon.log is where a lot of the error messages show up.

I fixed that with 'mv /usr/sbin/udhcpd /usr/sbin/udhcpd.disabled', although it would probably have been better to 'apt-get purge udhcpd'.

Here's my /etc/dnsmasq.conf:

interface=usb0
dhcp-range=192.168.7.1,192.168.7.1,4h

interface=wlan0

dhcp-range=192.168.4.2,192.168.4.10,4h

And I also added this to /etc/network/interfaces:
auto wlan0
iface wlan0 inet static
    address 192.168.4.1
    netmask 255.255.255.0
    network 192.168.4.0
    gateway 192.168.4.1

That seems to do it, except that I have to "ifup wlan0" after startup on my BeagleBone Green.  The Black doesn't seem to need that for some reason I haven't figured out yet.

Getting BeagleBone to recognize wifi adapters by upgrading the kernel

My beaglebone black wasn't recognizing my wifi adapter.  apt-get update ; apt-get dist-upgrade didn't help, and I noticed that it wasn't upgrading the kernel.

Looks like the way to get kernel updates is to use /opt/scripts/tools/update_kernel.sh.  When I first tried it, I got errors like "The certificate of `rcn-ee.net' is not trusted".

So the first step was to "git pull" down the latest version of the update_kernel.sh script, then run it.  Upon reboot, it recognized the wifi adapter.

Also note that beaglebone doesn't always do USB hotplug right, so I made sure to reboot after plugging in the adapter.

Also, even after updating the kernel, my Edimax and D-Link adapters show up but won't associate to an access point.  The Keebox W150NU seems to be working well, though.

Wednesday, December 16, 2015

BeagleBone Black/Green bus speeds

USB Host (big type A jack): 20MB/s writing to a Seagate USB3 2TB portable (spinning) hard disk (required plugging a 5V 4A power supply into the BeagleBone Black's power jack).  On BeagleBone Green, I got corruption with the Seagate disk, even when I powered the board from a bench supply.  With this Samsung 64GB USB flash drive I get 14-18MB/s write on both BeagleBone Green and Black.

Disk: 4.3MB/s writing to onboard flash, and 7.1MB/s to a SanDisk Ultra 64GB microSD card.  On BeagleBone Green, I get 9.4MB/s to onboard flash, and 6.8MB/s to the same SanDisk microSD card.

Network: Using netcat with the USB ethernet interface, I get 7.6MB/s upstream (to my laptop).  With the 100baseT jack I get 11.2MB/s upstream.  If I use ssh with its default cipher, I get about 10MB/s, but that goes back up to 11.1MB/s if I use "-c arcfour".

Compression: gzip -1 gives me 4.1MB/s on text generated by "cat /dev/urandom data | od -x".  I tried lz4 as well and it was almost exactly the same speed.

BeagleBone Black and Green microSD and onboard flash performance

Looks like I get about 4.3MB/sec when writing to the onboard flash on my BeagleBone Black, and about 7.1MB/sec when writing to a 64GB Sandisk Ultra 64GB microSD card.

On BeagleBone Green, I get 9.4MB/s to onboard flash, and 6.8MB/s to the same SanDisk microSD card.

I used this command to test:
$ time ( dd of=foo if=/dev/zero bs=1M count=100 ; sync )

Ignored dd's report, and divided 100 / elapsed time as reported by the time command.

Sunday, September 27, 2015

Closeups of a LED printer head

I picked up some printer heads for an Okidata LED printer and checked them out under the microscope.  

This ebay auction shows what the complete head looks like.  An LED printer is basically a laser printer, except that instead of scanning a laser beam across the page to make the toner stick to the page, an array of LEDs does the work.

Here's the lens assembly and LED array removed from the housing:


The lens assembly has two staggered rows of lenslets.  The head has some sort of tilt arrangement that I suspect they use to vibrate the lens assembly back and forth and then power the LEDs when the lenses are in the desired position.  (But don't quote me on that).


Putting the LED array under the microscope, we can see where the PCB is wire bonded to the LED driver circuitry.  Normally wirebonding is used inside a chip to go from the wafer to the pins, and then the whole chip is sealed up in plastic or ceramic.  But here the tiny gold wires are exposed, making them very easy to damage (which I did when removing the board from the head).



Below is a closeup.  The wires at the top are all going to a common trace on the upper part of the PCB.  The wires at the bottom are address/control lines going to the green/purple wafers.  At first I thought this was the LED array, but it's just the control circuitry.  That wafer is then wirebonded to the actual LED array, which just looks like a black line with dark gray squares between the top and middle rows of wires.

So you can see they had to run a wire for each and every LED in the array, and they're too densely packed to be able to run the wires to pads on the PCB, so instead they go wafer to wafer.  Then they just need to run control lines out to the PCB so it can tell the control wafer which LEDs to turn on.







Wednesday, September 16, 2015

BeagleBone maximum PWM frequency

Using a PWM channel to get square waves (don't care about duty cycle) from my BeagleBone, looks like I can get up to 50MHz with:

root@beaglebone:~# echo 10 > /sys/devices/ocp.3/pwm_test_P9_14.12/period
root@beaglebone:~# echo 5 > /sys/devices/ocp.3/pwm_test_P9_14.12/duty