Tuesday, October 23, 2012

Importing Excel data in R with XLConnect

I recently had to migrate to the 64-bit version of R due to the 4 GB memory limit imposed by the 32-bit version.  In doing so, the package I was using for importing Excel files (xlsReadWrite) was rendered incompatible.  Thus, I've moved over to XLConnect: http://cran.r-project.org/package=XLConnect

XLConnect looks very nice: it is fully featured, and is implemented as objects.  It uses a Java library to do the Excel imports, apparently.  This would seem like a detail, but it ends up adding a very important wrinkle.  According to the developer, Java's "ToString" function adds a ".0" to any values that look numeric.  So, even if you tell XLConnect's readWorksheet function to read the column in as a character data type (using the colTypes argument), it will add a ".0" to every value that looks to Java like a number.

Here's an example.  Let's make an excel document called XLConnect_test.xlsx:



Now, let's try to import it via XLConnect:
> require(XLConnect)
> xlc_df = readWorksheetFromFile(file="XLConnect_test.xlsx", sheet=1)
> xlc_df

    Col1 Col2
1    ABC  1.0
2  123.0  2.0
3 123ABC  ABC

> xlc_df = readWorksheetFromFile(file="XLConnect_test.xlsx", sheet=1, colTypes=c("character", "character"))
> xlc_df

    Col1 Col2
1    ABC  1.0
2  123.0  2.0
3 123ABC  ABC
Notice how XLConnect adds ".0" to each value, even though those columns are supposed to be character strings?  That's not good.  The nicely-communicative developer says that they're working on this for a future version.  In the meantime, here's the hack I've written so the rest of us can move forward:
xlc_dot_zero_hack <- function(df) {
    # looks for all numerics that had a ".0" added to them by XLConnect, and removes the ".0"
        return(
            as.data.frame( lapply( df, FUN = function(x) sub("^(\\d*)\\.0$", "\\1", x, perl=TRUE) ) )
        )
}
Just import your Excel data, and then run the function above on your imported dataframe.  Don't forget to reassign the result to a dataframe.  In our example, we would write:
> xlc_df = xlc_dot_zero_hack(xlc_df)
> xlc_df

    Col1 Col2
1    ABC    1
2    123    2
3 123ABC  ABC
VoilĂ !  Note that if your data has any strings that *actually* have a ".0" at the end, this hack will have unexpected results.  Specifically, it will remove *all* ".0"s that come after a string of digits.

I hope this might help others out there with the same conundrum!


Wednesday, September 5, 2012

Cross-References don't show up in Word's dialog box?

I was having an issue with a manuscript I'm putting together recently.  I was passing it back and forth with my advisor, and I was using Track Changes.  As I added some figures with captions, I then wanted to refer to those figures in the text via a cross-reference.

When I clicked on Insert -> Cross-Reference in Word 2010, the recently-inserted captions weren't in the list!



Ugh, *another* Word screwup with complex documents.

Well, after a lost day of screwing around with the problem (and sleeping on it), I came across the solution.

The problem seems to be that Word doesn't like you to insert your figures and captions as a "tracked change".  I went back and "accepted" all my insertions of figures and captions.  And, voila!  The figures now show up in the cross-reference dialog box.  Now, whenever I insert a new figure, I'll either turn off track changes for the moment, or just go back and accept the insertion.

Hope this helped somebody else!

Wednesday, May 9, 2012

pushd / popd for R

Hi Folks,

I haven't gotten around to putting my R code in nice objects yet. So, my code is procedural, and it's nasty. It sources one script to the next, processing data, analyzing data, etc. As the scripts get run around the codebase, they often change their working directory. So, I wanted access to a bash-equivalent pushd and popd. I'll run these at the beginning and end of each script.

These are not battle-tested yet, and there's really no error-checking or redundancy in them. So, use with care, and they'll probably evolve over time. Feel free to add to them in the comments!

Enjoy!


#  push getwd() strings into a FIFO vector

dir_fifo = c()

pushd <- function(cd) {
    # usage: pushd("directory to change to")
    dir_fifo = append(dir_fifo, getwd(), 0)
    assign("dir_fifo", dir_fifo, envir = .GlobalEnv)
    setwd(cd)
}

popd <- function() {
    # usage: popd()
    setwd(dir_fifo[1])
    dir_fifo = dir_fifo[-1]
    assign("dir_fifo", dir_fifo, envir = .GlobalEnv)
}

Friday, March 30, 2012

Do you miss seeing your Lenovo drivers ordered by release date?

IBM used to have a page where you could view all the drivers for your
system ordered by their release date. Since the IBM/Lenovo "System
Update" app has always been pretty unreliable, seeing the driver
releases ordered by date was always (for me, anyway) the best way to
make sure my system was up to date.

Lenovo, in all their wisdom, got rid of the ability to order drivers by
release dates. But, thankfully, there's a workaround!

Here's what you do. Go to the Lenovo support site and enter your system
type here: http://support.lenovo.com/en_US/downloads/default.page .
Then, click on the "Subscribe to driver updates" link in the upper-right
corner. That will take you to an RSS page. Don't subscribe to it if
you don't want to! But from here, you'll be able to see all the driver
releases ordered by date. Voilá!

Wednesday, March 7, 2012

Uninstalling HP's SmartPrintButton Extension from Firefox

I detest printer drivers. Well, not so much the drivers themselves, but all the useless junk that always comes with them. Sometimes you can find drivers that come separately from the entire software suite, but more often than not, you're stuck installing some sort of crap in order to be able to print to your new printer. Ugh.

Case-in-point: I just bought an HP Laserjet 1606dn. I chose the absolute minimum installation necessary in order to print to the thing. But, I still ended up with a "SmartPrintButton" extension, uninvited, in my firefox add-on panel. Ugh (again)!

Even worse, this extension was "locked" - i.e., there was no "remove" button I could click to get rid of thing. I tried removing it by starting firefox in safe mode, but that didn't get the job done.

Finally, I went to my firefox extensions folder (C:\Users\username\AppData\Mozilla\Firefox\yourprofile\extensions) to track down the culprit and exterminate him. It turned out to be not so easy as that, though. It seems that in newer firefox installations, extensions just put an encoded .xpi file in your extensions directory. The extension files themselves then seem to be scattered across your hard drive.

Since I had recently installed my printer, I looked for the .xpi file with the most recent modification date (you can click on the date column header in windows explorer to sort by date). I moved the file named "{578e7caa-210f-4967-a0d3-88fe5b59a39f}.xpi" out of the extensions directory, restarted firefox, and voila! No more SmartPrintButton extension. Phew.

Your SmartPrintButton file might be named something different than mine. And if you didn't install your printer recently, you might then have to take out extension .xpi files in groups, restarting firefox each time, to figure out which file corresponds to SmartPrintButton. But armed with these techniques, you should be able to hunt down the right file.

I hope this helps some folks out there!