XLConnect looks very nice: it is fully featured, and is implemented as objects. It uses a Java library to do the Excel imports, apparently. This would seem like a detail, but it ends up adding a very important wrinkle. According to the developer, Java's "ToString" function adds a ".0" to any values that look numeric. So, even if you tell XLConnect's readWorksheet function to read the column in as a character data type (using the colTypes argument), it will add a ".0" to every value that looks to Java like a number.
Here's an example. Let's make an excel document called XLConnect_test.xlsx:
Now, let's try to import it via XLConnect:
> require(XLConnect)Notice how XLConnect adds ".0" to each value, even though those columns are supposed to be character strings? That's not good. The nicely-communicative developer says that they're working on this for a future version. In the meantime, here's the hack I've written so the rest of us can move forward:
> xlc_df = readWorksheetFromFile(file="XLConnect_test.xlsx", sheet=1)
> xlc_df
Col1 Col2
1 ABC 1.0
2 123.0 2.0
3 123ABC ABC
> xlc_df = readWorksheetFromFile(file="XLConnect_test.xlsx", sheet=1, colTypes=c("character", "character"))
> xlc_df
Col1 Col2
1 ABC 1.0
2 123.0 2.0
3 123ABC ABC
xlc_dot_zero_hack <- function(df) {Just import your Excel data, and then run the function above on your imported dataframe. Don't forget to reassign the result to a dataframe. In our example, we would write:
# looks for all numerics that had a ".0" added to them by XLConnect, and removes the ".0"
return(
as.data.frame( lapply( df, FUN = function(x) sub("^(\\d*)\\.0$", "\\1", x, perl=TRUE) ) )
)
}
> xlc_df = xlc_dot_zero_hack(xlc_df)Voilà! Note that if your data has any strings that *actually* have a ".0" at the end, this hack will have unexpected results. Specifically, it will remove *all* ".0"s that come after a string of digits.
> xlc_df
Col1 Col2
1 ABC 1
2 123 2
3 123ABC ABC
I hope this might help others out there with the same conundrum!