Tuesday, August 13, 2019

R Exporting Data to Excel, CSV, SAS, STATA, Text File

How to Export Data from R

In this tutorial, we will learn how to export data from R environment to different formats.
To export data to the hard drive, you need the file path and an extension. First of all, the path is the location where the data will be stored. In this tutorial, you will see how to store data on:
  • The hard drive
  • Google Drive
  • Dropbox
Secondly, R allows the users to export the data into different types of files. We cover the essential file's extension:
  • csv
  • xlsx
  • RDS
  • SAS
  • SPSS
  • STATA
Overall, it is not difficult to export data from R.

Export to Hard drive

To begin with, you can save the data directly into the working directory. The following code prints the path of your working directory:
directory <-getwd()
directory
Output:
## [1] "/Users/15_Export_to_do" 
By default, the file will be saved in the below path.
For Mac OS:
/Users/USERNAME/Downloads/ 
For Windows:
C:\Users\USERNAME\Documents\
You can, of course, set a different path. For instance, you can change the path to the download folder.

Create data frame

First of all, let's import the mtcars dataset and get the mean of mpg and disp grouped by gear.
library(dplyr)
df <-mtcars % > %
    select(mpg, disp, gear) % > %
    group_by(gear) % > %
    summarize(mean_mpg = mean(mpg), mean_disp = mean(disp))
df
Output::
## # A tibble: 3 x 3
## gear mean_mpg mean_disp
## <dbl> <dbl> lt;dbl>
## 1 3 16.10667  326.3000
## 2  4 24.53333  123.0167
## 3 5 21.38000  202.4800
The table contains three rows and three columns. You can create a CSV file with the function write.csv().

Export CSV

The basic syntax is:
write.csv(df, path)
arguments
-df: Dataset to save. Need to be the same name of the data frame in the environment.
-path: A string. Set the destination path. Path + filename + extension i.e. "/Users/USERNAME/Downloads/mydata.csv" or the filename + extension if the folder is the same as the working directory
Example:
write.csv(df, "table_car.csv")
Code Explanation
  • write.csv(df, "table_car.csv"): Create a CSV file in the hard drive:
    • df: name of the data frame in the environment
    • "table_car.csv": Name the file table_car and store it as csv
Note: You can use the function write.csv2() to separate the rows with a semicolon.
write.csv2(df, "table_car.csv")
Note: For pedagogical purpose only, we created a function called open_folder() to open the directory folder for you. You just need to run the code below and see where the csv file is stored. You should see a file names table_car.csv.
# Run this code to create the function
open_folder <-function(dir){
 if (.Platform['OS.type'] == "windows"){
 shell.exec(dir)  
 } else {
 system(paste(Sys.getenv("R_BROWSER"), dir))
  }
}
# Call the function to open the folder
open_folder(directory)

Export to Excel file

Export data to Excel is trivial for Windows users and trickier for Mac OS user. Both users will use the library xlsx to create an Excel file. The slight difference comes from the installation of the library. Indeed, the library xlsx uses Java to create the file. Java needs to be installed if not present in your machine.
Windows users
If you are a Windows user, you can install the library directly with conda:
conda install -c r r-xlsx
Once the library installed, you can use the function write.xlsx(). A new Excel workbook is created in the working directory
library(xlsx)
write.xlsx(df, "table_car.xlsx")
If you are a Mac OS user, you need to follow these steps:
  • Step 1: Install the latest version of Java
  • Step 2: Install library rJava
  • Step 3: Install library xlsx
Step 1) You could download Java from official Oracle site and install it.
You can go back to Rstudio and check which version of Java is installed.
system("java -version")
At the time of the tutorial, the latest version of Java is 9.0.4.
Step 2) You need to install rjava in R. We recommended you to install R and Rstudio with Anaconda. Anaconda manages the dependencies between libraries. In this sense, Anaconda will handle the intricacies of rJava installation.
First of all, you need to update conda and then install the library. You can copy and paste the next two lines of code in the terminal.
conda - conda update
conda install -c r r-rjava
Next, open rjava in Rstudio
library(rJava)
Step 3) Finally, it is time to install xlsx. Once again, you can use conda to do it:
conda install -c r r-xlsx
Just as the windows users, you can save data with the function write.xlsx()
library(xlsx)
Output:
## Loading required package: xlsxjars
write.xlsx(df, "table_car.xlsx")

Export to different software

Exporting data to different software is as simple as importing them. The library "haven" provides a convenient way to export data to
  • spss
  • sas
  • stata
First of all, import the library. If you don't have "haven", you can go here to install it.
library(haven)  

SPSS file

Below is the code to export the data to SPSS software:
write_sav(df, "table_car.sav") 

Export SAS file

Just as simple as spss, you can export to sas
write_sas(df, "table_car.sas7bdat")

Export STATA file

Finally, haven library allows writing .dta file.
write_dta(df, "table_car.dta")

R

If you want to save a data frame or any other R object, you can use the save() function.
save(df, file ='table_car.RData')
You can check the files created above in the present working directory

Interact with the Cloud Services

Last but not least, R is equipped with fantastic libraries to interact with the cloud computing services. The last part of this tutorial deals with export/import files from:
  • Google Drive
  • Dropbox
Note: This part of the tutorial assumes you have an account with Google and Dropbox. If not, you can quickly create one for – Google Drive: https://accounts.google.com/SignUp?hl=en - Dropbox: https://www.dropbox.com/h

Google Drive

You need to install the library googledrive to access the function allowing to interact with Google Drive.
The library is not yet available at Anaconda. You can install it with the code below in the console.
install.packages("googledrive") 
and you open the library.
library(googledrive)
For non-conda user, installing a library is easy, you can use the function install.packages('NAME OF PACKAGE) with the name of the package inside the parenthesis. Don't forget the ' '. Note that, R is supposed to install the package in the `libPaths() automatically. It is worth to see it in action.

Upload to Google Drive

To upload a file to Google drive, you need to use the function drive_upload().
Each time you restart Rstudio, you will be prompted to allow access tidyverse to Google Drive.
The basic syntax of drive_upload() is
drive_upload(file, path = NULL, name = NULL)
arguments:
- file: Full name of the file to upload (i.e., including the extension)
- path: Location of the file- name: You can rename it as you wish. By default, it is the local name. 
After you launch the code, you need to confirm several questions
drive_upload%<("table_car.csv", name ="table_car")
Output:
## Local file: 
## * table_car.csv 
## uploaded into Drive file: 
## * table_car: 1hwb57eT-9qSgDHt9CrVt5Ht7RHogQaMk 
## with MIME type: 
## * text/csv
You type 1 in the console to confirm the access
Then, you are redirected to Google API to allow the access. Click Allow.
Once the authentication is complete, you can quit your browser.
In the Rstudio's console, you can see the summary of the step done. Google successfully uploaded the file located locally on the Drive. Google assigned an ID to each file in the drive.
You can see this file in Google Spreadsheet.
drive_browse("table_car")
Output:
You will be redirected to Google Spreadsheet

Import from Google Drive

Upload a file from Google Drive with the ID is convenient. If you know the file name, you can get its ID as follow:
Note: Depending on your internet connection and the size of your Drive, it takes times.
x <-drive_get("table_car")
as_id(x)
You stored the ID in the variable x. The function drive_download() allows downloading a file from Google Drive.
The basic syntax is:
drive_download(file, path = NULL, overwrite = FALSE)
arguments:
- file:  Name or id of the file to download
-path: Location to download the file. By default, it is downloaded to the working directory and the name as in Google Drive
-overwrite = FALSE: If the file already exists, don't overwrite it. If set to TRUE, the old file is erased and replaced by the new one.
You can finally download the file:
download_google & lt; - drive_download(as_id(x), overwrite = TRUE)
Code Explanation
  • drive_download(): Function to download a file from Google Drive
  • as_id(x): Use the ID to browse the file in Google Drive
  • overwrite = TRUE: If file exists, overwrite it, else execution halted To see the name of the file locally, you can use:
Output:
The file is stored in your working directory. Remember, you need to add the extenstion of the file to open it in R. You can create the full name with the function paste() (i.e. table_car.csv)
google_file <-download_google$local_path
google_file
path <-paste(google_file, ".csv", sep = "")
google_table_car <-read.csv(path)
google_table_car
Output:
##   X gear mean_mpg mean_disp
## 1 1    3 16.10667  326.3000
## 2 2    4 24.53333  123.0167
## 3 3    5 21.38000  202.4800
Finally, you can remove the file from your Google drive.
## remove file
drive_find("table_car") %>%drive_rm()
Output:
It's a slow process. Takes time to delete

Export to Dropbox

R interacts with Dropbox via the rdrop2 library. The library is not available at Anaconda as well. You can install it via the console
install.packages('rdrop2')
library(rdrop2)
You need to provide temporary access to Dropbox with your credential. After the identification is done, R can create, remove upload and download to your Dropbox.
First of all, you need to give access to your account. The credentials are cached during all session.
drop_auth()
You will be redirected to Dropbox to confirm the authentication.
You will get a confirmation page. You can close it and return to R
You can create a folder with the function drop_create().
  • drop_create('my_first_drop'): Create a folder in the first branch of Dropbox
  • drop_create('First_branch/my_first_drop'): Create a folder inside the existing First_branch folder.
drop_create('my_first_drop')
Output:
In DropBox
To upload the .csv file into your Dropbox, use the function drop_upload().
Basic syntax:
drop_upload(file, path = NULL, mode = "overwrite")
arguments:
- file: local path
- path: Path on Dropbox 
- mode = "overwrite":  By default, overwrite an existing file. If set to `add`, the upload is not completed.
drop_upload('table_car.csv', path = "my_first_drop")
Output:
At DropBox
You can read the csv file from Dropbox with the function drop_read_csv()
dropbox_table_car <-drop_read_csv("my_first_drop/table_car.csv")
dropbox_table_car
Output:
##   X gear mean_mpg mean_disp
## 1 1    3 16.10667  326.3000
## 2 2    4 24.53333  123.0167
## 3 3    5 21.38000  202.4800
When you are done using the file and want to delete it. You need to write the path of the file in the function drop_delete()
drop_delete('my_first_drop/table_car.csv')
Output:
It is also possible to delete a folder
drop_delete('my_first_drop')
Output:

Summary

We can summarize all the functions in the table below
Library
Objective
Function
base
Export csv
write.csv()
xlsx
Export excel
write.xlsx()
haven
Export spss
write_sav()
haven
Export sas
write_sas()
haven
Export stata
write_dta()
base
Export R
save()
googledrive
Upload Google Drive
drive_upload()
googledrive
Open in Google Drive
drive_browse()
googledrive
Retrieve file ID
drive_get(as_id())
googledrive
Dowload from Google Drive
download_google()
googledrive
Remove file from Google Drive
drive_rm()
rdrop2
Authentification
drop_auth()
rdrop2
Create a folder
drop_create()
rdrop2
Upload to Dropbox
drop_upload()
rdrop2
Read csv from Dropbox
drop_read_csv
rdrop2
Delete file from Dropbox
drop_delete()

No comments:

Post a Comment

No String Argument Constructor/Factory Method to Deserialize From String Value

  In this short article, we will cover in-depth the   JsonMappingException: no String-argument constructor/factory method to deserialize fro...