Overview

The synapser package provides an interface to Synapse, a collaborative workspace for reproducible data intensive research projects, providing support for:

  • integrated presentation of data, code and text
  • fine grained access control
  • provenance tracking

The synapser package lets you communicate with the Synapse platform to create collaborative data analysis projects and access data using the R programming language. Other Synapse clients exist for Python, Java, and the web browser.

If you’re just getting started with Synapse, have a look at the Getting Started guides for Synapse.

Good example projects are:

Installation

synapser is available as a ready-built package for Microsoft Windows and Mac OSX. For Linux systems, it is available to install from source. It can be installed or upgraded using the standard install.packages() command, adding the Sage Bionetworks R Archive Network (RAN) to the repository list, e.g.:

install.packages("synapser", repos = c("http://ran.synapse.org", "http://cran.fhcrc.org"))

Alternatively, edit your ~/.Rprofile and configure your default repositories:

options(repos = c("http://ran.synapse.org", "http://cran.fhcrc.org"))

after which you may run install.packages without specifying the repositories:

install.packages("synapser")

Please see the Troubleshooting vignettes for more information.

Connecting to Synapse

To use Synapse, you’ll need to register for an account. The Synapse website can authenticate using a Google account, but you’ll need to take the extra step of creating a Synapse password to use the programmatic clients.

Once that’s done, you’ll be able to load the library and login:

library(synapser)
## 
## TERMS OF USE NOTICE:
##   When using Synapse, remember that the terms and conditions of use require that you:
##   1) Attribute data contributors when discussing these data or results from these data.
##   2) Not discriminate, identify, or recontact individuals or groups represented by the data.
##   3) Use and contribute only data de-identified to HIPAA standards.
##   4) Redistribute data only under these same terms of use.
synLogin()
## Welcome, test!
## NULL

For more ways to manage your Synapse credentials, please see the Manage Synapse Credentials vignette, and the native reference documentation:

Accessing Data

To make the example below print useful information, we prepare a file:

# use hex_digits to generate random string
hex_digits <- c(as.character(0:9), letters[1:6])
projectName <- sprintf("My unique project %s", paste0(sample(hex_digits, 32, replace = TRUE), collapse = ""))
project <- Project(projectName)
project <- synStore(project)

# Create some files
filePath <- tempfile()
connection <- file(filePath)
writeChar("a \t b \t c \n d \t e \t f \n", connection, eos = NULL)
close(connection)
file <- File(path = filePath, parent = project)
file <- synStore(file)
## ################################################## Uploading file to Synapse storage ##################################################
Uploading [--------------------]0.00%   0.0bytes/23.0bytes  file438d14633fa8     
Uploading [####################]100.00%   23.0bytes/23.0bytes (132.0bytes/s) file438d14633fa8 Done...
synId <- file$properties$id

Synapse identifiers are used to refer to projects and data which are represented by entity objects. For example, the entity above represents a tab-delimited file containing a 2 by 3 matrix. Getting the entity retrieves an object that holds metadata describing the matrix, and also downloads the file to a local cache:

fileEntity <- synGet(synId)

View the entity’s metadata in the R console:

print(fileEntity)
## File: file438d14633fa8 (syn10110418)
##   md5=8465d33d9f407ef250ce519e92f300fb
##   fileSize=23
##   contentType=application/octet-stream
##   externalURL=None
##   cacheDir=/tmp/RtmprvAnGQ
##   files=['file438d14633fa8']
##   path=/tmp/RtmprvAnGQ/file438d14633fa8
##   synapseStore=True
## properties:
##   concreteType=org.sagebionetworks.repo.model.FileEntity
##   createdBy=3323858
##   createdOn=2019-07-02T21:16:24.532Z
##   dataFileHandleId=707297
##   etag=4a12488b-3566-4257-9209-0b5fbc371564
##   id=syn10110418
##   modifiedBy=3323858
##   modifiedOn=2019-07-02T21:16:24.613Z
##   name=file438d14633fa8
##   parentId=syn10110417
##   versionLabel=1
##   versionNumber=1
## annotations:

This is one simple way to read in a small matrix (we load just the first few rows):

read.table(fileEntity$path, nrows = 2)
##   V1 V2 V3
## 1  a  b  c
## 2  d  e  f

View the entity in the browser:

synOnweb(synId)

Download Location

By default the download location will always be in the Synapse cache. You can specify the downloadLocation parameter.

entity <- synGet("syn00123", downloadLocation = "/path/to/folder")

For more details see the native reference documentation, e.g.:

Organizing Data in a Project

You can create your own projects and upload your own data sets. Synapse stores entities in a hierarchical or tree structure. Projects are at the top level and must be uniquely named:

project <- Project(projectName)
project <- synStore(project)

Creating a folder:

dataFolder <- Folder("Data", parent = project)
dataFolder <- synStore(dataFolder)

Adding files to the project:

filePath <- tempfile()
connection <- file(filePath)
writeChar("this is the content of the file", connection, eos = NULL)
close(connection)
file <- File(path = filePath, parent = dataFolder)
file <- synStore(file)
## ################################################## Uploading file to Synapse storage ##################################################
Uploading [--------------------]0.00%   0.0bytes/31.0bytes  file438d768289fd     
Uploading [####################]100.00%   31.0bytes/31.0bytes (165.3bytes/s) file438d768289fd Done...

You can print the properties of an entity (such as the file we just created):

file$properties
## {
##   "concreteType": "org.sagebionetworks.repo.model.FileEntity",
##   "createdBy": "3323858",
##   "createdOn": "2019-07-02T21:16:26.138Z",
##   "dataFileHandleId": "707298",
##   "etag": "a62f9634-4193-4dc8-8126-44e2d5625fb8",
##   "id": "syn10110420",
##   "modifiedBy": "3323858",
##   "modifiedOn": "2019-07-02T21:16:26.206Z",
##   "name": "file438d768289fd",
##   "parentId": "syn10110419",
##   "versionLabel": "1",
##   "versionNumber": 1
## }

Most other properties are immutable, but you can change an entity’s name:

file$properties$name <- "different name"

Update Synapse with the change:

file <- synStore(file)
file$properties
## {
##   "concreteType": "org.sagebionetworks.repo.model.FileEntity",
##   "createdBy": "3323858",
##   "createdOn": "2019-07-02T21:16:26.138Z",
##   "dataFileHandleId": "707298",
##   "etag": "b6070237-b7d2-4fe8-b858-49b9f0ce0fc8",
##   "id": "syn10110420",
##   "modifiedBy": "3323858",
##   "modifiedOn": "2019-07-02T21:16:26.590Z",
##   "name": "different name",
##   "parentId": "syn10110419",
##   "versionLabel": "2",
##   "versionNumber": 2
## }

You can list all children of an entity:

children <- synGetChildren(project$properties$id)
as.list(children)
## [[1]]
## [[1]]$modifiedBy
## [1] "3323858"
## 
## [[1]]$type
## [1] "org.sagebionetworks.repo.model.Folder"
## 
## [[1]]$benefactorId
## [1] 10110417
## 
## [[1]]$versionNumber
## [1] 1
## 
## [[1]]$createdOn
## [1] "2019-07-02T21:16:25.115Z"
## 
## [[1]]$modifiedOn
## [1] "2019-07-02T21:16:25.206Z"
## 
## [[1]]$versionLabel
## [1] "1"
## 
## [[1]]$name
## [1] "Data"
## 
## [[1]]$id
## [1] "syn10110419"
## 
## [[1]]$createdBy
## [1] "3323858"
## 
## 
## [[2]]
## [[2]]$modifiedBy
## [1] "3323858"
## 
## [[2]]$type
## [1] "org.sagebionetworks.repo.model.FileEntity"
## 
## [[2]]$benefactorId
## [1] 10110417
## 
## [[2]]$versionNumber
## [1] 1
## 
## [[2]]$createdOn
## [1] "2019-07-02T21:16:24.532Z"
## 
## [[2]]$modifiedOn
## [1] "2019-07-02T21:16:24.613Z"
## 
## [[2]]$versionLabel
## [1] "1"
## 
## [[2]]$name
## [1] "file438d14633fa8"
## 
## [[2]]$id
## [1] "syn10110418"
## 
## [[2]]$createdBy
## [1] "3323858"

You can also filter by type:

filesAndFolders <- synGetChildren(project$properties$id, includeTypes = c("file", "folder"))
as.list(filesAndFolders)
## [[1]]
## [[1]]$modifiedBy
## [1] "3323858"
## 
## [[1]]$type
## [1] "org.sagebionetworks.repo.model.Folder"
## 
## [[1]]$benefactorId
## [1] 10110417
## 
## [[1]]$versionNumber
## [1] 1
## 
## [[1]]$createdOn
## [1] "2019-07-02T21:16:25.115Z"
## 
## [[1]]$modifiedOn
## [1] "2019-07-02T21:16:25.206Z"
## 
## [[1]]$versionLabel
## [1] "1"
## 
## [[1]]$name
## [1] "Data"
## 
## [[1]]$id
## [1] "syn10110419"
## 
## [[1]]$createdBy
## [1] "3323858"
## 
## 
## [[2]]
## [[2]]$modifiedBy
## [1] "3323858"
## 
## [[2]]$type
## [1] "org.sagebionetworks.repo.model.FileEntity"
## 
## [[2]]$benefactorId
## [1] 10110417
## 
## [[2]]$versionNumber
## [1] 1
## 
## [[2]]$createdOn
## [1] "2019-07-02T21:16:24.532Z"
## 
## [[2]]$modifiedOn
## [1] "2019-07-02T21:16:24.613Z"
## 
## [[2]]$versionLabel
## [1] "1"
## 
## [[2]]$name
## [1] "file438d14633fa8"
## 
## [[2]]$id
## [1] "syn10110418"
## 
## [[2]]$createdBy
## [1] "3323858"

You can avoid reading all children into memory at once by iterating through one at a time:

children <- synGetChildren(project$properties$id)
tryCatch({
  while (TRUE) {
    child <- nextElem(children)
    print(child)
  }
}, error = function(e) {
    print("Reached end of list.")
})
## $modifiedBy
## [1] "3323858"
## 
## $type
## [1] "org.sagebionetworks.repo.model.Folder"
## 
## $benefactorId
## [1] 10110417
## 
## $versionNumber
## [1] 1
## 
## $createdOn
## [1] "2019-07-02T21:16:25.115Z"
## 
## $modifiedOn
## [1] "2019-07-02T21:16:25.206Z"
## 
## $versionLabel
## [1] "1"
## 
## $name
## [1] "Data"
## 
## $id
## [1] "syn10110419"
## 
## $createdBy
## [1] "3323858"
## 
## $modifiedBy
## [1] "3323858"
## 
## $type
## [1] "org.sagebionetworks.repo.model.FileEntity"
## 
## $benefactorId
## [1] 10110417
## 
## $versionNumber
## [1] 1
## 
## $createdOn
## [1] "2019-07-02T21:16:24.532Z"
## 
## $modifiedOn
## [1] "2019-07-02T21:16:24.613Z"
## 
## $versionLabel
## [1] "1"
## 
## $name
## [1] "file438d14633fa8"
## 
## $id
## [1] "syn10110418"
## 
## $createdBy
## [1] "3323858"
## 
## [1] "Reached end of list."

You can move files to a different parent:

newFolder <- Folder("New Parent", parent = project)
newFolder <- synStore(newFolder)

file <- synMove(file, newFolder)

Content can be deleted:

synDelete(file)
## NULL

Deletion of a project will also delete its contents, in this case the folder:

folderId <- dataFolder$properties$id
synDelete(project)
## NULL
tryCatch(
  synGet(folderId),
  error = function(e) {
    message(sprintf("Retrieving a deleted folder causes: %s", as.character(e)))
  },
  silent = TRUE
)
## Retrieving a deleted folder causes: Error in value[[3L]](cond): 404 Client Error: 
## Entity syn10110419 is in trash can.

In addition to simple data storage, Synapse entities can be annotated with key/value metadata, described in markdown documents (wikis), and linked together in provenance graphs to create a reproducible record of a data analysis pipeline.

For more details see the native reference documentation, e.g.:

Annotating Synapse Entities

# (We use a time stamp just to help ensure uniqueness.)
projectName <- sprintf("My unique project created on %s", format(Sys.time(), "%a %b %d %H%M%OS4 %Y"))
project <- Project(projectName)
project <- synStore(project)
synSetAnnotations(project, list(annotationName = "annotationValue"))
## $annotationName
## $annotationName[[1]]
## [1] "annotationValue"
project <- synGet(project$properties$id)
project$annotations
## {
##   "annotationName": [
##     "annotationValue"
##   ]
## }
synGetAnnotations(project)
## $annotationName
## $annotationName[[1]]
## [1] "annotationValue"

synSetAnnotations will replace all the existing annotations with the given annotations. Hence, to add a new annotation to the ones currently defined for an entity, we must retrieve it:

annotations <- synGetAnnotations(project)
annotations[["numeric_annotation_name"]] <- 42
annotations <- synSetAnnotations(project, annotations)
annotations
## $annotationName
## $annotationName[[1]]
## [1] "annotationValue"
## 
## 
## $numeric_annotation_name
## $numeric_annotation_name[[1]]
## [1] 42

Provenance

Synapse provides tools for tracking ‘provenance’, or the transformation of raw data into processed results, by linking derived data objects to source data and the code used to perform the transformation.

The Activity object represents the source of a data set or the data processing steps used to produce it. Using W3C provenance ontology terms, a result is generated by a combination of data and code which are either used or executed.

Creating an activity object:

act <- Activity(
  name = "clustering",
  description = "whizzy clustering",
  used = c("syn1234", "syn1235"),
  executed = "syn4567")

Here, syn1234 and syn1235 might be two types of measurements on a common set of samples. Some whizzy clustering code might be referred to by syn4567.

Alternatively, you can build an activity up piecemeal:

act <- Activity(name = "clustering", description = "whizzy clustering")
act$used(c("syn12345", "syn12346"))
## NULL
act$executed("syn4567")
## NULL

The used and executed can reference entities in Synapse or URLs.

Entity examples:

  act$used("syn12345")
## NULL
  act$used(project)
## NULL
  act$used(target = "syn12345", targetVersion = 2)
## NULL

URL examples:

  act$used("http://mydomain.com/my/awesome/data.RData")
## NULL
  act$used(url = "http://mydomain.com/my/awesome/data.RData", name = "Awesome Data")
## NULL
  act$used(url = "https://github.com/joe_hacker/code_repo", name = "Gnarly hacks", wasExecuted = TRUE)
## NULL

Storing entities with provenance

The activity can be passed in when storing an Entity to set the Entity’s provenance:

project <- synGet(project$properties$id)
project <- synStore(project, activity = act)

We’ve now recorded that ‘project’ is the output of syn4567 applied to the data stored in syn1234 and syn1235.

Recording data source

The synStore() has shortcuts for specifying the used and executed lists directly. For example, when storing a data entity, it’s a good idea to record its source:

project <- synStore(
  project,
  activityName = "data-r-us",
  activityDescription = "downloaded from data-r-us",
  used = "http://data-r-us.com/excellent/data.xyz")

For more information:

Tables

Tables can be built up by adding sets of rows that follow a user-defined schema and queried using a SQL-like syntax. Please visit the Table vignettes for more information.

Wikis

Wiki pages can be attached to an Synapse entity (i.e. project, folder, file, etc). Text and graphics can be composed in markdown and rendered in the web view of the object.

Creating a Wiki

project <- synGet(project$properties$id)
content <- "
# My Wiki Page

Here is a description of my **fantastic** project!
"
# attachment
filePath <- tempfile()
connection <- file(filePath)
writeChar("this is the content of the file", connection, eos = NULL)
close(connection)

wiki <- Wiki(owner = project,
             title = "My Wiki Page",
             markdown = content,
             attachments = list(filePath))
wiki <- synStore(wiki)
## 
Uploading [--------------------]0.00%   0.0bytes/31.0bytes  file438d39b4e446     
Uploading [####################]100.00%   31.0bytes/31.0bytes (217.1bytes/s) file438d39b4e446 Done...

Updating a Wiki

project <- synGet(project$properties$id)
wiki <- synGetWiki(project)
## 
Downloading  [####################]100.00%   86.0bytes/86.0bytes (77.5kB/s) null_markdown.txt Done...
wiki.markdown <- "
# My Wiki Page

Here is a description of my **fantastic** project! Let's
*emphasize* the important stuff.
"

wiki <- synStore(wiki)

For more information:

Evaluations

An Evaluation is a Synapse construct useful for building processing pipelines and for scoring predictive modeling and data analysis challenges.

Creating an Evaluation:

eval <- Evaluation(
  name = sprintf("My unique evaluation created on %s", format(Sys.time(), "%a %b %d %H%M%OS4 %Y")),
  description = "testing",
  contentSource = project$properties$id,
  submissionReceiptMessage = "Thank you for your submission!",
  submissionInstructionsMessage = "This evaluation only accepts files.")
eval <- synStore(eval)

Retrieving the created Evaluation:

eval <- synGetEvaluation(eval$id)
eval
## {
##   "contentSource": "syn10110422",
##   "createdOn": "2019-07-02T21:16:39.389Z",
##   "description": "testing",
##   "etag": "a2dec332-b3b5-4b5e-aca3-1dd8e9338bc2",
##   "id": "9607856",
##   "name": "My unique evaluation created on Tue Jul 02 211639.2370 2019",
##   "ownerId": "3323858",
##   "status": "OPEN",
##   "submissionInstructionsMessage": "This evaluation only accepts files.",
##   "submissionReceiptMessage": "Thank you for your submission!"
## }

Submitting a file to an existing Evaluation:

# first create a file to submit
filePath <- tempfile()
connection <- file(filePath)
writeChar("this is my first submission", connection, eos = NULL)
close(connection)
file <- File(path = filePath, parent = project)
file <- synStore(file)
## ################################################## Uploading file to Synapse storage ##################################################
Uploading [--------------------]0.00%   0.0bytes/27.0bytes  file438d7f294cb1     
Uploading [####################]100.00%   27.0bytes/27.0bytes (130.1bytes/s) file438d7f294cb1 Done...
# submit the created file
submission <- synSubmit(eval, file)
## Thank you for your submission!

List submissions:

submissions <- synGetSubmissionBundles(eval)
as.list(submissions)
## [[1]]
## [[1]][[1]]
## {
##   "contributors": [
##     {
##       "createdOn": "2019-07-02T21:16:44.772Z",
##       "principalId": "3323858"
##     }
##   ],
##   "createdOn": "2019-07-02T21:16:44.772Z",
##   "entityBundleJSON": "{\"entityType\":\"org.sagebionetworks.repo.model.FileEntity\",\"fileHandles\":[{\"contentMd5\":\"3f466b7f85d184292a68cea1c4f7cfc2\",\"bucketName\":\"devdata.sagebase.org\",\"fileName\":\"file438d7f294cb1\",\"createdBy\":\"3323858\",\"contentSize\":27,\"concreteType\":\"org.sagebionetworks.repo.model.file.S3FileHandle\",\"etag\":\"4a03dd3f-a57c-4a11-8976-621d51a1ce65\",\"id\":\"707302\",\"storageLocationId\":1,\"createdOn\":\"2019-07-02T21:16:40.000Z\",\"contentType\":\"application/octet-stream\",\"key\":\"3323858/ebaa0d21-5c80-4d63-90c9-844d269c890c/file438d7f294cb1\"}],\"annotations\":{\"longAnnotations\":{},\"blobAnnotations\":{},\"stringAnnotations\":{},\"etag\":\"00000000-0000-0000-0000-000000000000\",\"id\":\"syn10110423\",\"dateAnnotations\":{},\"doubleAnnotations\":{}},\"entity\":{\"dataFileHandleId\":\"707302\",\"modifiedOn\":\"2019-07-02T21:16:42.710Z\",\"versionLabel\":\"1\",\"createdBy\":\"3323858\",\"name\":\"file438d7f294cb1\",\"concreteType\":\"org.sagebionetworks.repo.model.FileEntity\",\"etag\":\"fce9b337-7529-49fb-82dd-4a76945b559e\",\"modifiedBy\":\"3323858\",\"id\":\"syn10110423\",\"createdOn\":\"2019-07-02T21:16:41.052Z\",\"parentId\":\"syn10110422\",\"versionNumber\":1}}",
##   "entityId": "syn10110423",
##   "evaluationId": "9607856",
##   "id": "9622286",
##   "name": "file438d7f294cb1",
##   "userId": "3323858",
##   "versionNumber": 1
## }
## 
## [[1]][[2]]
## {
##   "entityId": "syn10110423",
##   "etag": "02eba7da-2041-4318-81ae-3bcbeb0607ff",
##   "id": "9622286",
##   "modifiedOn": "2019-07-02T21:16:44.772Z",
##   "status": "RECEIVED",
##   "statusVersion": 0,
##   "versionNumber": 1
## }

Retrieving submission by id:

# Not evaluating this section because of SYNPY-235
submission <- synGetSubmission(submission$id)
submission

Retrieving the submission status:

submissionStatus <- synGetSubmissionStatus(submission)
submissionStatus
## {
##   "entityId": "syn10110423",
##   "etag": "02eba7da-2041-4318-81ae-3bcbeb0607ff",
##   "id": "9622286",
##   "modifiedOn": "2019-07-02T21:16:44.772Z",
##   "status": "RECEIVED",
##   "statusVersion": 0,
##   "versionNumber": 1
## }

To view the annotations:

submissionStatus$annotations
## NULL

To update an annotation:

submissionStatus$annotations["doubleAnnos"] <- list(c("rank" = 3))
synStore(submissionStatus)

Query an evaluation:

queryString <- sprintf("query=select * from evaluation_%s LIMIT %s OFFSET %s'", eval$id, 10, 0)
synRestGET(paste("/evaluation/submission/query?", URLencode(queryString), sep = ""))
## $totalNumberOfResults
## [1] 0
## 
## $rows
## list()
## 
## $headers
## list()

To learn more about writing an evaluation query, please see: http://docs.synapse.org/rest/GET/evaluation/submission/query.html

For more information, please see:

Sharing Access to Content

By default, data sets in Synapse are private to your user account, but they can easily be shared with specific users, groups, or the public.

Retrieve the sharing setting on an entity:

synGetPermissions(project, principalId = 273950)
## list()

The first time an entity is shared, an ACL object is created for that entity. Let’s make project public:

acl <- synSetPermissions(project, principalId = 273950, accessType = list("READ"))
acl
## $etag
## [1] "8f4f93b5-61be-444d-be2a-0ccb13c5bbcb"
## 
## $id
## [1] "syn10110422"
## 
## $creationDate
## [1] "2019-07-02T21:16:30.412Z"
## 
## $resourceAccess
## $resourceAccess[[1]]
## $resourceAccess[[1]]$accessType
## $resourceAccess[[1]]$accessType[[1]]
## [1] "READ"
## 
## 
## $resourceAccess[[1]]$principalId
## [1] 273950
## 
## 
## $resourceAccess[[2]]
## $resourceAccess[[2]]$accessType
## $resourceAccess[[2]]$accessType[[1]]
## [1] "READ"
## 
## $resourceAccess[[2]]$accessType[[2]]
## [1] "MODERATE"
## 
## $resourceAccess[[2]]$accessType[[3]]
## [1] "CHANGE_SETTINGS"
## 
## $resourceAccess[[2]]$accessType[[4]]
## [1] "UPDATE"
## 
## $resourceAccess[[2]]$accessType[[5]]
## [1] "CHANGE_PERMISSIONS"
## 
## $resourceAccess[[2]]$accessType[[6]]
## [1] "DELETE"
## 
## $resourceAccess[[2]]$accessType[[7]]
## [1] "DOWNLOAD"
## 
## $resourceAccess[[2]]$accessType[[8]]
## [1] "CREATE"
## 
## 
## $resourceAccess[[2]]$principalId
## [1] 3323858

Now public can read:

synGetPermissions(project, principalId = 273950)
## [[1]]
## [1] "READ"
synDelete(project)
## NULL

File Views

A file view can be defined by its scope. It allows querying for FileEntity within the scope using a SQL-like syntax. Please visit the File View vignettes for more information.

Accessing the API Directly

These methods enable access to the Synapse REST(ish) API taking care of details like endpoints and authentication. See the REST API documentation.

Synapse Utilities

We provide some utility functions in the synapserutils package:

  • Copy Files, Folders, Tables, Links, Projects, and Wiki Pages.
  • Upload data to Synapse in bulk.
  • Download data from Synapse in bulk.

Please visit the synapserutils Github repository for instructions on how to download.

More information

For more information see the Synapse User Guide.