The synapser
package provides an interface to Synapse, a collaborative workspace for
reproducible data intensive research projects, providing support
for:
The synapser
package lets you communicate with the
Synapse platform to create collaborative data analysis projects and
access data using the R programming language. Other Synapse clients
exist for Python, Java,
and the web browser.
If you’re just getting started with Synapse, have a look at the Getting Started guides for Synapse.
Good example projects are:
synapser
is available as a ready-built package for
Microsoft Windows and Mac OSX. For Linux systems, it is available to
install from source. It can be installed or upgraded using the standard
install.packages()
command, adding the Sage Bionetworks R Archive Network
(RAN) to the repository list, e.g.:
install.packages("synapser", repos = c("http://ran.synapse.org", "https://cloud.r-project.org"))
Alternatively, edit your ~/.Rprofile
and configure your
default repositories:
after which you may run install.packages
without
specifying the repositories:
install.packages("synapser")
For a detailed installation guide see installation vignette. Please refer to the Troubleshooting guide for more information.
To use Synapse, you’ll need to register for an account. The Synapse website can authenticate using a Google account. If you authenticate using a Google account, you’ll need to create a personal access token to log in to Synapse through the programmatic clients. See the Manage Synapse Credentials vignette for more information.
Once that’s done, you’ll be able to load the library and login:
library(synapser)
##
## TERMS OF USE NOTICE:
## When using Synapse, remember that the terms and conditions of use require that you:
## 1) Attribute data contributors when discussing these data or results from these data.
## 2) Not discriminate, identify, or recontact individuals or groups represented by the data.
## 3) Use and contribute only data de-identified to HIPAA standards.
## 4) Redistribute data only under these same terms of use.
synLogin()
## NULL
For more ways to manage your Synapse credentials, please see the Manage Synapse Credentials vignette, and the native reference documentation:
?synLogin
?synLogout
To make the example below print useful information, we prepare a file:
# use hex_digits to generate random string
hex_digits <- c(as.character(0:9), letters[1:6])
projectName <- sprintf("My unique project %s", paste0(sample(hex_digits, 32, replace = TRUE), collapse = ""))
project <- Project(projectName)
project <- synStore(project)
# Create some files
filePath <- tempfile()
connection <- file(filePath)
writeChar("a \t b \t c \n d \t e \t f \n", connection, eos = NULL)
close(connection)
file <- File(path = filePath, parent = project)
# Add a version comment
file$properties$versionComment <- 'Some sort of comment about the new version of the file.'
file <- synStore(file)
synId <- file$properties$id
Synapse identifiers are used to refer to projects and data which are represented by entity objects. For example, the entity above represents a tab-delimited file containing a 2 by 3 matrix. Getting the entity retrieves an object that holds metadata describing the matrix, and also downloads the file to a local cache:
fileEntity <- synGet(synId)
View the entity’s metadata in the R console:
print(fileEntity)
## File(cacheDir='/tmp/Rtmp80GXo3', path='/tmp/Rtmp80GXo3/filec1c87d9109a9', _file_handle={'id': '146071290', 'etag': '1a85dde3-ce5e-4919-a468-7c1fba003c04', 'createdBy': '3434599', 'createdOn': '2024-08-21T00:58:00.000Z', 'modifiedOn': '2024-08-21T00:58:00.000Z', 'concreteType': 'org.sagebionetworks.repo.model.file.S3FileHandle', 'contentType': 'application/octet-stream', 'contentMd5': '8465d33d9f407ef250ce519e92f300fb', 'fileName': 'filec1c87d9109a9', 'storageLocationId': 1, 'contentSize': 23, 'status': 'AVAILABLE', 'bucketName': 'proddata.sagebase.org', 'key': '3434599/a0bd2d54-9bca-478d-8cba-f48095a50b4f/filec1c87d9109a9', 'isPreview': False, 'externalURL': None}, isLatestVersion=True, dataFileHandleId='146071290', versionComment='Some sort of comment about the new version of the file.', files=['filec1c87d9109a9'], synapseStore=True, parentId='syn62283513', versionNumber=1, modifiedOn='2024-08-21T00:57:59.924Z', id='syn62283514', name='filec1c87d9109a9', createdOn='2024-08-21T00:57:59.924Z', concreteType='org.sagebionetworks.repo.model.FileEntity', etag='2e738235-7137-4265-bcbf-fed0313c3942', createdBy='3434599', modifiedBy='3434599', versionLabel='1')
This is one simple way to read in a small matrix (we load just the first few rows):
read.table(fileEntity$path, nrows = 2)
## V1 V2 V3
## 1 a b c
## 2 d e f
View the entity in the browser:
synOnweb(synId)
By default the download location will always be in the Synapse cache. You can specify the downloadLocation parameter.
entity <- synGet("syn00123", downloadLocation = "/path/to/folder")
For more details see the native reference documentation, e.g.:
?synGet
?synOnweb
You can create your own projects and upload your own data sets. Synapse stores entities in a hierarchical or tree structure. Projects are at the top level and must be uniquely named:
Creating a folder:
Adding files to the project:
filePath <- tempfile()
connection <- file(filePath)
writeChar("this is the content of the file", connection, eos = NULL)
close(connection)
file <- File(path = filePath, parent = dataFolder)
file <- synStore(file)
You can print the properties of an entity (such as the file we just created):
file$properties
## Dict (13 items)
Most other properties are immutable, but you can change an entity’s name:
file$properties$name <- "different name"
Update Synapse with the change:
file <- synStore(file)
file$properties
## Dict (13 items)
You can list all children of an entity:
children <- synGetChildren(project$properties$id)
as.list(children)
## [[1]]
## [[1]]$name
## [1] "Data"
##
## [[1]]$id
## [1] "syn62283515"
##
## [[1]]$type
## [1] "org.sagebionetworks.repo.model.Folder"
##
## [[1]]$versionNumber
## [1] 1
##
## [[1]]$versionLabel
## [1] "1"
##
## [[1]]$isLatestVersion
## [1] TRUE
##
## [[1]]$benefactorId
## [1] 62283513
##
## [[1]]$createdOn
## [1] "2024-08-21T00:58:01.127Z"
##
## [[1]]$modifiedOn
## [1] "2024-08-21T00:58:01.127Z"
##
## [[1]]$createdBy
## [1] "3434599"
##
## [[1]]$modifiedBy
## [1] "3434599"
##
##
## [[2]]
## [[2]]$name
## [1] "filec1c87d9109a9"
##
## [[2]]$id
## [1] "syn62283514"
##
## [[2]]$type
## [1] "org.sagebionetworks.repo.model.FileEntity"
##
## [[2]]$versionNumber
## [1] 1
##
## [[2]]$versionLabel
## [1] "1"
##
## [[2]]$isLatestVersion
## [1] TRUE
##
## [[2]]$benefactorId
## [1] 62283513
##
## [[2]]$createdOn
## [1] "2024-08-21T00:57:59.924Z"
##
## [[2]]$modifiedOn
## [1] "2024-08-21T00:57:59.924Z"
##
## [[2]]$createdBy
## [1] "3434599"
##
## [[2]]$modifiedBy
## [1] "3434599"
You can also filter by type:
filesAndFolders <- synGetChildren(project$properties$id, includeTypes = c("file", "folder"))
as.list(filesAndFolders)
## [[1]]
## [[1]]$name
## [1] "Data"
##
## [[1]]$id
## [1] "syn62283515"
##
## [[1]]$type
## [1] "org.sagebionetworks.repo.model.Folder"
##
## [[1]]$versionNumber
## [1] 1
##
## [[1]]$versionLabel
## [1] "1"
##
## [[1]]$isLatestVersion
## [1] TRUE
##
## [[1]]$benefactorId
## [1] 62283513
##
## [[1]]$createdOn
## [1] "2024-08-21T00:58:01.127Z"
##
## [[1]]$modifiedOn
## [1] "2024-08-21T00:58:01.127Z"
##
## [[1]]$createdBy
## [1] "3434599"
##
## [[1]]$modifiedBy
## [1] "3434599"
##
##
## [[2]]
## [[2]]$name
## [1] "filec1c87d9109a9"
##
## [[2]]$id
## [1] "syn62283514"
##
## [[2]]$type
## [1] "org.sagebionetworks.repo.model.FileEntity"
##
## [[2]]$versionNumber
## [1] 1
##
## [[2]]$versionLabel
## [1] "1"
##
## [[2]]$isLatestVersion
## [1] TRUE
##
## [[2]]$benefactorId
## [1] 62283513
##
## [[2]]$createdOn
## [1] "2024-08-21T00:57:59.924Z"
##
## [[2]]$modifiedOn
## [1] "2024-08-21T00:57:59.924Z"
##
## [[2]]$createdBy
## [1] "3434599"
##
## [[2]]$modifiedBy
## [1] "3434599"
You can avoid reading all children into memory at once by iterating through one at a time:
children <- synGetChildren(project$properties$id)
tryCatch({
while (TRUE) {
child <- nextElem(children)
print(child)
}
}, error = function(e) {
print("Reached end of list.")
})
## $name
## [1] "Data"
##
## $id
## [1] "syn62283515"
##
## $type
## [1] "org.sagebionetworks.repo.model.Folder"
##
## $versionNumber
## [1] 1
##
## $versionLabel
## [1] "1"
##
## $isLatestVersion
## [1] TRUE
##
## $benefactorId
## [1] 62283513
##
## $createdOn
## [1] "2024-08-21T00:58:01.127Z"
##
## $modifiedOn
## [1] "2024-08-21T00:58:01.127Z"
##
## $createdBy
## [1] "3434599"
##
## $modifiedBy
## [1] "3434599"
##
## $name
## [1] "filec1c87d9109a9"
##
## $id
## [1] "syn62283514"
##
## $type
## [1] "org.sagebionetworks.repo.model.FileEntity"
##
## $versionNumber
## [1] 1
##
## $versionLabel
## [1] "1"
##
## $isLatestVersion
## [1] TRUE
##
## $benefactorId
## [1] 62283513
##
## $createdOn
## [1] "2024-08-21T00:57:59.924Z"
##
## $modifiedOn
## [1] "2024-08-21T00:57:59.924Z"
##
## $createdBy
## [1] "3434599"
##
## $modifiedBy
## [1] "3434599"
##
## [1] "Reached end of list."
You can move files to a different parent:
newFolder <- Folder("New Parent", parent = project)
newFolder <- synStore(newFolder)
file <- synMove(file, newFolder)
Content can be deleted:
synDelete(file)
## NULL
Deletion of a project will also delete its contents, in this case the folder:
folderId <- dataFolder$properties$id
synDelete(project)
## NULL
tryCatch(
synGet(folderId),
error = function(e) {
message(sprintf("Retrieving a deleted folder causes: %s", as.character(e)))
},
silent = TRUE
)
## Retrieving a deleted folder causes: Error in value[[3L]](cond): 404 Client Error:
## Entity syn62283515 is in trash can.
In addition to simple data storage, Synapse entities can be annotated with key/value metadata, described in markdown documents (wikis), and linked together in provenance graphs to create a reproducible record of a data analysis pipeline.
For more details see the native reference documentation, e.g.:
?Project
?Folder
?File
?Link
?synStore
# (We use a time stamp just to help ensure uniqueness.)
projectName <- sprintf("My unique project created on %s", format(Sys.time(), "%a %b %d %H%M%OS4 %Y"))
project <- Project(projectName)
# This will erase all existing annotations
project$annotations <- list(annotationName = "annotationValue")
project <- synStore(project)
project <- synGet(project$properties$id)
project$annotations
## {
## "annotationName": [
## "annotationValue"
## ]
## }
synGetAnnotations(project)
## $annotationName
## [1] "annotationValue"
Synapse provides tools for tracking ‘provenance’, or the transformation of raw data into processed results, by linking derived data objects to source data and the code used to perform the transformation.
The Activity object represents the source of a data set or the data processing steps used to produce it. Using W3C provenance ontology terms, a result is generated by a combination of data and code which are either used or executed.
act <- Activity(
name = "clustering",
description = "whizzy clustering",
used = c("syn1234", "syn1235"),
executed = "syn4567")
Here, syn1234 and syn1235 might be two types of measurements on a common set of samples. Some whizzy clustering code might be referred to by syn4567.
Alternatively, you can build an activity up piecemeal:
act <- Activity(name = "clustering", description = "whizzy clustering")
act$used(c("syn12345", "syn12346"))
act$executed("syn4567")
The used and executed can reference entities in Synapse or URLs.
Entity examples:
act$used("syn12345")
act$used(project)
act$used(target = "syn12345", targetVersion = 2)
URL examples:
act$used("http://mydomain.com/my/awesome/data.RData")
act$used(url = "http://mydomain.com/my/awesome/data.RData", name = "Awesome Data")
act$used(url = "https://github.com/joe_hacker/code_repo", name = "Gnarly hacks", wasExecuted = TRUE)
The activity can be passed in when storing an Entity to set the Entity’s provenance:
We’ve now recorded that ‘project’ is the output of syn4567 applied to the data stored in syn1234 and syn1235.
The synStore() has shortcuts for specifying the used and executed lists directly. For example, when storing a data entity, it’s a good idea to record its source:
project <- synStore(
project,
activityName = "data-r-us",
activityDescription = "downloaded from data-r-us",
used = "http://data-r-us.com/excellent/data.xyz")
For more information:
?Activity
?synDeleteProvenance
Tables can be built up by adding sets of rows that follow a user-defined schema and queried using a SQL-like syntax. Please visit the Table vignettes for more information.
Wiki pages can be attached to an Synapse entity (i.e. project, folder, file, etc). Text and graphics can be composed in markdown and rendered in the web view of the object.
Creating a Wiki
project <- synGet(project$properties$id)
content <- "
# My Wiki Page
Here is a description of my **fantastic** project!
"
# attachment
filePath <- tempfile()
connection <- file(filePath)
writeChar("this is the content of the file", connection, eos = NULL)
close(connection)
wiki <- Wiki(owner = project,
title = "My Wiki Page",
markdown = content,
attachments = list(filePath))
wiki <- synStore(wiki)
Updating a Wiki
project <- synGet(project$properties$id)
wiki <- synGetWiki(project)
wiki.markdown <- "
# My Wiki Page
Here is a description of my **fantastic** project! Let's
*emphasize* the important stuff.
"
wiki <- synStore(wiki)
For more information:
?Wiki
?synGetWiki
An Evaluation is a Synapse construct useful for building processing pipelines and for scoring predictive modeling and data analysis challenges.
Creating an Evaluation:
eval <- Evaluation(
name = sprintf("My unique evaluation created on %s", format(Sys.time(), "%a %b %d %H%M%OS4 %Y")),
description = "testing",
contentSource = project$properties$id,
submissionReceiptMessage = "Thank you for your submission!",
submissionInstructionsMessage = "This evaluation only accepts files.")
eval <- synStore(eval)
Retrieving the created Evaluation:
eval <- synGetEvaluation(eval$id)
eval
## {
## "contentSource": "syn62283518",
## "createdOn": "2024-08-21T00:58:08.585Z",
## "description": "testing",
## "etag": "5e6ddd44-eb1e-4b43-b271-d42a479d56e4",
## "id": "9615691",
## "name": "My unique evaluation created on Wed Aug 21 005808.5339 2024",
## "ownerId": "3434599",
## "submissionInstructionsMessage": "This evaluation only accepts files.",
## "submissionReceiptMessage": "Thank you for your submission!"
## }
Submitting a file to an existing Evaluation:
# first create a file to submit
filePath <- tempfile()
connection <- file(filePath)
writeChar("this is my first submission", connection, eos = NULL)
close(connection)
file <- File(path = filePath, parent = project)
file <- synStore(file)
# submit the created file
submission <- synSubmit(eval, file)
List submissions:
submissions <- synGetSubmissionBundles(eval)
as.list(submissions)
## [[1]]
## [[1]][[1]]
## {
## "contributors": [
## {
## "createdOn": "2024-08-21T00:58:09.785Z",
## "principalId": "3434599"
## }
## ],
## "createdOn": "2024-08-21T00:58:09.785Z",
## "entityBundleJSON": "{\"entity\":{\"name\":\"filec1c817a4c11\",\"id\":\"syn62283519\",\"etag\":\"0d647c72-4afc-47c2-b76b-7ad6fa0efa3b\",\"createdOn\":\"2024-08-21T00:58:09.555Z\",\"modifiedOn\":\"2024-08-21T00:58:09.555Z\",\"createdBy\":\"3434599\",\"modifiedBy\":\"3434599\",\"parentId\":\"syn62283518\",\"concreteType\":\"org.sagebionetworks.repo.model.FileEntity\",\"versionNumber\":1,\"versionLabel\":\"1\",\"isLatestVersion\":true,\"dataFileHandleId\":\"146071298\"},\"entityType\":\"file\",\"annotations\":{\"id\":\"syn62283519\",\"etag\":\"00000000-0000-0000-0000-000000000000\",\"annotations\":{}},\"fileHandles\":[{\"id\":\"146071298\",\"etag\":\"333c469e-cad9-4db7-8e96-473a2aaade8d\",\"createdBy\":\"3434599\",\"createdOn\":\"2024-08-21T00:58:09.000Z\",\"modifiedOn\":\"2024-08-21T00:58:09.000Z\",\"concreteType\":\"org.sagebionetworks.repo.model.file.S3FileHandle\",\"contentType\":\"application/octet-stream\",\"contentMd5\":\"3f466b7f85d184292a68cea1c4f7cfc2\",\"fileName\":\"filec1c817a4c11\",\"storageLocationId\":1,\"contentSize\":27,\"status\":\"AVAILABLE\",\"bucketName\":\"proddata.sagebase.org\",\"key\":\"3434599/d1f78b1a-1f0c-4001-b107-91eb12bfe5b3/filec1c817a4c11\",\"isPreview\":false}]}",
## "entityId": "syn62283519",
## "evaluationId": "9615691",
## "id": "9748839",
## "name": "filec1c817a4c11",
## "userId": "3434599",
## "versionNumber": 1
## }
##
## [[1]][[2]]
## {
## "entityId": "syn62283519",
## "etag": "8c9b81ac-85aa-4ef7-ae9d-b6a058e1acc4",
## "id": "9748839",
## "modifiedOn": "2024-08-21T00:58:09.785Z",
## "status": "RECEIVED",
## "statusVersion": 0,
## "submissionAnnotations": {},
## "versionNumber": 1
## }
Retrieving submission by id:
# Not evaluating this section because of SYNPY-235
submission <- synGetSubmission(submission$id)
submission
Retrieving the submission status:
submissionStatus <- synGetSubmissionStatus(submission)
submissionStatus
## {
## "entityId": "syn62283519",
## "etag": "8c9b81ac-85aa-4ef7-ae9d-b6a058e1acc4",
## "id": "9748839",
## "modifiedOn": "2024-08-21T00:58:09.785Z",
## "status": "RECEIVED",
## "statusVersion": 0,
## "submissionAnnotations": {},
## "versionNumber": 1
## }
To view the annotations:
submissionStatus$submissionAnnotations
## {}
To update an annotation:
Query an evaluation:
queryString <- sprintf("query=select * from evaluation_%s LIMIT %s OFFSET %s'", eval$id, 10, 0)
synRestGET(paste("/evaluation/submission/query?", URLencode(queryString), sep = ""))
## $headers
## [1] "scopeId" "entityId" "userId" "createdOn"
## [5] "versionNumber" "modifiedOn" "submitterAlias" "cancelControl"
## [9] "submitterId" "teamId" "name" "cancelRequested"
## [13] "objectId" "status" "canCancel"
##
## $rows
## $rows[[1]]
## $rows[[1]]$values
## $rows[[1]]$values[[1]]
## [1] "9615691"
##
## $rows[[1]]$values[[2]]
## [1] "syn62283519"
##
## $rows[[1]]$values[[3]]
## [1] "3434599"
##
## $rows[[1]]$values[[4]]
## [1] "1724201889785"
##
## $rows[[1]]$values[[5]]
## [1] "1"
##
## $rows[[1]]$values[[6]]
## [1] "1724201889785"
##
## $rows[[1]]$values[[7]]
## NULL
##
## $rows[[1]]$values[[8]]
## [1] "{\"submissionId\":\"9748839\",\"userId\":\"3434599\",\"canCancel\":false,\"cancelRequested\":false}"
##
## $rows[[1]]$values[[9]]
## [1] "3434599"
##
## $rows[[1]]$values[[10]]
## NULL
##
## $rows[[1]]$values[[11]]
## [1] "filec1c817a4c11"
##
## $rows[[1]]$values[[12]]
## [1] "false"
##
## $rows[[1]]$values[[13]]
## [1] "9748839"
##
## $rows[[1]]$values[[14]]
## [1] "RECEIVED"
##
## $rows[[1]]$values[[15]]
## [1] "false"
##
##
##
##
## $totalNumberOfResults
## [1] 1
To learn more about writing an evaluation query, please see: http://docs.synapse.org/rest/GET/evaluation/submission/query.html
For more information, please see:
?synGetEvaluation
?synSubmit
?synGetSubmissionBundles
?synGetSubmission
?synGetSubmissionStatus
By default, data sets in Synapse are private to your user account, but they can easily be shared with specific users, groups, or the public.
Retrieve the sharing setting on an entity:
synGetAcl(project, principal_id = "273950")
## list()
The first time an entity is shared, an ACL object is created for that entity. Let’s make project public:
acl <- synSetPermissions(project, principalId = 273949, accessType = list("READ"))
acl
## $id
## [1] "syn62283518"
##
## $creationDate
## [1] "2024-08-21T00:58:04.779Z"
##
## $etag
## [1] "37684725-f7e8-45a6-a40b-8f82f995afc5"
##
## $resourceAccess
## $resourceAccess[[1]]
## $resourceAccess[[1]]$principalId
## [1] 3434599
##
## $resourceAccess[[1]]$accessType
## [1] "MODERATE" "READ" "CHANGE_PERMISSIONS"
## [4] "UPDATE" "CHANGE_SETTINGS" "CREATE"
## [7] "DELETE" "DOWNLOAD"
##
##
## $resourceAccess[[2]]
## $resourceAccess[[2]]$principalId
## [1] 273949
##
## $resourceAccess[[2]]$accessType
## [1] "READ"
Now public can read:
synGetAcl(project, principal_id = 273950)
## [1] "READ"
Get permissions will obtain more human-readable view of an entity’s permissions
permissions = synGetPermissions(project)
permissions$can_view
## [1] TRUE
?synGetAcl
?synSetPermissions
?synGetPermissions
synDelete(project)
## NULL
A file view can be defined by its scope. It allows querying for FileEntity within the scope using a SQL-like syntax. Please visit the Views vignettes for more information.
These methods enable access to the Synapse REST(ish) API taking care of details like endpoints and authentication. See the REST API documentation.
?synRestGET
?synRestPOST
?synRestPUT
?synRestDELETE
We provide some utility functions in the synapserutils package:
Please visit the synapserutils Github repository for instructions on how to download.
For more information see the Synapse User Guide.