Overview

The synapser package provides an interface to Synapse, a collaborative workspace for reproducible data intensive research projects, providing support for:

  • integrated presentation of data, code and text
  • fine grained access control
  • provenance tracking

The synapser package lets you communicate with the Synapse platform to create collaborative data analysis projects and access data using the R programming language. Other Synapse clients exist for Python, Java, and the web browser.

If you’re just getting started with Synapse, have a look at the Getting Started guides for Synapse.

Good example projects are:

Installation

synapser is available as a ready-built package for Microsoft Windows and Mac OSX. For Linux systems, it is available to install from source. It can be installed or upgraded using the standard install.packages() command, adding the Sage Bionetworks R Archive Network (RAN) to the repository list, e.g.:

Alternatively, edit your ~/.Rprofile and configure your default repositories:

after which you may run install.packages without specifying the repositories:

Connecting to Synapse

To use Synapse, you’ll need to register for an account. The Synapse website can authenticate using a Google account, but you’ll need to take the extra step of creating a Synapse password to use the programmatic clients.

Once that’s done, you’ll be able to load the library and login:

library(synapser)
synLogin("me@nowhere.com", "secret")

You can also create a file .synapseConfig in your home directory containing login credentials: [authentication] username=me@nowhere.com password=secret

after which you can log in without typing your credentials:

For more details see the native reference documentation:

Accessing Data

To make the example below print useful information, we prepare a file:

Synapse identifiers are used to refer to projects and data which are represented by entity objects. For example, the entity above represents a tab-delimited file containing a 2 by 3 matrix. Getting the entity retrieves an object that holds metadata describing the matrix, and also downloads the file to a local cache:

View the entity’s metadata in the R console:

This is one simple way to read in a small matrix (we load just the first few rows):

View the entity in the browser:

synOnweb(synId)

Download Location

By default the download location will always be in the Synapse cache. You can specify the downloadLocation parameter.

entity <- synGet("syn00123", downloadLocation = "/path/to/folder")

For more details see the native reference documentation, e.g.:

Organizing Data in a Project

You can create your own projects and upload your own data sets. Synapse stores entities in a hierarchical or tree structure. Projects are at the top level and must be uniquely named:

Creating a folder:

dataFolder <- Folder("Data", parent = project)
dataFolder <- synStore(dataFolder)

Adding files to the project:

You can print the properties of an entity (such as the file we just created):

Most other properties are immutable, but you can change an entity’s name:

Update Synapse with the change:

You can list all children of an entity:

You can also filter by type:

You can avoid reading all children into memory at once by iterating through one at a time:

You can move files to a different parent:

newFolder <- Folder("New Parent", parent = project)
newFolder <- synStore(newFolder)

file <- synMove(file, newFolder)

Content can be deleted:

Deletion of a project will also delete its contents, in this case the folder:

In addition to simple data storage, Synapse entities can be annotated with key/value metadata, described in markdown documents (wikis), and linked together in provenance graphs to create a reproducible record of a data analysis pipeline.

For more details see the native reference documentation, e.g.:

Provenance

Synapse provides tools for tracking ‘provenance’, or the transformation of raw data into processed results, by linking derived data objects to source data and the code used to perform the transformation.

The Activity object represents the source of a data set or the data processing steps used to produce it. Using W3C provenance ontology terms, a result is generated by a combination of data and code which are either used or executed.

Creating an activity object:

Here, syn1234 and syn1235 might be two types of measurements on a common set of samples. Some whizzy clustering code might be referred to by syn4567.

Alternatively, you can build an activity up piecemeal:

act <- Activity(name = "clustering", description = "whizzy clustering")
act$used(c("syn12345", "syn12346"))
## NULL
act$executed("syn4567")
## NULL

The used and executed can reference entities in Synapse or URLs.

Entity examples:

URL examples:

Storing entities with provenance

The activity can be passed in when storing an Entity to set the Entity’s provenance:

project <- synGet(project$properties$id)
project <- synStore(project, activity = act)

We’ve now recorded that ‘project’ is the output of syn4567 applied to the data stored in syn1234 and syn1235.

Recording data source

The synStore() has shortcuts for specifying the used and executed lists directly. For example, when storing a data entity, it’s a good idea to record its source:

For more information:

Tables

Tables can be built up by adding sets of rows that follow a user-defined schema and queried using a SQL-like syntax. Please visit the Table vignettes for more information.

Evaluations

An Evaluation is a Synapse construct useful for building processing pipelines and for scoring predictive modeling and data analysis challenges.

Creating an Evaluation:

Retrieving the created Evaluation:

Submitting a file to an existing Evaluation:

List submissions:

submissions <- synGetSubmissionBundles(eval)
as.list(submissions)
## [[1]]
## [[1]][[1]]
## {
##   "contributors": [
##     {
##       "createdOn": "2018-10-02T00:35:36.961Z",
##       "principalId": "1"
##     }
##   ],
##   "createdOn": "2018-10-02T00:35:36.961Z",
##   "entityBundleJSON": "{\"entityType\":\"org.sagebionetworks.repo.model.FileEntity\",\"fileHandles\":[{\"contentMd5\":\"3f466b7f85d184292a68cea1c4f7cfc2\",\"bucketName\":\"devdata.sagebase.org\",\"fileName\":\"fileb4ea74c7439c\",\"createdBy\":\"1\",\"contentSize\":27,\"concreteType\":\"org.sagebionetworks.repo.model.file.S3FileHandle\",\"etag\":\"8e8f84d7-1fb9-4325-aef2-448e30d7caf9\",\"id\":\"233563\",\"storageLocationId\":1,\"createdOn\":\"2018-10-02T00:35:35.000Z\",\"contentType\":\"application/octet-stream\",\"key\":\"1/ff233d0d-273d-4fe2-ab1a-f7c438eed113/fileb4ea74c7439c\"}],\"annotations\":{\"longAnnotations\":{},\"blobAnnotations\":{},\"stringAnnotations\":{},\"etag\":\"00000000-0000-0000-0000-000000000000\",\"id\":\"syn9774250\",\"creationDate\":\"1538440535899\",\"uri\":\"/entity/syn9774250/annotations\",\"dateAnnotations\":{},\"doubleAnnotations\":{}},\"entity\":{\"accessControlList\":\"/repo/v1/entity/syn9774250/acl\",\"entityType\":\"org.sagebionetworks.repo.model.FileEntity\",\"annotations\":\"/repo/v1/entity/syn9774250/annotations\",\"uri\":\"/repo/v1/entity/syn9774250\",\"createdOn\":\"2018-10-02T00:35:35.899Z\",\"parentId\":\"syn9774249\",\"versionNumber\":1,\"dataFileHandleId\":\"233563\",\"modifiedOn\":\"2018-10-02T00:35:36.058Z\",\"versionLabel\":\"1\",\"createdBy\":\"1\",\"versions\":\"/repo/v1/entity/syn9774250/version\",\"name\":\"fileb4ea74c7439c\",\"concreteType\":\"org.sagebionetworks.repo.model.FileEntity\",\"etag\":\"fbbebba6-73e9-470d-a0e7-2af39f95318c\",\"modifiedBy\":\"1\",\"id\":\"syn9774250\",\"versionUrl\":\"/repo/v1/entity/syn9774250/version/1\"}}",
##   "entityId": "syn9774250",
##   "evaluationId": "9604225",
##   "id": "9608897",
##   "name": "fileb4ea74c7439c",
##   "userId": "1",
##   "versionNumber": 1
## }
## 
## [[1]][[2]]
## {
##   "entityId": "syn9774250",
##   "etag": "838fb4a3-5ab9-4567-8b9d-e3ee34f46124",
##   "id": "9608897",
##   "modifiedOn": "2018-10-02T00:35:36.961Z",
##   "status": "RECEIVED",
##   "statusVersion": 0,
##   "versionNumber": 1
## }

Retrieving submission by id:

Retrieving the submission status:

To view the annotations:

To update an annotation:

Query an evaluation:

To learn more about writing an evaluation query, please see: http://docs.synapse.org/rest/GET/evaluation/submission/query.html

For more information, please see:

Sharing Access to Content

By default, data sets in Synapse are private to your user account, but they can easily be shared with specific users, groups, or the public.

Retrieve the sharing setting on an entity:

synGetPermissions(project, principalId = 273950)
## list()

The first time an entity is shared, an ACL object is created for that entity. Let’s make project public:

Now public can read:

synGetPermissions(project, principalId = 273950)
## [[1]]
## [1] "READ"

File Views

A file view can be defined by its scope. It allows querying for FileEntity within the scope using a SQL-like syntax. Please visit the File View vignettes for more information.

Accessing the API Directly

These methods enable access to the Synapse REST(ish) API taking care of details like endpoints and authentication. See the REST API documentation.

Synapse Utilities

We provide some utility functions in the synapserutils package:

  • Copy Files, Folders, Tables, Links, Projects, and Wiki Pages.
  • Upload data to Synapse in bulk.
  • Download data from Synapse in bulk.

Please visit the synapserutils Github repository for instructions on how to download.

More information

For more information see the Synapse User Guide.