User Manual

This section describes how your Users may communicate with the API once it is deployed.

The examples are demonstrating using curl, but they are free to use any HTTP library out there.

Integration tests located at src/tests/integration are also providing plenty of resources.

Base routes

  • OpenAPI schema

curl ${SERVER_ENDPOINT}/schema

Yields full API schema in JSON form.

  • Liveness endpoint

curl ${SERVER_ENDPOINT}/live

Returns live with a 200 status code.

  • Login

curl ${SERVER_ENDPOINT}/login

Returns a keycloak login url. Visiting and authenticating there gives you an authentication JSON Web Token that you shall provide in header to hit protected routes, see authenticated next.

curl ${SERVER_ENDPOINT}/login
  • Authenticated

export TOKEN=ey....
curl -S "${SERVER_ENDPOINT}/authenticated"\
     -H "Authorization: Bearer ${TOKEN}"

This routes checks token and returns requesting user info: username, groups and projects.

  • Syn Ack

This route is the callback login route located at ${SERVER_ENDPOINT}/syn_ack and it is not meant to be accessed directly.

Entity routes

For each entity being managed by a ResourceController, the following routes are supported.

Note

  • As per REST standard, each entity is accessible under a resource prefix which is the name of the entity in plural form.

  • URLs end without trailing slashes

  • In the case of a multi-indexed entity (i.e. composite primary key), {id} refers to primary key elements separated by underscore symbol _.

  • POST

curl -d ${JSON_OBJECT}\
     ${SERVER_ENDPOINT}/my_resources

Supports submitting a resource and or a list of resource with nested resources.

Flexible write:

This endpoint works as a flexible write operation. It supports a mixin input of old and new data:

  • new data shall comply to resource Shema ruleset.

  • reference old data by setting (at least) primary key values in the dict.

    • Other fields will be applied as an update.

  • GET

one

curl ${SERVER_ENDPOINT}/my_resources/{id}

or all

curl ${SERVER_ENDPOINT}/my_resources
  • PUT

Not available for versioned resources, see Versioning below.

curl -X PUT\
     -H "Content-Type: application/json"\
     -d ${UPDATED_JSON_OBJECT}\
     ${SERVER_ENDPOINT}/my_resources/{id}
  • DELETE

curl -X DELETE\
     ${SERVER_ENDPOINT}/my_resources/{id}

Groups

Group key is its path according to top level groups. Since / is a reserved route character it is replaced by double underscore: __ (with no prefix).

E.g. parent__child__grandchild

Versioning

When a table is inheriting from Versioned e.g Dataset in our demo, associated controller exposes an extra route: POST /my_versioned_resources/{id}_{version}/release.

This triggers creation of a new row with a version increment.

Note

POST /release is the way of updating versioned resources. The endpoint PUT / (a.k.a update) is available, however it is meant to be used in order to update nested objects and collections of that resource. Thus, any attempt at updating a versioned resource through either PUT / or POST / shall raise an error.

E.g.

curl -X POST ${SERVER_ENDPOINT}/my_file_resources/{id}_{version}/release

OR to pass in an update for the new version.

curl -d '{"name": "new_name"}' ${SERVER_ENDPOINT}/my_file_resources/{id}_{version}/release

Note

In the case of a resource both Versioned and S3File, POST /release will generate a new upload form and set ready flag to false.

Filtering

When requesting all resources under a prefix (i.e. GET /my_resources) it is possible to filter results by appending a QueryString starting with ? and followed by:

  • field=value pairs, separated by &

    • Use field=val1,val2,val3 to OR between multiple values

    • Use nested.field=val to select on a nested attribute field

    • Use * in a string attribute for wildcards

  • numeric operators field.op([value])

    • [lt, le, gt, ge] are supported with a value.

    • [min, max] are supported without a value

Note

When querying with curl, don’t forget to escape & symbol or enclose the whole url in quotes, else your scripting language may intepret it as several commands.

Query a nested collection

Alternatively you may get a resource nested collection like this

curl ${SERVER_ENDPOINT}/my_resources/{id}/{collection}

It also supports partial results. i.e. by appending ?fields=f1,...,fn

File management

Files are stored leveraging an S3 bucket instance. Upload and Downloads are requested directly there through boto3 presigned-urls.

  • Upload

On creating a new /file resource, it is required that you pass in the size in bytes that you can obtain from its descriptor.

The resource shall contain a nested dictionary called upload composed of parts, containing presigned form for direct file upload.

Next we distinguish two cases:

Small files

In the case of a small file, i.e. less than 100MB there is a single part, containing a presigned POST and you may simply use the form to perform the upload.

The following snippet demonstrates how to do this in python:

upload_small_file.py
import requests

# obtained from file['upload']['parts'][0]['form'] creation response
post = {'url': ..., 'fields': ...}

file_path = "/path/to/my_file.ext"
file_name = "my_file.ext"

with open(file_path, 'rb') as f:
    files = {'file': (file_name, f)}
    http_response = requests.post(
        post['url'],
        data=post['fields'],
        files=files,
        verify=True,
        allow_redirects=True)
    assert http_response.status_code == 201

Upon completion, BioDM will be notified back via a callback, so the file is immediately available.

Large files

For large files, several parts will be present. Each allowing you to upload a chunk of size=100MB, possibly less for the last one.

For each part successfuly uploaded, the bucket will return you an ETag that you have to keep track of and associate with the correct part_number.

Ultimately, the process has to be completed by submitting that mapping in order for the bucket to aggregate all chunks into a file stored on the bucket. The bucket does not supports passing a callback for a part_upload.

Similarely here is an example using python:

upload_large_file.py
import requests

CHUNK_SIZE = 100*1024**2 # 100MB
parts_etags = []
host: str = ... # Server instance endpoint
file_id = ... # obtained from file['id']
upload_forms = [{'part_number': 1, 'form': ...}, ...] # obtained from file['upload']['parts']

# Upload file
with open(big_file_path, 'rb') as file:
    for part in upload_forms:
        part_data = file.read(CHUNK_SIZE) # Fetch one chunk.
        response = requests.put(
            part['form'], data=part_data, headers={'Content-Encoding': 'gzip'}
        )
        assert response.status_code == 200

        # Get etag and remove trailing quotes to not disturb subsequent (json) loading.
        etag = response.headers.get('ETag', "").replace('"', '')
        # Build mapping.
        parts_etags.append({'PartNumber': part['part_number'], 'ETag': etag})

# Send completion notice with the mapping.
complete = requests.put(
    f"{host}/files/{file_id}/complete_multipart",
    data=json.dumps(parts_etags).encode('utf-8')
)
assert complete.status_code == 201
assert 'Completed.' in complete.text

Note

This example above is a quite naive approach. For very large files, you should make use of a concurrency library (such as concurrent.futures or multiprocessing in python) in order to speed up that process, as parts can be uploaded in any order.

  • Download

Calling GET /my_file_resources will only return associated metadata (and the upload form(s) while it is still in prending state).

To download a file use the following endpoint.

curl ${SERVER_ENDPOINT}/my_file_resources/{id}/download

That will return a url to directly download the file via GET request.

Note

Download urls are coming back with a redirect header, thus you may use allow_redirects=True flag or equivalent when visiting this route to download in one go.

User permissions

When a Composition/One-to-Many relationship is flagged with permissions as described in Fine: Dynamic user owned permissions a new field perm_{relationship_name} is available for that resource.

E.g. Dataset resource in our example, would have an extra field perm_files.

A Permission is holding a ListGroup object for each enabled verbs. ListGroup being a route-less core table, allowing to manage lists of groups.

E.g. In our example, CREATE/READ/DOWNLOAD are enabled, hence a JSON representation of a dataset with its permissions looks like this, where leaving “read” empty means it will only account for decorator permissions if provided and left public otherwise.

{
    "name": "ds_test",
    "owner": {
        "username": "my_dataset_owner"
    },
    "perm_files": {
        "write": {
            "groups": [
                {"path": "genomics_team"},
                {"path": "IT_team"},
                {"..."}
            ]
        },
        "download": {
            "groups": [{"..."}]
        }
    }
}

Note

  • Passing a top level group will allow all descending children group for that verb/resource tuple.

  • Permissions are taken into account if and only if keycloak functionalities are enabled.

    • Without keycloak, no token exchange -> No way of getting back protected data.