Minimal Demo
Say you or your organization needs to store Datasets
, each containing a set of File
we will go over the following minimal example.
import marshmallow as ma
from marshmallow import fields as mf
import sqlalchemy as sa
from sqlalchemy import orm as sao
from typing import List
import uvicorn
import biodm as bd
from biodm import config
from biodm.components.controllers import ResourceController, S3Controller
# Tables
class Dataset(bd.components.Versioned, bd.components.Base):
id = Column(Integer, primary_key=True, autoincrement=not 'sqlite' in str(config.DATABASE_URL))
name : sao.Mapped[str] = sa.Column(sa.String(50), nullable=False)
description : sao.Mapped[str] = sa.Column(sa.String(500), nullable=False)
username_owner: sao.Mapped[int] = sa.Column(sa.ForeignKey("USER.username"), nullable=False)
owner : sao.Mapped["User"] = sao.relationship(foreign_keys=[username_owner])
files : sao.Mapped[List["File"]] = sao.relationship(back_populates="dataset")
class File(bd.components.S3File, bd.components.Base):
id = sa.Column(sa.Integer, primary_key=True)
dataset_id = sa.Column(sa.ForeignKey("DATASET.id"), nullable=False)
dataset: sao.Mapped["Dataset"] = sao.relationship(back_populates="files", single_parent=True, foreign_keys=[dataset_id])
# Schemas
class DatasetSchema(ma.Schema):
id = mf.Integer()
version = mf.Integer()
name = mf.String(required=True)
description = mf.String(required=False)
username_owner = mf.String(required=True, load_only=True)
owner = mf.Nested("UserSchema")
files = mf.List(mf.Nested("FileSchema"))
class FileSchema(ma.Schema):
id = mf.Integer()
filename = mf.String(required=True)
extension = mf.String(required=True)
size = mf.Integer(required=True)
url = mf.String( dump_only=True)
ready = mf.Bool( dump_only=True)
dataset_id = mf.Integer(required=True, load_only=True)
dataset = mf.Nested("DatasetSchema")
# Controllers
class DatasetController(ResourceController):
def __init__(self, app) -> None:
super().__init__(app=app)
class FileController(S3Controller):
def __init__(self, app) -> None:
super().__init__(app=app)
# Server
def main():
return bd.Api(debug=True, controllers=[DatasetController, FileController],)
if __name__ == "__main__":
uvicorn.run(
f"{__name__}:main", factory=True,
host=bd.config.SERVER_HOST, port=bd.config.SERVER_PORT,
loop="uvloop", log_level="debug", access_log=False
)
And voilà, If your use case is very basic it is a simple as that. This tiny codebase
deploys a server with two new RESTful resources, accessible respectively at /files
and
/datasets
.
Importantly /schema
, /files/schema
, /datasets/schema
will let you, or an
OpenAPISchema v3.0.0
compliant tool, discover all possible routes.
Moreover, it comes with two preset resources /users
, /groups
that are required for
permission management down the road.
All incoming Requests are logged in History
table
Let’s examine some key points:
Naming convention
Sticking to the simple naming convention introduced above for the three required components to
add a new respource lets BioDM
easily infer their relationships from name lookup in registries.
Table
: name of the resource in singularSchema
: same prefixed by SchemaController
: same prefixed by controller
Note
This is the Zen approach. You may however name those as you please and manually set relationships
in Controller’s __init__
method.
Base Resource
For a resource that is not interacting with an external serivce, this is covered by pairing
BioDM
’s SQLAlchemy
Declarative Base
and ResourceController
components.
File management
Note
At the moment, s3 protocol, using pre-signed url, only.
S3File
base class set on a table, populates it with a set of
Column
fields essential for the task.
All but ready
flag may be seen on FileSchema
.
S3Controller
will then populate upload_form
field when creating a new resource at /files
.
This is a stringified form for direct upload on the storage bay.
Once the file is uploaded, readiness flag is set to true.
From that point on, urls to download the file can be obtained by visiting
GET /files/{id}/download
Versioning
Dataset inheriting from Versioned
will populate an extra
version
column as primary key, making the overall key ('id', 'version',)
Versioned resources are read-only, eventual updates have to pass by
PUT /datasets/{id}_{version}/release
route that will produce a new resource, incrementing version.
Note
Nothing prevents you from expanding further on that primary key in your table class.
Warning
SQLite
doesn’t support autoincrement in the case of a composite primary key.
BioDM
will populate the canonical leading id
column at the cost of an extra request
to fetch max id before inserting. Other configuration will yield errors.