CLAM Data API

The CLAM Data API is at the heart of CLAM. It contains various data structures CLAM uses, such as the Profiles, Input Templates, Output Templates, Metadata, etc… This API is used by CLAM internally but is also designed to be used in your system wrapper scripts and clients!

class clam.common.data.AbstractConverter(id, **kwargs)
acceptforinput = []
acceptforoutput = []
convertforinput(filepath, metadata)

Convert from target format into one of the source formats. Relevant if converters are used in InputTemplates. Metadata already is metadata for the to-be-generated file. ‘filepath’ is both the source and the target file, the source file will be erased and overwritten with the conversion result!

convertforoutput(outputfile)

Convert from one of the source formats into target format. Relevant if converters are used in OutputTemplates. Sourcefile is a CLAMOutputFile instance.

label = '(ERROR: label not overriden from AbstractConverter!)'
class clam.common.data.AbstractMetaField(key, value=None)

This abstract class is the basis for derived classes representing metadata fields of particular types. A metadata field is in essence a (key, value) pair. These classes are used in output templates (described by the XML tag meta). They are not used by CLAMMetaData

static fromxml(node)

Static method returning an MetaField instance (any subclass of AbstractMetaField) from the given XML description. Node can be a string or an etree._Element.

resolve(data, parameters, parentfile, relevantinputfiles)
xml(operator='set', indent='')

Serialize the metadata field to XML

class clam.common.data.Action(*args, **kwargs)

This action paradigm allows you to specify actions, each action allows you to tie a URL to a script or Python function, and may take a number of parameters you explicitly specify. Each action is strictly independent of other actions, and completely separate of the projects, and by extension also of any files within projects and any profiles. Unlike projects, which may run over a long time period and are suited for batch processing, actions are intended for real-time communication. Typically they should return an answer in at most a couple of seconds.i

Positional Arguments:

  • a Parameter instance or a Viewer instance.

Keyword arguments:

  • id - The ID of the action (mandatory)

  • name - A human readable name, used in the interface

  • description - A human readable description of the action, used in the interface

  • command - The command to run, this is analogous to the COMMAND in the service configuration file and may contain parameters (most notably $PARAMETERS)

  • function - The python function to call (use either this or command)

  • parameters - List of parameter instances. By defailt, they will be passed in the order defined to the command or function.

  • parameterstyle - Set to positional (default) or keywords. Changes the way arguments are passed to the function.

  • viewers - List of viewer instances.

  • mimetype - The mimetype of the output (when no viewers are used).

  • method - The HTTP Method to allow, set to string GET, POST or the None value to allow all methods.

  • returncodes404 - A list of command exit codes that will be mapped to HTTP 404 Not Found (defaults to: [4])

  • returncodes403 - A list of command exit codes that will be mapped to HTTP 403 Permission Denied (defaults to: [3])

  • returncodes200 - A list of command exit codes that will be mapped to HTTP 200 Ok (defaults to: [0])

  • allowanonymous - Boolean indicating whether this action can be used without any authentication.

static fromxml(node)

Static method returning an Action instance from the given XML description. Node can be a string or an etree._Element.

xml(indent='')
exception clam.common.data.AuthRequired(msg='')

Raised on HTTP 401 - Authentication Required error. Service requires authentication, pass user credentials in CLAMClient constructor.

exception clam.common.data.AuthenticationRequired

This Exception is raised when authentication is required but has not been provided

exception clam.common.data.BadRequest

Raised on HTTP 400 - Bad Request erors

class clam.common.data.CLAMData(xml, client=None, localroot=False, projectpath=None, loadmetadata=True)

Instances of this class hold all the CLAM Data that is automatically extracted from CLAM XML responses. Its member variables are:

  • baseurl - The base URL to the service (string)

  • projecturl - The full URL to the selected project, if any (string)

  • status - Can be: clam.common.status.READY (0),``clam.common.status.RUNNING`` (1), or clam.common.status.DONE (2)

  • statusmessage - The latest status message (string)

  • completion - An integer between 0 and 100 indicating

    the percentage towards completion.

  • parameters - List of parameters (but use the methods instead)

  • profiles - List of profiles ([ Profile ])

  • program - A Program instance (or None). Describes the expected outputfiles given the uploaded inputfiles. This is the concretisation of the matching profiles.

  • input - List of input files ([ CLAMInputFile ]); use inputfiles() instead for easier access

  • output - List of output files ([ CLAMOutputFile ])

  • projects - List of project IDs ([ string ])

  • corpora - List of pre-installed corpora

  • errors - Boolean indicating whether there are errors in parameter specification

  • errormsg - String containing an error message

  • oauth_access_token - OAuth2 access token (empty if not used, string)

Note that depending on the current status of the project, not all may be available.

baseurl

String containing the base URL of the webserivice

commandlineargs()

Obtain a string of all parameters, using the paramater flags they were defined with, in order to pass to an external command. This is shell-safe by definition.

corpora

List of pre-installed corpora

errormsg

String containing an error message if an error occured

errors

Boolean indicating whether there are errors in parameter specification

get(parameter_id, default=None)
input

List of input files ([ CLAMInputFile ])

inputfile(inputtemplate=None)

Return the inputfile for the specified inputtemplate, if inputtemplate=None, inputfile is returned regardless of inputtemplate. This function may only return 1 and returns an error when multiple input files can be returned, use inputfiles() instead.

inputfiles(inputtemplate=None)

Generator yielding all inputfiles for the specified inputtemplate, if inputtemplate=None, inputfiles are returned regardless of inputtemplate.

inputtemplate(template_id)

Return the inputtemplate with the specified ID. This is used to resolve a inputtemplate ID to an InputTemplate object instance

inputtemplates()

Return all input templates as a list (of InputTemplate instances)

loadmetadata

True)

Type:

Automatically load metadata for input and output files? (default

matchingprofiles()

Generator yielding all matching profiles

output

List of output files ([ CLAMOutputFile ])

outputtemplate(template_id)

Get an output template by ID

parameter(parameter_id)

Return the specified global parameter (the entire object, not just the value)

parametererror()

Return the first parameter error, or False if there is none

parameters

This contains a list of (parametergroup, [parameters]) tuples.

parseresponse(xml, localroot=False)

Parses CLAM XML, there’s usually no need to call this directly

passparameters()

Return all parameters as {id: value} dictionary

profiles

List of profiles ([ Profile ])

program

Program instance. Describes the expected outputfiles given the uploaded inputfiles. This is the concretisation of the matching profiles.

projects

List of projects ([ string ])

projecturl

String containing the full URL to the project, if a project was indeed selected

status

The current status of the service, returns clam.common.status.READY (1), clam.common.status.RUNNING (2), or clam.common.status.DONE (3)

statusmessage

The current status of the service in a human readable message

class clam.common.data.CLAMFile(projectpath, filename, loadmetadata=True, client=None, requiremetadata=False)
attachviewers(profiles)

Attach viewers and converters to file, automatically scan all profiles for outputtemplate or inputtemplate

basedir = ''
copy(target, timeout=500)

Copy or download this file to a new local file

delete()

Delete this file

exists()
loadmetadata()

Load metadata for this file. This is usually called automatically upon instantiation, except if explicitly disabled. Works both locally as well as for clients connecting to a CLAM service.

metafilename()

Returns the filename for the metadata file (not full path). Only used for local files.

read()

Loads all lines in memory

readlines()

Loads all lines in memory

store(fileid=None, keep=False)

Put a file in temporary public storage, returns the ID if the file is local, returns a dictionary with keys ‘id’, ‘filename’ and ‘url’ if the file is remote.

validate()

Validate this file. Returns a boolean.

class clam.common.data.CLAMInputFile(projectpath, filename, loadmetadata=True, client=None, requiremetadata=False)
basedir = 'input'
class clam.common.data.CLAMMetaData(file, **kwargs)

A simple hash structure to hold arbitrary metadata. This is the basis for format classes.

allowcustomattributes = True
attributes = None
classmethod formatxml(indent='')

Render an XML representation of the format class

static fromxml(node, file=None)

Read metadata from XML. Static method returning an CLAMMetaData instance (or rather; the appropriate subclass of CLAMMetaData) from the given XML description. Node can be a string or an etree._Element.

httpheaders()

HTTP headers to output for this format. Yields (key,value) tuples. Should be overridden in sub-classes!

items()

Returns all items as (key, value) tuples

mimetype = 'text/plain'
save(filename)

Save metadata to XML file

schema = ''
validate()

Validate the metadata. Possibly extracts additional metadata from the actual file into the metadata file. This method calls a format’s custom validator() function which you can override per format, additionally it also validates any constraints that are set. The validatation method implements some caching so your validator() function is never called more than once.

validateconstraints()

Validates the constraints (if any). Called by validate(), no need to invoke directly

validator()

This method can be overriden on derived classes and has no implementation here, should return True or False. Additionaly, if there is metadata IN the actual file, this method should extract it and assign it to this object. Will be automatically called from constructor. Note that the file (CLAMFile) is accessible through self.file, which is guaranteerd to exist when this method is called.

xml(indent='')

Render an XML representation of the metadata

class clam.common.data.CLAMOutputFile(projectpath, filename, loadmetadata=True, client=None, requiremetadata=False)
basedir = 'output'
class clam.common.data.CLAMProvenanceData(serviceid, servicename, serviceurl, outputtemplate_id, outputtemplate_label, inputfiles, parameters=None, timestamp=None)

Holds provenance data

static fromxml(node)

Return a CLAMProvenanceData instance from the given XML description. Node can be a string or an lxml.etree._Element.

xml(indent='')

Serialise provenance data to XML. This is included in CLAM Metadata files

exception clam.common.data.ConfigurationError

This Exception is raised when authentication is required but has not been provided

class clam.common.data.Constraint(constrainttype, **kwargs)
static fromxml(node)

Static method returns a Constraint instance from the given XML description. Node can be a string or an etree._Element.

test(metadata)
xml(indent='')

Produce Constraint XML

class clam.common.data.CopyMetaField(key, value=None)

In CopyMetaField, the value is in the form of templateid.keyid, denoting where to copy from. If not keyid but only a templateid is specified, the keyid of the metafield itself will be assumed.

resolve(data, parameters, parentfile, relevantinputfiles)
xml(indent='')

Serialize the metadata field to XML

class clam.common.data.ForbidMeta(**kwargs)
exception clam.common.data.FormatError(value)

This Exception is raised when the CLAM response is not in the valid CLAM XML format

class clam.common.data.Forwarder(id, name, url, description='', type='zip', tmpstore=True, encodeurl=True)
exception clam.common.data.HTTPError

This Exception is raised when certain data (such a metadata), can’t be retrieved over HTTP

class clam.common.data.InputSource(**kwargs)
check()

Checks if this inputsource is usable in INPUTSOURCES

isdir()
isfile()
xml(indent='')
class clam.common.data.InputTemplate(template_id, formatclass, label, *args, **kwargs)

This class represents an input template. A slot with a certain format and function to which input files can be uploaded

static fromxml(node)

Static method returning an InputTemplate instance from the given XML description. Node can be a string or an etree._Element.

generate(file, validatedata=None, inputdata=None, user=None)

Convert the template into instantiated metadata, validating the data in the process and returning errors otherwise. inputdata is a dictionary-compatible structure, such as the relevant postdata. Return (success, metadata, parameters), error messages can be extracted from parameters[].error. Validatedata is a (errors,parameters) tuple that can be passed if you did validation in a prior stage, if not specified, it will be done automatically.

json()

Produce a JSON representation for the web interface

match(metadata, user=None)

Does the specified metadata match this template? returns (success,metadata,parameters)

matchingfiles(projectpath)

Checks if the input conditions are satisfied, i.e the required input files are present. We use the symbolic links .*.INPUTTEMPLATE.id.seqnr to determine this. Returns a list of matching results (seqnr, filename, inputtemplate).

validate(postdata, user=None)

Validate posted data against the inputtemplate

xml(indent='')

Produce Template XML

exception clam.common.data.NoConnection

Raised when a connection can’t be established

exception clam.common.data.NotFound(msg='')

Raised on HTTP 404 - Not Found Errors

class clam.common.data.OutputTemplate(template_id, formatclass, label, *args, **kwargs)
findparent(inputtemplates)

Find the most suitable parent, that is: the first matching unique/multi inputtemplate

static fromxml(node)

Static method return an OutputTemplate instance from the given XML description. Node can be a string or an etree._Element.

generate(profile, parameters, projectpath, inputfiles, provenancedata=None)

Yields (inputtemplate, inputfilename, inputmetadata, outputfilename, metadata) tuples

generatemetadata(parameters, parentfile, relevantinputfiles, provenancedata=None)

Generate metadata, given a filename, parameters and a dictionary of inputdata (necessary in case we copy from it)

getparent(profile)

Resolve a parent ID

xml(indent='')

Produce Template XML

class clam.common.data.ParameterCondition(**kwargs)
allpossibilities()

Returns all possible outputtemplates that may occur (recusrively applied)

evaluate(parameters)

Returns False if there’s no match, or whatever the ParameterCondition evaluates to (recursively applied!)

static fromxml(node)

Static method returning a ParameterCondition instance from the given XML description. Node can be a string or an etree._Element.

match(parameters)
xml(indent='')
exception clam.common.data.ParameterError(msg='')

Raised on Parameter Errors, i.e. when a parameter does not validate, is missing, or is otherwise set incorrectly.

class clam.common.data.ParameterMetaField(key, value=None)
resolve(data, parameters, parentfile, relevantinputfiles)
xml(indent='')

Serialize the metadata field to XML

exception clam.common.data.PermissionDenied(msg='')

Raised on HTTP 403 - Permission Denied Errors (but only if no CLAM XML response is provided)

class clam.common.data.Profile(*args)
static fromxml(node)

Return a profile instance from the given XML description. Node can be a string or an etree._Element.

generate(projectpath, parameters, serviceid, servicename, serviceurl)

Generate output metadata on the basis of input files and parameters. Projectpath must be absolute. Returns a Program instance.

match(projectpath, parameters)

Check if the profile matches all inputdata and produces output given the set parameters. Returns a boolean

matchingfiles(projectpath)

Return a list of all inputfiles matching the profile (filenames)

out(indent='')
outputtemplates()

Returns all outputtemplates, resolving ParameterConditions to all possibilities

xml(indent='')

Produce XML output for the profile

class clam.common.data.Program(projectpath, matchedprofiles=None)

A Program is the concretisation of Profile. It describes the exact output files that will be created on the basis of what input files. This is in essence a dictionary structured as follows: {outputfilename: (outputtemplate, inputfiles)} in which inputfiles is a dictionary {inputfilename: inputtemplate}

add(outputfilename, outputtemplate, inputfilename=None, inputtemplate=None)

Add a new path to the program

getinputfile(outputfile, loadmetadata=True, client=None, requiremetadata=False)

Grabs one input file for the specified output filename (raises a KeyError exception if there is no such output, StopIteration if there are no input files for it). Shortcut for getinputfiles()

getinputfiles(outputfile, loadmetadata=True, client=None, requiremetadata=False)

Iterates over all input files for the specified outputfile (you may pass a CLAMOutputFile instance or a filename string). Yields (CLAMInputFile,str:inputtemplate_id) tuples. The last three arguments are passed to its constructor.

getoutputfile(loadmetadata=True, client=None, requiremetadata=False)

Grabs one output file (raises a StopIteration exception if there is none). Shortcut for getoutputfiles()

getoutputfiles(loadmetadata=True, client=None, requiremetadata=False)

Iterates over all output files and their output template. Yields (CLAMOutputFile, str:outputtemplate_id) tuples. The last three arguments are passed to its constructor.

inputpairs(outputfilename)

Iterates over all (inputfilename, inputtemplate) pairs for a specific output filename

outputpairs()

Iterates over all (outputfilename, outputtemplate) pairs

update([E, ]**F) None.  Update D from dict/iterable E and F.

If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]

class clam.common.data.RawXMLProvenanceData(data)
xml()
class clam.common.data.RequireMeta(**kwargs)
exception clam.common.data.ServerError(msg='')

Raised on HTTP 500 - Internal Server Error. Indicates that something went wrong on the server side.

class clam.common.data.SetMetaField(key, value=None)
resolve(data, parameters, parentfile, relevantinputfiles)
xml(indent='')

Serialize the metadata field to XML

exception clam.common.data.TimeOut

Raised when a connection times out

class clam.common.data.UnsetMetaField(key, value=None)
resolve(data, parameters, parentfile, relevantinputfiles)
xml(indent='')

Serialize the metadata field to XML

exception clam.common.data.UploadError(msg='')

Raised when something fails during upload

clam.common.data.buildarchive(project, path, fmt)

Build a download archive, returns the full file path

clam.common.data.escape(s, quote)
clam.common.data.escapeshelloperators(s)
clam.common.data.getclamdata(filename, custom_formats=None, custom_viewers=None)

This function reads the CLAM Data from an XML file. Use this to read the clam.xml file from your system wrapper. It returns a CLAMData instance.

If you make use of CUSTOM_FORMATS, you need to pass the CUSTOM_FORMATS list as 2nd argument.

clam.common.data.getformats(profiles)
clam.common.data.loadconfig(callername, required=True)

This function loads an external configuration file. It is called directly by the service configuration script and complements the configuration specified there. The function in turn automatically searches for an appropriate configuration file (in several paths). Host and system specific configuration files are prioritised over more generic ones.

  • callername - A string representing the name of settings module. This is typically set to __name__

Example:

loadconfig(__name__)
clam.common.data.loadconfigfile(configfile, settingsmodule)

This function loads an external configuration file. It is usually not invoked directly but through loadconfig() which handles searching for the right configuration file in the right paths, with fallbacks.

clam.common.data.parsexmlstring(node)
clam.common.data.processhttpcode(code, allowcodes=None)

Return the success code or raises the appropriate exception when the code repesents an HTTP error code

clam.common.data.processparameter(postdata, parameter, user=None)
clam.common.data.processparameters(postdata, parameters, user=None)
clam.common.data.profiler(profiles, projectpath, parameters, serviceid, servicename, serviceurl, printdebug=None)

Given input files and parameters, produce metadata for outputfiles. Returns a list of matched profiles (empty if none match), and a program.

clam.common.data.resolveconfigvariables(value, settingsmodule)

Resolves standard environment variables, encoded in curly braces

clam.common.data.resolveinputfilename(filename, parameters, inputtemplate, nextseq=0, project=None)
clam.common.data.resolveoutputfilename(filename, globalparameters, localparameters, outputtemplate, nextseq, project, inputfilename)
clam.common.data.sanitizeparameters(parameters)

Construct a dictionary of parameters, for internal use only

clam.common.data.shellsafe(s, quote='', doescape=True)

Returns the value string, wrapped in the specified quotes (if not empty), but checks and raises an Exception if the string is at risk of causing code injection

clam.common.data.unescapeshelloperators(s)