CLAM Formats

class clam.common.formats.AlpinoXMLFormat(file, **kwargs)
attributes = {}
mimetype = 'text/xml'
name = 'Alpino XML'
schema = ''
schemaorg_type = 'TextDigitalDocument'
class clam.common.formats.BinaryDataFormat(file, **kwargs)
attributes = {}
mimetype = 'application/octet-stream'
name = 'Application-specific Binary Data'
schemaorg_type = 'DigitalDocument'
class clam.common.formats.CSVFormat(file, **kwargs)
attributes = {'encoding': StringParameter encoding, 'language': StringParameter language}
mimetype = 'text/csv'
name = 'Comma Separated Values'
schemaorg_type = 'SpreadsheetDigitalDocument'
class clam.common.formats.DCOIFormat(file, **kwargs)
attributes = {}
mimetype = 'text/xml'
name = 'DCOI format'
schema = ''
schemaorg_type = 'TextDigitalDocument'
class clam.common.formats.DjVuFormat(file, **kwargs)
attributes = {}
mimetype = 'image/x-djvu'
name = 'DjVu format'
schemaorg_type = 'ImageObject'
class clam.common.formats.ExampleFormat(file, **kwargs)

This is an Example format, please inspect its source code if you want to create custom formats!

allowcustomattributes = True
attributes = {}
httpheaders()

HTTP headers to output for this format. Yields (key,value) tuples.

mimetype = 'text/plain'
schema = None
schemaorg_type = 'DigitalDocument'
validator()

Implement your validator here, should return True or False. Additionaly, if there is metadata IN the actual file, this method should extract it and assign it to this object. Will be automatically called from constructor. Note that the file (CLAMFile) is accessible through self.file, which is guaranteerd to exist when this method is called.

class clam.common.formats.FoLiAXMLFormat(file, **kwargs)
attributes = {'chunk-annotation': StringParameter chunk-annotation, 'entity-annotation': StringParameter entity-annotation, 'lemma-annotation': StringParameter lemma-annotation, 'paragraph-annotation': StringParameter paragraph-annotation, 'pos-annotation': StringParameter pos-annotation, 'relation-annotation': StringParameter relation-annotation, 'sense-annotation': StringParameter sense-annotation, 'sentence-annotation': StringParameter sentence-annotation, 'syntax-annotation': StringParameter syntax-annotation, 'text-annotation': StringParameter text-annotation, 'token-annotation': StringParameter token-annotation, 'version': StringParameter version}
mimetype = 'text/xml'
name = 'FoLiA XML'
schema = ''
schemaorg_type = 'TextDigitalDocument'
validator()

This method can be overriden on derived classes and has no implementation here, should return True or False. Additionaly, if there is metadata IN the actual file, this method should extract it and assign it to this object. Will be automatically called from constructor. Note that the file (CLAMFile) is accessible through self.file, which is guaranteerd to exist when this method is called.

class clam.common.formats.FrogTSVFormat(file, **kwargs)
attributes = {'chunking': ChoiceParameter chunking, 'lemmatisation': ChoiceParameter lemmatisation, 'morphologicalanalysis': ChoiceParameter morphologicalanalysis, 'mwudetection': ChoiceParameter mwudetection, 'namedentities': ChoiceParameter namedentities, 'parsing': ChoiceParameter parsing, 'postagging': ChoiceParameter postagging, 'tokenisation': StaticParameter tokenisation: yes}
mimetype = 'text/plain'
name = 'Frog Tab Separated Values'
schemaorg_type = 'TextDigitalDocument'
class clam.common.formats.GifImageFormat(file, **kwargs)
attributes = {}
mimetype = 'image/gif'
name = 'Gif Image'
schemaorg_type = 'ImageObject'
class clam.common.formats.HTMLFormat(file, **kwargs)

HTML Format Definition. This format has one required attribute: encoding

attributes = {'encoding': StringParameter encoding, 'language': StringParameter language}
httpheaders()

HTTP headers to output for this format. Yields (key,value) tuples.

mimetype = 'text/html'
name = 'HTML Format'
schemaorg_type = 'TextDigitalDocument'
class clam.common.formats.JSONFormat(file, **kwargs)
mimetype = 'application/json'
name = 'JSON Format (generic, not further specified)'
schemaorg_type = 'DigitalDocument'
class clam.common.formats.JpegImageFormat(file, **kwargs)
attributes = {}
mimetype = 'image/jpeg'
name = 'Jpeg Image'
schemaorg_type = 'ImageObject'
class clam.common.formats.KBXMLFormat(file, **kwargs)
mimetype = 'text/xml'
name = 'Koninklijke Bibliotheek XML-formaat'
schema = ''
schemaorg_type = 'TextDigitalDocument'
class clam.common.formats.MP3AudioFormat(file, **kwargs)
attributes = {}
mimetype = 'audio/mpeg'
name = 'MP3 Audio File'
schemaorg_type = 'AudioObject'
class clam.common.formats.MSWordFormat(file, **kwargs)
attributes = {}
mimetype = 'application/msword'
name = 'Microsoft Word format'
schema = ''
schemaorg_type = 'TextDigitalDocument'
class clam.common.formats.MpegVideoFormat(file, **kwargs)
attributes = {}
mimetype = 'video/mpeg'
name = 'Mpeg Video'
schemaorg_type = 'VideoObject'
class clam.common.formats.OggAudioFormat(file, **kwargs)
attributes = {}
mimetype = 'audio/vorbis'
name = 'Ogg Vorbis Audio File'
schemaorg_type = 'AudioObject'
class clam.common.formats.OggVideoFormat(file, **kwargs)
attributes = {}
mimetype = 'audio/ogg'
name = 'Ogg Video File'
schemaorg_type = 'VideoObject'
class clam.common.formats.OpenDocumentTextFormat(file, **kwargs)
attributes = {}
mimetype = 'application/vnd.oasis.opendocument.text'
name = 'Open Document Text Format'
schemaorg_type = 'TextDigitalDocument'
class clam.common.formats.PDFFormat(file, **kwargs)
attributes = {}
mimetype = 'application/pdf'
name = 'PDF'
schemaorg_type = 'TextDigitalDocument'
class clam.common.formats.PlainTextFormat(file, **kwargs)

Plain Text Format Definition. This format has one required attribute: encoding

attributes = {'encoding': StringParameter encoding, 'language': StringParameter language}
httpheaders()

HTTP headers to output for this format. Yields (key,value) tuples.

mimetype = 'text/plain'
name = 'Plain Text Format'
class clam.common.formats.PngImageFormat(file, **kwargs)
attributes = {}
mimetype = 'image/png'
name = 'PNG Image'
schemaorg_type = 'ImageObject'
class clam.common.formats.TICCLShadowOutputXML(file, **kwargs)
mimetype = 'text/xml'
name = 'Ticcl Shadow Output'
schema = ''
schemaorg_type = 'TextDigitalDocument'
class clam.common.formats.TICCLVariantOutputXML(file, **kwargs)
mimetype = 'text/xml'
name = 'Ticcl Variant Output'
schema = ''
schemaorg_type = 'TextDigitalDocument'
clam.common.formats.TadpoleFormat

alias of FrogTSVFormat

class clam.common.formats.TiffImageFormat(file, **kwargs)
attributes = {}
mimetype = 'image/tiff'
name = 'Tiff Image'
schemaorg_type = 'ImageObject'
clam.common.formats.UndefinedXMLFormat

alias of XMLFormat

class clam.common.formats.WaveAudioFormat(file, **kwargs)
attributes = {}
mimetype = 'audio/vnd.wave'
name = 'Wave Audio File'
schemaorg_type = 'AudioObject'
class clam.common.formats.XMLFormat(file, **kwargs)
mimetype = 'text/xml'
name = 'XML Format (generic, not further specified)'
schema = ''
schemaorg_type = 'DigitalDocument'
class clam.common.formats.XMLStyleSheet(file, **kwargs)
attributes = {}
mimetype = 'application/xslt+xml'
name = 'XML Stylesheet'
schemaorg_type = 'DigitalDocument'
class clam.common.formats.ZIPFormat(file, **kwargs)
attributes = {}
mimetype = 'application/zip'
name = 'ZIP Archive'
schemaorg_type = 'Dataset'