CLAM Formats

class clam.common.converters.AbstractConverter(id, **kwargs)
acceptforinput = []
acceptforoutput = []
convertforinput(filepath, metadata)

Convert from target format into one of the source formats. Relevant if converters are used in InputTemplates. Metadata already is metadata for the to-be-generated file. ‘filepath’ is both the source and the target file, the source file will be erased and overwritten with the conversion result!

convertforoutput(outputfile)

Convert from one of the source formats into target format. Relevant if converters are used in OutputTemplates. Sourcefile is a CLAMOutputFile instance.

label = '(ERROR: label not overriden from AbstractConverter!)'
class clam.common.converters.CharEncodingConverter(id, **kwargs)
acceptforinput = [<class 'clam.common.formats.PlainTextFormat'>]
acceptforoutput = [<class 'clam.common.formats.PlainTextFormat'>]
convertforinput(filepath, metadata=None)

Convert from target format into one of the source formats. Relevant if converters are used in InputTemplates. Metadata already is metadata for the to-be-generated file.

convertforoutput(outputfile)

Convert from one of the source formats into target format. Relevant if converters are used in OutputTemplates. Outputfile is a CLAMOutputFile instance.

label = 'CharEncodingConverter'
class clam.common.converters.MSWordConverter(id, **kwargs)
acceptforinput = [<class 'clam.common.formats.PlainTextFormat'>]
convertforinput(filepath, metadata=None)
converttool = 'catdoc'
class clam.common.converters.PDFtoHTMLConverter(id, **kwargs)
acceptforinput = [<class 'clam.common.formats.HTMLFormat'>]
convertforinput(filepath, metadata=None)
converttool = 'pdftohtml'
class clam.common.converters.PDFtoTextConverter(id, **kwargs)
acceptforinput = [<class 'clam.common.formats.PlainTextFormat'>]
convertforinput(filepath, metadata=None)
converttool = 'pdftotext'