CLAM Formats

class clam.common.converters.CharEncodingConverter(id, **kwargs)
acceptforinput = [<class 'clam.common.formats.PlainTextFormat'>]
acceptforoutput = [<class 'clam.common.formats.PlainTextFormat'>]
convertforinput(filepath, metadata=None)

Convert from target format into one of the source formats. Relevant if converters are used in InputTemplates. Metadata already is metadata for the to-be-generated file.

convertforoutput(outputfile)

Convert from one of the source formats into target format. Relevant if converters are used in OutputTemplates. Outputfile is a CLAMOutputFile instance.

label = 'CharEncodingConverter'
class clam.common.converters.MSWordConverter(id, **kwargs)
acceptforinput = [<class 'clam.common.formats.PlainTextFormat'>]
convertforinput(filepath, metadata=None)

Convert from target format into one of the source formats. Relevant if converters are used in InputTemplates. Metadata already is metadata for the to-be-generated file. ‘filepath’ is both the source and the target file, the source file will be erased and overwritten with the conversion result!

converttool = 'catdoc'
class clam.common.converters.PDFtoHTMLConverter(id, **kwargs)
acceptforinput = [<class 'clam.common.formats.HTMLFormat'>]
convertforinput(filepath, metadata=None)

Convert from target format into one of the source formats. Relevant if converters are used in InputTemplates. Metadata already is metadata for the to-be-generated file. ‘filepath’ is both the source and the target file, the source file will be erased and overwritten with the conversion result!

converttool = 'pdftohtml'
class clam.common.converters.PDFtoTextConverter(id, **kwargs)
acceptforinput = [<class 'clam.common.formats.PlainTextFormat'>]
convertforinput(filepath, metadata=None)

Convert from target format into one of the source formats. Relevant if converters are used in InputTemplates. Metadata already is metadata for the to-be-generated file. ‘filepath’ is both the source and the target file, the source file will be erased and overwritten with the conversion result!

converttool = 'pdftotext'