CLAM Formats

class clam.common.converters.CharEncodingConverter(id, **kwargs)

acceptforinput = [<class 'clam.common.formats.PlainTextFormat'>]

acceptforoutput = [<class 'clam.common.formats.PlainTextFormat'>]

convertforinput(filepath, metadata=None): Convert from target format into one of the source formats. Relevant if converters are used in InputTemplates. Metadata already is metadata for the to-be-generated file.

convertforoutput(outputfile): Convert from one of the source formats into target format. Relevant if converters are used in OutputTemplates. Outputfile is a CLAMOutputFile instance.

label = 'CharEncodingConverter'

class clam.common.converters.MSWordConverter(id, **kwargs)

acceptforinput = [<class 'clam.common.formats.PlainTextFormat'>]

convertforinput(filepath, metadata=None): Convert from target format into one of the source formats. Relevant if converters are used in InputTemplates. Metadata already is metadata for the to-be-generated file. ‘filepath’ is both the source and the target file, the source file will be erased and overwritten with the conversion result!

converttool = 'catdoc'

class clam.common.converters.PDFtoHTMLConverter(id, **kwargs)

acceptforinput = [<class 'clam.common.formats.HTMLFormat'>]

convertforinput(filepath, metadata=None): Convert from target format into one of the source formats. Relevant if converters are used in InputTemplates. Metadata already is metadata for the to-be-generated file. ‘filepath’ is both the source and the target file, the source file will be erased and overwritten with the conversion result!

converttool = 'pdftohtml'

class clam.common.converters.PDFtoTextConverter(id, **kwargs)

acceptforinput = [<class 'clam.common.formats.PlainTextFormat'>]

convertforinput(filepath, metadata=None): Convert from target format into one of the source formats. Relevant if converters are used in InputTemplates. Metadata already is metadata for the to-be-generated file. ‘filepath’ is both the source and the target file, the source file will be erased and overwritten with the conversion result!

converttool = 'pdftotext'

Read the Docs v: latest

Versions: latest; stable

Downloads: pdf; html; epub

On Read the Docs: Project Home; Builds