Source code for inscriptis.annotation.output
r""":class:`AnnotationProcessor`\s transform annotations to an output format.
All AnnotationProcessor's implement the :class:`AnnotationProcessor` interface
by overwrite the class's :meth:`AnnotationProcessor.__call__` method.
.. note::
1. The AnnotationExtractor class must be put into a package with the
extractor's name (e.g., :mod:`inscriptis.annotation.output.*package*`)
and be named :class:`*PackageExtractor*` (see the examples below).
2. The overwritten :meth:`__call__` method may either extend the original
dictionary which contains the extracted text and annotations (e.g.,
:class:`~inscriptis.annotation.output.surface.SurfaceExtractor`) or
may replace it with an custom output (e.g.,
:class:`~inscriptis.annotation.output.html.HtmlExtractor` and
:class:`~inscriptis.annotation.output.xml.XmlExtractor`.
Currently, Inscriptis supports the following built-in AnnotationProcessors:
1. :class:`~inscriptis.annotation.output.html.HtmlExtractor` provides an
annotated HTML output format.
2. :class:`~inscriptis.annotation.output.xml.XmlExtractor` yields an output
which marks annotations with XML tags.
3. :class:`~inscriptis.annotation.output.surface.SurfaceExtractor` adds the
key `surface` to the result dictionary which contains the surface forms
of the extracted annotations.
"""
from typing import Dict, Any
[docs]
class AnnotationProcessor:
"""An AnnotationProcessor is called for formatting annotations."""
def __call__(self, annotated_text: Dict[str, str]) -> Any:
"""Format the given text and annotations.
Args:
annotated_text: a dictionary that contains the converted text and
all annotations that have been found.
Returns:
An output representation that has been changed according to the
AnnotationProcessor's design.
"""
raise NotImplementedError