.. _ctlr-article-label:

Article package
===============

.. toctree::
    :maxdepth: 1
    :titlesonly:
    :caption: Full API

    article.api.rst

The ``article`` package is responsible for handling the articles you
have collected from your website.

.. hint:: In case you think you have found a mistake in this package,
          contact your assistant. Those who considerably improve this module
          will get additional bonuses.

``article`` module
------------------

``article`` module represents the methods to work with the
:py:class:`core_utils.ctlr.article.article.Article` abstraction.

:py:class:`core_utils.ctlr.article.article.Article` class is responsible
for storing article raw, meta and ``.conllu`` data and working with it.
During the implementation of Lab 5 and Lab 6, you should use the methods of this class.
The lab descriptions will contain hints on when to use one or another class method,
but we advise you to study all the methods: they may be useful to you.

.. note:: Do not forget to create a new instance of the
          :py:class:`core_utils.ctlr.article.article.Article` class to use its methods.

In addition to the ``Article`` class, the module has:

1. :py:func:`core_utils.ctlr.article.article.date_from_meta` function which converts
   text date to ``datetime`` object.
2. :py:func:`core_utils.ctlr.article.article.get_article_id_from_filepath` function which
   extracts the article id from its path.
3. :py:class:`core_utils.ctlr.article.article.ArtifactType` abstraction which provides
   types of artifacts that can be created by text processing pipelines, such as ``CLEANED``,
   ``UDPIPE_CONLLU``, and ``STANZA_CONLLU``.

The description of each artifact you can find in :ref:`dataset-label`.

.. note:: You should utilize attributes of :py:class:`core_utils.ctlr.article.article.ArtifactType`
          in order to save processed versions of files. Otherwise, if you pass a string
          itself to some saving function, your code will be much more fragile.

``io`` module
-------------

``io`` module provide functions to work with input/output operations for
the :py:class:`core_utils.ctlr.article.article.Article` abstraction.

It consists of the following functions, which are grouped by usage in the labs:

Lab_5
~~~~~

-  :py:func:`core_utils.ctlr.article.io.to_raw` - use to save raw texts of each article;
-  :py:func:`core_utils.ctlr.article.io.to_meta` - use to save meta-information about each article.

Lab_6
~~~~~

-  :py:func:`core_utils.ctlr.article.io.from_raw` - use to load raw texts and
   create the :py:class:`core_utils.ctlr.article.article.Article` abstraction;
-  :py:func:`core_utils.ctlr.article.io.to_cleaned` - use to save cleaned texts
   of each article, i.e. lowercased texts with no punctuation;
-  :py:func:`core_utils.ctlr.article.io.to_meta` - use to save POS information
   and required syntactic patterns from each article;
-  :py:func:`core_utils.ctlr.article.io.from_meta` - use to load
   meta-information about each article and create the
   :py:class:`core_utils.ctlr.article.article.Article` abstraction;