Article package

Full API

The article package is responsible for handling the articles you have collected from your website.

Hint

In case you think you have found a mistake in this package, contact your assistant. Those who considerably improve this module will get additional bonuses.

article module

article module represents the methods to work with the core_utils.ctlr.article.article.Article abstraction.

core_utils.ctlr.article.article.Article class is responsible for storing article raw, meta and .conllu data and working with it. During the implementation of Lab 5 and Lab 6, you should use the methods of this class. The lab descriptions will contain hints on when to use one or another class method, but we advise you to study all the methods: they may be useful to you.

Note

Do not forget to create a new instance of the core_utils.ctlr.article.article.Article class to use its methods.

In addition to the Article class, the module has:

  1. core_utils.ctlr.article.article.date_from_meta() function which converts text date to datetime object.

  2. core_utils.ctlr.article.article.get_article_id_from_filepath() function which extracts the article id from its path.

  3. core_utils.ctlr.article.article.ArtifactType abstraction which provides types of artifacts that can be created by text processing pipelines, such as CLEANED, UDPIPE_CONLLU, and STANZA_CONLLU.

The description of each artifact you can find in Dataset requirements.

Note

You should utilize attributes of core_utils.ctlr.article.article.ArtifactType in order to save processed versions of files. Otherwise, if you pass a string itself to some saving function, your code will be much more fragile.

io module

io module provide functions to work with input/output operations for the core_utils.ctlr.article.article.Article abstraction.

It consists of the following functions, which are grouped by usage in the labs:

Lab_5

Lab_6