Article package
Full API
The article
package is responsible for handling the articles you
have collected from your website.
Hint
In case you think you have found a mistake in this package, contact your assistant. Those who considerably improve this module will get additional bonuses.
article
module
article
module represents the methods to work with the
core_utils.ctlr.article.article.Article
abstraction.
core_utils.ctlr.article.article.Article
class is responsible
for storing article raw, meta and .conllu
data and working with it.
During the implementation of Lab 5 and Lab 6, you should use the methods of this class.
The lab descriptions will contain hints on when to use one or another class method,
but we advise you to study all the methods: they may be useful to you.
Note
Do not forget to create a new instance of the
core_utils.ctlr.article.article.Article
class to use its methods.
In addition to the Article
class, the module has:
core_utils.ctlr.article.article.date_from_meta()
function which converts text date todatetime
object.core_utils.ctlr.article.article.get_article_id_from_filepath()
function which extracts the article id from its path.core_utils.ctlr.article.article.ArtifactType
abstraction which provides types of artifacts that can be created by text processing pipelines, such asCLEANED
,UDPIPE_CONLLU
, andSTANZA_CONLLU
.
The description of each artifact you can find in Dataset requirements.
Note
You should utilize attributes of core_utils.ctlr.article.article.ArtifactType
in order to save processed versions of files. Otherwise, if you pass a string
itself to some saving function, your code will be much more fragile.
io
module
io
module provide functions to work with input/output operations for
the core_utils.ctlr.article.article.Article
abstraction.
It consists of the following functions, which are grouped by usage in the labs:
Lab_5
core_utils.ctlr.article.io.to_raw()
- use to save raw texts of each article;core_utils.ctlr.article.io.to_meta()
- use to save meta-information about each article.
Lab_6
core_utils.ctlr.article.io.from_raw()
- use to load raw texts and create thecore_utils.ctlr.article.article.Article
abstraction;core_utils.ctlr.article.io.to_cleaned()
- use to save cleaned texts of each article, i.e. lowercased texts with no punctuation;core_utils.ctlr.article.io.to_meta()
- use to save POS information and required syntactic patterns from each article;core_utils.ctlr.article.io.from_meta()
- use to load meta-information about each article and create thecore_utils.ctlr.article.article.Article
abstraction;