article package
Submodules
Article implementation.
- class core_utils.ctlr.article.article.Article(url: str | None, article_id: int)
Bases:
object
Article class implementation.
- get_conllu_info() str
Get the sentences from ConlluArticle.
- Returns:
Sentences from ConlluArticle
- Return type:
- get_file_path(kind: ArtifactType) Path
Get a proper filepath for an Article instance.
- Parameters:
kind (ArtifactType) – A variant of a file
- Returns:
Path to Article instance
- Return type:
- get_meta_file_path() Path
Get path for requested article’s meta info.
- Returns:
Path to requested article’s meta info
- Return type:
- get_raw_text_path() Path
Get path for requested raw article.
- Returns:
Path to requested raw article
- Return type:
- set_conllu_info(info: str) None
Set the conllu_sentences_attribute.
- Parameters:
info (str) – CONLL-U sentences
- class core_utils.ctlr.article.article.ArtifactType(value)
Bases:
Enum
Types of artifacts that can be created by text processing pipelines.
- CLEANED = 'cleaned'
- STANZA_CONLLU = 'stanza_conllu'
- UDPIPE_CONLLU = 'udpipe_conllu'
- core_utils.ctlr.article.article.date_from_meta(date_txt: str) datetime
Convert text date to datetime object.
- Parameters:
date_txt (str) – Date in text format
- Returns:
Datetime object
- Return type:
- core_utils.ctlr.article.article.get_article_id_from_filepath(path: Path) int
Extract the article id from its path.
- Parameters:
path (pathlib.Path) – Path to article
- Returns:
Article id
- Return type:
- core_utils.ctlr.article.article.split_by_sentence(text: str) list[str]
Splits the given text by sentence separators.
I/O operations for Article.
- core_utils.ctlr.article.io.from_meta(path: Path | str, article: Article | None = None) Article
Load meta.json file into the Article abstraction.
- Parameters:
path (Union[pathlib.Path, str]) – Path to meta info
article (Optional[Article]) – Article instance
- Returns:
Article instance
- Return type:
- core_utils.ctlr.article.io.from_raw(path: Path | str, article: Article | None = None) Article
Load raw text and create an Article with it.
- Parameters:
path (Union[pathlib.Path, str]) – Path to article raw text
article (Optional[Article]) – Article instance
- Returns:
Article instance
- Return type:
- core_utils.ctlr.article.io.to_cleaned(article: Article) None
Save cleaned text.
- Parameters:
article (Article) – Article instance