Short summary of lectures

Lecture 1. Introduction to technical track

Web scraping as a craft. Place of technical track in overall discipline: conceptually and assessment formula. Technical track overview. Programming assignment overview. Client-server architecture in World Wide Web. Request. Response. Python package manager pip. requirements.txt as a manifest of project dependencies.

Lecture 2. Headers. HTML structure.

Types of HTTP methods: GET, POST, DELETE, PUT. Idea of mimicking to human-made requests. Tip no. 1: random timeouts among calls. Tip no. 2: sending requests with headers from browser. Obtaining headers from browser. Basics of HTML structure: hierarchical form, tag as a basic element, properties of tags.

Lecture 3. Search in HTML page

Check for request status: implicit cast to bool, check for status code, switch on exception raising. Introduction to HTML scraping. Key strategies for finding elements: by id, by class, by attribute, by tag name, by child-parent relations, and by combination of aforementioned approaches. Making requests with requests API. bs4: installation, basic API. Finding elements in HTML page with find, find_all.