Course Content
KM-01: Overview of Artificial Intelligence
This module introduces learners to the fundamental concepts of Artificial Intelligence (AI) and its growing role in modern technology, business, and society. Learners will explore the evolution of AI, key definitions, and different types of artificial intelligence, as well as related fields such as machine learning, deep learning, neural networks, data science, automation, and robotics. The module also examines how AI is applied in real-world environments, including industries such as healthcare, finance, agriculture, manufacturing, and digital services. In addition, learners will understand the strategic advantages of AI in business, including automation, improved decision-making, and increased productivity. By the end of the module, learners will have a foundational understanding of AI technologies, their applications, and their impact on the Fourth Industrial Revolution (4IR). This knowledge prepares learners for further study and practical skills development within the Artificial Intelligence Software Developer qualification at NQF Level 4.
0/8
KM-02: Introduction to Mathematics and Statistics for Artificial Intelligence
This module introduces learners to the essential mathematical and statistical concepts required for understanding Artificial Intelligence, Machine Learning, Deep Learning, and Data Analytics. It provides foundational knowledge in areas such as basic mathematics, linear algebra, binary number systems, scientific notation, probability, and statistics. Learners will explore how mathematical principles are used to represent data, perform calculations, and analyze patterns in AI systems. The module also develops problem-solving skills through practical applications including coordinate systems, matrix operations, and probability models used in modern AI technologies.
0/25
KM-03: Analytical Thinking and Problem Solving
This module focuses on developing the learner’s ability to analyse problems logically and design structured solutions. Learners are introduced to analytical thinking techniques, critical thinking skills, and problem-solving methods used in artificial intelligence development. The module teaches how to break down complex problems, evaluate possible solutions, and apply structured reasoning when designing AI-based systems. By the end of the module, learners will understand how to approach real-world problems systematically and use analytical tools such as decision trees and critical thinking methods to support AI problem solving
0/7
KM-04: Data, Databases and Data Visualisation
This module introduces learners to the fundamental concepts of data, database systems, and data visualisation, which are essential components in modern artificial intelligence and data-driven technologies. The module focuses on helping learners understand how data is collected, processed, analysed, stored, and transformed into meaningful insights for decision-making. Learners begin by exploring the value of data and the role of data analysis, including how reliable data sources are identified and how raw data is refined by handling missing values, correcting misalignments, and eliminating irrelevant information. The module also explains common flaws and limitations in data collection, such as bias, omission, and errors that may affect the quality and reliability of data. The module then moves into practical data handling using spreadsheets, where learners study techniques for analysing and presenting data. This includes creating reports, sorting and filtering datasets, using pivot tables and dashboards, importing data from files and databases, and visualising results using charts and analytical tools. Learners are also introduced to databases and Structured Query Language (SQL), which allow large volumes of data to be stored, managed, and retrieved efficiently. In addition, the module explores data mining techniques used to identify patterns and relationships within datasets. Finally, the module highlights the importance of data visualisation and data security, teaching learners how to present information clearly using AI-assisted tools while ensuring that sensitive information is protected from misuse or unauthorized access. Overall, this module equips learners with the knowledge required to manage data effectively, perform analysis, create meaningful visualisations, and maintain data integrity and security, which are critical skills for professionals working in artificial intelligence, data science, and software development environments.
0/17
KM-05: Computing Theory
computational thinking. Programming is the process of writing instructions that tell a computer how to perform tasks. These instructions are written using programming languages such as Python, Java, or C++. In this module learners will develop an understanding of how computers interpret instructions, how algorithms are used to solve problems, and how basic programming structures work. The module also introduces the core principles of software development and provides an entry-level understanding of Python programming. By the end of the module learners will understand how software systems are designed, how algorithms are created to solve problems, and how programming languages are used to build modern digital solutions including artificial intelligence systems. The module covers the following key topics: Introduction to programming languages Introduction to algorithms Programming basics Solution development Introduction to Python These concepts provide the theoretical foundation needed before learners begin writing real programs in practical learning modules.
0/11
KM-06: Introduction to Artificial Intelligence, Machine Learning, Deep Learning
The main focus of the learning in this knowledge module is to build an understanding of the relationship between Artificial Intelligence, Machine Learning and Deep Learning, as well as the application of such systems to create a set of instructions to perform a programming task. Learners will explore how AI technologies are used across industries such as healthcare, finance, education, and automation. The module also introduces ethical considerations, responsible AI use, and the impact of AI on society and employment. By the end of this module, learners will understand how artificial intelligence systems work, the different types of AI technologies, and how these technologies are applied in modern software development environments.
0/3
KM-07: Artificial Intelligence Frameworks and Data Scraping
This module introduces learners to Artificial Intelligence frameworks and their role in developing intelligent systems. Learners will explore how frameworks such as TensorFlow, Keras, PyTorch and IBM Watson help developers design, train and deploy AI models efficiently. The module also introduces the concept of data scraping, explaining how AI technologies can be used to collect and extract information from websites. Learners will understand the tools, procedures, and legal considerations involved in web scraping and how this data can be used for analytics and decision-making. By the end of the module, learners will understand the structure of AI frameworks, their advantages, practical applications, and how AI techniques can be used to automate data extraction processes.
0/7
KM-08: Machine learning
The main focus of this knowledge module is to build an understanding of the relationship between Artificial Intelligence (AI), Machine Learning (ML), and Deep Learning, as well as the application of machine learning to create a set of instructions that can perform programming tasks. This module introduces learners to the types of machine learning models, machine learning algorithm classifications, common machine learning algorithms, and the machine learning workflow process used to develop intelligent systems. Learners will also explore how machine learning can support business decision-making and improve business performance. The module further explains how machine learning systems use data, features, and labels to identify patterns, make predictions, and automate tasks. By understanding these concepts, learners will gain the foundational knowledge required to work with machine learning technologies and apply them in real-world applications and business environments.
0/11
KM-09: Deep Learning (DL)
This module introduces learners to the concept of Deep Learning, an advanced area of Artificial Intelligence that builds on Machine Learning techniques to create intelligent systems capable of learning complex patterns from large datasets. The module focuses on understanding the relationship between Artificial Intelligence (AI), Machine Learning (ML), and Deep Learning (DL) and how deep learning technologies are used to develop intelligent applications. Learners will explore how neural networks are structured and how they function, including the roles of input layers, hidden layers, and output layers in deep learning systems. The module also introduces different neural network architectures such as convolutional neural networks, recurrent neural networks, and recursive neural networks, which are widely used in fields such as computer vision, natural language processing, and speech recognition. In addition, the module covers activation functions used in deep learning models, including functions such as Sigmoid, Tanh, Softmax, and ReLU. Learners will also study how deep learning networks are built, trained, and tuned to improve performance. These concepts help developers design more accurate and efficient models for solving complex computational problems. The module further introduces advanced Python concepts for deep learning, including decorators, context managers, exception handling, and Python package management. These programming techniques are important for developing scalable deep learning applications. Finally, learners will explore TensorFlow and Keras, two of the most widely used frameworks for deep learning development. These tools allow developers to build, train, and deploy neural networks efficiently using modern machine learning libraries and APIs. By the end of this module, learners will understand the core concepts of deep learning, neural network architecture, advanced Python programming for AI development, and the use of TensorFlow and Keras to build deep learning models.
0/7
KM-10: Introduction to Governance, Legislation and Ethics
This module introduces learners to the principles of governance, legislation, ethics, workplace security, and business practices that influence organisations and employees. The module focuses on understanding how legal frameworks and ethical standards guide behaviour in the workplace and ensure accountability, transparency, and responsible decision-making. Learners will explore important workplace legislation such as the Labour Relations Act (LRA), the Protection of Personal Information Act (POPIA), and other regulatory frameworks that affect employees and employers. The module also introduces key ethical principles, including professional conduct, fairness, honesty, and accountability in professional environments. In addition, the module examines workplace security, performance management, business planning, and costing concepts that influence organisational efficiency and sustainability. By the end of the module, learners will understand how governance, ethics, legislation, and management practices contribute to a responsible and productive workplace environment.
0/19
KM-11: Fundamentals of Design Thinking and Innovation
This module introduces learners to the principles of design thinking, creativity, and innovation in the workplace. It focuses on solving problems using a human-centered approach, where user needs are prioritised through observation, empathy, and iterative development. Learners will explore key concepts such as design thinking methodology, creativity, innovation types, and application in real-world environments, including software development and business. The module also highlights how organisations use design thinking to improve products, processes, and services while fostering innovation. By the end of this module, learners will understand how to apply design thinking to solve complex problems and drive innovation effectively in the workplace.
0/15
KM-12: Fundamentals of Research and Information Analysis
This module focuses on developing an understanding of research principles, information gathering, and data analysis techniques. It equips learners with the ability to collect, evaluate, interpret, and apply information effectively in problem-solving and decision-making contexts
0/6
Artificial Intelligence Software Developer

Lesson Overview

This lesson focuses on how Artificial Intelligence can be used in data scraping and web scraping. The KM-07 documents define this topic through the following learning outcomes: concept and definition, purpose of data scraping, data scraping tools, legal issues, web scraping procedure, and libraries used for web scraping.

Web scraping involves writing a software robot that can automatically collect data from webpages. The documents explain that simple bots may do basic extraction, while more sophisticated bots use AI to find the correct data on a page and copy it into suitable data fields for processing by analytics applications. AI and ML can enhance the web scraping value chain, especially where the work is tedious, repetitive, and requires governance and quality assurance.

1. Concept and Definition of Data Scraping

Data scraping, in its general form, refers to a technique in which a computer program extracts data from output generated by another program. In the KM-07 documents, this is closely linked with web scraping, which is the process of using an application or bot to extract useful information from a website.

Web scraping is more than just copying what appears on a page. The documents explain that, unlike screen scraping, which only copies visible pixels displayed on a screen, web scraping extracts the underlying HTML code and, with it, data stored in a database. Once extracted, this data can be replicated, reformatted, stored, and used elsewhere.

This means web scraping is not simply a manual reading exercise. It is a technical process that allows machines to gather structured and unstructured information from websites automatically.

2. Is Web Scraping Part of AI?

The documents directly explain that AI and ML can be used to enhance several processes along the web scraping value chain. This is especially useful in tasks that are time-consuming and tedious, and in tasks that require governance and quality assurance.

In practical terms, this means AI can help scraping systems:

  • identify the correct content on complex pages,
  • distinguish useful data from irrelevant page content,
  • improve extraction accuracy,
  • automate repetitive scraping processes,
  • support analytics once the data is collected.

So while traditional web scraping can be rule-based, AI-enhanced scraping can make the process smarter, faster, and more adaptive.

3. Purpose of Data Scraping

The KM-07 documents explain that one of the reasons scraping exists is because many companies do not want their unique content to be downloaded and reused for unauthorized purposes. As a result, they do not always expose all data through APIs or other easy-to-consume sources. Scraper bots, however, try to obtain website data despite those limitations. This creates what the documents describe as a cat-and-mouse game between scraper bots and website protection strategies.

Scraper bots may be created for several purposes, including:

Content scraping

This is when content is pulled from a website and reused elsewhere. The documents give the example of scraping reviews from a site like Yelp and reproducing them on another site.

Price scraping

Competitors may scrape pricing information in order to compare prices and create a market advantage.

Contact scraping

Scrapers may collect email addresses and phone numbers from websites such as online directories. The documents note that this is often used for bulk mailing lists, robocalls, spam, or malicious social engineering attempts.

Beyond these examples, the documents also show a simpler educational example: scraping product information from an e-commerce site into an Excel spreadsheet.

4. Why Scrape Website Data?

The KM-07 material explains that scraping is often used because websites do not always provide their data in a directly consumable format. If an organization wants data for analysis, business intelligence, market comparison, product monitoring, or research, scraping may be the only practical method of collecting that information at scale.

The importance of scraping website data includes:

  • collecting large amounts of web information quickly,
  • reducing manual copying,
  • enabling structured analysis,
  • supporting decision-making,
  • gathering public market or product data,
  • feeding AI and analytics systems with real-world data.

5. Legal Issues in Web Scraping

The documents make it clear that web scraping is not illegal by itself. It is legal when you scrape data that is publicly available on the internet. However, certain kinds of data are protected by regulations, so caution is required when scraping:

  • personal data,
  • intellectual property,
  • confidential data.

The material also explains that developers should respect target websites and use empathy to create ethical scrapers. This means legality depends not only on the act of scraping, but also on:

  • what kind of data is being scraped,
  • whether the data is protected,
  • how the scraped data will be used,
  • whether website rules or regulations are being violated.

So learners must understand that scraping has both technical and ethical dimensions.

6. The Web Scraping Procedure

The KM-07 documents present a clear scraping workflow. To extract data using web scraping with Python, the basic steps are: find the URL, inspect the page, find the data to extract, write the code, run the code, and store the data in the required format.

Let’s break that down in detail.

Step 1: Find the URL to scrape

The first step is to identify the exact webpage that contains the target information. Without the correct URL, the scraper has no source to work from.

Step 2: Inspect the page

The developer then inspects the page structure, usually using browser developer tools, in order to understand the HTML layout and identify where the target data is stored.

Step 3: Find the data you want to extract

Once the page is inspected, the relevant HTML elements must be located. This may include product titles, prices, tables, reviews, headings, or links.

Step 4: Write the code

A scraping script is then written using suitable libraries or tools. This code sends requests to the website, retrieves the page content, and parses it for the required data.

Step 5: Run the code and extract the data

The script is executed so the program can fetch and process the content automatically.

Step 6: Store the data in the required format

Finally, the extracted data is saved in a useful structure such as a spreadsheet, CSV file, database, or another required format.

The facilitator guide also summarizes the scraping process in three broader stages:

  • the scraper bot sends an HTTP GET request,

  • the website responds and the scraper parses the HTML,

  • the extracted data is converted into the required output format.

7. Types of Data Extracted Through Web Scraping

The documents explain that web scraping extracts underlying HTML code and data stored in a database, rather than just visible content.

Examples of data that can be scraped include:

  • product information,
  • prices,
  • reviews,
  • contact information,
  • website content,
  • data tables,
  • text from HTML pages.

Because the scraper works with HTML and database-backed page data, the extracted information can be much richer and more structured than simple copy-and-paste.

8. Scraping a URL

The KM-07 summative memorandum defines scraping a URL as using bots to extract content and data from a website. It emphasizes that web scraping works on the underlying HTML and stored data, not just what appears on the surface.

This is important because it shows that a URL is not just a webpage address; it is the entry point into a structured information source that can be processed programmatically.

9. Code Scraping

The KM-07 summative memorandum defines code scraping, or data scraping, as a technique where a computer program extracts data from human-readable output coming from another program.

This means that scraping is not limited to webpages alone. It can also apply to other software outputs, as long as the data can be captured and transformed into a useful format.

10. Libraries Used for Web Scraping

The KM-07 learner guide includes a section on Python libraries used for web scraping. It notes that web scraping can extract both structured and unstructured data from the web and export it into a useful format.

Requests

Requests is described as the most basic Python library for web scraping. It is used for making HTTP requests such as GET and POST. It is simple and easy to use, which is why it is sometimes described as “HTTP for Humans.” However, Requests does not parse HTML on its own.

Advantages of Requests:

  • simple,
  • basic/digest authentication,
  • international domains and URLs,
  • chunked requests,
  • HTTP(S) proxy support.

Disadvantages of Requests:

  • retrieves only static content,
  • cannot parse HTML,
  • cannot handle websites built purely with JavaScript.

lxml

The learner guide explains that lxml is a fast, production-quality HTML and XML parsing library. It works especially well when scraping large datasets and is often combined with Requests. It supports XPath and CSS selectors for extracting information.

Advantages of lxml:

  • faster than many other parsers,
  • lightweight,
  • uses element trees,
  • Pythonic API.

Disadvantages of lxml:

  • does not work well with poorly designed HTML,
  • official documentation may be difficult for beginners.

Beautiful Soup

Beautiful Soup is described as one of the most widely used Python libraries for web scraping. It builds a parse tree for HTML and XML documents and is considered beginner-friendly. It can also be combined with other parsers like lxml.

The guide notes that Beautiful Soup is easier to work with, has strong documentation, and works well with poorly designed HTML, but it is slower than pure lxml.

Selenium

The learner guide explains that Selenium is especially useful for dynamically populated websites, where data is loaded through JavaScript. Other libraries may struggle with such pages, but Selenium can render web pages, click elements, fill forms, scroll pages, and perform actions much like a human user.

Advantages of Selenium:

  • beginner-friendly,
  • automated web scraping,
  • can scrape dynamically populated pages,
  • automates browsers,
  • can do many actions on a web page.

Disadvantages of Selenium:

  • very slow,
  • difficult to set up,
  • high CPU and memory usage,
  • not ideal for large projects.

Lesson Summary

This lesson explained how AI can be used in data scraping and web scraping. The documents define web scraping as the use of software robots to automatically collect data from webpages, and they show that AI and ML can improve this process where accuracy, governance, and efficiency are important.

The lesson also covered:

  • the purpose of data scraping,
  • why websites are scraped,
  • legal and ethical considerations,
  • the web scraping procedure,
  • types of data extracted,
  • code scraping,
  • and key Python scraping libraries such as Requests, lxml, Beautiful Soup, and Selenium.
Scroll to Top