Lesson Overview
This lesson introduces learners to data manipulation techniques used in Robotic Process Automation (RPA) environments. Learners will explore how automation workflows collect, process, format, transform, validate, and manage data during workflow execution. The lesson also examines data tables, collections, filtering, sorting, and common data manipulation activities used in automation projects and business processes.
Lesson Outcomes
After completing this lesson, learners will be able to:
- Define data manipulation and explain its importance in automation
- Explain how workflows process and transform data
- Describe data tables and collections
- Explain filtering and sorting operations
- Describe data validation methods
- Explain data formatting and conversion techniques
- Apply good practices for handling data in workflows
KT0401: Introduction to Data Manipulation
Data manipulation refers to the process of changing, organising, formatting, processing, or transforming information so that it can be used effectively within workflows and business systems.
In RPA environments, workflows constantly manipulate data while performing tasks such as:
- Reading files
- Processing invoices
- Updating databases
- Generating reports
- Validating information
- Sending emails
Data manipulation is important because raw information is often incomplete, inconsistent, or unsuitable for direct processing.
Automation workflows must therefore prepare and organise information before it can be used.
Examples of data manipulation activities include:
- Filtering records
- Sorting data
- Removing duplicates
- Formatting dates
- Combining text
- Splitting values
- Validating information
Good data manipulation improves workflow accuracy and automation reliability.
KT0402: Data Tables
A data table is a structured collection of information organised into rows and columns.
Data tables are commonly used in RPA because many business processes involve spreadsheet-style information.
Example of a data table:
| Invoice Number | Supplier | Amount |
|---|---|---|
| INV001 | ABC Ltd | R5000 |
| INV002 | XYZ Ltd | R3200 |
Data tables allow workflows to:
- Store large sets of information
- Process records efficiently
- Perform calculations
- Filter and sort information
- Generate reports
Automation platforms often provide activities specifically designed for data table manipulation.
Working with Data Tables
Common data table activities include:
- Reading data tables
- Adding rows
- Removing rows
- Updating values
- Filtering records
- Exporting information
Data tables improve workflow efficiency because bots can process multiple records automatically.
KT0403: Collections and Lists
Collections are groups of related items stored together within workflows.
Common collection types include:
- Lists
- Arrays
- Queues
- Dictionaries
Collections are useful when workflows must process multiple items.
Examples include:
- Email addresses
- File names
- Invoice numbers
- Customer records
Lists
A list stores multiple values in a sequence.
Example:
customers = ["Sarah", "John", "Ahmed"]
Lists allow workflows to iterate through values using loops.
Arrays
Arrays are similar to lists but are often fixed in size.
Example:
invoice_numbers = [101, 102, 103]
Collections improve workflow flexibility because automation can process groups of data dynamically.
KT0404: Filtering Data
Filtering is the process of selecting only specific records that meet defined conditions.
Example:
A workflow may filter invoices greater than R5000.
Filtering helps workflows:
- Reduce unnecessary processing
- Focus on relevant information
- Improve reporting accuracy
- Support business rules
Example filtering conditions include:
- Status equals “Approved”
- Amount greater than 1000
- Department equals “Finance”
Filtering is commonly used in:
- Reports
- Databases
- Data tables
- Automation workflows
KT0405: Sorting Data
Sorting arranges information in a specific order.
Data may be sorted:
- Alphabetically
- Numerically
- By date
- Ascending
- Descending
Example:
| Before Sorting | After Sorting |
|---|---|
| 300 | 100 |
| 100 | 200 |
| 200 | 300 |
Sorting improves:
- Data readability
- Reporting
- Workflow organisation
- Search efficiency
Automation workflows often sort information before generating reports or processing records.
KT0406: Data Validation
Data validation checks whether information is accurate, complete, and acceptable before processing.
Validation is important because incorrect data may cause workflow failures or inaccurate outputs.
Examples of validation checks include:
| Validation Type | Example |
|---|---|
| Required Field | Email address cannot be blank |
| Numeric Validation | Amount must contain numbers |
| Date Validation | Date format must be correct |
| Length Validation | ID number must contain required digits |
Validation improves:
- Workflow reliability
- Data accuracy
- Process consistency
- Error reduction
Automation workflows often validate data before continuing processing.
KT0407: Data Formatting and Conversion
Data formatting changes the appearance or structure of information.
Examples include:
- Formatting dates
- Converting currencies
- Adjusting decimal places
- Changing text case
Example:
| Original Value | Formatted Value |
|---|---|
| 2026/05/20 | 20 May 2026 |
| john smith | John Smith |
Data Conversion
Data conversion changes information from one type to another.
Examples include:
- Text to number
- Number to string
- Date to text
- Boolean conversion
Example:
invoice_total = int("500")
Proper formatting and conversion improve workflow compatibility and processing accuracy.
KT0408: Removing Duplicate Data
Duplicate data occurs when the same information appears multiple times.
Duplicate records may cause:
- Incorrect reporting
- Repeated processing
- Data inconsistencies
- Workflow inefficiencies
Example:
| Customer ID |
|---|
| 1001 |
| 1001 |
| 1002 |
Automation workflows may remove duplicate records before processing information.
Removing duplicates improves:
- Data quality
- Workflow efficiency
- Reporting accuracy
KT0409: Data Manipulation in Automation Workflows
Data manipulation is essential in automation because workflows continuously process information from multiple systems.
Bots may manipulate data while:
- Reading spreadsheets
- Extracting emails
- Processing invoices
- Updating databases
- Generating reports
- Handling customer information
Example workflow:
- Read invoice spreadsheet
- Remove duplicates
- Validate invoice amounts
- Filter unpaid invoices
- Sort invoices by date
- Generate report
Without data manipulation, automation workflows would not be able to process business information effectively.
KT0410: Best Practices for Data Manipulation
Good data handling practices improve workflow reliability and maintainability.
Best practices include:
Validate Data Before Processing
Workflows should check information before using it.
Use Consistent Formatting
Data should follow standard formats throughout workflows.
Remove Duplicate Records
Duplicate information should be identified and removed.
Handle Exceptions Properly
Workflows should manage missing or invalid information safely.
Use Meaningful Variable Names
Variables and collections should have clear descriptive names.
Protect Sensitive Information
Sensitive data should be handled securely and according to organisational policies.
Good data manipulation practices improve workflow quality and operational efficiency.
Data Manipulation in RPA Environments
In RPA environments, bots interact with large amounts of information across multiple systems.
Data manipulation allows workflows to:
- Organise information
- Transform data formats
- Validate business data
- Generate accurate outputs
- Support automation decisions
Efficient data manipulation is essential for successful automation projects and reliable business operations.
Key Notes
- Data manipulation involves processing and transforming information.
- Data tables organise information into rows and columns.
- Collections and lists store groups of related values.
- Filtering selects records based on conditions.
- Sorting arranges information in a specific order.
- Data validation checks information accuracy and completeness.
- Formatting and conversion improve workflow compatibility.
- Removing duplicates improves data quality and reporting accuracy.
- Good data manipulation practices improve automation reliability and efficiency.