Ray Poynter, 21 July 2025
Agents are one of the key AI developments that is re-shaping the way we conduct business and processes. An agent is a software tool capable of tackling a task without user supervision. In this post, I will illustrate agents with an example that takes data (for example, from a quantitative survey). I will inlcude the informaiton you need to build this Data Checker yourself.
Gems in Gemini
For this example, I created the agent in Gemini (but I could have created an almost identical agent in ChatGPT, Copilot, and Claude).
In Gemini, agents are called Gems. You will find the Gems option on the panel on the left, as in the image below. To create a Gem, click on Explore Gems.

Once you are in the Gem Manager, you will see a list of Gems created by Gems, your Gems, and the button to create a New Gem.

A Simple Agent / Gem
Agents can be quite complex, but this example is very straightforward is based on a set of instructions. The Gem is created by giving it a name and then defining the Instructions. The instructions make the agent useful. Each time the agent is run, the instructions define what should happen.
In the case of this data checkers, the instructions tell the agent to take an uploaded data file, to describe its key features (e.g. number of records and number of fields), followed by a list of potential issues, such as missing data, duplicates, and data in the wrong format. Later in the blog I will share the current iteration of the instructions.
Examples of Using the Data Checker
To test the Data Checker I downloaded just over 42,000 reviews of Disney amusement parks, from Kaggle.com as a CSV file.
I launched the Gem by clicking on it in my list of Gems. I typed “I want to check a file”. The Gem replied, “Please provide me with the file you would like me to check.”
I uploaded the file and pressed the process arrow.
After a period of processing, the Gem responded with:
“Here is the quality report for the file Disney Data from Kaggle as csv.csv:
- File Name: Disney Data from Kaggle as csv.csv
- Number of Rows: 42656
- Number of Columns/Fields: 6
- Names of the Fields: Review_ID,Rating,Year_Month,Reviewer_Location,Review_Text,Branch
Here are the data quality issues identified in the file:
- Duplicate Rows:
- There are 12 completely duplicate rows.
 - There are 20 duplicate values in the Review_IDfield, which is intended to be a unique identifier.
 
- Missing Data:
- While there are no standard null values, the string ‘missing’ is used as a placeholder:
- 2613 cells in the Year_Monthcolumn contain the string ‘missing’.
 - 448 cells in the Review_Textcolumn contain the string ‘missing’.
 
- 2613 cells in the 
 
- While there are no standard null values, the string ‘missing’ is used as a placeholder:
- Data Type and Format Conformance:
- Year_Month: As noted above, 2613 cells do not conform to the expected- YYYY-MMformat because they contain the string ‘missing’.
 - Rating: All values are within the expected range of 1 to 5.
 
- Personally Identifiable Information (PII):
- There are 4 potential email addresses found in the Review_Textcolumn, which could be considered PII.
 
- There are 4 potential email addresses found in the 
- Other Data Quality Issues:
- The Branchfield contains consistent and expected values (Disneyland_HongKong,Disneyland_California,Disneyland_Paris).
 
- The 
I can help fix some of these issues, such as removing duplicate rows and duplicate Review_ID entries, and replacing the ‘missing’ string with actual null values or removing rows with ‘missing’ dates. I can also help you anonymize the PII data, by removing the email addresses, or masking them.
Would you like me to proceed with fixing these issues and then export the updated file for you?”
I would then ask Gemini to remove the duplicates, remove the cases where the date field contained the word “Missing”, and remove the four email addresses. I would then export the file for future use and reference, before proceeding to do any analysis or investigation.
The Instructions
Below are the instructions I used for this Agent. The first version I created was simpler, but over time, I spotted cases where the instructions missed things I wanted it to detect. I iterated on the instructions to improve them, and the version below is my current iteration.
“You are an expert data processor, and you have been chosen to check the quality of data files. The types of files you should expect are those where the first row serves as a header, naming the fields. The remaining rows should be one for each respondent in the data.
You should check for the following:
1) Are there any duplicate rows? Check that any fields intended to be unique identifiers (e.g., ‘ID’, ‘Review_ID’) do not contain duplicate values.
2) Are there any cells with missing data?
3) Identify any cells whose values don’t conform to the expected data type or format for their column. For example, dates not matching YYYY-MM (or zero-padded), numeric fields containing letters or symbols, text fields containing placeholder strings like “missing,” emails in free-text reviews, etc.
4) Is there any personally identifiable information?
5) Are there any other data quality issues?
Report to the user:
1) The names of the file
2) The number of rows
3) The number of columns/fields
4) The names of the fields
Then report any data quality issues.
Offer to fix the issues if you can, and then offer to export the updated file.”
If you wanted to replicate this Agent, in Claude, Gemini, Copilot or ChatGPT, simply copy these instructions and paste them as the instructions in your chosen LLM.
Want to learn more about Agents?

I am hosting a webinar on how to start using agents, with examples that you can create using ChatGPT, Copilot, Glaude, and Gemini.
Wednesday, 30 July, 2025
Broadcast time 3pm London (10am New York, 9am Chicago)
Click here to register
Related
Curious about how hot insights methods can benefit your business? Contact us at SoftOfficePro.com. We’ll help you harness the latest market research techniques to stay ahead of the competition.
For all Market Research projects please visit pulsefe.com. They have a great platform comparable to STG at a fractional cost.
For ODK Collect projects please contact us at softofficepro.com
Source link