Is Your Data Ready for AI?

These days, everyone is either trying AI (rare), considering AI (most common), tried it before they were ready (with mixed results) or just AI curious (which doesn’t necessarily preclude the other possibilities). Sooner or later, your organization is going to be in the trying category, and then you will be either in the group of folks that excelled with it or stumbled. One of the key factors that will determine that result is the quality of your data going in, and the integrity of your data moving forward.

Let’s take a little time now to consider the relationship between data quality, data integrity, and generative technologies, and then think about potential how to improve the odds of landing in the successful adopter side of the coming AI divide.

The Human Edge: Fuzzy Thinking and Pattern Recognition

The current differentiation between AI and human intelligence lies in our capacity for fuzzy thinking and nuanced pattern recognition. Humans possess an innate ability to identify when information doesn’t fit a pattern or context, a skill that AI systems are still developing. While AI can process vast amounts of data at incredible speeds, it may struggle with contextual understanding and adaptability in novel situations.

This limitation in AI’s cognitive flexibility can lead to inefficiencies, particularly when dealing with complex, real-world scenarios. As AI systems attempt to process and make sense of imperfect or inconsistent data, they will consume more computational resources, leading to higher operational costs.

The Rising Costs of Using AI Inefficiently

The inefficiencies in AI processing are already manifesting at a macro level. Major tech companies and AI research institutions are reporting significant increases in power consumption as they scale up their AI offerings and user base. These escalating costs will (eventually and inevitably) be passed on to consumers, likely in the form of changes to service billing structures. Consider the current use of paying per token where either the cost per token will go up or the number of tokens require to complete common operations, or both. Think of how coffee used to be sold in 1-lb bags and now we pay more per bag where the bag now holds 10 ounces. AI may become the first digital form of shrinkflation.

Garbage In, Garbage Out…More Garbage In?

Recognizing these challenges, forward-thinking organizations are prioritizing data cleanup as an important first step on their AI adoption journey. However, it’s important to note that data integrity is not the result of a a one-time effort. It requires ongoing policies, procedures, processes to support what is likely the most import commodity any organization owns.

When data stores are initially created, they are typically clean and well-structured (don’t get me started on garbage test data, that is a separate article…coming soon!). The data becomes messy over time (how much time depends on many factors) simply through regular use (and sometimes irregular, but that is also beyond the scope of this post). When AI is added to that use, trained on that same use, it will get messier faster unless the processes that led to the mess are also addressed.

It may be tempting to consider this a training issue. Inadequate training can certainly lead to bad data, but good training may not be sufficient to correct the problem. This is because training is costly to create, costly to deliver, will need to be delivered again for every new team member, will likely need to be repeated periodically for all team members, and still may not always be remembered or followed.

The most reliable and cost-effective way to improve those processes is to automate those that can be automated. Automation may cost more to create than the training process, but then it is one-and-done until the process itself needs to change. The key to cost-effective automation is determining when it is still OK to kick an edge case out for a human to deal with it and have a good process for the human to be notified and the task tracked to completion.

Automation offers several advantages over traditional training methods:

  1. Consistency: Automated processes perform tasks the same way every time, reducing human error.
  2. Scalability: Once implemented, automated processes can handle increasing volumes of data without proportional increases in cost.
  3. Long-term cost-effectiveness: While initial implementation may be costly, automation provides ongoing benefits without the need for repeated training sessions.

Moving forward

Once the organization’s data has been cleaned up and processes put in place to maintain the integrity of that data, automated where possible, then the opportunity to get ahead of the competition through generative technologies is real for your organization. Like many adventures into new territory, there will be plenty of new challenges that will require urgent attention and decisive action. Preparing for what is known and predictable first will leave more resources for managing the unexpected.

And remember, most people heading into new territory seek the help of an experienced guide. Being new territory, it isn’t so important that the guide be experienced with the specific territory, but that they have experience of venturing into other new areas and have lived to tell about it.

Shout out to Jon Ewoniuk and his new podcast The 360 Salesforce Mastermind Podcast. This article was inspired by his first episode, where his guest spoke about niches (mine being a leadership in digital innovation and automation adoption) and the importance of good data to support generative technologies.

© Scott S. Nelson

If it is not written down, it does not exist

I can’t find a solid attribution to the title of this post, I only know it isn’t me. But I wish it were. I read it first in a Tom Clancy novel and they were talking about the stock market. I don’t say it nearly enough, and I’m talking about IT documentation.

Say what you want about “self-documenting”. Until that becomes a solid feature of an “AI” product, self-documenting isn’t. If you don’t believe me, find some code or configuration file you wrote 3 years ago and haven’t looked at since and tell me what it does and why. Yes, there will be a small percentage of cases where this will be perfectly clear, and to those few I say “now hand it to someone you have never met and see if they get the same information””. That percentage drops drastically.

Here are some of my thoughts on documentation that I hope will inspire you to create more and help you to make it as useful as possible with the least effort necessary.

Identify Customer-Required Documents at the Start of the Project

Customers often do not look for documentation gaps until near the time when they expect them to be completed. Ask for their specific documentation requirements up front. They may later want to expand that list, and acquiring the list at the beginning and publishing the list for customer review early in the project will allow for better prioritization when the project is nearing the end.

Start Documentation Early

Many initially successful projects can later be viewed as poorly delivered when proper documentation for enhancements and maintenance is incomplete or non-existent. Begin your documentation as early as possible with a full outline and then fill it in the details as time permits. This helps to reduce the project wrap-up stress in starting documentation from scratch at the 11th hours.

Use Technical Deliverable Templates

Templates help to standardize documents for easier reference and provide reminders of sections to include and approaches for presenting specific topics.

Start from a Blank Template

While it is tempting to start from an existing document, this  can result in bad data being left in and verifications of details between designs being properly verified. One example is a project that had scheduled jobs that ended up overlapping because the team kept creating new documents from the existing documents and leaving the schedule table unverified.

Write Instructional Material with the “Eye of the Beginner”

There are few things more frustrating to someone trying to accomplish a specific task according to instructions to get stuck because their current results don’t match the instructions. Who they blame for this is often more of a reflection on their temperament than the instructions, and in IT it is fair to say that most of the time it is the fault of the instruction. Other than true technical writers (there are many in tech writing roles who are really either copywriters or editors and not true technical writers), most people who write documentation do so from the perspective of having already accomplished the task. While being able to do the task can be very useful in writing documentation, knowing how to look at the task through the eyes of someone who does not is the key to being a good technical writer. Follow the steps of your own documentation exactly from the point of view of having never done it before (VMs are great for this), and revise your documentation every time you find a missing step or a concept that is only clear with experience of having already accomplished the step or task. If you have a hard time doing that, have people with no understanding of your subject follow your documentation (without you present!) and provide feedback.

Avoid Unspecified Nouns

Unspecified nouns are words such as this, that, they, and it.  The use of unspecified nouns can be confusing if there is more than one possible interpretation. While you, as the communicator, may only see one meaning for what is referred to by the unspecified noun, a reader or listener may not.

For example, “they may not understand the purpose of the link” is clearer as “stakeholders may not understand the purpose of the link” or “users may not understand the purpose of the link”, both of which may have different meanings.

While it may seem excessive and repetitive to do so, repeat the name of the thing you are talking about rather than assuming that everyone will know what this is meant by that.

Note, this has nothing to do with use of preferred pronouns. I was using they and them for years and wish I had contact information for the teachers and editors who kept insisting I use he and his when referring to an unspecified actor or reader so  I could send them an I-told-you-so emoji.

Validate Guide Documents

The value of a guide is that people should be able to use the guide to complete the tasks it describes without assistance. First validate it yourself by following your document exactly as you have written it. Then have someone that will be a consumer of the document go through and follow the guide, having them make notes with Track Changes turned on. Bonus points for getting someone that has little understanding of the supporting technologies to do the validation.

A Picture Should Be Worth a Thousand Words

Screen shots should be used when they enhance understanding, allowing fewer words to describe the full details. They are not a replacement for all descriptions. At a minimum they should have a caption.

When values need to be entered in a screen shot, always include the values as text so that readers can copy/paste the values. Command line entries should use a style indicating they are commands rather than a screen shot showing the command, again to facilitate copy/paste and also because:

Screen Shot with Text Example

Is not clearer than:

Type this here

Update the Document Reference Links as You Work

Throughout the project you will need to find information online relevant to the solution. In addition to maintaining your own reference bookmarks, whenever you find a URL particularly useful to the solution add it to the Reference Links section of the project documentation. If you do not own the document, send the links to the team member that does for inclusion.

Share Your Knowledge Regularly

Hopefully your employer and/or client have a Knowledge Management system. At a minimum, the following milestones should be a reminder to contribute to this shared knowledge base:

  • Completion of a project
  • Completion of a key deliverable
  • Solving a difficult issue

If you find it difficult to remember to make KM contributions following these events, set a regular reminder (weekly if not daily) to think about your activities and accomplishments and if there is something that is worth sharing, then do so.

© Scott S. Nelson

Achieving Artificial Intelligence with the Reverse

I believe that AI is taking the wrong path. They are trying to work top down because that’s how humans who are generally successful at solving problems approach solving problems. Problems that are somewhat familiar are best served with a rigorous approach that leads to a planned conclusion. Still, many of the greatest break-throughs have come about by accident, i.e., not following the general path at first. Penicillin and Post-It Notes come immediately to mind.

But humans only solve most problems the usual way because they have a solid ground work to start from. The accidents create a new ground work for development. Accidents come about by not following the normal path, and I think the solution to AI is in taking a different path that will lead to a ground work that will support the entire field.

AI should take the approach of developing siloed expert systems. Make lots of them and keep refining until commoditized. Then start working on higher systems that can merge related systems together though interfaces like web services (but more efficient). Then build ever higher systems until a small set of controlling systems can leverage the legacy systems. The legacy systems, truly failures to create AI and the wrong accepted path will provide an infrastructure that will support a true AI solution.

© Scott S. Nelson