TL;DR: Fostering a culture of trust that leads to calm collaboration up front will yield the benefits that Agile principles promise.
Preface: While agile is in the title of this post, no claim is made that the post is about how to do agile or how SAFe is or is not agile. It is about how the
Manifesto for Agile Software Development is self-clarifying in that it concludes with “while
there is value in the items on the right, we value the items on the left
more.” (italics mine), and how the value of the items on either side should be measured by their effectiveness in a given organization and the organizations influence on the “self-organizing teams” referenced in the
Principles behind the Agile Manifesto. That said…
The value of architecture, documentation, and design reviews in SAFe was illustrated in a scenario that played out over several weeks.
The situation started with the discovery that a particular value coming from SAP had two sources. Well, not a particular value from the perspective of the source. The value had the same name, was constrained to the same list of options, but could and did have different values depending on the source, both of which were related to the same physical asset. For numerous reasons not uncommon to SAP implementations that have evolved for over a decade, it was much more prudent to fetch these values from SAP in batches and store them locally.
The issue of the incorrect source was identified by someone outside the development team when it was found to be commonly missing from the source selected for work prioritization. For various reasons that will be common across a variety of applications that support human workflow, this was considered something that needed to be addressed urgently.
The developer who had implemented the fetch to the correct source was tapped to come up with a solution. Now, one thing about this particular application is that it was a rewrite of a previous version where the value of “Working software over comprehensive documentation” was adhered to without considering the contextual reality that the team developing release one would neither be the team working on the inevitable enhancements nor ever meet that team. The re-write came about when the system was on its third generation of developers and every enhancement was slowed because there was no way to regression test all of the undocumented parts. Unsurprisingly, the organizational context that resulted in the first version missing documentation also resulted in some tables schemas being copied wholesale from the original application and not reviewed because requirements were late, resources were late, and the timeline was unchanged. So, with no understanding of why not to, the developer provided a temporary solution of copying the data from one table to the other because it had only been communicated that the data from one source was the correct data for the prioritization filter. Users were able to get their correctly prioritized assignments and the long-term fix went to the backlog.
As luck and timing would have it, when the design phase of the long term fix was picked up by the architect, the developer was on vacation. Further, while this particular developer had often made time to document his designs, the particular service the long-term fix depended on was one of the few that were not documented. Still further, it had been re-design as another service had been discovered to obtain the same data more reliably. But all of the data currently loaded was from the previous version, so even the attempt of reverse engineering the service to get sample data for evaluation was not possible. These kinds of issues can lead to frustration, which in turn dampens creative thinking, which is to say that had the architect looked at the data instead of following the assumption from the story that the data wasn’t yet readily available, he would have discovered that it was already present.
Eventually the source of the correct value was identified and a design created that would favor the correct value over the incorrect value but use the incorrect value if the correct one was not available to allow for the assignments to continue because sometimes the two actual values were the same (which is inspiration about a future post discussing the value of MDM). The design also included updating to the correct value if it became available after the initial values were set. The architect, being thorough, noted in the design a concern about what should be done when the correct value came into the system after the record that was prioritized based on that value has been assigned and processed by a user. After much back and forth, it was finally communicated that while the data was retrieved from the same system and labeled with the same name, the two values were not different because one was incorrect but because they were in fact to separate values meant for two different viewpoints. Which means that the design of attempting to choose and store a single correct value in both tables was invalid and that the records altered for the work-around were now (potentially) invalid. This made the correct solution a (relatively) simple change to the sorting query.
With the full
20/20 vision of hindsight, it is now clear that if the team did not feel that ever issue needed to be treated as an emergency and all of the product, design, and development stakeholders had discussed the issue prior to taking action, about 80 hours of work would have been reduced to 4 hours. Yes, there were other factors that impacted the need of 80 hours to deal with what is a fairly minor flaw, but those factors would not have come in to play had the questions been asked up front and clarity reached through collaboration.
© Scott S. Nelson