The anatomy of a defect

Posted by

The following defect is to some extent typical of those encountered by connector developers1 at Tasktop, inasmuch as there exists a typical defect. The ‘to some extent’ qualifier is there as most defects are either too straightforward to write a blog about, or conversely too complex and/or delve too deeply into our code internals to make for a concise narrative. Nevertheless, this defect should still give a good idea of the type of challenges we face in our day-to-day work and how we address these challenges.

This particular defect arrived from the Solutions team on behalf of a customer. The customer had several Tasktop Integration Hub2 integrations between Salesforce Sales Cloud and some other tools, and occasionally got the following error from Sales Cloud:

“(HTTP 400: Bad Request): errorCode: INVALID_QUERY_LOCATOR”. 

Luckily this defect wasn’t particularly severe, as each time the error would eventually disappear and the integration would continue to work. However this error still happened periodically to the customer, so we needed to look into it.

The reason why the Salesforce API would throw this error was readily available online, and well documented3. Essentially, queries to the API do not return all the results at once, but rather return results in paginated fashion with 200 results per page by default. To keep track of which page of results the user is at, the system maintains a “query cursor”. Additionally, the system limits any one user to ten query cursors simultaneously and will forget the oldest cursor when the user creates an eleventh cursor. This means that if you have eleven queries going at once and request another page of the earliest query, the API would give you this INVALID_QUERY_LOCATOR error.

Given that Hub asynchronously does a large number of operations at once, trying to ensure that we don’t use more than ten cursors at once seemed rather challenging. The customer could impose a concurrency limit on the connector, limiting the number of threads using the Salesforce Sales Cloud connector, but this was an imperfect solution. While it certainly would work, it came at the cost of potentially needlessly slowing down the integrations, which might or might not be acceptable for this customer and any others that could run into this issue.

Alternatively, I considered having the connector keep track of the number of paginated calls that were being made using semaphores and having threads wait if there were ten cursors already being used, but even a brief look at this approach uncovered tremendous difficulties. Not only was this fundamentally rather complicated, but it also didn’t mesh well with the existing code that managed paginated calls, and there was no way to know ahead of time if a query would have enough results to use up a cursor.

Mitigating the problem by increasing the page size to result in fewer cursors being used was another obvious possibility, though yet again not without cost. Results are paginated for a reason and having very large pages would also have a performance impact. To evaluate this option, I needed to know what queries the connector was making that were causing this INVALID_QUERY_LOCATOR error.

If those queries were returning more than 2000 results, which was the upper limit to the size of a Salesforce page, increasing the page size wouldn’t do any good. Looking at the customer’s Hub logs, I noted that the INVALID_QUERY_LOCATOR error was not being thrown by the queries I expected. I had expected that queries searching for all cases in the system or other queries with a similar scope would be the ones to throw the error, as they are the ones that would typically have the most results and accordingly take the most time. However, it was the call to retrieve the ‘related objects’ (links) of an object that generated most of the errors.

The possibility that the customer had a large number of Salesforce objects with more than 200 links (which is what would be needed to use up a cursor) seemed on the face of it unlikely, but some of our customers did have very complicated set-ups. I somewhat dubiously asked Solutions to relay an inquiry to the customer about the number of links on their objects and started looking at the Hub configurations. What was even more puzzling was that the customer did not have any multi-links in their integration mappings, which means that Hub shouldn’t have even needed to query for links in the first place.

Word from the customer came back: their Salesforce objects had much fewer than 200 links.

I continued to look at both the configuration and the logs, trying to understand why Hub was making these queries. After looking at the stack trace for a while, I noticed that the Salesforce objects did have a single link mapped to some sort of ‘component’ object, but I wasn’t quite sure about how this was causing the problem.

The defect had been filed as a defect with the Salesforce Sales Cloud connector which had seemed reasonable at the time, but now it looked more like a Hub defect, so I sought the opinion of someone more familiar with the Hub codebase. They helped me find where the linked component was being retrieved, and there we saw that instead of just retrieving the ID of the component, which is what was needed, Hub did a full retrieve of the component including every single link it had. And these components were linked to potentially hundreds or thousands of Salesforce objects. Furthermore, the component’s links were being retrieved every time we retrieved an object. This was what was using up all the cursors.

The fix for this defect was one line long. Or rather it could have been, but with the associated refactorings, test cases needing updating, test cases to add, etc. it ended up at 59 lines long which is much less satisfying. But it nevertheless ended up being a tidy fix to the underlying issue and came with some performance improvements to boot.

References

  1. At Tasktop, the developers that write the ‘connectors’ that, well, connect with the tools that we support are referred to as Connector Developers. Check out the ever-growing number of integrations we support
  2. Tasktop Integration Hub is our company’s flagship product and uses the connectors to integrate the various tools and team that plan, build and deliver software together. In doing so, Tasktop automates the real-time flow of product-critical information across software delivery value stream from ideation to operation and back through the customer feedback loop.
  3. This Salesforce API error is well documented. Unfortunately, most of the time API errors are not.

Leave a Reply

Your email address will not be published. Required fields are marked *