SharePoint Syntex (formally part of Project Cortex) was released publicly at Microsoft Ignite 2020. Syntex is Microsoft’s first serious foray at infusing artificial intelligence functionality into SharePoint, the document management system many of us has come to use and (sort of) love in the corporate world over the last 15 years. It’s a critical tool for both managing documents and collaborating on content among teams. So with all the changes Microsoft has introduced into SharePoint in the last few years, the AI component was discernibly inevitable. And it’s a cool little tool – if you have the right use cases for it.

This blog post is dedicated to giving you the down-and-dirty intro to Syntex, including what it is, applicable scenarios, and why even bother with it. There’s already a lot written about how to configure and set it up (not the least of which is provided by – ahem – Microsoft), so let’s focus on what matters from a business (and analyst) perspective.

The landing page for the Syntex Content Center looks A LOT like . . . SharePoint!

How will it improve my current SharePoint Experience?

In testing Syntex, I found that there were many good things that created genuine improvements on the SharePoint experience. After the initial configurations and learning the document processing model creation cycle, I can genuinely see how this might benefit organizations in the future – especially if said organizations are open to business change.

Automation of Repetitive Tasks

Rather than input document library metadata manually, Syntex allows basically allows users to upload documents and walk away. The model will fill in the library columns on its own. This saves time and effort.

A Little Bit of Auto-Pilot Mode for Information Architecture

The model design works with SharePoint to basically create new content types and site columns. All are available within the Content Type Gallery in the Admin Center. You can even let Syntex work with your Term Store to auto-tag documents based on your taxonomy. You can even auto-apply retention labels to the different content types as they come in.

Disseminate Information Faster

By tying Syntex models to document libraries, content and metadata is automatically indexed across your tenant to be searchable and usable. This means teams and business units can more easier find what they need.

Syntex Shortcomings

In my configuration and trial adventures of Syntex, it didn’t go all peaches ‘n cream. As with my usual experience with Microsoft 365, the vendor tends to over-market their products as simple, intuitive and far-reaching in terms of improving the digital workplace experience. There were a few things that left me wanting in terms of a better user experience, as well as a better consumer of this new product

The accuracy of the documents you model is highly nuanced

As discussed, you could provide documents of many types and build highly trained, complex processing models with very accurate classifiers and extractors. But accuracy results will differ based on a number of factors. This is even more true on a customer-to-customer basis. Consumers need to be aware that what works for a colleague or a partner organization may not work as effectively for another. It's all about sandboxing, sandboxing, and more sandboxing.

Syntex has a user learning curve and requires ongoing commitment to maintenance

Microsoft brands Syntex as low-code, easy to learn, and quick to jump in. I beg to differ, even as an analyst with a technical background. It will take practice and a deep understanding of both the model learning process, the user interface, and information architecture in general. For modeling to be successful, digital workplace teams will need to dedicate resources internally to learn, build, and curate processing models so that users can generate value out of Syntex and its capabilities.

Read the Fine Print for the Licensing Model of Syntex

The Syntex advertisements and marketing material has warts - or at least the way it's contextualized. Signing up for a Syntex license only provides the Document Understanding processing model, not the Forms Processing Model - although you may be lead to believe it's all one and the same. These are two different technologies with different investment costs. Read the fine print, do your research, and make sure you know what you're getting with a Syntex license (and what you're not).

Good Use Cases and Not-So-Good Use Cases

As like any piece of software or functionality, there are limitations with Syntex – and when I say “Syntex”, I mean the Document Understanding Processing model, not the Form Processing Model (for reasons I will explain later). Most people would agree that automation of documents and their classifications is a good thing. But there are certain documents that make sense with Syntex, and some that don’t. Additionally, Form Processing is good for certain documents, but not necessarily for a basic Syntex license. Below are a list of documents you might find useful if Syntex is on your radar:

Document Types That Syntex Could Work Understand Quite Well
Some Things That Might Not Place Nicely with Syntex

50 Shades of Document Understanding

You’ll notice I put asteriks next to the Accounting and Shipping Documents, and it’s included on both sides. That’s not a mistake. These documents could work quite well – or not. The grey area lies in the “semi-structured’ nature of these documents. That is, documents that are sort of visual in their layout but definitely have some labels and regional similarities every time. These can include Purchase Orders, Invoices, Journal Entries, Credit Memos, and other types of accounting.

It is worth just sandboxing with these documents as you are building your AI models and seeing what results work for you. You’ll need to be absolutely accurate with your labels and their regionality/settings – so don’t be afraid to get your fingers dirty with multiple rounds of testing. It may take a bit of time.

Accounting and shipping documents are two document types that can fall in that “grey zone”. It’s best just to try and see how accurate it works with your documents. 

Why Am I Not Talking About the Form Processing Model?

As mentioned above, the form processing model is great for AI usage to look at structured documents with layouts, but the difference between wanting it and enabling it is about $500 USD/month. This is no small cost to businesses that are operating on tight budgets and don’t quite know if they need that level of AI power.

Additionally, the Form Processing Model is less of a structural integration into SharePoint than the Document Understanding Model. What I mean by this is that Form Process is handled by the PowerApps AI Builder. You simply apply the Builder to a document library and boom! – the documents are scanned with more horsepower to handle structured content. But it doesn’t manage the content types, the content type columns, or the Power Automate part. You wouldn’t see your AI Builder in the SharePoint Admin Center in the Content Types Gallery, and you would certainly need to be familiar with PowerApps and the graphical user interface to maintain the model. Additionally, there is no sense in using Form Processing unless your goals included specific use cases around object, language, and key phrase detection or sentiment analysis. In other words, you’d probably want to know what exactly it is you want to do with AI technology before you commit to spending that kind of money – especially if the Document Understanding Processing Model in Syntex might be able to handle most of the basic document automation needs you encounter on the day-to-day.

The AI Builder in PowerApps is a wonderfully powerful tool and will proably be scaled in your Microsoft 365 experience in the future; but for document library metadata automation it’s likely overkill in most cases.

Making the Most of Syntex

So what really works and what doesn’t? I have sandboxed with SharePoint Syntex considerably over the last few weeks since its release. Here’s what I’ve found as the rules of thumb:

Ensure the right word markers, labels, and expected data are in predictable places. Regional consistency of your data in Syntex is the key to its success.

Helping the model understand what you want to scan is really on you. By familiarizing yourself with the Explanations features (Phases, Patterns, and Proximities), you can make the Classifier and Extractor steps much less daunting, and dramatically improve your accuracy scores with your model. 

For best results, mix one part document type with at least five parts of document samples. For extra accuracy, you’ll probably want to feed it at least 20. Document Understanding thrives on learning and finding patterns and consistencies. Microsoft recommends at least five, but don’t be afraid to give it more – especially for content that may vary in it’s structure.

The type of font and font size on your documents can impact the accuracy rate of the models you build. Make sure to use clear, readable font that isn’t obscured or overlapped with other page elements. It’s best to ensure the documents you manage are set up for success before you bring them into Syntex altogether.

And if you are going to attempt modeling documents with handwriting, at least require print – don’t even bother with cursive or chicken scratch. You’re not going to enjoy the results.

This will be critical to undertake as not only does it require a change of behavior (especially uprooting hard-baked SharePoint UX expectations), but also a change of “trust” in automation technology to essentially cover off menial tasks. Communicate, demonstrate, and have a method of taking in feedback or criticism on of processing models so that they can be improved over the long run.

Having someone dedicated as the Syntex Model Admin is not just good governance, it’s good business. Your documents and what you need from them will change over time, and you’ll need to ensure you have the resources available to make sure those models are still effective.

Much like a car, you’ll need to maintain your models to get the best mileage – and keep driving in the right direction.

Templates that have been adjusted over time (e.g. new layouts) may cause some level of problems. You may need to build a new model to handle a new document layouts.