While we have yet to achieve truly human-like artificial intelligence, we are already seeing important breakthroughs in natural language processing. How can the audit profession leverage these for its own transformation?


Natural language processing (‘NLP’) is a subfield of artificial intelligence that helps computers to understand, interpret and manipulate human languages, with the potential to transform the audit profession. Generally, the history of NLP is thought to have started in the 1950s1. NLP has gone through a fast developing period during the last decade and is widely adopted in various industries today, including audit. Four major applications of NLP that auditors can consider are:

Text classification

Know-your-client (‘KYC’) is a standard procedure in the audit industry that verifies the identity of a client and potential risk in the business relationship. One step of KYC is to filter for any recent negative sentiments about key personnel in the firm. Traditionally, auditors need to look for the names of key personnel through search engines and inspect each result manually. NLP’s text classification ability can identify negative and positive sentiments of news. A highly accurate text classification model (Currently, text classification models can achieve an accuracy higher than 90%2) therefore plays a key role in automating the process of filtering for negative news.

NLP can also process textual information in any language, which saves translation time for the auditor or reduces the cost of employing a professional translator.


Information retrieval

Vouching, or examining documentary evidence to verify the accuracy and occurrence of a transaction, is a basic audit practice and probably also one of the most tedious ones. While such repetitive tasks do not require advanced audit skills, they are tedious and time-consuming.

Now, with the capability of optical character recognition (‘OCR’) to convert hard copies into machine-readable formats and NLP to retrieve key information from documents, such as invoices and delivery orders, it is possible to automate the vouching task and free up auditors’ time for higher value tasks. By automating information extraction and validation, NLP can not only boost audit efficiency but also eliminate human errors and increase the accuracy of data entry.


Natural language generation (‘NLG’)

NLG is a subfield of NLP and usually relates to computer systems that can produce understandable texts in human languages3. An application of NLG in audit is report generation.

While most audit software provide a report-generating function, they still rely on a lot of human input. NLG can automate or partially automate these tasks. For example, IBM’s OpenPages leverages IBM Watson NLP and machine learning capabilities to provide a one-click audit report generation function4. Such a function realises both cost and efficiency benefits.

Another application of NLG, though it might not directly relate to audit, is dynamic narrative generation for interactive dashboards. Nowadays, many audit firms choose to analyse and present audit and value-added results using business intelligence tools such as Tableau and Power BI. In such cases, NLG can help turn analysis of structured data behind charts and graphs into text for greater clarity. For instance, Tableau’s WordSmith Extension generates concise and digestible descriptions that adjust based on users’ interactions with dashboards. This function greatly increases convenience for users as it helps them to better understand and utilise the analysis.


Natural language understanding (‘NLU’)

We believe the ultimate role of an auditor is a business advisor who helps his/her client grow the business. To achieve this, an auditor needs both financial and non-financial knowledge, which require extra time and effort to find and analyse information related to the client’s company and industry. Given that 2.5 quintillion bytes5 of data are created each day, it is crucial for us to get the right information efficiently.  This is where NLU plays a role. An advanced application of NLP, NLU genuinely understands the meaning of the text. With such functionality, it can extract a large amount of information first, filter out what is irrelevant and “feed” us content that really matters.

With more than 80%6 of data in unstructured formats, NLP could potentially transform the audit industry. It is particularly powerful in automating repetitive tasks, especially when integrated with other technologies, such as OCR and machine learning. With advanced technologies reducing time and resources spent on repetitive tasks, auditors now have an opportunity to play a bigger role as a business advisor. They also need to keep a close eye on technological developments and upskill accordingly to truly benefit from such advancements.


This article was written by Data Analytics Specialist Zhang Yuchen of our Technology, Media & Telecommunications practice.

For enquiries on how our Technology, Media & Telecommunications team can assist you, please contact us:
Adrian Tan
Partner & Industry Lead, Technology, Media & Telecommunications
T +65 6594 7876
[email protected]

Natural language processing, Wikipedia, last modified 18 June 2019, https://en.wikipedia.org/wiki/Natural_language_processing#History_of_NLP
2 Bag of Tricks for Efficient Text Classification, Armand Joulin, Edouard Grave, Piotr Bojanowski, Tomas Mikolov, 9 Aug 2016, https://arxiv.org/pdf/1607.01759.pdf
3 Building Natural Language Generation Systems (Cambridge University Press, 2000), Ehud Reiter, Robert Dale, http://www.ling.helsinki.fi/kit/2004s/ctl310gen/ReiterDale/ContentsIntro.pdf
4 IBM OpenPages with Watson, IBM, 2018, https://www.ibm.com/
5 How Much Data Do We Create Every Day? The Mind-Blowing Stats Everyone Should Read, Bernard Marr, 21 May 2018, https://www.forbes.com/sites/bernardmarr/2018/05/21/how-much-data-do-we-create-every-day-the-mind-blowing-stats-everyone-should-read/#3b6a688a60ba
6 The Big (Unstructured) Data Problem, Juliette Rizkallah, 5 Jun 2017, https://www.forbes.com/sites/forbestechcouncil/2017/06/05/the-big-unstructured-data-problem/#668dff3f493a