In large corporations, millions of cash transactions are booked via cash management software (CMS) per month. Most CMS systems adopt a key-word (search string) based matching logic for booking, which checks if the cash transaction description contains a specific search string and books the transaction to an appropriate general ledger account (GL-account) according to a booking rule. However, due to the free-text nature of transaction description and the diversity of cash transactions, CMS systems often fail due to data corruption (truncation, insertions, spelling errors), paraphrasing, and lack of reusable key word in the description, requiring significant manual intervention by accountants. Month over month, accountants manually handle CMS booking failures in spreadsheets. We present two machine learning models, a GL-account classification model and a search string extraction model, to alleviate this manual process. These two models, backed by retrieval augmented large language models, can automate booking for a substantial portion of the manual transactions. Our approach is robust to common data issues in transaction description. Unlike typical deep-learning models, our models are interpretable and explainable. For GL-account classification, our approach has an accuracy close to human experts. For search string extraction, compared to other methods such as fine-tuning transformers for extraction tasks, our approach produces reliable results closer to accountants.
Research areas