Being part of one of the most successful franchise businesses, Inter IKEA Group wishes to implement state-of-the-art technology to improve business processes and ensure efficiency. The suborganization IKEA Supply is responsible for sourcing, supplying, selling, and distributing IKEA products across the world. In the process of optimizing the supply chain, it is essential to track incomes and costs related to different categories within the chain, such as transportation, renovations, and consultant services. This enables decision-making based on accurate information. As of today, financial transactions between IKEA Supply and other parties are tracked and categorized using an external tool. This tool aims to classify the transactions using a four-level hierarchy of labels, where the classifications in each level depend on the classification obtained in previous levels. The problem IKEA Supply is facing is that the classifications made by this tool do not achieve high accuracy, thus resulting in financial reports of poor quality. Therefore, IKEA Supply and its Department of Data and Technology would like to evaluate an alternative approach to this problem based on a large language model. The thesis aims to evaluate how a large language model can be implemented to classify financial transactions related to IKEA Supply in a hierarchical structure. The final goal is to achieve better accuracy with a large language model than the currently used classification tool developed by an external party.
In the thesis, a large language model developed by OpenAI will be used. The data is provided by IKEA Supply and contains financial transactions and the corresponding true classifications for all four levels. Feature selection is made through interviews and backward selection. A grid search is performed to identify the most optimal combination of hyperparameters for this particular classification problem. Further, different prompting techniques such as zero-shot, one-shot, and few-shot prompting will be evaluated and compared to enable an understanding of what formulation results in the most accurate classifications. The prompt generating the best precision when classifying financial transactions is implemented according to the Chain-of-Thought prompting structure, where the classification problem is broken down into four sub-problems to utilize the hierarchical structure of the labels in the dataset. Since the large language model is a licensed product, continuous analysis of the costs is made to ensure cost efficiency throughout the project.
The results obtained indicate that the zero-shot prompting technique generates the most accurate classifications. When it is implemented in the Chain-of-Thought prompting structure to provide guidance to the model in its reasoning process and facilitate understanding of the task at hand, the model performance is further improved. Apart from obtaining higher evaluation metrics, the large language model provides a classification for each transaction through all four levels in the hierarchy. This result is considered to outperform the classifications made by the currently used classification tool. However, from the analysis of costs, it is clear that the usage of the large language model implies significantly higher costs than expected. Although additional adjustments need to be made to enable the implementation of the model for business purposes, the results show the potential of further customizing a large language model for this particular classification problem.
Using a large language model for the classification of financial transactions within IKEA Supply can be considered beneficial since it has been trained on publicly available data that IKEA does not possess. Based on the results obtained, implementing the large language model may imply a significant cost increase since IKEA Supply handles a large amount of transactions and each run of the model, being a licensed product, comes at a cost. Therefore, exploring how this solution can complement other, more cost-efficient classification models implemented for this particular purpose is relevant.