Computer Science Question
Recipe Recommendation System First Author1 and Second Author2 1 Princeton University, Princeton NJ 08544, USA 2 Springer Heidelberg, Tiergartenstr. 17, 69121 Heidelberg, Germany [email protected] Abstract. This paper presents the development of an advanced recipe recommendation system, an innovative application at the intersection of health, machine learning, and natural language processing (NLP). Addressing the challenge posed by the overwhelming abundance of online recipes, this system provides personalized recipe suggestions based on user preferences, dietary restrictions, and ingredient availability. The core of the system is a multi-input neural network model that processes and categorizes recipes using various textual inputs such as ingredients, recipe names, and descriptions. The model adeptly handles multi-label classification and multi-input data, two primary challenges in the domain of digital gastronomy. Extensive data preprocessing, including text normalization, tokenization, and vectorization, was conducted to prepare the dataset sourced from Kaggle. The model’s performance was rigorously evaluated through training and validation accuracy metrics, displaying its capability to effectively categorize and recommend recipes. The system’s potential applications span from personal cooking assistance to professional culinary platforms, highlighting its significance in enhancing digital culinary experiences. Future work includes expanding the dataset for greater diversity, integrating models that are more complex for improved NLP, and developing a user-centric mobile application. This study contributes to the growing field of AI in gastronomy, demonstrating the practical application of machine learning in everyday life domains. Keywords: Text Normalization, Multi-Input Neural Networks, Tokenization, Kaggle Dataset, Digital Gastronomy, Machine Learning, Vectorization, Recipe Recommendation Systems 1 Introduction In the health and gastronomy domain, the introduction of cutting-edge technology, lambasted by state-of-the-art machine learning and artificial intelligence, has caused several significant changes. Developing a recipe recommendation system is stingingly noteworthy (Zoran et al.; 2021). This project utilizes machine learning and natural language processing (NLP) capacity to supply personalized recipe recommendations tailored to individuals’ preferences and needs. 1.1 Background and Motivation The growth of digital milieus in the culinary world has caused an overloading plenitude of recipes attainable online. This richness, though advantageous, also presents a roadblock for users struggling to discover recipes that correspond to their distinct tastes, dietary restraints, or ingredient accessibility. The desire for an effective, clever system to traverse this gargantuan culinary landscape could not be more crucial. A recipe recommendation system meets this need by sieving, sorting, and recommending recipes based on user predispositions and data input (Walpitaje, 2023). 1.2 Background and Motivation The primary goal of this project was to build a machine-learning model competent for understanding and classifying recipes from various textual entries such as ingredients, recipe names, and descriptions. The endpoint was to provide users with a tool that not only suggested recipes based on these inputs but also learned from user actions to refine its recommendations over successive uses. 2 Challenges 2.1 Faced Challenges • Multi-Label Commensuration: Recipes often fit into several categories or tags simultaneously, necessitating the model to prognosticate multiple labels precisely (Gravina et al., 2020). • Multi-Input Data Manipulation: The model had to process and draw from several types of textual data, each contributing diverse and relevant specifics about the recipes (Gravina et al., 2020). • Natural Language Processing: Suitably deriving meaningful facts from assorted textual data constituted a formidable challenge, particularly considering the various and often-asymmetric nature of recipe descriptions and ingredients. 2.2 Approach To manage these difficulties, the project received a multi-phased approach: • Data Preprocessing: Thorough data sanitization and readying were achieved, involving text normalization, segmentation, and vectorization. • Model Creation: A multi-input neural system model was made to process numerous kinds of data and carry out multi-label classification. • Training and Testing: The model was qualified on a considerable dataset and examined thoroughly to guarantee accuracy and agility This recipe recommendation system project amalgamates culinary arts and progressive technology, aiming to optimize the user experience in digital cooking realms through perspicuous and personalized recipe proposals. The advanced sections delve deeper into the project’s methodology, data taking care of, model building, and the profits attained. 3 Related Works Recent developments in recipe recommendation systems have seen various innovative approaches, reflecting the growing interest and advancements in this area. Here are some notable examples: 3.1 RecipeRec: A Heterogeneous Graph Learning Model RecipeRec is a novel approach that utilizes a heterogeneous graph-learning model for recipe recommendations [5]. • Key Features: This model captures recipe content and collaborative signals through a heterogeneous graph neural network with hierarchical attention and an ingredient set transformer (Zhang et al,. 2022). • Graph Modeling: RecipeRec incorporates the higher-order collaborative signal, such as the relational structure among users, recipes, and food items, into its recommendation system (Zhang et al., 2022). • Dataset: The project utilizes URI-Graph, a large-scale user-recipe-ingredient graph, to facilitate recipe recommendation research and graph-based food studies. 3.2 RecipeNLG Dataset Project • Data Source: The project employed the RecipeNLG dataset, a collection of recipes derived from various websites, which includes recipe titles, ingredients, directions, source links, and a clean ingredients list identified through Named Entity Recognition (NER) (sakib et., 2022). • Exploratory Analysis: The initial analysis examined the most frequent words and document length distribution to identify data integrity issues (sakib et., 2022). • Data Cleaning: The project involved cleaning the directions and ingredients column, removing stop words and punctuation, and isolating key ingredients for topic separation (sakib et., 2022).. 3.3 Application of Various Clustering Techniques • • • • • K-Means Clustering: Used for grouping similar data points after converting text into numerical vectors using TF-IDF. Latent Dirichlet Allocation (LDA): A probabilistic approach for identifying related terms representing a theme. Top2Vec with Doc2Vec: This method embeds “documents” into vectors and uses UMAP and HDBSCAN for dimensionality reduction and locating dense document pockets. BERTopic: Employs sentence-transformers and c-Tf-Idf to generate dense clusters and interpretable topics. Correlation Explanation (CorEx): Correlates words based on their co-appearance within documents and allows semi-supervised training with “anchor” words. 3.4 Results and Evaluation • Developing Intuitive Categories: The project aimed to automatically categorize recipes into intuitive categories using various unsupervised methods like BerTopic, word2vec, and LDA. • Topic Concentration Metric: A “topic concentration” score was developed to evaluate the purity of topics produced by clustering approaches like BERTopic, Top2Vec, and CorEx. 3.5 R shiny App Development • Recipe Finder Application: An Rshiny app was developed in RStudio to suggest recipes based on input ingredients, with the ability to narrow down recipes by category (Glowacka, 2021). These recent projects and research efforts illustrate the diverse methodologies and technologies employed in the recipe recommendation domain, from graph-based modeling to advanced clustering techniques and application development. 3.6 Synthesis and Comparison The current recipe recommendation project, while sharing the overarching goal of effective recipe categorization and recommendation with these recent developments, adopts a distinct approach: • Model Architecture: Using a multi-input neural network distinguishes it from graph-based or clusteringfocused models. This architecture is particularly adept at handling diverse types of textual data, a feature not central to the other mentioned projects. • Focus on Multi-Label Classification: Unlike projects that use unsupervised learning for topic modeling or clustering, this project emphasizes supervised learning for categorizing recipes into multiple labels. • User-Centric Approach: The project is designed with a practical, user-centric approach, aiming to offer direct recipe recommendations based on user input, differentiating it from more data-centric projects like RecipeNLG. • Adaptation and Integration: While it does not directly employ graph learning models or advanced clustering techniques, the methodology of the current project could be integrated with these approaches for enhanced performance, like using graph models to capture user-recipe interactions better. In summary, the current recipe recommendation system builds upon the foundational work in the field by adopting a unique approach that prioritizes multi-input data processing and multi-label classification. This approach complements existing methods and opens up possibilities for future integrations and enhancements in the evolving landscape of culinary recommendation systems. 4 Related Works The dataset for this project was sourced from Kaggle and was specifically designed for food recipe recommendations. It includes various attributes related to recipes, such as names, descriptions, ingredients, and tags. The dataset’s size is substantial, encompassing a wide range of recipes, which adds to the model’s potential diversity and applicability. 4.1 Data Loading and Initial Processing Data Cleaning: recipes = recipes.drop([‘id’, ‘serving_size’, ‘servings’], axis=1) from nltk.corpus import stopwords from nltk.stem import WordNetLemmatizer from nltk.tokenize import word_tokenize nltk.download(‘punkt’) nltk.download(‘stopwords’) nltk.download(‘wordnet’) stop_words = set(stopwords.words(‘english’)) def clean_text(text, lemmatize=True): text = text.lower() text = re.sub(r'[^a-zA-Zs]’, ”, text) words = word_tokenize(text) if lemmatize: lemmatizer = WordNetLemmatizer() words = [lemmatizer.lemmatize(word) for word in words] words = [word for word in words if word not in stop_words] return ‘ ‘.join(words) def split_tags(text): text = re.sub(r'[[]'”]’, ”, text) tags = text.split(‘, ‘) return tags recipes[‘name’] = recipes[‘name’].apply(lambda x: clean_text(x) if isinstance(x, str) else x) recipes[‘description’] = recipes[‘description’].apply(lambda x: clean_text(x) if isinstance(x, str) else x) recipes[‘ingredients’] = recipes[‘ingredients’].apply(lambda x: clean_text(x, lemmatize=False) if isinstance(x, str) else x) recipes[‘tags’] = recipes[‘tags’].apply(lambda x: split_tags(x) if isinstance(x, str) else x) NLTK is used for text processing, including tokenization, removing stopwords, and optional lemmatization. Custom functions clean_text and split_tags are defined and applied to clean and standardize the data. In the figure below provides visual documentation of the coding process, the structure of the data, and the initial steps of data preprocessing which are crucial to understanding the implementation. Fig. 1. Documentation of the coding process 4.2 Feature Engineering and data preparation Label Processing: from sklearn.preprocessing import MultiLabelBinarizer interested_labels = [‘easy’, ‘vegetarian’, ‘low-calorie’, ‘fruit’] filtered_labels = recipes[‘tags’].apply(lambda tags: [tag for tag in tags if tag in interested_labels]) mlb = MultiLabelBinarizer() y = mlb.fit_transform(filtered_labels) print(“Dimensions of one-hot encoding:”, y.shape) print(“Label categories:”, mlb.classes_) MultiLabelBinarizer is used for converting multi-label data into a one-hot encoded format. Only specific labels of interest are kept for analysis. Text Vectorization: from tensorflow.keras.preprocessing.text import Tokenizer from tensorflow.keras.preprocessing.sequence import pad_sequences tokenizer = Tokenizer(num_words=5000) tokenizer.fit_on_texts(recipes[‘description’]) tokenizer.fit_on_texts(recipes[‘ingredients’]) tokenizer.fit_on_texts(recipes[‘name’]) X1_seq = tokenizer.texts_to_sequences(recipes[‘description’]) X2_seq = tokenizer.texts_to_sequences(recipes[‘ingredients’]) X3_seq = tokenizer.texts_to_sequences(recipes[‘name’]) X1_pad = pad_sequences(X1_seq, maxlen=256) X2_pad = pad_sequences(X2_seq, maxlen=256) X3_pad = pad_sequences(X3_seq, maxlen=256) Text data is vectorized using the Tokenizer and converted to sequences. The sequences are padded to ensure uniform length. As seen in Figure 2, involve careful selection and transformation of recipe labels to match user preferences and preparation of textual inputs for the model, ensuring that the data fed into the neural network is clean, relevant, and properly structured for optimal learning outcomes. Fig. 2. Preprocessing steps 4.3 Model Building Defining the Model: import tensorflow as tf from tensorflow.keras.models import Model from tensorflow.keras.layers import Input, Embedding, GlobalAveragePooling1D, Dense, concatenate input_desc = Input(shape=(256,), name=’desc_input’) input_ingr = Input(shape=(256,), name=’ingr_input’) input_name = Input(shape=(256,), name=’name_input’) embedding = Embedding(input_dim=5000, output_dim=64, input_length=256) desc_branch = embedding(input_desc) desc_branch = GlobalAveragePooling1D()(desc_branch) ingr_branch = embedding(input_ingr) ingr_branch = GlobalAveragePooling1D()(ingr_branch) name_branch = embedding(input_name) name_branch = GlobalAveragePooling1D()(name_branch) concatenated = concatenate([desc_branch, ingr_branch, name_branch]) output = Dense(10, activation=’relu’)(concatenated) output = Dense(len(mlb.classes_), activation=’sigmoid’)(output) model = Model(inputs=[input_desc, input_ingr, input_name], outputs=output) The model is constructed using TensorFlow and Keras frameworks, designed with three distinct inputs that represent description, ingredients, and name. Each of these inputs is first processed through its own embedding layer, which is then followed by a global average pooling layer. The subsequent output from these pooling layers is concatenated into a single vector, which is subsequently passed through several dense layers to integrate the information for further processing. The figure below presents the construction and recipe recommendations training phases of the neural network, detailing the specific layers and mechanisms used to process the diverse textual inputs, crucial for the model’s ability to generate accurate. Fig. 3. Training phases of the neural network Defining the Model: model.compile(optimizer=’adam’, loss=’binary_crossentropy’, metrics=[‘accuracy’]) history = model.fit([X1_pad, X2_pad, X3_pad], y, epochs=20, validation_data=([X1_pad, X2_pad, X3_pad], y), batch_size=10) The model is compiled and trained with appropriate optimizer, loss function, and metrics. 4.4 Model Evaluation and Visualization The core of this project is a multi-input neural network model built using TensorFlow and Keras. The model’s architecture is tailored to handle the complex nature of recipe data, including multiple input types (description, ingredients, and name). Model Evaluation loss, accuracy = model.evaluate([X1_pad, X2_pad, X3_pad], y) print(f”Test accuracy: {accuracy}”) The model’s performance is evaluated on the test data. Visualization import matplotlib.pyplot as plt plt.style.use(‘ggplot’) def plot_history(history): acc = history.history[‘accuracy’] val_acc = history.history[‘val_accuracy’] loss = history.history[‘loss’] val_loss = history.history[‘val_loss’] x = range(1, len(acc) + 1) plt.figure(figsize=(12, 5)) plt.subplot(1, 2, 1) plt.plot(x, acc, ‘b’, label=’Training accuracy’) plt.plot(x, val_acc, ‘r’, label=’Validation accuracy’) plt.title(‘Training and validation accuracy’) plt.legend() plt.subplot(1, 2, 2) plt.plot(x, loss, ‘b’, label=’Training loss’) plt.plot(x, val_loss, ‘r’, label=’Validation loss’) plt.title(‘Training and validation loss’) plt.legend() plt.show() plot_history(history) Training and validation accuracies and losses are plotted using matplotlib. Code: API Request: import requests api_key = “your_api_key” headers = { “Content-Type”: “application/json”, “Authorization”: f”Bearer {api_key}” } data = { “model”: “gpt-3.5-turbo”, “messages”: [{“role”: “user”, “content”: “Say this is a test!”}], “temperature”: 0.7 } response = requests.post(“https://api.openai.com/v1/chat/completions”, headers=headers, json=data) print(response.json()) This snippet demonstrates making an API request to OpenAI’s GPT-3.5 model. The code covers the complete procedure of data arrangement, model creation, training, evaluation, and visualization for a multi-input, multilabel categorization neural network in the setting of a recipe proposal scheme. 5 RESULTS The appraisal of the model’s execution revealed its capacity to classify recipes into the correct labels effectively. The procedure tracked the model’s exactness and reduction across different epochs for training and validation datasets. • Training Accuracy: There was a consistent increase in training accuracy over the epochs, starting from 81.67% and fluctuating slightly but generally trending upwards. This suggests that the model was learning and adapting from the training data, improving its ability to classify recipes correctly as training progressed. • Validation Accuracy: The validation accuracy provides insight into the model’s generalization capabilities. The model’s validation accuracy had more variation than the training accuracy, starting strong but then experiencing fluctuations. For example, it started at approximately 80.75%, dipped to around 74.78% in the second epoch, but then rose again, indicating some epochs where the model generalized better than others do. Loss Trends: The training and validation loss both showed a decreasing trend over the epochs. The training loss reduced from 0.3801 to 0.2998, while the validation loss, after some fluctuation, ended at 0.3298. Decreasing loss indicates learning, signifying that the model was improving at minimizing the error between the predicted and actual labels. Final Epoch Evaluation: By the final epoch (Epoch 20/20), the model achieved a training accuracy of approximately 78.66% and a validation accuracy of approximately 77.98%. This outcome suggests that the model maintained a reasonable level of accuracy throughout the training process. • • Fig. 4. Training logs The image above shows the training log showcasing the progression of model accuracy and loss across different epochs, underscoring the model’s learning curve. • Model Architecture Visualization: The architecture visualization image illustrates the multi-input model design with three distinct input pathways for ‘description,’ ‘ingredients,’ and ‘name,’ each processed through an embedding and pooling layer before concatenation. This design allowed the model to handle and learn from different types of data inputs, which is a key feature for handling complex recipe data. • Model Performance Charts: The accompanying charts display the accuracy and loss for training and validation sets. They visually represent the model’s learning across epochs, with blue lines indicating training metrics and red lines for validation metrics. • Testing Accuracy: The testing accuracy, evaluated on a holdout set not seen during training, was approximately 77.98%. This metric is crucial as it approximates the model’s performance in real-world scenarios. • Model Saving: The model was saved in the HDF5 file format despite a warning suggesting using the newer Keras format. This indicates a successful model training and serialization for later use. • GPT-3.5 Turbo API Interaction: The included JSON response from the OpenAI’s GPT-3.5 turbo model indicates a separate test of chat functionality. While not directly related to the recipe recommendation system, this interaction with GPT models could inform future enhancements, such as incorporating conversational AI elements into the recipe recommendation experience. 6 LIMITATIONS AND FUTURE WORKS In the context of the recipe recommendation system project, there are several areas for future work and some limitations to consider. Understanding these can guide improvements and expansions of the project. 6.1 • • Limitations Data Quality and Variety: The current dataset could have restrictions regarding assorted variety and profundity. Broadening the dataset to incorporate more fluctuated recipes could upgrade the model’s viability. Model Generalization: The current model may need to generalize better to unfamiliar or remarkable recipes. Further testing with differing and testing datasets is required. • • • 6.2 • • • • • 7 Computational Resources: Preparing more unpredictable models may require significantly more computational assets, conceivably confining the task’s scalability. Real-world Application: Translating the model’s execution from a controlled testing condition to a realworld application could present unforeseen difficulties, for example, dealing with continuous data or client collaborations. Cultural Sensitivity: The model may need to adequately consider social nuances in cooking and eating propensities, which could limit its viability in certain locales or networks. Future Works Improving Model Complexity: Investigate combining more confused models like BERT or GPT-3 for progressed common language comprehension. This could improve the framework’s capacity to comprehend and process complex formula portrayals and fixings. Expanding the Dataset: Incorporate a more assorted arrangement of recipes, including those from various societies or eating routine prerequisites, to expand the framework’s usefulness and achieve. User Personalization: Execute client inclination following to customize recipe suggestions. This could involve making client profiles and organizing suggestions dependent on past collaborations and inclinations. Feature Engineering: Experiment with extra highlights like cooking time, dietary data, or seasonality of fixings to upgrade suggestion exactness. Mobile Application Integration: Build a versatile application interface for the framework, making it increasingly open and easy to understand for a more extensive crowd. CONCLUSION In conclusion, while the recipe recommendation system shows promise, incorporating these considerations into future development will be crucial for enhancing its effectiveness, user experience, and applicability. Continuous refinement and testing, coupled with feedback from real-world applications, will be key to the project’s long-term success (Mazlan, 2023). Developing the recipe recommendation system represents a significant stride in leveraging machine learning and natural language processing for culinary applications. This project has successfully demonstrated the potential of using a multi-input neural network model to classify and recommend recipes based on user preferences and inputs. 7.1 • • • • 7.2 • • • 7.3 Key Achievements Effective Data Processing: The task aptly tended to a convoluted dataset, applying exhaustive preprocessing methods to clean and standardize recipe data, making it reasonable for machine learning applications. Innovative Model Design: The multi-input model engineering, fusing embedding and worldwide normal pooling layers, was successfully customized to the one-of-a-kind requests of formula data. This plan decision was primarily to catch the embodiment of the recipes from different content inputs. Robust Training and Evaluation: The model experienced an exhaustive preparation measure, exhibiting encouraging outcomes regarding accuracy and reduction metrics. The validation execution demonstrated a great level of generalization to new data. Technical Integration: Exploitation of progressed devices and libraries like TensorFlow, Keras, Pandas, and NLTK flaunted the task’s technical profundity and its arrangement with current industry norms in machine learning and information science. Project Implications and Impact Culinary Domain: This task has potential results for the cooking world, offering a technologically progressed formula disclosure and personalization arrangement. User Experience: The framework can significantly upgrade client experience in advanced culinary stages, giving customized, pertinent, and different recipe proposals. The task adds to the more extensive field of AI in gastronomy, demonstrating how machine learning can cross with everyday spaces like cooking. Future Scope and Sustainability The task’s scalability and adaptability to incorporate data that are more varied, complex models and client personalization angles propose its supportability and capacity for future expansion. With further advancement, the framework could be coordinated into different stages, extending from close-to-home cooking applications to expert culinary databases, intensifying its utility and effect. 7.4 Final Thoughts The recipe recommendation framework stands as an impression of the intensity of machine learning in changing how we collaborate with nourishment and cooking in the advanced age. As innovation advances, activities like this will assume a pivotal job in interfacing AI and functional, ordinary applications, upgrading client encounters in new and energizing manners. The excursion of this task from origination to execution denotes a specialized accomplishment but also opens entryways for future improvements in the culinary space. References A. Zoran, E. A. Gonzalez, and A. B. Mizrahi, “Cooking with computers: the vision of digital gastronomy,” in Gastronomy and Food Science, Academic Press, 2021, pp. 35-53. M. Svensson, K. Höök, J. Laaksolahti, and A. Waern, “Social navigation of food recipes,” in Proc. SIGCHI Conf. Human Factors Comput. Syst., 2001, pp. 341–348. A. Walpitage, “A food recipe recommendation system based on nutritional factors in the Finnish food community,” Master’s thesis, A. Walpitage, 2023. Q. Li, R. Gravina, Y. Li, S. H. Alsamhi, F. Sun, and G. Fortino, “Multi-user activity recognition: Challenges and opportunities,” Information Fusion, vol. 63, pp. 121-135, 2020. Y. Tian, C. Zhang, Z. Guo, C. Huang, R. Metoyer, and N. V. Chawla, “RecipeRec: a heterogeneous graph learning model for recipe recommendation,” arXiv preprint arXiv:2205.14005, 2022. N. Sakib, G. M. Shahariar, M. M. Kabir, M. K. Hasan, and H. Mahmud, “Assorted, Archetypal and Annotated Two Million (3A2M) Cooking Recipes Dataset Based on Active Learning,” in Proc. Int. Conf. Machine Intelligence and Emerging Technologies, Cham, Springer Nature Switzerland, Sep. 2022, pp. 188-203. M. Glowacka-Musial, Data Visualization with R for Digital Collections. ALA TechSource, 2021. Mazlan, I., Abdullah, N., & Ahmad, N. (2023). Exploring the Impact of Hybrid Recommender Systems on Personalized Mental Health Recommendations. International Journal of Advanced Computer Science and Applications, 14(6) Author, F.: Article title. Journal 2(5), 99–110 (2016). Author, F., Author, S.: Title of a proceedings paper. In: Editor, F., Editor, S. (eds.) CONFERENCE 2016, LNCS, vol. 9999, pp. 1–13. Springer, Heidelberg (2016). Author, F., Author, S., Author, T.: Book title. 2nd edn. Publisher, Location (1999). Author, F.: Contribution title. In: 9th International Proceedings on Proceedings, pp. 1–2. Publisher, Location (2010). LNCS Homepage, http://www.springer.com/lncs, last accessed 2016/11/21.
Collepals.com Plagiarism Free Papers
Are you looking for custom essay writing service or even dissertation writing services? Just request for our write my paper service, and we'll match you with the best essay writer in your subject! With an exceptional team of professional academic experts in a wide range of subjects, we can guarantee you an unrivaled quality of custom-written papers.
Get ZERO PLAGIARISM, HUMAN WRITTEN ESSAYS
Why Hire Collepals.com writers to do your paper?
Quality- We are experienced and have access to ample research materials.
We write plagiarism Free Content
Confidential- We never share or sell your personal information to third parties.
Support-Chat with us today! We are always waiting to answer all your questions.