Subscribe to the PwC Newsletter

Join the community, trending research, meditron-70b: scaling medical pretraining for large language models.

basic machine learning research papers

Large language models (LLMs) can potentially democratize access to medical knowledge.

basic machine learning research papers

TaskWeaver: A Code-First Agent Framework

microsoft/taskweaver • 29 Nov 2023

TaskWeaver provides support for rich data structures, flexible plugin usage, and dynamic plugin selection, and leverages LLM coding capabilities for complex logic.

Improving Sample Quality of Diffusion Models Using Self-Attention Guidance

Denoising diffusion models (DDMs) have attracted attention for their exceptional generation quality and diversity.

basic machine learning research papers

GeoDream: Disentangling 2D and Geometric Priors for High-Fidelity and Consistent 3D Generation

basic machine learning research papers

We justify that the refined 3D geometric priors aid in the 3D-aware capability of 2D diffusion priors, which in turn provides superior guidance for the refinement of 3D geometric priors.

On Bringing Robots Home

We use the Stick to collect 13 hours of data in 22 homes of New York City, and train Home Pretrained Representations (HPR).

SeamlessM4T: Massively Multilingual & Multimodal Machine Translation

What does it take to create the Babel Fish, a tool that can help individuals translate speech between any two languages?

LightGaussian: Unbounded 3D Gaussian Compression with 15x Reduction and 200+ FPS

Recent advancements in real-time neural rendering using point-based techniques have paved the way for the widespread adoption of 3D representations.

Qwen Technical Report

Large language models (LLMs) have revolutionized the field of artificial intelligence, enabling natural language processing tasks that were previously thought to be exclusive to humans.

basic machine learning research papers

Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models

Recently, instruction-following audio-language models have received broad attention for audio interaction with humans.

YUAN 2.0: A Large Language Model with Localized Filtering-based Attention

In this work, we develop and release Yuan 2. 0, a series of large language models with parameters ranging from 2. 1 billion to 102. 6 billion.

basic machine learning research papers

  • Review Article
  • Published: 22 March 2021

Machine Learning: Algorithms, Real-World Applications and Research Directions

  • Iqbal H. Sarker   ORCID: 1 , 2  

SN Computer Science volume  2 , Article number:  160 ( 2021 ) Cite this article

365k Accesses

997 Citations

19 Altmetric

Metrics details

In the current age of the Fourth Industrial Revolution (4 IR or Industry 4.0), the digital world has a wealth of data, such as Internet of Things (IoT) data, cybersecurity data, mobile data, business data, social media data, health data, etc. To intelligently analyze these data and develop the corresponding smart and automated  applications, the knowledge of artificial intelligence (AI), particularly, machine learning (ML) is the key. Various types of machine learning algorithms such as supervised, unsupervised, semi-supervised, and reinforcement learning exist in the area. Besides, the deep learning , which is part of a broader family of machine learning methods, can intelligently analyze the data on a large scale. In this paper, we present a comprehensive view on these machine learning algorithms that can be applied to enhance the intelligence and the capabilities of an application. Thus, this study’s key contribution is explaining the principles of different machine learning techniques and their applicability in various real-world application domains, such as cybersecurity systems, smart cities, healthcare, e-commerce, agriculture, and many more. We also highlight the challenges and potential research directions based on our study. Overall, this paper aims to serve as a reference point for both academia and industry professionals as well as for decision-makers in various real-world situations and application areas, particularly from the technical point of view.

Working on a manuscript?


We live in the age of data, where everything around us is connected to a data source, and everything in our lives is digitally recorded [ 21 , 103 ]. For instance, the current electronic world has a wealth of various kinds of data, such as the Internet of Things (IoT) data, cybersecurity data, smart city data, business data, smartphone data, social media data, health data, COVID-19 data, and many more. The data can be structured, semi-structured, or unstructured, discussed briefly in Sect. “ Types of Real-World Data and Machine Learning Techniques ”, which is increasing day-by-day. Extracting insights from these data can be used to build various intelligent applications in the relevant domains. For instance, to build a data-driven automated and intelligent cybersecurity system, the relevant cybersecurity data can be used [ 105 ]; to build personalized context-aware smart mobile applications, the relevant mobile data can be used [ 103 ], and so on. Thus, the data management tools and techniques having the capability of extracting insights or useful knowledge from the data in a timely and intelligent way is urgently needed, on which the real-world applications are based.

figure 1

The worldwide popularity score of various types of ML algorithms (supervised, unsupervised, semi-supervised, and reinforcement) in a range of 0 (min) to 100 (max) over time where x-axis represents the timestamp information and y-axis represents the corresponding score

Artificial intelligence (AI), particularly, machine learning (ML) have grown rapidly in recent years in the context of data analysis and computing that typically allows the applications to function in an intelligent manner [ 95 ]. ML usually provides systems with the ability to learn and enhance from experience automatically without being specifically programmed and is generally referred to as the most popular latest technologies in the fourth industrial revolution (4 IR or Industry 4.0) [ 103 , 105 ]. “Industry 4.0” [ 114 ] is typically the ongoing automation of conventional manufacturing and industrial practices, including exploratory data processing, using new smart technologies such as machine learning automation. Thus, to intelligently analyze these data and to develop the corresponding real-world applications, machine learning algorithms is the key. The learning algorithms can be categorized into four major types, such as supervised, unsupervised, semi-supervised, and reinforcement learning in the area [ 75 ], discussed briefly in Sect. “ Types of Real-World Data and Machine Learning Techniques ”. The popularity of these approaches to learning is increasing day-by-day, which is shown in Fig. 1 , based on data collected from Google Trends [ 4 ] over the last five years. The x - axis of the figure indicates the specific dates and the corresponding popularity score within the range of \(0 \; (minimum)\) to \(100 \; (maximum)\) has been shown in y - axis . According to Fig. 1 , the popularity indication values for these learning types are low in 2015 and are increasing day by day. These statistics motivate us to study on machine learning in this paper, which can play an important role in the real-world through Industry 4.0 automation.

In general, the effectiveness and the efficiency of a machine learning solution depend on the nature and characteristics of data and the performance of the learning algorithms . In the area of machine learning algorithms, classification analysis, regression, data clustering, feature engineering and dimensionality reduction, association rule learning, or reinforcement learning techniques exist to effectively build data-driven systems [ 41 , 125 ]. Besides, deep learning originated from the artificial neural network that can be used to intelligently analyze data, which is known as part of a wider family of machine learning approaches [ 96 ]. Thus, selecting a proper learning algorithm that is suitable for the target application in a particular domain is challenging. The reason is that the purpose of different learning algorithms is different, even the outcome of different learning algorithms in a similar category may vary depending on the data characteristics [ 106 ]. Thus, it is important to understand the principles of various machine learning algorithms and their applicability to apply in various real-world application areas, such as IoT systems, cybersecurity services, business and recommendation systems, smart cities, healthcare and COVID-19, context-aware systems, sustainable agriculture, and many more that are explained briefly in Sect. “ Applications of Machine Learning ”.

Based on the importance and potentiality of “Machine Learning” to analyze the data mentioned above, in this paper, we provide a comprehensive view on various types of machine learning algorithms that can be applied to enhance the intelligence and the capabilities of an application. Thus, the key contribution of this study is explaining the principles and potentiality of different machine learning techniques, and their applicability in various real-world application areas mentioned earlier. The purpose of this paper is, therefore, to provide a basic guide for those academia and industry people who want to study, research, and develop data-driven automated and intelligent systems in the relevant areas based on machine learning techniques.

The key contributions of this paper are listed as follows:

To define the scope of our study by taking into account the nature and characteristics of various types of real-world data and the capabilities of various learning techniques.

To provide a comprehensive view on machine learning algorithms that can be applied to enhance the intelligence and capabilities of a data-driven application.

To discuss the applicability of machine learning-based solutions in various real-world application domains.

To highlight and summarize the potential research directions within the scope of our study for intelligent data analysis and services.

The rest of the paper is organized as follows. The next section presents the types of data and machine learning algorithms in a broader sense and defines the scope of our study. We briefly discuss and explain different machine learning algorithms in the subsequent section followed by which various real-world application areas based on machine learning algorithms are discussed and summarized. In the penultimate section, we highlight several research issues and potential future directions, and the final section concludes this paper.

Types of Real-World Data and Machine Learning Techniques

Machine learning algorithms typically consume and process data to learn the related patterns about individuals, business processes, transactions, events, and so on. In the following, we discuss various types of real-world data as well as categories of machine learning algorithms.

Types of Real-World Data

Usually, the availability of data is considered as the key to construct a machine learning model or data-driven real-world systems [ 103 , 105 ]. Data can be of various forms, such as structured, semi-structured, or unstructured [ 41 , 72 ]. Besides, the “metadata” is another type that typically represents data about the data. In the following, we briefly discuss these types of data.

Structured: It has a well-defined structure, conforms to a data model following a standard order, which is highly organized and easily accessed, and used by an entity or a computer program. In well-defined schemes, such as relational databases, structured data are typically stored, i.e., in a tabular format. For instance, names, dates, addresses, credit card numbers, stock information, geolocation, etc. are examples of structured data.

Unstructured: On the other hand, there is no pre-defined format or organization for unstructured data, making it much more difficult to capture, process, and analyze, mostly containing text and multimedia material. For example, sensor data, emails, blog entries, wikis, and word processing documents, PDF files, audio files, videos, images, presentations, web pages, and many other types of business documents can be considered as unstructured data.

Semi-structured: Semi-structured data are not stored in a relational database like the structured data mentioned above, but it does have certain organizational properties that make it easier to analyze. HTML, XML, JSON documents, NoSQL databases, etc., are some examples of semi-structured data.

Metadata: It is not the normal form of data, but “data about data”. The primary difference between “data” and “metadata” is that data are simply the material that can classify, measure, or even document something relative to an organization’s data properties. On the other hand, metadata describes the relevant data information, giving it more significance for data users. A basic example of a document’s metadata might be the author, file size, date generated by the document, keywords to define the document, etc.

In the area of machine learning and data science, researchers use various widely used datasets for different purposes. These are, for example, cybersecurity datasets such as NSL-KDD [ 119 ], UNSW-NB15 [ 76 ], ISCX’12 [ 1 ], CIC-DDoS2019 [ 2 ], Bot-IoT [ 59 ], etc., smartphone datasets such as phone call logs [ 84 , 101 ], SMS Log [ 29 ], mobile application usages logs [ 137 ] [ 117 ], mobile phone notification logs [ 73 ] etc., IoT data [ 16 , 57 , 62 ], agriculture and e-commerce data [ 120 , 138 ], health data such as heart disease [ 92 ], diabetes mellitus [ 83 , 134 ], COVID-19 [ 43 , 74 ], etc., and many more in various application domains. The data can be in different types discussed above, which may vary from application to application in the real world. To analyze such data in a particular problem domain, and to extract the insights or useful knowledge from the data for building the real-world intelligent applications, different types of machine learning techniques can be used according to their learning capabilities, which is discussed in the following.

Types of Machine Learning Techniques

Machine Learning algorithms are mainly divided into four categories: Supervised learning, Unsupervised learning, Semi-supervised learning, and Reinforcement learning [ 75 ], as shown in Fig. 2 . In the following, we briefly discuss each type of learning technique with the scope of their applicability to solve real-world problems.

figure 2

Various types of machine learning techniques

Supervised: Supervised learning is typically the task of machine learning to learn a function that maps an input to an output based on sample input-output pairs [ 41 ]. It uses labeled training data and a collection of training examples to infer a function. Supervised learning is carried out when certain goals are identified to be accomplished from a certain set of inputs [ 105 ], i.e., a task-driven approach . The most common supervised tasks are “classification” that separates the data, and “regression” that fits the data. For instance, predicting the class label or sentiment of a piece of text, like a tweet or a product review, i.e., text classification, is an example of supervised learning.

Unsupervised: Unsupervised learning analyzes unlabeled datasets without the need for human interference, i.e., a data-driven process [ 41 ]. This is widely used for extracting generative features, identifying meaningful trends and structures, groupings in results, and exploratory purposes. The most common unsupervised learning tasks are clustering, density estimation, feature learning, dimensionality reduction, finding association rules, anomaly detection, etc.

Semi-supervised: Semi-supervised learning can be defined as a hybridization of the above-mentioned supervised and unsupervised methods, as it operates on both labeled and unlabeled data [ 41 , 105 ]. Thus, it falls between learning “without supervision” and learning “with supervision”. In the real world, labeled data could be rare in several contexts, and unlabeled data are numerous, where semi-supervised learning is useful [ 75 ]. The ultimate goal of a semi-supervised learning model is to provide a better outcome for prediction than that produced using the labeled data alone from the model. Some application areas where semi-supervised learning is used include machine translation, fraud detection, labeling data and text classification.

Reinforcement: Reinforcement learning is a type of machine learning algorithm that enables software agents and machines to automatically evaluate the optimal behavior in a particular context or environment to improve its efficiency [ 52 ], i.e., an environment-driven approach . This type of learning is based on reward or penalty, and its ultimate goal is to use insights obtained from environmental activists to take action to increase the reward or minimize the risk [ 75 ]. It is a powerful tool for training AI models that can help increase automation or optimize the operational efficiency of sophisticated systems such as robotics, autonomous driving tasks, manufacturing and supply chain logistics, however, not preferable to use it for solving the basic or straightforward problems.

Thus, to build effective models in various application areas different types of machine learning techniques can play a significant role according to their learning capabilities, depending on the nature of the data discussed earlier, and the target outcome. In Table 1 , we summarize various types of machine learning techniques with examples. In the following, we provide a comprehensive view of machine learning algorithms that can be applied to enhance the intelligence and capabilities of a data-driven application.

Machine Learning Tasks and Algorithms

In this section, we discuss various machine learning algorithms that include classification analysis, regression analysis, data clustering, association rule learning, feature engineering for dimensionality reduction, as well as deep learning methods. A general structure of a machine learning-based predictive model has been shown in Fig. 3 , where the model is trained from historical data in phase 1 and the outcome is generated in phase 2 for the new test data.

figure 3

A general structure of a machine learning based predictive model considering both the training and testing phase

Classification Analysis

Classification is regarded as a supervised learning method in machine learning, referring to a problem of predictive modeling as well, where a class label is predicted for a given example [ 41 ]. Mathematically, it maps a function ( f ) from input variables ( X ) to output variables ( Y ) as target, label or categories. To predict the class of given data points, it can be carried out on structured or unstructured data. For example, spam detection such as “spam” and “not spam” in email service providers can be a classification problem. In the following, we summarize the common classification problems.

Binary classification: It refers to the classification tasks having two class labels such as “true and false” or “yes and no” [ 41 ]. In such binary classification tasks, one class could be the normal state, while the abnormal state could be another class. For instance, “cancer not detected” is the normal state of a task that involves a medical test, and “cancer detected” could be considered as the abnormal state. Similarly, “spam” and “not spam” in the above example of email service providers are considered as binary classification.

Multiclass classification: Traditionally, this refers to those classification tasks having more than two class labels [ 41 ]. The multiclass classification does not have the principle of normal and abnormal outcomes, unlike binary classification tasks. Instead, within a range of specified classes, examples are classified as belonging to one. For example, it can be a multiclass classification task to classify various types of network attacks in the NSL-KDD [ 119 ] dataset, where the attack categories are classified into four class labels, such as DoS (Denial of Service Attack), U2R (User to Root Attack), R2L (Root to Local Attack), and Probing Attack.

Multi-label classification: In machine learning, multi-label classification is an important consideration where an example is associated with several classes or labels. Thus, it is a generalization of multiclass classification, where the classes involved in the problem are hierarchically structured, and each example may simultaneously belong to more than one class in each hierarchical level, e.g., multi-level text classification. For instance, Google news can be presented under the categories of a “city name”, “technology”, or “latest news”, etc. Multi-label classification includes advanced machine learning algorithms that support predicting various mutually non-exclusive classes or labels, unlike traditional classification tasks where class labels are mutually exclusive [ 82 ].

Many classification algorithms have been proposed in the machine learning and data science literature [ 41 , 125 ]. In the following, we summarize the most common and popular methods that are used widely in various application areas.

Naive Bayes (NB): The naive Bayes algorithm is based on the Bayes’ theorem with the assumption of independence between each pair of features [ 51 ]. It works well and can be used for both binary and multi-class categories in many real-world situations, such as document or text classification, spam filtering, etc. To effectively classify the noisy instances in the data and to construct a robust prediction model, the NB classifier can be used [ 94 ]. The key benefit is that, compared to more sophisticated approaches, it needs a small amount of training data to estimate the necessary parameters and quickly [ 82 ]. However, its performance may affect due to its strong assumptions on features independence. Gaussian, Multinomial, Complement, Bernoulli, and Categorical are the common variants of NB classifier [ 82 ].

Linear Discriminant Analysis (LDA): Linear Discriminant Analysis (LDA) is a linear decision boundary classifier created by fitting class conditional densities to data and applying Bayes’ rule [ 51 , 82 ]. This method is also known as a generalization of Fisher’s linear discriminant, which projects a given dataset into a lower-dimensional space, i.e., a reduction of dimensionality that minimizes the complexity of the model or reduces the resulting model’s computational costs. The standard LDA model usually suits each class with a Gaussian density, assuming that all classes share the same covariance matrix [ 82 ]. LDA is closely related to ANOVA (analysis of variance) and regression analysis, which seek to express one dependent variable as a linear combination of other features or measurements.

Logistic regression (LR): Another common probabilistic based statistical model used to solve classification issues in machine learning is Logistic Regression (LR) [ 64 ]. Logistic regression typically uses a logistic function to estimate the probabilities, which is also referred to as the mathematically defined sigmoid function in Eq. 1 . It can overfit high-dimensional datasets and works well when the dataset can be separated linearly. The regularization (L1 and L2) techniques [ 82 ] can be used to avoid over-fitting in such scenarios. The assumption of linearity between the dependent and independent variables is considered as a major drawback of Logistic Regression. It can be used for both classification and regression problems, but it is more commonly used for classification.

K-nearest neighbors (KNN): K-Nearest Neighbors (KNN) [ 9 ] is an “instance-based learning” or non-generalizing learning, also known as a “lazy learning” algorithm. It does not focus on constructing a general internal model; instead, it stores all instances corresponding to training data in n -dimensional space. KNN uses data and classifies new data points based on similarity measures (e.g., Euclidean distance function) [ 82 ]. Classification is computed from a simple majority vote of the k nearest neighbors of each point. It is quite robust to noisy training data, and accuracy depends on the data quality. The biggest issue with KNN is to choose the optimal number of neighbors to be considered. KNN can be used both for classification as well as regression.

Support vector machine (SVM): In machine learning, another common technique that can be used for classification, regression, or other tasks is a support vector machine (SVM) [ 56 ]. In high- or infinite-dimensional space, a support vector machine constructs a hyper-plane or set of hyper-planes. Intuitively, the hyper-plane, which has the greatest distance from the nearest training data points in any class, achieves a strong separation since, in general, the greater the margin, the lower the classifier’s generalization error. It is effective in high-dimensional spaces and can behave differently based on different mathematical functions known as the kernel. Linear, polynomial, radial basis function (RBF), sigmoid, etc., are the popular kernel functions used in SVM classifier [ 82 ]. However, when the data set contains more noise, such as overlapping target classes, SVM does not perform well.

Decision tree (DT): Decision tree (DT) [ 88 ] is a well-known non-parametric supervised learning method. DT learning methods are used for both the classification and regression tasks [ 82 ]. ID3 [ 87 ], C4.5 [ 88 ], and CART [ 20 ] are well known for DT algorithms. Moreover, recently proposed BehavDT [ 100 ], and IntrudTree [ 97 ] by Sarker et al. are effective in the relevant application domains, such as user behavior analytics and cybersecurity analytics, respectively. By sorting down the tree from the root to some leaf nodes, as shown in Fig. 4 , DT classifies the instances. Instances are classified by checking the attribute defined by that node, starting at the root node of the tree, and then moving down the tree branch corresponding to the attribute value. For splitting, the most popular criteria are “gini” for the Gini impurity and “entropy” for the information gain that can be expressed mathematically as [ 82 ].

figure 4

An example of a decision tree structure

figure 5

An example of a random forest structure considering multiple decision trees

Random forest (RF): A random forest classifier [ 19 ] is well known as an ensemble classification technique that is used in the field of machine learning and data science in various application areas. This method uses “parallel ensembling” which fits several decision tree classifiers in parallel, as shown in Fig. 5 , on different data set sub-samples and uses majority voting or averages for the outcome or final result. It thus minimizes the over-fitting problem and increases the prediction accuracy and control [ 82 ]. Therefore, the RF learning model with multiple decision trees is typically more accurate than a single decision tree based model [ 106 ]. To build a series of decision trees with controlled variation, it combines bootstrap aggregation (bagging) [ 18 ] and random feature selection [ 11 ]. It is adaptable to both classification and regression problems and fits well for both categorical and continuous values.

Adaptive Boosting (AdaBoost): Adaptive Boosting (AdaBoost) is an ensemble learning process that employs an iterative approach to improve poor classifiers by learning from their errors. This is developed by Yoav Freund et al. [ 35 ] and also known as “meta-learning”. Unlike the random forest that uses parallel ensembling, Adaboost uses “sequential ensembling”. It creates a powerful classifier by combining many poorly performing classifiers to obtain a good classifier of high accuracy. In that sense, AdaBoost is called an adaptive classifier by significantly improving the efficiency of the classifier, but in some instances, it can trigger overfits. AdaBoost is best used to boost the performance of decision trees, base estimator [ 82 ], on binary classification problems, however, is sensitive to noisy data and outliers.

Extreme gradient boosting (XGBoost): Gradient Boosting, like Random Forests [ 19 ] above, is an ensemble learning algorithm that generates a final model based on a series of individual models, typically decision trees. The gradient is used to minimize the loss function, similar to how neural networks [ 41 ] use gradient descent to optimize weights. Extreme Gradient Boosting (XGBoost) is a form of gradient boosting that takes more detailed approximations into account when determining the best model [ 82 ]. It computes second-order gradients of the loss function to minimize loss and advanced regularization (L1 and L2) [ 82 ], which reduces over-fitting, and improves model generalization and performance. XGBoost is fast to interpret and can handle large-sized datasets well.

Stochastic gradient descent (SGD): Stochastic gradient descent (SGD) [ 41 ] is an iterative method for optimizing an objective function with appropriate smoothness properties, where the word ‘stochastic’ refers to random probability. This reduces the computational burden, particularly in high-dimensional optimization problems, allowing for faster iterations in exchange for a lower convergence rate. A gradient is the slope of a function that calculates a variable’s degree of change in response to another variable’s changes. Mathematically, the Gradient Descent is a convex function whose output is a partial derivative of a set of its input parameters. Let, \(\alpha\) is the learning rate, and \(J_i\) is the training example cost of \(i \mathrm{th}\) , then Eq. ( 4 ) represents the stochastic gradient descent weight update method at the \(j^\mathrm{th}\) iteration. In large-scale and sparse machine learning, SGD has been successfully applied to problems often encountered in text classification and natural language processing [ 82 ]. However, SGD is sensitive to feature scaling and needs a range of hyperparameters, such as the regularization parameter and the number of iterations.

Rule-based classification : The term rule-based classification can be used to refer to any classification scheme that makes use of IF-THEN rules for class prediction. Several classification algorithms such as Zero-R [ 125 ], One-R [ 47 ], decision trees [ 87 , 88 ], DTNB [ 110 ], Ripple Down Rule learner (RIDOR) [ 125 ], Repeated Incremental Pruning to Produce Error Reduction (RIPPER) [ 126 ] exist with the ability of rule generation. The decision tree is one of the most common rule-based classification algorithms among these techniques because it has several advantages, such as being easier to interpret; the ability to handle high-dimensional data; simplicity and speed; good accuracy; and the capability to produce rules for human clear and understandable classification [ 127 ] [ 128 ]. The decision tree-based rules also provide significant accuracy in a prediction model for unseen test cases [ 106 ]. Since the rules are easily interpretable, these rule-based classifiers are often used to produce descriptive models that can describe a system including the entities and their relationships.

figure 6

Classification vs. regression. In classification the dotted line represents a linear boundary that separates the two classes; in regression, the dotted line models the linear relationship between the two variables

Regression Analysis

Regression analysis includes several methods of machine learning that allow to predict a continuous ( y ) result variable based on the value of one or more ( x ) predictor variables [ 41 ]. The most significant distinction between classification and regression is that classification predicts distinct class labels, while regression facilitates the prediction of a continuous quantity. Figure 6 shows an example of how classification is different with regression models. Some overlaps are often found between the two types of machine learning algorithms. Regression models are now widely used in a variety of fields, including financial forecasting or prediction, cost estimation, trend analysis, marketing, time series estimation, drug response modeling, and many more. Some of the familiar types of regression algorithms are linear, polynomial, lasso and ridge regression, etc., which are explained briefly in the following.

Simple and multiple linear regression: This is one of the most popular ML modeling techniques as well as a well-known regression technique. In this technique, the dependent variable is continuous, the independent variable(s) can be continuous or discrete, and the form of the regression line is linear. Linear regression creates a relationship between the dependent variable ( Y ) and one or more independent variables ( X ) (also known as regression line) using the best fit straight line [ 41 ]. It is defined by the following equations:

where a is the intercept, b is the slope of the line, and e is the error term. This equation can be used to predict the value of the target variable based on the given predictor variable(s). Multiple linear regression is an extension of simple linear regression that allows two or more predictor variables to model a response variable, y, as a linear function [ 41 ] defined in Eq. 6 , whereas simple linear regression has only 1 independent variable, defined in Eq. 5 .

Polynomial regression: Polynomial regression is a form of regression analysis in which the relationship between the independent variable x and the dependent variable y is not linear, but is the polynomial degree of \(n^\mathrm{th}\) in x [ 82 ]. The equation for polynomial regression is also derived from linear regression (polynomial regression of degree 1) equation, which is defined as below:

Here, y is the predicted/target output, \(b_0, b_1,... b_n\) are the regression coefficients, x is an independent/ input variable. In simple words, we can say that if data are not distributed linearly, instead it is \(n^\mathrm{th}\) degree of polynomial then we use polynomial regression to get desired output.

LASSO and ridge regression: LASSO and Ridge regression are well known as powerful techniques which are typically used for building learning models in presence of a large number of features, due to their capability to preventing over-fitting and reducing the complexity of the model. The LASSO (least absolute shrinkage and selection operator) regression model uses L 1 regularization technique [ 82 ] that uses shrinkage, which penalizes “absolute value of magnitude of coefficients” ( L 1 penalty). As a result, LASSO appears to render coefficients to absolute zero. Thus, LASSO regression aims to find the subset of predictors that minimizes the prediction error for a quantitative response variable. On the other hand, ridge regression uses L 2 regularization [ 82 ], which is the “squared magnitude of coefficients” ( L 2 penalty). Thus, ridge regression forces the weights to be small but never sets the coefficient value to zero, and does a non-sparse solution. Overall, LASSO regression is useful to obtain a subset of predictors by eliminating less important features, and ridge regression is useful when a data set has “multicollinearity” which refers to the predictors that are correlated with other predictors.

Cluster Analysis

Cluster analysis, also known as clustering, is an unsupervised machine learning technique for identifying and grouping related data points in large datasets without concern for the specific outcome. It does grouping a collection of objects in such a way that objects in the same category, called a cluster, are in some sense more similar to each other than objects in other groups [ 41 ]. It is often used as a data analysis technique to discover interesting trends or patterns in data, e.g., groups of consumers based on their behavior. In a broad range of application areas, such as cybersecurity, e-commerce, mobile data processing, health analytics, user modeling and behavioral analytics, clustering can be used. In the following, we briefly discuss and summarize various types of clustering methods.

Partitioning methods: Based on the features and similarities in the data, this clustering approach categorizes the data into multiple groups or clusters. The data scientists or analysts typically determine the number of clusters either dynamically or statically depending on the nature of the target applications, to produce for the methods of clustering. The most common clustering algorithms based on partitioning methods are K-means [ 69 ], K-Mediods [ 80 ], CLARA [ 55 ] etc.

Density-based methods: To identify distinct groups or clusters, it uses the concept that a cluster in the data space is a contiguous region of high point density isolated from other such clusters by contiguous regions of low point density. Points that are not part of a cluster are considered as noise. The typical clustering algorithms based on density are DBSCAN [ 32 ], OPTICS [ 12 ] etc. The density-based methods typically struggle with clusters of similar density and high dimensionality data.

Hierarchical-based methods: Hierarchical clustering typically seeks to construct a hierarchy of clusters, i.e., the tree structure. Strategies for hierarchical clustering generally fall into two types: (i) Agglomerative—a “bottom-up” approach in which each observation begins in its cluster and pairs of clusters are combined as one, moves up the hierarchy, and (ii) Divisive—a “top-down” approach in which all observations begin in one cluster and splits are performed recursively, moves down the hierarchy, as shown in Fig 7 . Our earlier proposed BOTS technique, Sarker et al. [ 102 ] is an example of a hierarchical, particularly, bottom-up clustering algorithm.

Grid-based methods: To deal with massive datasets, grid-based clustering is especially suitable. To obtain clusters, the principle is first to summarize the dataset with a grid representation and then to combine grid cells. STING [ 122 ], CLIQUE [ 6 ], etc. are the standard algorithms of grid-based clustering.

Model-based methods: There are mainly two types of model-based clustering algorithms: one that uses statistical learning, and the other based on a method of neural network learning [ 130 ]. For instance, GMM [ 89 ] is an example of a statistical learning method, and SOM [ 22 ] [ 96 ] is an example of a neural network learning method.

Constraint-based methods: Constrained-based clustering is a semi-supervised approach to data clustering that uses constraints to incorporate domain knowledge. Application or user-oriented constraints are incorporated to perform the clustering. The typical algorithms of this kind of clustering are COP K-means [ 121 ], CMWK-Means [ 27 ], etc.

figure 7

A graphical interpretation of the widely-used hierarchical clustering (Bottom-up and top-down) technique

Many clustering algorithms have been proposed with the ability to grouping data in machine learning and data science literature [ 41 , 125 ]. In the following, we summarize the popular methods that are used widely in various application areas.

K-means clustering: K-means clustering [ 69 ] is a fast, robust, and simple algorithm that provides reliable results when data sets are well-separated from each other. The data points are allocated to a cluster in this algorithm in such a way that the amount of the squared distance between the data points and the centroid is as small as possible. In other words, the K-means algorithm identifies the k number of centroids and then assigns each data point to the nearest cluster while keeping the centroids as small as possible. Since it begins with a random selection of cluster centers, the results can be inconsistent. Since extreme values can easily affect a mean, the K-means clustering algorithm is sensitive to outliers. K-medoids clustering [ 91 ] is a variant of K-means that is more robust to noises and outliers.

Mean-shift clustering: Mean-shift clustering [ 37 ] is a nonparametric clustering technique that does not require prior knowledge of the number of clusters or constraints on cluster shape. Mean-shift clustering aims to discover “blobs” in a smooth distribution or density of samples [ 82 ]. It is a centroid-based algorithm that works by updating centroid candidates to be the mean of the points in a given region. To form the final set of centroids, these candidates are filtered in a post-processing stage to remove near-duplicates. Cluster analysis in computer vision and image processing are examples of application domains. Mean Shift has the disadvantage of being computationally expensive. Moreover, in cases of high dimension, where the number of clusters shifts abruptly, the mean-shift algorithm does not work well.

DBSCAN: Density-based spatial clustering of applications with noise (DBSCAN) [ 32 ] is a base algorithm for density-based clustering which is widely used in data mining and machine learning. This is known as a non-parametric density-based clustering technique for separating high-density clusters from low-density clusters that are used in model building. DBSCAN’s main idea is that a point belongs to a cluster if it is close to many points from that cluster. It can find clusters of various shapes and sizes in a vast volume of data that is noisy and contains outliers. DBSCAN, unlike k-means, does not require a priori specification of the number of clusters in the data and can find arbitrarily shaped clusters. Although k-means is much faster than DBSCAN, it is efficient at finding high-density regions and outliers, i.e., is robust to outliers.

GMM clustering: Gaussian mixture models (GMMs) are often used for data clustering, which is a distribution-based clustering algorithm. A Gaussian mixture model is a probabilistic model in which all the data points are produced by a mixture of a finite number of Gaussian distributions with unknown parameters [ 82 ]. To find the Gaussian parameters for each cluster, an optimization algorithm called expectation-maximization (EM) [ 82 ] can be used. EM is an iterative method that uses a statistical model to estimate the parameters. In contrast to k-means, Gaussian mixture models account for uncertainty and return the likelihood that a data point belongs to one of the k clusters. GMM clustering is more robust than k-means and works well even with non-linear data distributions.

Agglomerative hierarchical clustering: The most common method of hierarchical clustering used to group objects in clusters based on their similarity is agglomerative clustering. This technique uses a bottom-up approach, where each object is first treated as a singleton cluster by the algorithm. Following that, pairs of clusters are merged one by one until all clusters have been merged into a single large cluster containing all objects. The result is a dendrogram, which is a tree-based representation of the elements. Single linkage [ 115 ], Complete linkage [ 116 ], BOTS [ 102 ] etc. are some examples of such techniques. The main advantage of agglomerative hierarchical clustering over k-means is that the tree-structure hierarchy generated by agglomerative clustering is more informative than the unstructured collection of flat clusters returned by k-means, which can help to make better decisions in the relevant application areas.

Dimensionality Reduction and Feature Learning

In machine learning and data science, high-dimensional data processing is a challenging task for both researchers and application developers. Thus, dimensionality reduction which is an unsupervised learning technique, is important because it leads to better human interpretations, lower computational costs, and avoids overfitting and redundancy by simplifying models. Both the process of feature selection and feature extraction can be used for dimensionality reduction. The primary distinction between the selection and extraction of features is that the “feature selection” keeps a subset of the original features [ 97 ], while “feature extraction” creates brand new ones [ 98 ]. In the following, we briefly discuss these techniques.

Feature selection: The selection of features, also known as the selection of variables or attributes in the data, is the process of choosing a subset of unique features (variables, predictors) to use in building machine learning and data science model. It decreases a model’s complexity by eliminating the irrelevant or less important features and allows for faster training of machine learning algorithms. A right and optimal subset of the selected features in a problem domain is capable to minimize the overfitting problem through simplifying and generalizing the model as well as increases the model’s accuracy [ 97 ]. Thus, “feature selection” [ 66 , 99 ] is considered as one of the primary concepts in machine learning that greatly affects the effectiveness and efficiency of the target machine learning model. Chi-squared test, Analysis of variance (ANOVA) test, Pearson’s correlation coefficient, recursive feature elimination, are some popular techniques that can be used for feature selection.

Feature extraction: In a machine learning-based model or system, feature extraction techniques usually provide a better understanding of the data, a way to improve prediction accuracy, and to reduce computational cost or training time. The aim of “feature extraction” [ 66 , 99 ] is to reduce the number of features in a dataset by generating new ones from the existing ones and then discarding the original features. The majority of the information found in the original set of features can then be summarized using this new reduced set of features. For instance, principal components analysis (PCA) is often used as a dimensionality-reduction technique to extract a lower-dimensional space creating new brand components from the existing features in a dataset [ 98 ].

Many algorithms have been proposed to reduce data dimensions in the machine learning and data science literature [ 41 , 125 ]. In the following, we summarize the popular methods that are used widely in various application areas.

Variance threshold: A simple basic approach to feature selection is the variance threshold [ 82 ]. This excludes all features of low variance, i.e., all features whose variance does not exceed the threshold. It eliminates all zero-variance characteristics by default, i.e., characteristics that have the same value in all samples. This feature selection algorithm looks only at the ( X ) features, not the ( y ) outputs needed, and can, therefore, be used for unsupervised learning.

Pearson correlation: Pearson’s correlation is another method to understand a feature’s relation to the response variable and can be used for feature selection [ 99 ]. This method is also used for finding the association between the features in a dataset. The resulting value is \([-1, 1]\) , where \(-1\) means perfect negative correlation, \(+1\) means perfect positive correlation, and 0 means that the two variables do not have a linear correlation. If two random variables represent X and Y , then the correlation coefficient between X and Y is defined as [ 41 ]

ANOVA: Analysis of variance (ANOVA) is a statistical tool used to verify the mean values of two or more groups that differ significantly from each other. ANOVA assumes a linear relationship between the variables and the target and the variables’ normal distribution. To statistically test the equality of means, the ANOVA method utilizes F tests. For feature selection, the results ‘ANOVA F value’ [ 82 ] of this test can be used where certain features independent of the goal variable can be omitted.

Chi square: The chi-square \({\chi }^2\) [ 82 ] statistic is an estimate of the difference between the effects of a series of events or variables observed and expected frequencies. The magnitude of the difference between the real and observed values, the degrees of freedom, and the sample size depends on \({\chi }^2\) . The chi-square \({\chi }^2\) is commonly used for testing relationships between categorical variables. If \(O_i\) represents observed value and \(E_i\) represents expected value, then

Recursive feature elimination (RFE): Recursive Feature Elimination (RFE) is a brute force approach to feature selection. RFE [ 82 ] fits the model and removes the weakest feature before it meets the specified number of features. Features are ranked by the coefficients or feature significance of the model. RFE aims to remove dependencies and collinearity in the model by recursively removing a small number of features per iteration.

Model-based selection: To reduce the dimensionality of the data, linear models penalized with the L 1 regularization can be used. Least absolute shrinkage and selection operator (Lasso) regression is a type of linear regression that has the property of shrinking some of the coefficients to zero [ 82 ]. Therefore, that feature can be removed from the model. Thus, the penalized lasso regression method, often used in machine learning to select the subset of variables. Extra Trees Classifier [ 82 ] is an example of a tree-based estimator that can be used to compute impurity-based function importance, which can then be used to discard irrelevant features.

Principal component analysis (PCA): Principal component analysis (PCA) is a well-known unsupervised learning approach in the field of machine learning and data science. PCA is a mathematical technique that transforms a set of correlated variables into a set of uncorrelated variables known as principal components [ 48 , 81 ]. Figure 8 shows an example of the effect of PCA on various dimensions space, where Fig. 8 a shows the original features in 3D space, and Fig. 8 b shows the created principal components PC1 and PC2 onto a 2D plane, and 1D line with the principal component PC1 respectively. Thus, PCA can be used as a feature extraction technique that reduces the dimensionality of the datasets, and to build an effective machine learning model [ 98 ]. Technically, PCA identifies the completely transformed with the highest eigenvalues of a covariance matrix and then uses those to project the data into a new subspace of equal or fewer dimensions [ 82 ].

figure 8

An example of a principal component analysis (PCA) and created principal components PC1 and PC2 in different dimension space

Association Rule Learning

Association rule learning is a rule-based machine learning approach to discover interesting relationships, “IF-THEN” statements, in large datasets between variables [ 7 ]. One example is that “if a customer buys a computer or laptop (an item), s/he is likely to also buy anti-virus software (another item) at the same time”. Association rules are employed today in many application areas, including IoT services, medical diagnosis, usage behavior analytics, web usage mining, smartphone applications, cybersecurity applications, and bioinformatics. In comparison to sequence mining, association rule learning does not usually take into account the order of things within or across transactions. A common way of measuring the usefulness of association rules is to use its parameter, the ‘support’ and ‘confidence’, which is introduced in [ 7 ].

In the data mining literature, many association rule learning methods have been proposed, such as logic dependent [ 34 ], frequent pattern based [ 8 , 49 , 68 ], and tree-based [ 42 ]. The most popular association rule learning algorithms are summarized below.

AIS and SETM: AIS is the first algorithm proposed by Agrawal et al. [ 7 ] for association rule mining. The AIS algorithm’s main downside is that too many candidate itemsets are generated, requiring more space and wasting a lot of effort. This algorithm calls for too many passes over the entire dataset to produce the rules. Another approach SETM [ 49 ] exhibits good performance and stable behavior with execution time; however, it suffers from the same flaw as the AIS algorithm.

Apriori: For generating association rules for a given dataset, Agrawal et al. [ 8 ] proposed the Apriori, Apriori-TID, and Apriori-Hybrid algorithms. These later algorithms outperform the AIS and SETM mentioned above due to the Apriori property of frequent itemset [ 8 ]. The term ‘Apriori’ usually refers to having prior knowledge of frequent itemset properties. Apriori uses a “bottom-up” approach, where it generates the candidate itemsets. To reduce the search space, Apriori uses the property “all subsets of a frequent itemset must be frequent; and if an itemset is infrequent, then all its supersets must also be infrequent”. Another approach predictive Apriori [ 108 ] can also generate rules; however, it receives unexpected results as it combines both the support and confidence. The Apriori [ 8 ] is the widely applicable techniques in mining association rules.

ECLAT: This technique was proposed by Zaki et al. [ 131 ] and stands for Equivalence Class Clustering and bottom-up Lattice Traversal. ECLAT uses a depth-first search to find frequent itemsets. In contrast to the Apriori [ 8 ] algorithm, which represents data in a horizontal pattern, it represents data vertically. Hence, the ECLAT algorithm is more efficient and scalable in the area of association rule learning. This algorithm is better suited for small and medium datasets whereas the Apriori algorithm is used for large datasets.

FP-Growth: Another common association rule learning technique based on the frequent-pattern tree (FP-tree) proposed by Han et al. [ 42 ] is Frequent Pattern Growth, known as FP-Growth. The key difference with Apriori is that while generating rules, the Apriori algorithm [ 8 ] generates frequent candidate itemsets; on the other hand, the FP-growth algorithm [ 42 ] prevents candidate generation and thus produces a tree by the successful strategy of ‘divide and conquer’ approach. Due to its sophistication, however, FP-Tree is challenging to use in an interactive mining environment [ 133 ]. Thus, the FP-Tree would not fit into memory for massive data sets, making it challenging to process big data as well. Another solution is RARM (Rapid Association Rule Mining) proposed by Das et al. [ 26 ] but faces a related FP-tree issue [ 133 ].

ABC-RuleMiner: A rule-based machine learning method, recently proposed in our earlier paper, by Sarker et al. [ 104 ], to discover the interesting non-redundant rules to provide real-world intelligent services. This algorithm effectively identifies the redundancy in associations by taking into account the impact or precedence of the related contextual features and discovers a set of non-redundant association rules. This algorithm first constructs an association generation tree (AGT), a top-down approach, and then extracts the association rules through traversing the tree. Thus, ABC-RuleMiner is more potent than traditional rule-based methods in terms of both non-redundant rule generation and intelligent decision-making, particularly in a context-aware smart computing environment, where human or user preferences are involved.

Among the association rule learning techniques discussed above, Apriori [ 8 ] is the most widely used algorithm for discovering association rules from a given dataset [ 133 ]. The main strength of the association learning technique is its comprehensiveness, as it generates all associations that satisfy the user-specified constraints, such as minimum support and confidence value. The ABC-RuleMiner approach [ 104 ] discussed earlier could give significant results in terms of non-redundant rule generation and intelligent decision-making for the relevant application areas in the real world.

Reinforcement Learning

Reinforcement learning (RL) is a machine learning technique that allows an agent to learn by trial and error in an interactive environment using input from its actions and experiences. Unlike supervised learning, which is based on given sample data or examples, the RL method is based on interacting with the environment. The problem to be solved in reinforcement learning (RL) is defined as a Markov Decision Process (MDP) [ 86 ], i.e., all about sequentially making decisions. An RL problem typically includes four elements such as Agent, Environment, Rewards, and Policy.

RL can be split roughly into Model-based and Model-free techniques. Model-based RL is the process of inferring optimal behavior from a model of the environment by performing actions and observing the results, which include the next state and the immediate reward [ 85 ]. AlphaZero, AlphaGo [ 113 ] are examples of the model-based approaches. On the other hand, a model-free approach does not use the distribution of the transition probability and the reward function associated with MDP. Q-learning, Deep Q Network, Monte Carlo Control, SARSA (State–Action–Reward–State–Action), etc. are some examples of model-free algorithms [ 52 ]. The policy network, which is required for model-based RL but not for model-free, is the key difference between model-free and model-based learning. In the following, we discuss the popular RL algorithms.

Monte Carlo methods: Monte Carlo techniques, or Monte Carlo experiments, are a wide category of computational algorithms that rely on repeated random sampling to obtain numerical results [ 52 ]. The underlying concept is to use randomness to solve problems that are deterministic in principle. Optimization, numerical integration, and making drawings from the probability distribution are the three problem classes where Monte Carlo techniques are most commonly used.

Q-learning: Q-learning is a model-free reinforcement learning algorithm for learning the quality of behaviors that tell an agent what action to take under what conditions [ 52 ]. It does not need a model of the environment (hence the term “model-free”), and it can deal with stochastic transitions and rewards without the need for adaptations. The ‘Q’ in Q-learning usually stands for quality, as the algorithm calculates the maximum expected rewards for a given behavior in a given state.

Deep Q-learning: The basic working step in Deep Q-Learning [ 52 ] is that the initial state is fed into the neural network, which returns the Q-value of all possible actions as an output. Still, when we have a reasonably simple setting to overcome, Q-learning works well. However, when the number of states and actions becomes more complicated, deep learning can be used as a function approximator.

Reinforcement learning, along with supervised and unsupervised learning, is one of the basic machine learning paradigms. RL can be used to solve numerous real-world problems in various fields, such as game theory, control theory, operations analysis, information theory, simulation-based optimization, manufacturing, supply chain logistics, multi-agent systems, swarm intelligence, aircraft control, robot motion control, and many more.

Artificial Neural Network and Deep Learning

Deep learning is part of a wider family of artificial neural networks (ANN)-based machine learning approaches with representation learning. Deep learning provides a computational architecture by combining several processing layers, such as input, hidden, and output layers, to learn from data [ 41 ]. The main advantage of deep learning over traditional machine learning methods is its better performance in several cases, particularly learning from large datasets [ 105 , 129 ]. Figure 9 shows a general performance of deep learning over machine learning considering the increasing amount of data. However, it may vary depending on the data characteristics and experimental set up.

figure 9

Machine learning and deep learning performance in general with the amount of data

The most common deep learning algorithms are: Multi-layer Perceptron (MLP), Convolutional Neural Network (CNN, or ConvNet), Long Short-Term Memory Recurrent Neural Network (LSTM-RNN) [ 96 ]. In the following, we discuss various types of deep learning methods that can be used to build effective data-driven models for various purposes.

figure 10

A structure of an artificial neural network modeling with multiple processing layers

MLP: The base architecture of deep learning, which is also known as the feed-forward artificial neural network, is called a multilayer perceptron (MLP) [ 82 ]. A typical MLP is a fully connected network consisting of an input layer, one or more hidden layers, and an output layer, as shown in Fig. 10 . Each node in one layer connects to each node in the following layer at a certain weight. MLP utilizes the “Backpropagation” technique [ 41 ], the most “fundamental building block” in a neural network, to adjust the weight values internally while building the model. MLP is sensitive to scaling features and allows a variety of hyperparameters to be tuned, such as the number of hidden layers, neurons, and iterations, which can result in a computationally costly model.

CNN or ConvNet: The convolution neural network (CNN) [ 65 ] enhances the design of the standard ANN, consisting of convolutional layers, pooling layers, as well as fully connected layers, as shown in Fig. 11 . As it takes the advantage of the two-dimensional (2D) structure of the input data, it is typically broadly used in several areas such as image and video recognition, image processing and classification, medical image analysis, natural language processing, etc. While CNN has a greater computational burden, without any manual intervention, it has the advantage of automatically detecting the important features, and hence CNN is considered to be more powerful than conventional ANN. A number of advanced deep learning models based on CNN can be used in the field, such as AlexNet [ 60 ], Xception [ 24 ], Inception [ 118 ], Visual Geometry Group (VGG) [ 44 ], ResNet [ 45 ], etc.

LSTM-RNN: Long short-term memory (LSTM) is an artificial recurrent neural network (RNN) architecture used in the area of deep learning [ 38 ]. LSTM has feedback links, unlike normal feed-forward neural networks. LSTM networks are well-suited for analyzing and learning sequential data, such as classifying, processing, and predicting data based on time series data, which differentiates it from other conventional networks. Thus, LSTM can be used when the data are in a sequential format, such as time, sentence, etc., and commonly applied in the area of time-series analysis, natural language processing, speech recognition, etc.

figure 11

An example of a convolutional neural network (CNN or ConvNet) including multiple convolution and pooling layers

In addition to these most common deep learning methods discussed above, several other deep learning approaches [ 96 ] exist in the area for various purposes. For instance, the self-organizing map (SOM) [ 58 ] uses unsupervised learning to represent the high-dimensional data by a 2D grid map, thus achieving dimensionality reduction. The autoencoder (AE) [ 15 ] is another learning technique that is widely used for dimensionality reduction as well and feature extraction in unsupervised learning tasks. Restricted Boltzmann machines (RBM) [ 46 ] can be used for dimensionality reduction, classification, regression, collaborative filtering, feature learning, and topic modeling. A deep belief network (DBN) is typically composed of simple, unsupervised networks such as restricted Boltzmann machines (RBMs) or autoencoders, and a backpropagation neural network (BPNN) [ 123 ]. A generative adversarial network (GAN) [ 39 ] is a form of the network for deep learning that can generate data with characteristics close to the actual data input. Transfer learning is currently very common because it can train deep neural networks with comparatively low data, which is typically the re-use of a new problem with a pre-trained model [ 124 ]. A brief discussion of these artificial neural networks (ANN) and deep learning (DL) models are summarized in our earlier paper Sarker et al. [ 96 ].

Overall, based on the learning techniques discussed above, we can conclude that various types of machine learning techniques, such as classification analysis, regression, data clustering, feature selection and extraction, and dimensionality reduction, association rule learning, reinforcement learning, or deep learning techniques, can play a significant role for various purposes according to their capabilities. In the following section, we discuss several application areas based on machine learning algorithms.

Applications of Machine Learning

In the current age of the Fourth Industrial Revolution (4IR), machine learning becomes popular in various application areas, because of its learning capabilities from the past and making intelligent decisions. In the following, we summarize and discuss ten popular application areas of machine learning technology.

Predictive analytics and intelligent decision-making: A major application field of machine learning is intelligent decision-making by data-driven predictive analytics [ 21 , 70 ]. The basis of predictive analytics is capturing and exploiting relationships between explanatory variables and predicted variables from previous events to predict the unknown outcome [ 41 ]. For instance, identifying suspects or criminals after a crime has been committed, or detecting credit card fraud as it happens. Another application, where machine learning algorithms can assist retailers in better understanding consumer preferences and behavior, better manage inventory, avoiding out-of-stock situations, and optimizing logistics and warehousing in e-commerce. Various machine learning algorithms such as decision trees, support vector machines, artificial neural networks, etc. [ 106 , 125 ] are commonly used in the area. Since accurate predictions provide insight into the unknown, they can improve the decisions of industries, businesses, and almost any organization, including government agencies, e-commerce, telecommunications, banking and financial services, healthcare, sales and marketing, transportation, social networking, and many others.

Cybersecurity and threat intelligence: Cybersecurity is one of the most essential areas of Industry 4.0. [ 114 ], which is typically the practice of protecting networks, systems, hardware, and data from digital attacks [ 114 ]. Machine learning has become a crucial cybersecurity technology that constantly learns by analyzing data to identify patterns, better detect malware in encrypted traffic, find insider threats, predict where bad neighborhoods are online, keep people safe while browsing, or secure data in the cloud by uncovering suspicious activity. For instance, clustering techniques can be used to identify cyber-anomalies, policy violations, etc. To detect various types of cyber-attacks or intrusions machine learning classification models by taking into account the impact of security features are useful [ 97 ]. Various deep learning-based security models can also be used on the large scale of security datasets [ 96 , 129 ]. Moreover, security policy rules generated by association rule learning techniques can play a significant role to build a rule-based security system [ 105 ]. Thus, we can say that various learning techniques discussed in Sect. Machine Learning Tasks and Algorithms , can enable cybersecurity professionals to be more proactive inefficiently preventing threats and cyber-attacks.

Internet of things (IoT) and smart cities: Internet of Things (IoT) is another essential area of Industry 4.0. [ 114 ], which turns everyday objects into smart objects by allowing them to transmit data and automate tasks without the need for human interaction. IoT is, therefore, considered to be the big frontier that can enhance almost all activities in our lives, such as smart governance, smart home, education, communication, transportation, retail, agriculture, health care, business, and many more [ 70 ]. Smart city is one of IoT’s core fields of application, using technologies to enhance city services and residents’ living experiences [ 132 , 135 ]. As machine learning utilizes experience to recognize trends and create models that help predict future behavior and events, it has become a crucial technology for IoT applications [ 103 ]. For example, to predict traffic in smart cities, parking availability prediction, estimate the total usage of energy of the citizens for a particular period, make context-aware and timely decisions for the people, etc. are some tasks that can be solved using machine learning techniques according to the current needs of the people.

Traffic prediction and transportation: Transportation systems have become a crucial component of every country’s economic development. Nonetheless, several cities around the world are experiencing an excessive rise in traffic volume, resulting in serious issues such as delays, traffic congestion, higher fuel prices, increased CO \(_2\) pollution, accidents, emergencies, and a decline in modern society’s quality of life [ 40 ]. Thus, an intelligent transportation system through predicting future traffic is important, which is an indispensable part of a smart city. Accurate traffic prediction based on machine and deep learning modeling can help to minimize the issues [ 17 , 30 , 31 ]. For example, based on the travel history and trend of traveling through various routes, machine learning can assist transportation companies in predicting possible issues that may occur on specific routes and recommending their customers to take a different path. Ultimately, these learning-based data-driven models help improve traffic flow, increase the usage and efficiency of sustainable modes of transportation, and limit real-world disruption by modeling and visualizing future changes.

Healthcare and COVID-19 pandemic: Machine learning can help to solve diagnostic and prognostic problems in a variety of medical domains, such as disease prediction, medical knowledge extraction, detecting regularities in data, patient management, etc. [ 33 , 77 , 112 ]. Coronavirus disease (COVID-19) is an infectious disease caused by a newly discovered coronavirus, according to the World Health Organization (WHO) [ 3 ]. Recently, the learning techniques have become popular in the battle against COVID-19 [ 61 , 63 ]. For the COVID-19 pandemic, the learning techniques are used to classify patients at high risk, their mortality rate, and other anomalies [ 61 ]. It can also be used to better understand the virus’s origin, COVID-19 outbreak prediction, as well as for disease diagnosis and treatment [ 14 , 50 ]. With the help of machine learning, researchers can forecast where and when, the COVID-19 is likely to spread, and notify those regions to match the required arrangements. Deep learning also provides exciting solutions to the problems of medical image processing and is seen as a crucial technique for potential applications, particularly for COVID-19 pandemic [ 10 , 78 , 111 ]. Overall, machine and deep learning techniques can help to fight the COVID-19 virus and the pandemic as well as intelligent clinical decisions making in the domain of healthcare.

E-commerce and product recommendations: Product recommendation is one of the most well known and widely used applications of machine learning, and it is one of the most prominent features of almost any e-commerce website today. Machine learning technology can assist businesses in analyzing their consumers’ purchasing histories and making customized product suggestions for their next purchase based on their behavior and preferences. E-commerce companies, for example, can easily position product suggestions and offers by analyzing browsing trends and click-through rates of specific items. Using predictive modeling based on machine learning techniques, many online retailers, such as Amazon [ 71 ], can better manage inventory, prevent out-of-stock situations, and optimize logistics and warehousing. The future of sales and marketing is the ability to capture, evaluate, and use consumer data to provide a customized shopping experience. Furthermore, machine learning techniques enable companies to create packages and content that are tailored to the needs of their customers, allowing them to maintain existing customers while attracting new ones.

NLP and sentiment analysis: Natural language processing (NLP) involves the reading and understanding of spoken or written language through the medium of a computer [ 79 , 103 ]. Thus, NLP helps computers, for instance, to read a text, hear speech, interpret it, analyze sentiment, and decide which aspects are significant, where machine learning techniques can be used. Virtual personal assistant, chatbot, speech recognition, document description, language or machine translation, etc. are some examples of NLP-related tasks. Sentiment Analysis [ 90 ] (also referred to as opinion mining or emotion AI) is an NLP sub-field that seeks to identify and extract public mood and views within a given text through blogs, reviews, social media, forums, news, etc. For instance, businesses and brands use sentiment analysis to understand the social sentiment of their brand, product, or service through social media platforms or the web as a whole. Overall, sentiment analysis is considered as a machine learning task that analyzes texts for polarity, such as “positive”, “negative”, or “neutral” along with more intense emotions like very happy, happy, sad, very sad, angry, have interest, or not interested etc.

Image, speech and pattern recognition: Image recognition [ 36 ] is a well-known and widespread example of machine learning in the real world, which can identify an object as a digital image. For instance, to label an x-ray as cancerous or not, character recognition, or face detection in an image, tagging suggestions on social media, e.g., Facebook, are common examples of image recognition. Speech recognition [ 23 ] is also very popular that typically uses sound and linguistic models, e.g., Google Assistant, Cortana, Siri, Alexa, etc. [ 67 ], where machine learning methods are used. Pattern recognition [ 13 ] is defined as the automated recognition of patterns and regularities in data, e.g., image analysis. Several machine learning techniques such as classification, feature selection, clustering, or sequence labeling methods are used in the area.

Sustainable agriculture: Agriculture is essential to the survival of all human activities [ 109 ]. Sustainable agriculture practices help to improve agricultural productivity while also reducing negative impacts on the environment [ 5 , 25 , 109 ]. The sustainable agriculture supply chains are knowledge-intensive and based on information, skills, technologies, etc., where knowledge transfer encourages farmers to enhance their decisions to adopt sustainable agriculture practices utilizing the increasing amount of data captured by emerging technologies, e.g., the Internet of Things (IoT), mobile technologies and devices, etc. [ 5 , 53 , 54 ]. Machine learning can be applied in various phases of sustainable agriculture, such as in the pre-production phase - for the prediction of crop yield, soil properties, irrigation requirements, etc.; in the production phase—for weather prediction, disease detection, weed detection, soil nutrient management, livestock management, etc.; in processing phase—for demand estimation, production planning, etc. and in the distribution phase - the inventory management, consumer analysis, etc.

User behavior analytics and context-aware smartphone applications: Context-awareness is a system’s ability to capture knowledge about its surroundings at any moment and modify behaviors accordingly [ 28 , 93 ]. Context-aware computing uses software and hardware to automatically collect and interpret data for direct responses. The mobile app development environment has been changed greatly with the power of AI, particularly, machine learning techniques through their learning capabilities from contextual data [ 103 , 136 ]. Thus, the developers of mobile apps can rely on machine learning to create smart apps that can understand human behavior, support, and entertain users [ 107 , 137 , 140 ]. To build various personalized data-driven context-aware systems, such as smart interruption management, smart mobile recommendation, context-aware smart searching, decision-making that intelligently assist end mobile phone users in a pervasive computing environment, machine learning techniques are applicable. For example, context-aware association rules can be used to build an intelligent phone call application [ 104 ]. Clustering approaches are useful in capturing users’ diverse behavioral activities by taking into account data in time series [ 102 ]. To predict the future events in various contexts, the classification methods can be used [ 106 , 139 ]. Thus, various learning techniques discussed in Sect. “ Machine Learning Tasks and Algorithms ” can help to build context-aware adaptive and smart applications according to the preferences of the mobile phone users.

In addition to these application areas, machine learning-based models can also apply to several other domains such as bioinformatics, cheminformatics, computer networks, DNA sequence classification, economics and banking, robotics, advanced engineering, and many more.

Challenges and Research Directions

Our study on machine learning algorithms for intelligent data analysis and applications opens several research issues in the area. Thus, in this section, we summarize and discuss the challenges faced and the potential research opportunities and future directions.

In general, the effectiveness and the efficiency of a machine learning-based solution depend on the nature and characteristics of the data, and the performance of the learning algorithms. To collect the data in the relevant domain, such as cybersecurity, IoT, healthcare and agriculture discussed in Sect. “ Applications of Machine Learning ” is not straightforward, although the current cyberspace enables the production of a huge amount of data with very high frequency. Thus, collecting useful data for the target machine learning-based applications, e.g., smart city applications, and their management is important to further analysis. Therefore, a more in-depth investigation of data collection methods is needed while working on the real-world data. Moreover, the historical data may contain many ambiguous values, missing values, outliers, and meaningless data. The machine learning algorithms, discussed in Sect “ Machine Learning Tasks and Algorithms ” highly impact on data quality, and availability for training, and consequently on the resultant model. Thus, to accurately clean and pre-process the diverse data collected from diverse sources is a challenging task. Therefore, effectively modifying or enhance existing pre-processing methods, or proposing new data preparation techniques are required to effectively use the learning algorithms in the associated application domain.

To analyze the data and extract insights, there exist many machine learning algorithms, summarized in Sect. “ Machine Learning Tasks and Algorithms ”. Thus, selecting a proper learning algorithm that is suitable for the target application is challenging. The reason is that the outcome of different learning algorithms may vary depending on the data characteristics [ 106 ]. Selecting a wrong learning algorithm would result in producing unexpected outcomes that may lead to loss of effort, as well as the model’s effectiveness and accuracy. In terms of model building, the techniques discussed in Sect. “ Machine Learning Tasks and Algorithms ” can directly be used to solve many real-world issues in diverse domains, such as cybersecurity, smart cities and healthcare summarized in Sect. “ Applications of Machine Learning ”. However, the hybrid learning model, e.g., the ensemble of methods, modifying or enhancement of the existing learning techniques, or designing new learning methods, could be a potential future work in the area.

Thus, the ultimate success of a machine learning-based solution and corresponding applications mainly depends on both the data and the learning algorithms. If the data are bad to learn, such as non-representative, poor-quality, irrelevant features, or insufficient quantity for training, then the machine learning models may become useless or will produce lower accuracy. Therefore, effectively processing the data and handling the diverse learning algorithms are important, for a machine learning-based solution and eventually building intelligent applications.

In this paper, we have conducted a comprehensive overview of machine learning algorithms for intelligent data analysis and applications. According to our goal, we have briefly discussed how various types of machine learning methods can be used for making solutions to various real-world issues. A successful machine learning model depends on both the data and the performance of the learning algorithms. The sophisticated learning algorithms then need to be trained through the collected real-world data and knowledge related to the target application before the system can assist with intelligent decision-making. We also discussed several popular application areas based on machine learning techniques to highlight their applicability in various real-world issues. Finally, we have summarized and discussed the challenges faced and the potential research opportunities and future directions in the area. Therefore, the challenges that are identified create promising research opportunities in the field which must be addressed with effective solutions in various application areas. Overall, we believe that our study on machine learning-based solutions opens up a promising direction and can be used as a reference guide for potential research and applications for both academia and industry professionals as well as for decision-makers, from a technical point of view.

Canadian institute of cybersecurity, university of new brunswick, iscx dataset, (Accessed on 20 October 2019).

Cic-ddos2019 [online]. available: (Accessed on 28 March 2020).

World health organization: WHO. .

Google trends. In , 2019.

Adnan N, Nordin Shahrina Md, Rahman I, Noor A. The effects of knowledge transfer on farmers decision making toward sustainable agriculture practices. World J Sci Technol Sustain Dev. 2018.

Agrawal R, Gehrke J, Gunopulos D, Raghavan P. Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings of the 1998 ACM SIGMOD international conference on Management of data. 1998; 94–105

Agrawal R, Imieliński T, Swami A. Mining association rules between sets of items in large databases. In: ACM SIGMOD Record. ACM. 1993;22: 207–216

Agrawal R, Gehrke J, Gunopulos D, Raghavan P. Fast algorithms for mining association rules. In: Proceedings of the International Joint Conference on Very Large Data Bases, Santiago Chile. 1994; 1215: 487–499.

Aha DW, Kibler D, Albert M. Instance-based learning algorithms. Mach Learn. 1991;6(1):37–66.

Article   Google Scholar  

Alakus TB, Turkoglu I. Comparison of deep learning approaches to predict covid-19 infection. Chaos Solit Fract. 2020;140:

Amit Y, Geman D. Shape quantization and recognition with randomized trees. Neural Comput. 1997;9(7):1545–88.

Ankerst M, Breunig MM, Kriegel H-P, Sander J. Optics: ordering points to identify the clustering structure. ACM Sigmod Record. 1999;28(2):49–60.

Anzai Y. Pattern recognition and machine learning. Elsevier; 2012.

MATH   Google Scholar  

Ardabili SF, Mosavi A, Ghamisi P, Ferdinand F, Varkonyi-Koczy AR, Reuter U, Rabczuk T, Atkinson PM. Covid-19 outbreak prediction with machine learning. Algorithms. 2020;13(10):249.

Article   MathSciNet   Google Scholar  

Baldi P. Autoencoders, unsupervised learning, and deep architectures. In: Proceedings of ICML workshop on unsupervised and transfer learning, 2012; 37–49 .

Balducci F, Impedovo D, Pirlo G. Machine learning applications on agricultural datasets for smart farm enhancement. Machines. 2018;6(3):38.

Boukerche A, Wang J. Machine learning-based traffic prediction models for intelligent transportation systems. Comput Netw. 2020;181

Breiman L. Bagging predictors. Mach Learn. 1996;24(2):123–40.

Article   MATH   Google Scholar  

Breiman L. Random forests. Mach Learn. 2001;45(1):5–32.

Breiman L, Friedman J, Stone CJ, Olshen RA. Classification and regression trees. CRC Press; 1984.

Cao L. Data science: a comprehensive overview. ACM Comput Surv (CSUR). 2017;50(3):43.

Google Scholar  

Carpenter GA, Grossberg S. A massively parallel architecture for a self-organizing neural pattern recognition machine. Comput Vis Graph Image Process. 1987;37(1):54–115.

Chiu C-C, Sainath TN, Wu Y, Prabhavalkar R, Nguyen P, Chen Z, Kannan A, Weiss RJ, Rao K, Gonina E, et al. State-of-the-art speech recognition with sequence-to-sequence models. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018 pages 4774–4778. IEEE .

Chollet F. Xception: deep learning with depthwise separable convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1251–1258, 2017.

Cobuloglu H, Büyüktahtakın IE. A stochastic multi-criteria decision analysis for sustainable biomass crop selection. Expert Syst Appl. 2015;42(15–16):6065–74.

Das A, Ng W-K, Woon Y-K. Rapid association rule mining. In: Proceedings of the tenth international conference on Information and knowledge management, pages 474–481. ACM, 2001.

de Amorim RC. Constrained clustering with minkowski weighted k-means. In: 2012 IEEE 13th International Symposium on Computational Intelligence and Informatics (CINTI), pages 13–17. IEEE, 2012.

Dey AK. Understanding and using context. Person Ubiquit Comput. 2001;5(1):4–7.

Eagle N, Pentland AS. Reality mining: sensing complex social systems. Person Ubiquit Comput. 2006;10(4):255–68.

Essien A, Petrounias I, Sampaio P, Sampaio S. Improving urban traffic speed prediction using data source fusion and deep learning. In: 2019 IEEE International Conference on Big Data and Smart Computing (BigComp). IEEE. 2019: 1–8. .

Essien A, Petrounias I, Sampaio P, Sampaio S. A deep-learning model for urban traffic flow prediction with traffic events mined from twitter. In: World Wide Web, 2020: 1–24 .

Ester M, Kriegel H-P, Sander J, Xiaowei X, et al. A density-based algorithm for discovering clusters in large spatial databases with noise. Kdd. 1996;96:226–31.

Fatima M, Pasha M, et al. Survey of machine learning algorithms for disease diagnostic. J Intell Learn Syst Appl. 2017;9(01):1.

Flach PA, Lachiche N. Confirmation-guided discovery of first-order rules with tertius. Mach Learn. 2001;42(1–2):61–95.

Freund Y, Schapire RE, et al. Experiments with a new boosting algorithm. In: Icml, Citeseer. 1996; 96: 148–156

Fujiyoshi H, Hirakawa T, Yamashita T. Deep learning-based image recognition for autonomous driving. IATSS Res. 2019;43(4):244–52.

Fukunaga K, Hostetler L. The estimation of the gradient of a density function, with applications in pattern recognition. IEEE Trans Inform Theory. 1975;21(1):32–40.

Article   MathSciNet   MATH   Google Scholar  

Goodfellow I, Bengio Y, Courville A, Bengio Y. Deep learning. Cambridge: MIT Press; 2016.

Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y. Generative adversarial nets. In: Advances in neural information processing systems. 2014: 2672–2680.

Guerrero-Ibáñez J, Zeadally S, Contreras-Castillo J. Sensor technologies for intelligent transportation systems. Sensors. 2018;18(4):1212.

Han J, Pei J, Kamber M. Data mining: concepts and techniques. Amsterdam: Elsevier; 2011.

Han J, Pei J, Yin Y. Mining frequent patterns without candidate generation. In: ACM Sigmod Record, ACM. 2000;29: 1–12.

Harmon SA, Sanford TH, Sheng X, Turkbey EB, Roth H, Ziyue X, Yang D, Myronenko A, Anderson V, Amalou A, et al. Artificial intelligence for the detection of covid-19 pneumonia on chest ct using multinational datasets. Nat Commun. 2020;11(1):1–7.

He K, Zhang X, Ren S, Sun J. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell. 2015;37(9):1904–16.

He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 2016: 770–778.

Hinton GE. A practical guide to training restricted boltzmann machines. In: Neural networks: Tricks of the trade. Springer. 2012; 599-619

Holte RC. Very simple classification rules perform well on most commonly used datasets. Mach Learn. 1993;11(1):63–90.

Hotelling H. Analysis of a complex of statistical variables into principal components. J Edu Psychol. 1933;24(6):417.

Houtsma M, Swami A. Set-oriented mining for association rules in relational databases. In: Data Engineering, 1995. Proceedings of the Eleventh International Conference on, IEEE.1995:25–33.

Jamshidi M, Lalbakhsh A, Talla J, Peroutka Z, Hadjilooei F, Lalbakhsh P, Jamshidi M, La Spada L, Mirmozafari M, Dehghani M, et al. Artificial intelligence and covid-19: deep learning approaches for diagnosis and treatment. IEEE Access. 2020;8:109581–95.

John GH, Langley P. Estimating continuous distributions in bayesian classifiers. In: Proceedings of the Eleventh conference on Uncertainty in artificial intelligence, Morgan Kaufmann Publishers Inc. 1995; 338–345

Kaelbling LP, Littman ML, Moore AW. Reinforcement learning: a survey. J Artif Intell Res. 1996;4:237–85.

Kamble SS, Gunasekaran A, Gawankar SA. Sustainable industry 4.0 framework: a systematic literature review identifying the current trends and future perspectives. Process Saf Environ Protect. 2018;117:408–25.

Kamble SS, Gunasekaran A, Gawankar SA. Achieving sustainable performance in a data-driven agriculture supply chain: a review for research and applications. Int J Prod Econ. 2020;219:179–94.

Kaufman L, Rousseeuw PJ. Finding groups in data: an introduction to cluster analysis, vol. 344. John Wiley & Sons; 2009.

Keerthi SS, Shevade SK, Bhattacharyya C, Radha Krishna MK. Improvements to platt’s smo algorithm for svm classifier design. Neural Comput. 2001;13(3):637–49.

Khadse V, Mahalle PN, Biraris SV. An empirical comparison of supervised machine learning algorithms for internet of things data. In: 2018 Fourth International Conference on Computing Communication Control and Automation (ICCUBEA), IEEE. 2018; 1–6

Kohonen T. The self-organizing map. Proc IEEE. 1990;78(9):1464–80.

Koroniotis N, Moustafa N, Sitnikova E, Turnbull B. Towards the development of realistic botnet dataset in the internet of things for network forensic analytics: bot-iot dataset. Fut Gen Comput Syst. 2019;100:779–96.

Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, 2012: 1097–1105

Kushwaha S, Bahl S, Bagha AK, Parmar KS, Javaid M, Haleem A, Singh RP. Significant applications of machine learning for covid-19 pandemic. J Ind Integr Manag. 2020;5(4).

Lade P, Ghosh R, Srinivasan S. Manufacturing analytics and industrial internet of things. IEEE Intell Syst. 2017;32(3):74–9.

Lalmuanawma S, Hussain J, Chhakchhuak L. Applications of machine learning and artificial intelligence for covid-19 (sars-cov-2) pandemic: a review. Chaos Sol Fract. 2020:110059 .

LeCessie S, Van Houwelingen JC. Ridge estimators in logistic regression. J R Stat Soc Ser C (Appl Stat). 1992;41(1):191–201.

LeCun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied to document recognition. Proc IEEE. 1998;86(11):2278–324.

Liu H, Motoda H. Feature extraction, construction and selection: A data mining perspective, vol. 453. Springer Science & Business Media; 1998.

López G, Quesada L, Guerrero LA. Alexa vs. siri vs. cortana vs. google assistant: a comparison of speech-based natural user interfaces. In: International Conference on Applied Human Factors and Ergonomics, Springer. 2017; 241–250.

Liu B, HsuW, Ma Y. Integrating classification and association rule mining. In: Proceedings of the fourth international conference on knowledge discovery and data mining, 1998.

MacQueen J, et al. Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, 1967;volume 1, pages 281–297. Oakland, CA, USA.

Mahdavinejad MS, Rezvan M, Barekatain M, Adibi P, Barnaghi P, Sheth AP. Machine learning for internet of things data analysis: a survey. Digit Commun Netw. 2018;4(3):161–75.

Marchand A, Marx P. Automated product recommendations with preference-based explanations. J Retail. 2020;96(3):328–43.

McCallum A. Information extraction: distilling structured data from unstructured text. Queue. 2005;3(9):48–57.

Mehrotra A, Hendley R, Musolesi M. Prefminer: mining user’s preferences for intelligent mobile notification management. In: Proceedings of the International Joint Conference on Pervasive and Ubiquitous Computing, Heidelberg, Germany, 12–16 September, 2016; pp. 1223–1234. ACM, New York, USA. .

Mohamadou Y, Halidou A, Kapen PT. A review of mathematical modeling, artificial intelligence and datasets used in the study, prediction and management of covid-19. Appl Intell. 2020;50(11):3913–25.

Mohammed M, Khan MB, Bashier Mohammed BE. Machine learning: algorithms and applications. CRC Press; 2016.

Book   Google Scholar  

Moustafa N, Slay J. Unsw-nb15: a comprehensive data set for network intrusion detection systems (unsw-nb15 network data set). In: 2015 military communications and information systems conference (MilCIS), 2015;pages 1–6. IEEE .

Nilashi M, Ibrahim OB, Ahmadi H, Shahmoradi L. An analytical method for diseases prediction using machine learning techniques. Comput Chem Eng. 2017;106:212–23.

Yujin O, Park S, Ye JC. Deep learning covid-19 features on cxr using limited training data sets. IEEE Trans Med Imaging. 2020;39(8):2688–700.

Otter DW, Medina JR , Kalita JK. A survey of the usages of deep learning for natural language processing. IEEE Trans Neural Netw Learn Syst. 2020.

Park H-S, Jun C-H. A simple and fast algorithm for k-medoids clustering. Expert Syst Appl. 2009;36(2):3336–41.

Liii Pearson K. on lines and planes of closest fit to systems of points in space. Lond Edinb Dublin Philos Mag J Sci. 1901;2(11):559–72.

Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, et al. Scikit-learn: machine learning in python. J Mach Learn Res. 2011;12:2825–30.

MathSciNet   MATH   Google Scholar  

Perveen S, Shahbaz M, Keshavjee K, Guergachi A. Metabolic syndrome and development of diabetes mellitus: predictive modeling based on machine learning techniques. IEEE Access. 2018;7:1365–75.

Santi P, Ram D, Rob C, Nathan E. Behavior-based adaptive call predictor. ACM Trans Auton Adapt Syst. 2011;6(3):21:1–21:28.

Polydoros AS, Nalpantidis L. Survey of model-based reinforcement learning: applications on robotics. J Intell Robot Syst. 2017;86(2):153–73.

Puterman ML. Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons; 2014.

Quinlan JR. Induction of decision trees. Mach Learn. 1986;1:81–106.

Quinlan JR. C4.5: programs for machine learning. Mach Learn. 1993.

Rasmussen C. The infinite gaussian mixture model. Adv Neural Inform Process Syst. 1999;12:554–60.

Ravi K, Ravi V. A survey on opinion mining and sentiment analysis: tasks, approaches and applications. Knowl Syst. 2015;89:14–46.

Rokach L. A survey of clustering algorithms. In: Data mining and knowledge discovery handbook, pages 269–298. Springer, 2010.

Safdar S, Zafar S, Zafar N, Khan NF. Machine learning based decision support systems (dss) for heart disease diagnosis: a review. Artif Intell Rev. 2018;50(4):597–623.

Sarker IH. Context-aware rule learning from smartphone data: survey, challenges and future directions. J Big Data. 2019;6(1):1–25.

Sarker IH. A machine learning based robust prediction model for real-life mobile phone data. Internet Things. 2019;5:180–93.

Sarker IH. Ai-driven cybersecurity: an overview, security intelligence modeling and research directions. SN Comput Sci. 2021.

Sarker IH. Deep cybersecurity: a comprehensive overview from neural network and deep learning perspective. SN Comput Sci. 2021.

Sarker IH, Abushark YB, Alsolami F, Khan A. Intrudtree: a machine learning based cyber security intrusion detection model. Symmetry. 2020;12(5):754.

Sarker IH, Abushark YB, Khan A. Contextpca: predicting context-aware smartphone apps usage based on machine learning techniques. Symmetry. 2020;12(4):499.

Sarker IH, Alqahtani H, Alsolami F, Khan A, Abushark YB, Siddiqui MK. Context pre-modeling: an empirical analysis for classification based user-centric context-aware predictive modeling. J Big Data. 2020;7(1):1–23.

Sarker IH, Alan C, Jun H, Khan AI, Abushark YB, Khaled S. Behavdt: a behavioral decision tree learning to build user-centric context-aware predictive model. Mob Netw Appl. 2019; 1–11.

Sarker IH, Colman A, Kabir MA, Han J. Phone call log as a context source to modeling individual user behavior. In: Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing (Ubicomp): Adjunct, Germany, pages 630–634. ACM, 2016.

Sarker IH, Colman A, Kabir MA, Han J. Individualized time-series segmentation for mining mobile phone user behavior. Comput J Oxf Univ UK. 2018;61(3):349–68.

Sarker IH, Hoque MM, MdK Uddin, Tawfeeq A. Mobile data science and intelligent apps: concepts, ai-based modeling and research directions. Mob Netw Appl, pages 1–19, 2020.

Sarker IH, Kayes ASM. Abc-ruleminer: user behavioral rule-based machine learning method for context-aware intelligent services. J Netw Comput Appl. 2020; page 102762

Sarker IH, Kayes ASM, Badsha S, Alqahtani H, Watters P, Ng A. Cybersecurity data science: an overview from machine learning perspective. J Big Data. 2020;7(1):1–29.

Sarker IH, Watters P, Kayes ASM. Effectiveness analysis of machine learning classification models for predicting personalized context-aware smartphone usage. J Big Data. 2019;6(1):1–28.

Sarker IH, Salah K. Appspred: predicting context-aware smartphone apps using random forest learning. Internet Things. 2019;8:

Scheffer T. Finding association rules that trade support optimally against confidence. Intell Data Anal. 2005;9(4):381–95.

Sharma R, Kamble SS, Gunasekaran A, Kumar V, Kumar A. A systematic literature review on machine learning applications for sustainable agriculture supply chain performance. Comput Oper Res. 2020;119:

Shengli S, Ling CX. Hybrid cost-sensitive decision tree, knowledge discovery in databases. In: PKDD 2005, Proceedings of 9th European Conference on Principles and Practice of Knowledge Discovery in Databases. Lecture Notes in Computer Science, volume 3721, 2005.

Shorten C, Khoshgoftaar TM, Furht B. Deep learning applications for covid-19. J Big Data. 2021;8(1):1–54.

Gökhan S, Nevin Y. Data analysis in health and big data: a machine learning medical diagnosis model based on patients’ complaints. Commun Stat Theory Methods. 2019;1–10

Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Van Den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M, et al. Mastering the game of go with deep neural networks and tree search. nature. 2016;529(7587):484–9.

Ślusarczyk B. Industry 4.0: Are we ready? Polish J Manag Stud. 17, 2018.

Sneath Peter HA. The application of computers to taxonomy. J Gen Microbiol. 1957;17(1).

Sorensen T. Method of establishing groups of equal amplitude in plant sociology based on similarity of species. Biol Skr. 1948; 5.

Srinivasan V, Moghaddam S, Mukherji A. Mobileminer: mining your frequent patterns on your phone. In: Proceedings of the International Joint Conference on Pervasive and Ubiquitous Computing, Seattle, WA, USA, 13-17 September, pp. 389–400. ACM, New York, USA. 2014.

Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A. Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2015; pages 1–9.

Tavallaee M, Bagheri E, Lu W, Ghorbani AA. A detailed analysis of the kdd cup 99 data set. In. IEEE symposium on computational intelligence for security and defense applications. IEEE. 2009;2009:1–6.

Tsagkias M. Tracy HK, Surya K, Vanessa M, de Rijke M. Challenges and research opportunities in ecommerce search and recommendations. In: ACM SIGIR Forum. volume 54. NY, USA: ACM New York; 2021. p. 1–23.

Wagstaff K, Cardie C, Rogers S, Schrödl S, et al. Constrained k-means clustering with background knowledge. Icml. 2001;1:577–84.

Wang W, Yang J, Muntz R, et al. Sting: a statistical information grid approach to spatial data mining. VLDB. 1997;97:186–95.

Wei P, Li Y, Zhang Z, Tao H, Li Z, Liu D. An optimization method for intrusion detection classification model based on deep belief network. IEEE Access. 2019;7:87593–605.

Weiss K, Khoshgoftaar TM, Wang DD. A survey of transfer learning. J Big data. 2016;3(1):9.

Witten IH, Frank E. Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann; 2005.

Witten IH, Frank E, Trigg LE, Hall MA, Holmes G, Cunningham SJ. Weka: practical machine learning tools and techniques with java implementations. 1999.

Wu C-C, Yen-Liang C, Yi-Hung L, Xiang-Yu Y. Decision tree induction with a constrained number of leaf nodes. Appl Intell. 2016;45(3):673–85.

Wu X, Kumar V, Quinlan JR, Ghosh J, Yang Q, Motoda H, McLachlan GJ, Ng A, Liu B, Philip SY, et al. Top 10 algorithms in data mining. Knowl Inform Syst. 2008;14(1):1–37.

Xin Y, Kong L, Liu Z, Chen Y, Li Y, Zhu H, Gao M, Hou H, Wang C. Machine learning and deep learning methods for cybersecurity. IEEE Access. 2018;6:35365–81.

Xu D, Yingjie T. A comprehensive survey of clustering algorithms. Ann Data Sci. 2015;2(2):165–93.

Zaki MJ. Scalable algorithms for association mining. IEEE Trans Knowl Data Eng. 2000;12(3):372–90.

Zanella A, Bui N, Castellani A, Vangelista L, Zorzi M. Internet of things for smart cities. IEEE Internet Things J. 2014;1(1):22–32.

Zhao Q, Bhowmick SS. Association rule mining: a survey. Singapore: Nanyang Technological University; 2003.

Zheng T, Xie W, Xu L, He X, Zhang Y, You M, Yang G, Chen Y. A machine learning-based framework to identify type 2 diabetes through electronic health records. Int J Med Inform. 2017;97:120–7.

Zheng Y, Rajasegarar S, Leckie C. Parking availability prediction for sensor-enabled car parks in smart cities. In: Intelligent Sensors, Sensor Networks and Information Processing (ISSNIP), 2015 IEEE Tenth International Conference on. IEEE, 2015; pages 1–6.

Zhu H, Cao H, Chen E, Xiong H, Tian J. Exploiting enriched contextual information for mobile app classification. In: Proceedings of the 21st ACM international conference on Information and knowledge management. ACM, 2012; pages 1617–1621

Zhu H, Chen E, Xiong H, Kuifei Y, Cao H, Tian J. Mining mobile user preferences for personalized context-aware recommendation. ACM Trans Intell Syst Technol (TIST). 2014;5(4):58.

Zikang H, Yong Y, Guofeng Y, Xinyu Z. Sentiment analysis of agricultural product ecommerce review data based on deep learning. In: 2020 International Conference on Internet of Things and Intelligent Applications (ITIA), IEEE, 2020; pages 1–7

Zulkernain S, Madiraju P, Ahamed SI. A context aware interruption management system for mobile devices. In: Mobile Wireless Middleware, Operating Systems, and Applications. Springer. 2010; pages 221–234

Zulkernain S, Madiraju P, Ahamed S, Stamm K. A mobile intelligent interruption management system. J UCS. 2010;16(15):2060–80.

Download references

Author information

Authors and affiliations.

Swinburne University of Technology, Melbourne, VIC, 3122, Australia

Iqbal H. Sarker

Department of Computer Science and Engineering, Chittagong University of Engineering & Technology, 4349, Chattogram, Bangladesh

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Iqbal H. Sarker .

Ethics declarations

Conflict of interest.

The author declares no conflict of interest.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the topical collection “Advances in Computational Approaches for Artificial Intelligence, Image Processing, IoT and Cloud Applications” guest edited by Bhanu Prakash K N and M. Shivakumar.

Rights and permissions

Reprints and Permissions

About this article

Cite this article.

Sarker, I.H. Machine Learning: Algorithms, Real-World Applications and Research Directions. SN COMPUT. SCI. 2 , 160 (2021).

Download citation

Received : 27 January 2021

Accepted : 12 March 2021

Published : 22 March 2021


Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Machine learning
  • Deep learning
  • Artificial intelligence
  • Data science
  • Data-driven decision-making
  • Predictive analytics
  • Intelligent applications


  • Find a journal
  • Publish with us

Top 12 Research Papers to Study If You Are Interested in Machine Learning‍

Table of contents.

Machine Learning is a fast evolving field that has helped almost all types of  industries, including healthcare, finance, and technology. Over the past few years, ML has become a popular topic for research, and has achieved numerous breakthroughs.

This article discusses 10 research articles that a researcher needs to read, to keep up with the latest ML approaches and concepts in 2023.

  • Novel Machine Learning Algorithm Can Identify Patients at Risk of Poor Overall Survival Following Curative Resection for Colorectal Liver Metastases

(Amygdalos et al., 2023)

Field of Research: Medical and Healthcare

This paper demonstrates the power of a gradient-boosted decision tree model in identifying high-risk patients for poor overall survival after curative resection for colorectal liver metastases. The findings suggest potential benefits of closer follow-up and aggressive treatment strategies for these patients, paving the way for more personalized healthcare interventions.

  • A Novel Study on Machine Learning Algorithm-Based Cardiovascular Disease Prediction

(Khan, Qureshi, Daniyal, & Tawiah, 2023)

Field of Research: Healthcare and Medical 

Investigating machine learning methods, such as decision trees and random forests, this study focuses on accurate prediction and decision-making for cardiovascular disease patients. The research aims to improve patient diagnosis and treatment through the integration of predictive algorithms, contributing to the advancement of precision medicine.

  • A Novel Machine Learning Algorithm for Interval Systems Approximation Based on Artificial Neural Network

(Zerrougui, Adamou-Mitiche, & Mitiche, 2023)

Field of Research: Structural Dynamics and AI-Based Order Reduction

This paper introduces a novel artificial intelligence technique using artificial neural networks to reduce the degree of large interval systems. The method maintains system stability while achieving a very acceptable approximation, presenting a promising approach for more efficient and reliable structural dynamics analysis.

  • A Novel Machine Learning Approach for Solar Radiation Estimation

(Hissou, Benkirane, Guezzaz, Azrour, & Beni-Hssane, 2023)

Field of Research: Renewable Energy and Climate Modeling

Addressing solar energy variability, this paper explores a multivariate time series model with recursive feature elimination to estimate solar radiation. The findings contribute to renewable energy planning and climate modeling efforts, enabling better utilization of solar resources and facilitating sustainable energy transitions.

  • A Novel Machine Learning Method for Evaluating the Impact of Emission Sources on Ozone Formation

(Cheng et al., 2023)

Field of Research: Environmental Science and Pollution Control

Using positive matrix factorization and Shapley Additive explanation, this research assesses the impact of emission sources on ozone formation. The findings provide valuable insights into pollution control strategies, aiding in the design of effective air quality management policies.

  • AcrPred: A Hybrid Optimization with Enumerated Machine Learning Algorithm to Predict Anti-CRISPR Proteins

(Dao et al., 2023)

Field of Research: Biotechnology and Gene Editing

CRISPR-Cas is a powerful gene editing tool, and this paper focuses on predicting anti-CRISPR proteins using a high-accuracy predictive model based on machine learning. The research provides potential tools for gene editing regulation, opening up new avenues for precision gene therapies.

  • Modeling the Mechanical Properties of Recycled Aggregate Concrete Using Hybrid Machine Learning Algorithms

(Peng & Unluer, 2023)

Field of Research: Civil Engineering and Sustainable Construction

This study employs hybrid machine learning models to predict the mechanical properties of recycled aggregate concrete. The research contributes to sustainable construction practices and material optimization, offering greener and more efficient building solutions.

  • Deep MCANC: A Deep Learning Approach to Multi-Channel Active Noise Control

(Zhang & Wang, 2023)

Field of Research: Acoustics and Noise Control

This paper introduces deep MCANC, a deep learning-based approach for multi-channel active noise control. The research demonstrates the effectiveness of the method in wideband noise reduction and real-time applications, potentially revolutionizing noise reduction technologies.

  • Wheel Defect Detection Using a Hybrid Deep Learning Approach 

(Shaikh, Hussain, & Chowdhry, 2023)

Field of Research: Railway Transportation and Vibration Analysis

Addressing the issue of defective wheels in railway transportation, this research proposes a hybrid deep learning approach using accelerometer data for accurate detection of various wheel defects. The findings contribute to improved operational safety and maintenance, ensuring smoother and safer railway operations.

  • An Improved Forest Fire Detection Method Based on the Detectron2 Model and a Deep Learning Approach

(Abdusalomov, Islam, Nasimov, Mukhiddinov, & Whangbo, 2023)

Field of Research: Environmental Monitoring and Fire Safety

In this study, an improved forest fire detection method is proposed, utilizing deep learning approaches and the Detectron2 model. The research demonstrates high precision in detecting fires, which is critical for timely response and fire control, providing a reliable and scalable solution for fire management systems.

These ten research papers present significant contributions to the field of machine learning, showcasing the transformative potential of this technology across various domains. From improving healthcare outcomes and advancing renewable energy utilization to enhancing environmental monitoring and safety measures, machine learning continues to revolutionize industries and shape a more sustainable and efficient future.

As aspiring researchers and practitioners delve into these informative papers, they will gain a deeper appreciation for the multifaceted applications of machine learning, inspiring further exploration and innovation in this dynamic and evolving field. Embracing the insights from these research papers, the machine learning community can continue pushing the boundaries of AI technology, ushering in a new era of intelligent solutions for complex challenges.

Additional Reading: Review Papers on Machine Learning

  • Machine Learning and Deep Learning: A Review of Methods and Applications

(Sharifani & Amini, 2023)

Field of Research: Artificial Intelligence and Machine Learning

This comprehensive review article explores the methods and applications of both machine learning and deep learning. It delves into the strengths and weaknesses of these technologies and discusses their potential future directions. Additionally, the article addresses the challenges associated with data privacy, ethical considerations, and the importance of transparency in decision-making processes. As machine learning and deep learning continue to reshape industries and human-computer interactions, this review provides a valuable resource for understanding their transformative impact and potential for future innovation.

  • Machine Learning Algorithms to Forecast Air Quality: A Survey

(Méndez, Merayo, & Núñez, 2023)

Field of Research: Environmental Science and Air Quality Forecasting

Air pollution poses significant health risks, making the accurate forecasting of pollutant concentrations crucial for public health and environmental management. This survey reviews machine learning algorithms, particularly deep learning models, that have been applied to air quality forecasting from 2011 to 2021. The paper classifies the contributions based on geographical distribution, predicted values, predictor variables, evaluation metrics, and machine learning models used, providing valuable insights into the state-of-the-art techniques in this important environmental domain.

The two additional review papers provide readers with broader perspectives on the state-of-the-art in machine learning and its diverse applications. Together, these review papers complement the top 10 research papers by providing readers with comprehensive insights into the broader landscape of machine learning research and its real-world applications. As the field continues to evolve, staying updated with both cutting-edge research and broader trends will be essential for leveraging machine learning's potential to tackle complex problems and create a positive impact on society.

Abdusalomov, A. B., Islam, B. M. S., Nasimov, R., Mukhiddinov, M., & Whangbo, T. K. (2023). An Improved Forest Fire Detection Method Based on the Detectron2 Model and a Deep Learning Approach. Sensors , 23 (3), 1512.

Amygdalos, I., Müller‐Franzes, G., Bednarsch, J., Czigany, Z., Ulmer, T. F., Bruners, P., … Lang, S. A. (2023). Novel machine learning algorithm can identify patients at risk of poor overall survival following curative resection for colorectal liver metastases. Journal of Hepato-Biliary-Pancreatic Sciences , 30 (5), 602–614.

Cheng, Y., Huang, X.-F., Peng, Y., Tang, M.-X., Zhu, B., Xia, S.-Y., & He, L.-Y. (2023). A novel machine learning method for evaluating the impact of emission sources on ozone formation. Environmental Pollution , 316 , 120685.

Dao, F.-Y., Liu, M.-L., Su, W., Lv, H., Zhang, Z.-Y., Lin, H., & Liu, L. (2023). AcrPred: A hybrid optimization with enumerated machine learning algorithm to predict Anti-CRISPR proteins. International Journal of Biological Macromolecules , 228 , 706–714.

Hissou, H., Benkirane, S., Guezzaz, A., Azrour, M., & Beni-Hssane, A. (2023). A Novel Machine Learning Approach for Solar Radiation Estimation. Sustainability , 15 (13), 10609.

Khan, A., Qureshi, M., Daniyal, M., & Tawiah, K. (2023). A Novel Study on Machine Learning Algorithm-Based Cardiovascular Disease Prediction. Health & Social Care in the Community , 2023 , 1–10.

Méndez, M., Merayo, M. G., & Núñez, M. (2023). Machine learning algorithms to forecast air quality: a survey. Artificial Intelligence Review , 56 (9), 10031–10066.

Peng, Y., & Unluer, C. (2023). Modeling the mechanical properties of recycled aggregate concrete using hybrid machine learning algorithms. Resources, Conservation and Recycling , 190 , 106812.

Shaikh, K., Hussain, I., & Chowdhry, B. S. (2023). Wheel Defect Detection Using a Hybrid Deep Learning Approach. Sensors , 23 (14), 6248.

Sharifani, K., & Amini, M. (2023). Machine Learning and Deep Learning: A Review of Methods and Applications. World Information Technology and Engineering Journal , 10 (7), 3897–3904. Retrieved from

Zerrougui, R., Adamou-Mitiche, A. B. H., & Mitiche, L. (2023). A novel machine learning algorithm for interval systems approximation based on artificial neural network. Journal of Intelligent Manufacturing , 34 (5), 2171–2184.

Zhang, H., & Wang, D. (2023). Deep MCANC: A deep learning approach to multi-channel active noise control. Neural Networks , 158 , 318–327.

basic machine learning research papers

Build Your Own VS Code-Connected AI Coding Companion: A Step-by-Step Guide

basic machine learning research papers

Understanding Blender 4.0 Alpha with OpenUSD

basic machine learning research papers

Insanely Fast Text Transcription from Audio or Video Content Using Whisper Large V3

basic machine learning research papers

Fine-Tuning Your Own Personal Copilot

basic machine learning research papers

How to Speed Up Large Language Model Pipelines Using GPTCache: A Guide and Performance Comparison

basic machine learning research papers

How to Build a RAG-Powered Multi-Document Q&A Chatbot Using Llama2, Streamlit and Vector Database on E2E Cloud

This is a decorative image for: A Complete Guide To Customer Acquisition For Startups

A Complete Guide To Customer Acquisition For Startups

Any business is enlivened by its customers. Therefore, a strategy to constantly bring in new clients is an ongoing requirement. In this regard, having a proper customer acquisition strategy can be of great importance.

So, if you are just starting your business, or planning to expand it, read on to learn more about this concept.

The problem with customer acquisition

As an organization, when working in a diverse and competitive market like India, you need to have a well-defined customer acquisition strategy to attain success. However, this is where most startups struggle. Now, you may have a great product or service, but if you are not in the right place targeting the right demographic, you are not likely to get the results you want.

To resolve this, typically, companies invest, but if that is not channelized properly, it will be futile.

So, the best way out of this dilemma is to have a clear customer acquisition strategy in place.

How can you create the ideal customer acquisition strategy for your business?

  • Define what your goals are

You need to define your goals so that you can meet the revenue expectations you have for the current fiscal year. You need to find a value for the metrics –

  • MRR – Monthly recurring revenue, which tells you all the income that can be generated from all your income channels.
  • CLV – Customer lifetime value tells you how much a customer is willing to spend on your business during your mutual relationship duration.  
  • CAC – Customer acquisition costs, which tells how much your organization needs to spend to acquire customers constantly.
  • Churn rate – It tells you the rate at which customers stop doing business.

All these metrics tell you how well you will be able to grow your business and revenue.

  • Identify your ideal customers

You need to understand who your current customers are and who your target customers are. Once you are aware of your customer base, you can focus your energies in that direction and get the maximum sale of your products or services. You can also understand what your customers require through various analytics and markers and address them to leverage your products/services towards them.

  • Choose your channels for customer acquisition

How will you acquire customers who will eventually tell at what scale and at what rate you need to expand your business? You could market and sell your products on social media channels like Instagram, Facebook and YouTube, or invest in paid marketing like Google Ads. You need to develop a unique strategy for each of these channels. 

  • Communicate with your customers

If you know exactly what your customers have in mind, then you will be able to develop your customer strategy with a clear perspective in mind. You can do it through surveys or customer opinion forms, email contact forms, blog posts and social media posts. After that, you just need to measure the analytics, clearly understand the insights, and improve your strategy accordingly.

Combining these strategies with your long-term business plan will bring results. However, there will be challenges on the way, where you need to adapt as per the requirements to make the most of it. At the same time, introducing new technologies like AI and ML can also solve such issues easily. To learn more about the use of AI and ML and how they are transforming businesses, keep referring to the blog section of E2E Networks.

Reference Links

This is a decorative image for: Constructing 3D objects through Deep Learning

Image-based 3D Object Reconstruction State-of-the-Art and trends in the Deep Learning Era

3D reconstruction is one of the most complex issues of deep learning systems . There have been multiple types of research in this field, and almost everything has been tried on it — computer vision, computer graphics and machine learning, but to no avail. However, that has resulted in CNN or convolutional neural networks foraying into this field, which has yielded some success.

The Main Objective of the 3D Object Reconstruction

Developing this deep learning technology aims to infer the shape of 3D objects from 2D images. So, to conduct the experiment, you need the following:

  • Highly calibrated cameras that take a photograph of the image from various angles.
  • Large training datasets can predict the geometry of the object whose 3D image reconstruction needs to be done. These datasets can be collected from a database of images, or they can be collected and sampled from a video.

By using the apparatus and datasets, you will be able to proceed with the 3D reconstruction from 2D datasets.

State-of-the-art Technology Used by the Datasets for the Reconstruction of 3D Objects

The technology used for this purpose needs to stick to the following parameters:

Training with the help of one or multiple RGB images, where the segmentation of the 3D ground truth needs to be done. It could be one image, multiple images or even a video stream.

The testing will also be done on the same parameters, which will also help to create a uniform, cluttered background, or both.

The volumetric output will be done in both high and low resolution, and the surface output will be generated through parameterisation, template deformation and point cloud. Moreover, the direct and intermediate outputs will be calculated this way.

  • Network architecture used

The architecture used in training is 3D-VAE-GAN, which has an encoder and a decoder, with TL-Net and conditional GAN. At the same time, the testing architecture is 3D-VAE, which has an encoder and a decoder.

  • Training used

The degree of supervision used in 2D vs 3D supervision, weak supervision along with loss functions have to be included in this system. The training procedure is adversarial training with joint 2D and 3D embeddings. Also, the network architecture is extremely important for the speed and processing quality of the output images.

  • Practical applications and use cases

Volumetric representations and surface representations can do the reconstruction. Powerful computer systems need to be used for reconstruction.

Given below are some of the places where 3D Object Reconstruction Deep Learning Systems are used:

  • 3D reconstruction technology can be used in the Police Department for drawing the faces of criminals whose images have been procured from a crime site where their faces are not completely revealed.
  • It can be used for re-modelling ruins at ancient architectural sites. The rubble or the debris stubs of structures can be used to recreate the entire building structure and get an idea of how it looked in the past.
  • They can be used in plastic surgery where the organs, face, limbs or any other portion of the body has been damaged and needs to be rebuilt.
  • It can be used in airport security, where concealed shapes can be used for guessing whether a person is armed or is carrying explosives or not.
  • It can also help in completing DNA sequences.

So, if you are planning to implement this technology, then you can rent the required infrastructure from E2E Networks and avoid investing in it. And if you plan to learn more about such topics, then keep a tab on the blog section of the website .

This is a decorative image for: Comprehensive Guide to Deep Q-Learning for Data Science Enthusiasts

A Comprehensive Guide To Deep Q-Learning For Data Science Enthusiasts

For all data science enthusiasts who would love to dig deep, we have composed a write-up about Q-Learning specifically for you all. Deep Q-Learning and Reinforcement learning (RL) are extremely popular these days. These two data science methodologies use Python libraries like TensorFlow 2 and openAI’s Gym environment.

So, read on to know more.

What is Deep Q-Learning?

Deep Q-Learning utilizes the principles of Q-learning, but instead of using the Q-table, it uses the neural network. The algorithm of deep Q-Learning uses the states as input and the optimal Q-value of every action possible as the output. The agent gathers and stores all the previous experiences in the memory of the trained tuple in the following order:

State> Next state> Action> Reward

The neural network training stability increases using a random batch of previous data by using the experience replay. Experience replay also means the previous experiences stocking, and the target network uses it for training and calculation of the Q-network and the predicted Q-Value. This neural network uses openAI Gym, which is provided by taxi-v3 environments.

Now, any understanding of Deep Q-Learning   is incomplete without talking about Reinforcement Learning.

What is Reinforcement Learning?

Reinforcement is a subsection of ML. This part of ML is related to the action in which an environmental agent participates in a reward-based system and uses Reinforcement Learning to maximize the rewards. Reinforcement Learning is a different technique from unsupervised learning or supervised learning because it does not require a supervised input/output pair. The number of corrections is also less, so it is a highly efficient technique.

Now, the understanding of reinforcement learning is incomplete without knowing about Markov Decision Process (MDP). MDP is involved with each state that has been presented in the results of the environment, derived from the state previously there. The information which composes both states is gathered and transferred to the decision process. The task of the chosen agent is to maximize the awards. The MDP optimizes the actions and helps construct the optimal policy.

For developing the MDP, you need to follow the Q-Learning Algorithm, which is an extremely important part of data science and machine learning.

What is Q-Learning Algorithm?

The process of Q-Learning is important for understanding the data from scratch. It involves defining the parameters, choosing the actions from the current state and also choosing the actions from the previous state and then developing a Q-table for maximizing the results or output rewards.

The 4 steps that are involved in Q-Learning:

  • Initializing parameters – The RL (reinforcement learning) model learns the set of actions that the agent requires in the state, environment and time.
  • Identifying current state – The model stores the prior records for optimal action definition for maximizing the results. For acting in the present state, the state needs to be identified and perform an action combination for it.
  • Choosing the optimal action set and gaining the relevant experience – A Q-table is generated from the data with a set of specific states and actions, and the weight of this data is calculated for updating the Q-Table to the following step.
  • Updating Q-table rewards and next state determination – After the relevant experience is gained and agents start getting environmental records. The reward amplitude helps to present the subsequent step.  

In case the Q-table size is huge, then the generation of the model is a time-consuming process. This situation requires Deep Q-learning.

Hopefully, this write-up has provided an outline of Deep Q-Learning and its related concepts. If you wish to learn more about such topics, then keep a tab on the blog section of the E2E Networks website.

This is a decorative image for: GAUDI: A Neural Architect for Immersive 3D Scene Generation

GAUDI: A Neural Architect for Immersive 3D Scene Generation

The evolution of artificial intelligence in the past decade has been staggering, and now the focus is shifting towards AI and ML systems to understand and generate 3D spaces. As a result, there has been extensive research on manipulating 3D generative models. In this regard, Apple’s AI and ML scientists have developed GAUDI, a method specifically for this job.

An introduction to GAUDI

The GAUDI 3D immersive technique founders named it after the famous architect Antoni Gaudi. This AI model takes the help of a camera pose decoder, which enables it to guess the possible camera angles of a scene. Hence, the decoder then makes it possible to predict the 3D canvas from almost every angle.

What does GAUDI do?

GAUDI can perform multiple functions –

  • The extensions of these generative models have a tremendous effect on ML and computer vision. Pragmatically, such models are highly useful. They are applied in model-based reinforcement learning and planning world models, SLAM is s, or 3D content creation.
  • Generative modelling for 3D objects has been used for generating scenes using graf, pigan, and gsn, which incorporate a GAN (Generative Adversarial Network). The generator codes radiance fields exclusively. Using the 3D space in the scene along with the camera pose generates the 3D image from that point. This point has a density scalar and RGB value for that specific point in 3D space. This can be done from a 2D camera view. It does this by imposing 3D datasets on those 2D shots. It isolates various objects and scenes and combines them to render a new scene altogether.
  • GAUDI also removes GANs pathologies like mode collapse and improved GAN.
  • GAUDI also uses this to train data on a canonical coordinate system. You can compare it by looking at the trajectory of the scenes.

How is GAUDI applied to the content?

The steps of application for GAUDI have been given below:

  • Each trajectory is created, which consists of a sequence of posed images (These images are from a 3D scene) encoded into a latent representation. This representation which has a radiance field or what we refer to as the 3D scene and the camera path is created in a disentangled way. The results are interpreted as free parameters. The problem is optimized by and formulation of a reconstruction objective.
  • This simple training process is then scaled to trajectories, thousands of them creating a large number of views. The model samples the radiance fields totally from the previous distribution that the model has learned.
  • The scenes are thus synthesized by interpolation within the hidden space.
  • The scaling of 3D scenes generates many scenes that contain thousands of images. During training, there is no issue related to canonical orientation or mode collapse.
  • A novel de-noising optimization technique is used to find hidden representations that collaborate in modelling the camera poses and the radiance field to create multiple datasets with state-of-the-art performance in generating 3D scenes by building a setup that uses images and text.

To conclude, GAUDI has more capabilities and can also be used for sampling various images and video datasets. Furthermore, this will make a foray into AR (augmented reality) and VR (virtual reality). With GAUDI in hand, the sky is only the limit in the field of media creation. So, if you enjoy reading about the latest development in the field of AI and ML, then keep a tab on the blog section of the E2E Networks website.

Build on the most powerful infrastructure cloud

A vector illustration of a tech city using latest cloud technologies & infrastructure

basic machine learning research papers

Frequently Asked Questions

Journal of Machine Learning Research

The Journal of Machine Learning Research (JMLR), established in 2000 , provides an international forum for the electronic and paper publication of high-quality scholarly articles in all areas of machine learning. All published papers are freely available online.

  • 2023.01.20 : Volume 23 completed; Volume 24 began.
  • 2022.07.20 : New special issue on climate change .
  • 2022.02.18 : New blog post: Retrospectives from 20 Years of JMLR .
  • 2022.01.25 : Volume 22 completed; Volume 23 began.
  • 2021.12.02 : Message from outgoing co-EiC Bernhard Schölkopf .
  • 2021.02.10 : Volume 21 completed; Volume 22 began.
  • More news ...

Latest papers

Bagging in overparameterized learning: Risk characterization and risk monotonization Pratik Patil, Jin-Hong Du, Arun Kumar Kuchibhotla , 2023. [ abs ][ pdf ][ bib ]

Operator learning with PCA-Net: upper and lower complexity bounds Samuel Lanthaler , 2023. [ abs ][ pdf ][ bib ]

Mixed Regression via Approximate Message Passing Nelvin Tan, Ramji Venkataramanan , 2023. [ abs ][ pdf ][ bib ]      [ code ]

The Dynamics of Sharpness-Aware Minimization: Bouncing Across Ravines and Drifting Towards Wide Minima Peter L. Bartlett, Philip M. Long, Olivier Bousquet , 2023. [ abs ][ pdf ][ bib ]

MARLlib: A Scalable and Efficient Multi-agent Reinforcement Learning Library Siyi Hu, Yifan Zhong, Minquan Gao, Weixun Wang, Hao Dong, Xiaodan Liang, Zhihui Li, Xiaojun Chang, Yaodong Yang , 2023. (Machine Learning Open Source Software Paper) [ abs ][ pdf ][ bib ]      [ code ]

Fast Expectation Propagation for Heteroscedastic, Lasso-Penalized, and Quantile Regression Jackson Zhou, John T. Ormerod, Clara Grazian , 2023. [ abs ][ pdf ][ bib ]      [ code ]

Zeroth-Order Alternating Gradient Descent Ascent Algorithms for A Class of Nonconvex-Nonconcave Minimax Problems Zi Xu, Zi-Qi Wang, Jun-Lin Wang, Yu-Hong Dai , 2023. [ abs ][ pdf ][ bib ]

The Measure and Mismeasure of Fairness Sam Corbett-Davies, Johann D. Gaebler, Hamed Nilforoshan, Ravi Shroff, Sharad Goel , 2023. [ abs ][ pdf ][ bib ]      [ code ]

Microcanonical Hamiltonian Monte Carlo Jakob Robnik, G. Bruno De Luca, Eva Silverstein, Uroš Seljak , 2023. [ abs ][ pdf ][ bib ]      [ code ]

Prediction Equilibrium for Dynamic Network Flows Lukas Graf, Tobias Harks, Kostas Kollias, Michael Markl , 2023. [ abs ][ pdf ][ bib ]      [ code ]

Dimension Reduction and MARS Yu Liu LIU, Degui Li, Yingcun Xia , 2023. [ abs ][ pdf ][ bib ]

Nevis'22: A Stream of 100 Tasks Sampled from 30 Years of Computer Vision Research Jorg Bornschein, Alexandre Galashov, Ross Hemsley, Amal Rannen-Triki, Yutian Chen, Arslan Chaudhry, Xu Owen He, Arthur Douillard, Massimo Caccia, Qixuan Feng, Jiajun Shen, Sylvestre-Alvise Rebuffi, Kitty Stacpoole, Diego de las Casas, Will Hawkins, Angeliki Lazaridou, Yee Whye Teh, Andrei A. Rusu, Razvan Pascanu, Marc’Aurelio Ranzato , 2023. [ abs ][ pdf ][ bib ]      [ code ]

Fast Screening Rules for Optimal Design via Quadratic Lasso Reformulation Guillaume Sagnol, Luc Pronzato , 2023. [ abs ][ pdf ][ bib ]      [ code ]

Multi-Consensus Decentralized Accelerated Gradient Descent Haishan Ye, Luo Luo, Ziang Zhou, Tong Zhang , 2023. [ abs ][ pdf ][ bib ]

Continuous-in-time Limit for Bayesian Bandits Yuhua Zhu, Zachary Izzo, Lexing Ying , 2023. [ abs ][ pdf ][ bib ]

Two Sample Testing in High Dimension via Maximum Mean Discrepancy Hanjia Gao, Xiaofeng Shao , 2023. [ abs ][ pdf ][ bib ]

Random Feature Amplification: Feature Learning and Generalization in Neural Networks Spencer Frei, Niladri S. Chatterji, Peter L. Bartlett , 2023. [ abs ][ pdf ][ bib ]

Pivotal Estimation of Linear Discriminant Analysis in High Dimensions Ethan X. Fang, Yajun Mei, Yuyang Shi, Qunzhi Xu, Tuo Zhao , 2023. [ abs ][ pdf ][ bib ]

Learning Optimal Feedback Operators and their Sparse Polynomial Approximations Karl Kunisch, Donato Vásquez-Varas, Daniel Walter , 2023. [ abs ][ pdf ][ bib ]

Sensitivity-Free Gradient Descent Algorithms Ion Matei, Maksym Zhenirovskyy, Johan de Kleer, John Maxwell , 2023. [ abs ][ pdf ][ bib ]

A PDE approach for regret bounds under partial monitoring Erhan Bayraktar, Ibrahim Ekren, Xin Zhang , 2023. [ abs ][ pdf ][ bib ]

A General Learning Framework for Open Ad Hoc Teamwork Using Graph-based Policy Learning Arrasy Rahman, Ignacio Carlucho, Niklas Höpner, Stefano V. Albrecht , 2023. [ abs ][ pdf ][ bib ]      [ code ]

Causal Bandits for Linear Structural Equation Models Burak Varici, Karthikeyan Shanmugam, Prasanna Sattigeri, Ali Tajer , 2023. [ abs ][ pdf ][ bib ]

High-Dimensional Inference for Generalized Linear Models with Hidden Confounding Jing Ouyang, Kean Ming Tan, Gongjun Xu , 2023. [ abs ][ pdf ][ bib ]

Weibull Racing Survival Analysis with Competing Events, Left Truncation, and Time-Varying Covariates Quan Zhang, Yanxun Xu, Mei-Cheng Wang, Mingyuan Zhou , 2023. [ abs ][ pdf ][ bib ]

Erratum: Risk Bounds for the Majority Vote: From a PAC-Bayesian Analysis to a Learning Algorithm Louis-Philippe Vignault, Audrey Durand, Pascal Germain , 2023. [ abs ][ pdf ][ bib ]

Augmented Transfer Regression Learning with Semi-non-parametric Nuisance Models Molei Liu, Yi Zhang, Katherine P. Liao, Tianxi Cai , 2023. [ abs ][ pdf ][ bib ]

From Understanding Genetic Drift to a Smart-Restart Mechanism for Estimation-of-Distribution Algorithms Weijie Zheng, Benjamin Doerr , 2023. [ abs ][ pdf ][ bib ]

A Unified Analysis of Multi-task Functional Linear Regression Models with Manifold Constraint and Composite Quadratic Penalty Shiyuan He, Hanxuan Ye, Kejun He , 2023. [ abs ][ pdf ][ bib ]

Deletion and Insertion Tests in Regression Models Naofumi Hama, Masayoshi Mase, Art B. Owen , 2023. [ abs ][ pdf ][ bib ]

Deep Neural Networks with Dependent Weights: Gaussian Process Mixture Limit, Heavy Tails, Sparsity and Compressibility Hoil Lee, Fadhel Ayed, Paul Jung, Juho Lee, Hongseok Yang, Francois Caron , 2023. [ abs ][ pdf ][ bib ]      [ code ]

A New Look at Dynamic Regret for Non-Stationary Stochastic Bandits Yasin Abbasi-Yadkori, András György, Nevena Lazić , 2023. [ abs ][ pdf ][ bib ]

Universal Approximation Property of Invertible Neural Networks Isao Ishikawa, Takeshi Teshima, Koichi Tojo, Kenta Oono, Masahiro Ikeda, Masashi Sugiyama , 2023. [ abs ][ pdf ][ bib ]

Low Tree-Rank Bayesian Vector Autoregression Models Leo L Duan, Zeyu Yuwen, George Michailidis, Zhengwu Zhang , 2023. [ abs ][ pdf ][ bib ]      [ code ]

Generic Unsupervised Optimization for a Latent Variable Model With Exponential Family Observables Hamid Mousavi, Jakob Drefs, Florian Hirschberger, Jörg Lücke , 2023. [ abs ][ pdf ][ bib ]      [ code ]

A Complete Characterization of Linear Estimators for Offline Policy Evaluation Juan C. Perdomo, Akshay Krishnamurthy, Peter Bartlett, Sham Kakade , 2023. [ abs ][ pdf ][ bib ]

Near-Optimal Weighted Matrix Completion Oscar López , 2023. [ abs ][ pdf ][ bib ]

Community models for networks observed through edge nominations Tianxi Li, Elizaveta Levina, Ji Zhu , 2023. [ abs ][ pdf ][ bib ]      [ code ]

The Bayesian Learning Rule Mohammad Emtiyaz Khan, Håvard Rue , 2023. [ abs ][ pdf ][ bib ]

Removing Data Heterogeneity Influence Enhances Network Topology Dependence of Decentralized SGD Kun Yuan, Sulaiman A. Alghunaim, Xinmeng Huang , 2023. [ abs ][ pdf ][ bib ]

Sparse Markov Models for High-dimensional Inference Guilherme Ost, Daniel Y. Takahashi , 2023. [ abs ][ pdf ][ bib ]

Distinguishing Cause and Effect in Bivariate Structural Causal Models: A Systematic Investigation Christoph Käding,, Jakob Runge, , 2023. [ abs ][ pdf ][ bib ]

Elastic Gradient Descent, an Iterative Optimization Method Approximating the Solution Paths of the Elastic Net Oskar Allerbo, Johan Jonasson, Rebecka Jörnsten , 2023. [ abs ][ pdf ][ bib ]      [ code ]

On Biased Compression for Distributed Learning Aleksandr Beznosikov, Samuel Horváth, Peter Richtárik, Mher Safaryan , 2023. [ abs ][ pdf ][ bib ]

Adaptive Clustering Using Kernel Density Estimators Ingo Steinwart, Bharath K. Sriperumbudur, Philipp Thomann , 2023. [ abs ][ pdf ][ bib ]

A Continuous-time Stochastic Gradient Descent Method for Continuous Data Kexin Jin, Jonas Latz, Chenguang Liu, Carola-Bibiane Schönlieb , 2023. [ abs ][ pdf ][ bib ]

Online Non-stochastic Control with Partial Feedback Yu-Hu Yan, Peng Zhao, Zhi-Hua Zhou , 2023. [ abs ][ pdf ][ bib ]

Distributed Sparse Regression via Penalization Yao Ji, Gesualdo Scutari, Ying Sun, Harsha Honnappa , 2023. [ abs ][ pdf ][ bib ]

Causal Discovery with Unobserved Confounding and Non-Gaussian Data Y. Samuel Wang, Mathias Drton , 2023. [ abs ][ pdf ][ bib ]

Sharper Analysis for Minibatch Stochastic Proximal Point Methods: Stability, Smoothness, and Deviation Xiao-Tong Yuan, Ping Li , 2023. [ abs ][ pdf ][ bib ]

Dynamic Ranking with the BTL Model: A Nearest Neighbor based Rank Centrality Method Eglantine Karlé, Hemant Tyagi , 2023. [ abs ][ pdf ][ bib ]      [ code ]

Revisiting minimum description length complexity in overparameterized models Raaz Dwivedi, Chandan Singh, Bin Yu, Martin Wainwright , 2023. [ abs ][ pdf ][ bib ]      [ code ]

Sparse Plus Low Rank Matrix Decomposition: A Discrete Optimization Approach Dimitris Bertsimas, Ryan Cory-Wright, Nicholas A. G. Johnson , 2023. [ abs ][ pdf ][ bib ]      [ code ]

On the Estimation of Derivatives Using Plug-in Kernel Ridge Regression Estimators Zejian Liu, Meng Li , 2023. [ abs ][ pdf ][ bib ]

Surrogate Assisted Semi-supervised Inference for High Dimensional Risk Prediction Jue Hou, Zijian Guo, Tianxi Cai , 2023. [ abs ][ pdf ][ bib ]

ProtoryNet - Interpretable Text Classification Via Prototype Trajectories Dat Hong, Tong Wang, Stephen Baek , 2023. [ abs ][ pdf ][ bib ]      [ code ]

Distributed Algorithms for U-statistics-based Empirical Risk Minimization Lanjue Chen, Alan T.K. Wan, Shuyi Zhang, Yong Zhou , 2023. [ abs ][ pdf ][ bib ]

Minimax Estimation for Personalized Federated Learning: An Alternative between FedAvg and Local Training? Shuxiao Chen, Qinqing Zheng, Qi Long, Weijie J. Su , 2023. [ abs ][ pdf ][ bib ]

Nearest Neighbor Dirichlet Mixtures Shounak Chattopadhyay, Antik Chakraborty, David B. Dunson , 2023. [ abs ][ pdf ][ bib ]      [ code ]

Learning to Rank under Multinomial Logit Choice James A. Grant, David S. Leslie , 2023. [ abs ][ pdf ][ bib ]

Scalable high-dimensional Bayesian varying coefficient models with unknown within-subject covariance Ray Bai, Mary R. Boland, Yong Chen , 2023. [ abs ][ pdf ][ bib ]      [ code ]

Multi-view Collaborative Gaussian Process Dynamical Systems Shiliang Sun, Jingjing Fei, Jing Zhao, Liang Mao , 2023. [ abs ][ pdf ][ bib ]

Fairlearn: Assessing and Improving Fairness of AI Systems Hilde Weerts, Miroslav Dudík, Richard Edgar, Adrin Jalali, Roman Lutz, Michael Madaio , 2023. (Machine Learning Open Source Software Paper) [ abs ][ pdf ][ bib ]      [ code ]

Scalable Real-Time Recurrent Learning Using Columnar-Constructive Networks Khurram Javed, Haseeb Shah, Richard S. Sutton, Martha White , 2023. [ abs ][ pdf ][ bib ]      [ code ]

Torchhd: An Open Source Python Library to Support Research on Hyperdimensional Computing and Vector Symbolic Architectures Mike Heddes, Igor Nunes, Pere Vergés, Denis Kleyko, Danny Abraham, Tony Givargis, Alexandru Nicolau, Alexander Veidenbaum , 2023. (Machine Learning Open Source Software Paper) [ abs ][ pdf ][ bib ]      [ code ]

skrl: Modular and Flexible Library for Reinforcement Learning Antonio Serrano-Muñoz, Dimitrios Chrysostomou, Simon Bøgh, Nestor Arana-Arexolaleiba , 2023. (Machine Learning Open Source Software Paper) [ abs ][ pdf ][ bib ]      [ code ]

Estimating the Carbon Footprint of BLOOM, a 176B Parameter Language Model Alexandra Sasha Luccioni, Sylvain Viguier, Anne-Laure Ligozat , 2023. [ abs ][ pdf ][ bib ]      [ code ]

Adaptive False Discovery Rate Control with Privacy Guarantee Xintao Xia, Zhanrui Cai , 2023. [ abs ][ pdf ][ bib ]

Atlas: Few-shot Learning with Retrieval Augmented Language Models Gautier Izacard, Patrick Lewis, Maria Lomeli, Lucas Hosseini, Fabio Petroni, Timo Schick, Jane Dwivedi-Yu, Armand Joulin, Sebastian Riedel, Edouard Grave , 2023. [ abs ][ pdf ][ bib ]      [ code ]

Convex Reinforcement Learning in Finite Trials Mirco Mutti, Riccardo De Santi, Piersilvio De Bartolomeis, Marcello Restelli , 2023. [ abs ][ pdf ][ bib ]

Unbiased Multilevel Monte Carlo Methods for Intractable Distributions: MLMC Meets MCMC Tianze Wang, Guanyang Wang , 2023. [ abs ][ pdf ][ bib ]      [ code ]

Improving multiple-try Metropolis with local balancing Philippe Gagnon, Florian Maire, Giacomo Zanella , 2023. [ abs ][ pdf ][ bib ]

Importance Sparsification for Sinkhorn Algorithm Mengyu Li, Jun Yu, Tao Li, Cheng Meng , 2023. [ abs ][ pdf ][ bib ]      [ code ]

Graph Attention Retrospective Kimon Fountoulakis, Amit Levi, Shenghao Yang, Aseem Baranwal, Aukosh Jagannath , 2023. [ abs ][ pdf ][ bib ]      [ code ]

Confidence Intervals and Hypothesis Testing for High-dimensional Quantile Regression: Convolution Smoothing and Debiasing Yibo Yan, Xiaozhou Wang, Riquan Zhang , 2023. [ abs ][ pdf ][ bib ]

Selection by Prediction with Conformal p-values Ying Jin, Emmanuel J. Candes , 2023. [ abs ][ pdf ][ bib ]      [ code ]

Alpha-divergence Variational Inference Meets Importance Weighted Auto-Encoders: Methodology and Asymptotics Kamélia Daudel, Joe Benton, Yuyang Shi, Arnaud Doucet , 2023. [ abs ][ pdf ][ bib ]

Sparse Graph Learning from Spatiotemporal Time Series Andrea Cini, Daniele Zambon, Cesare Alippi , 2023. [ abs ][ pdf ][ bib ]

Improved Powered Stochastic Optimization Algorithms for Large-Scale Machine Learning Zhuang Yang , 2023. [ abs ][ pdf ][ bib ]

PaLM: Scaling Language Modeling with Pathways Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann, Parker Schuh, Kensen Shi, Sasha Tsvyashchenko, Joshua Maynez, Abhishek Rao, Parker Barnes, Yi Tay, Noam Shazeer, Vinodkumar Prabhakaran, Emily Reif, Nan Du, Ben Hutchinson, Reiner Pope, James Bradbury, Jacob Austin, Michael Isard, Guy Gur-Ari, Pengcheng Yin, Toju Duke, Anselm Levskaya, Sanjay Ghemawat, Sunipa Dev, Henryk Michalewski, Xavier Garcia, Vedant Misra, Kevin Robinson, Liam Fedus, Denny Zhou, Daphne Ippolito, David Luan, Hyeontaek Lim, Barret Zoph, Alexander Spiridonov, Ryan Sepassi, David Dohan, Shivani Agrawal, Mark Omernick, Andrew M. Dai, Thanumalayan Sankaranarayana Pillai, Marie Pellat, Aitor Lewkowycz, Erica Moreira, Rewon Child, Oleksandr Polozov, Katherine Lee, Zongwei Zhou, Xuezhi Wang, Brennan Saeta, Mark Diaz, Orhan Firat, Michele Catasta, Jason Wei, Kathy Meier-Hellstern, Douglas Eck, Jeff Dean, Slav Petrov, Noah Fiedel , 2023. [ abs ][ pdf ][ bib ]

Leaky Hockey Stick Loss: The First Negatively Divergent Margin-based Loss Function for Classification Oh-Ran Kwon, Hui Zou , 2023. [ abs ][ pdf ][ bib ]      [ code ]

Efficient Computation of Rankings from Pairwise Comparisons M. E. J. Newman , 2023. [ abs ][ pdf ][ bib ]

Scalable Computation of Causal Bounds Madhumitha Shridharan, Garud Iyengar , 2023. [ abs ][ pdf ][ bib ]

Neural Q-learning for solving PDEs Samuel N. Cohen, Deqing Jiang, Justin Sirignano , 2023. [ abs ][ pdf ][ bib ]      [ code ]

Tractable and Near-Optimal Adversarial Algorithms for Robust Estimation in Contaminated Gaussian Models Ziyue Wang, Zhiqiang Tan , 2023. [ abs ][ pdf ][ bib ]      [ code ]

MultiZoo and MultiBench: A Standardized Toolkit for Multimodal Deep Learning Paul Pu Liang, Yiwei Lyu, Xiang Fan, Arav Agarwal, Yun Cheng, Louis-Philippe Morency, Ruslan Salakhutdinov , 2023. (Machine Learning Open Source Software Paper) [ abs ][ pdf ][ bib ]      [ code ]

Strategic Knowledge Transfer Max Olan Smith, Thomas Anthony, Michael P. Wellman , 2023. [ abs ][ pdf ][ bib ]

Lifted Bregman Training of Neural Networks Xiaoyu Wang, Martin Benning , 2023. [ abs ][ pdf ][ bib ]      [ code ]

Statistical Comparisons of Classifiers by Generalized Stochastic Dominance Christoph Jansen, Malte Nalenz, Georg Schollmeyer, Thomas Augustin , 2023. [ abs ][ pdf ][ bib ]

Sample Complexity for Distributionally Robust Learning under chi-square divergence Zhengyu Zhou, Weiwei Liu , 2023. [ abs ][ pdf ][ bib ]

Interpretable and Fair Boolean Rule Sets via Column Generation Connor Lawless, Sanjeeb Dash, Oktay Gunluk, Dennis Wei , 2023. [ abs ][ pdf ][ bib ]

On the Optimality of Nuclear-norm-based Matrix Completion for Problems with Smooth Non-linear Structure Yunhua Xiang, Tianyu Zhang, Xu Wang, Ali Shojaie, Noah Simon , 2023. [ abs ][ pdf ][ bib ]

Autoregressive Networks Binyan Jiang, Jialiang Li, Qiwei Yao , 2023. [ abs ][ pdf ][ bib ]

Merlion: End-to-End Machine Learning for Time Series Aadyot Bhatnagar, Paul Kassianik, Chenghao Liu, Tian Lan, Wenzhuo Yang, Rowan Cassius, Doyen Sahoo, Devansh Arpit, Sri Subramanian, Gerald Woo, Amrita Saha, Arun Kumar Jagota, Gokulakrishnan Gopalakrishnan, Manpreet Singh, K C Krithika, Sukumar Maddineni, Daeki Cho, Bo Zong, Yingbo Zhou, Caiming Xiong, Silvio Savarese, Steven Hoi, Huan Wang , 2023. (Machine Learning Open Source Software Paper) [ abs ][ pdf ][ bib ]      [ code ]

Limits of Dense Simplicial Complexes T. Mitchell Roddenberry, Santiago Segarra , 2023. [ abs ][ pdf ][ bib ]

RankSEG: A Consistent Ranking-based Framework for Segmentation Ben Dai, Chunlin Li , 2023. [ abs ][ pdf ][ bib ]      [ code ]

Conditional Distribution Function Estimation Using Neural Networks for Censored and Uncensored Data Bingqing Hu, Bin Nan , 2023. [ abs ][ pdf ][ bib ]      [ code ]

Single Timescale Actor-Critic Method to Solve the Linear Quadratic Regulator with Convergence Guarantees Mo Zhou, Jianfeng Lu , 2023. [ abs ][ pdf ][ bib ]      [ code ]

Multi-source Learning via Completion of Block-wise Overlapping Noisy Matrices Doudou Zhou, Tianxi Cai, Junwei Lu , 2023. [ abs ][ pdf ][ bib ]      [ code ]

A Unified Framework for Factorizing Distributional Value Functions for Multi-Agent Reinforcement Learning Wei-Fang Sun, Cheng-Kuang Lee, Simon See, Chun-Yi Lee , 2023. [ abs ][ pdf ][ bib ]      [ code ]

Functional L-Optimality Subsampling for Functional Generalized Linear Models with Massive Data Hua Liu, Jinhong You, Jiguo Cao , 2023. [ abs ][ pdf ][ bib ]      [ code ]

Adaptation Augmented Model-based Policy Optimization Jian Shen, Hang Lai, Minghuan Liu, Han Zhao, Yong Yu, Weinan Zhang , 2023. [ abs ][ pdf ][ bib ]

GANs as Gradient Flows that Converge Yu-Jui Huang, Yuchong Zhang , 2023. [ abs ][ pdf ][ bib ]

Random Forests for Change Point Detection Malte Londschien, Peter Bühlmann, Solt Kovács , 2023. [ abs ][ pdf ][ bib ]      [ code ]

Least Squares Model Averaging for Distributed Data Haili Zhang, Zhaobo Liu, Guohua Zou , 2023. [ abs ][ pdf ][ bib ]

An Empirical Investigation of the Role of Pre-training in Lifelong Learning Sanket Vaibhav Mehta, Darshan Patil, Sarath Chandar, Emma Strubell , 2023. [ abs ][ pdf ][ bib ]      [ code ]

Polynomial-Time Algorithms for Counting and Sampling Markov Equivalent DAGs with Applications Marcel Wienöbst, Max Bannach, Maciej Liśkiewicz , 2023. [ abs ][ pdf ][ bib ]      [ code ]

An Inexact Augmented Lagrangian Algorithm for Training Leaky ReLU Neural Network with Group Sparsity Wei Liu, Xin Liu, Xiaojun Chen , 2023. [ abs ][ pdf ][ bib ]

Entropic Fictitious Play for Mean Field Optimization Problem Fan Chen, Zhenjie Ren, Songbo Wang , 2023. [ abs ][ pdf ][ bib ]

GFlowNet Foundations Yoshua Bengio, Salem Lahlou, Tristan Deleu, Edward J. Hu, Mo Tiwari, Emmanuel Bengio , 2023. [ abs ][ pdf ][ bib ]

LibMTL: A Python Library for Deep Multi-Task Learning Baijiong Lin, Yu Zhang , 2023. (Machine Learning Open Source Software Paper) [ abs ][ pdf ][ bib ]      [ code ]

Minimax Risk Classifiers with 0-1 Loss Santiago Mazuelas, Mauricio Romero, Peter Grunwald , 2023. [ abs ][ pdf ][ bib ]

Augmented Sparsifiers for Generalized Hypergraph Cuts Nate Veldt, Austin R. Benson, Jon Kleinberg , 2023. [ abs ][ pdf ][ bib ]      [ code ]

Non-stationary Online Learning with Memory and Non-stochastic Control Peng Zhao, Yu-Hu Yan, Yu-Xiang Wang, Zhi-Hua Zhou , 2023. [ abs ][ pdf ][ bib ]

L0Learn: A Scalable Package for Sparse Learning using L0 Regularization Hussein Hazimeh, Rahul Mazumder, Tim Nonet , 2023. (Machine Learning Open Source Software Paper) [ abs ][ pdf ][ bib ]      [ code ]

Buffered Asynchronous SGD for Byzantine Learning Yi-Rui Yang, Wu-Jun Li , 2023. [ abs ][ pdf ][ bib ]

A Non-parametric View of FedAvg and FedProx:Beyond Stationary Points Lili Su, Jiaming Xu, Pengkun Yang , 2023. [ abs ][ pdf ][ bib ]

Multiplayer Performative Prediction: Learning in Decision-Dependent Games Adhyyan Narang, Evan Faulkner, Dmitriy Drusvyatskiy, Maryam Fazel, Lillian J. Ratliff , 2023. [ abs ][ pdf ][ bib ]      [ code ]

Variational Inverting Network for Statistical Inverse Problems of Partial Differential Equations Junxiong Jia, Yanni Wu, Peijun Li, Deyu Meng , 2023. [ abs ][ pdf ][ bib ]

Model-based Causal Discovery for Zero-Inflated Count Data Junsouk Choi, Yang Ni , 2023. [ abs ][ pdf ][ bib ]      [ code ]

Q-Learning for MDPs with General Spaces: Convergence and Near Optimality via Quantization under Weak Continuity Ali Kara, Naci Saldi, Serdar Yüksel , 2023. [ abs ][ pdf ][ bib ]

CodaLab Competitions: An Open Source Platform to Organize Scientific Challenges Adrien Pavao, Isabelle Guyon, Anne-Catherine Letournel, Dinh-Tuan Tran, Xavier Baro, Hugo Jair Escalante, Sergio Escalera, Tyler Thomas, Zhen Xu , 2023. (Machine Learning Open Source Software Paper) [ abs ][ pdf ][ bib ]      [ code ]

Contrasting Identifying Assumptions of Average Causal Effects: Robustness and Semiparametric Efficiency Tetiana Gorbach, Xavier de Luna, Juha Karvanen, Ingeborg Waernbaum , 2023. [ abs ][ pdf ][ bib ]      [ code ]

Variational Gibbs Inference for Statistical Model Estimation from Incomplete Data Vaidotas Simkus, Benjamin Rhodes, Michael U. Gutmann , 2023. [ abs ][ pdf ][ bib ]      [ code ]

Clustering and Structural Robustness in Causal Diagrams Santtu Tikka, Jouni Helske, Juha Karvanen , 2023. [ abs ][ pdf ][ bib ]      [ code ]

MMD Aggregated Two-Sample Test Antonin Schrab, Ilmun Kim, Mélisande Albert, Béatrice Laurent, Benjamin Guedj, Arthur Gretton , 2023. [ abs ][ pdf ][ bib ]      [ code ]

Divide-and-Conquer Fusion Ryan S.Y. Chan, Murray Pollock, Adam M. Johansen, Gareth O. Roberts , 2023. [ abs ][ pdf ][ bib ]

PAC-learning for Strategic Classification Ravi Sundaram, Anil Vullikanti, Haifeng Xu, Fan Yao , 2023. [ abs ][ pdf ][ bib ]

Insights into Ordinal Embedding Algorithms: A Systematic Evaluation Leena Chennuru Vankadara, Michael Lohaus, Siavash Haghiri, Faiz Ul Wahab, Ulrike von Luxburg , 2023. [ abs ][ pdf ][ bib ]      [ code ]

Clustering with Tangles: Algorithmic Framework and Theoretical Guarantees Solveig Klepper, Christian Elbracht, Diego Fioravanti, Jakob Kneip, Luca Rendsburg, Maximilian Teegen, Ulrike von Luxburg , 2023. [ abs ][ pdf ][ bib ]      [ code ]

Random Feature Neural Networks Learn Black-Scholes Type PDEs Without Curse of Dimensionality Lukas Gonon , 2023. [ abs ][ pdf ][ bib ]

The Proximal ID Algorithm Ilya Shpitser, Zach Wood-Doughty, Eric J. Tchetgen Tchetgen , 2023. [ abs ][ pdf ][ bib ]      [ code ]

Quantifying Network Similarity using Graph Cumulants Gecia Bravo-Hermsdorff, Lee M. Gunderson, Pierre-André Maugis, Carey E. Priebe , 2023. [ abs ][ pdf ][ bib ]      [ code ]

Learning an Explicit Hyper-parameter Prediction Function Conditioned on Tasks Jun Shu, Deyu Meng, Zongben Xu , 2023. [ abs ][ pdf ][ bib ]      [ code ]

On the Theoretical Equivalence of Several Trade-Off Curves Assessing Statistical Proximity Rodrigue Siry, Ryan Webster, Loic Simon, Julien Rabin , 2023. [ abs ][ pdf ][ bib ]

Metrizing Weak Convergence with Maximum Mean Discrepancies Carl-Johann Simon-Gabriel, Alessandro Barp, Bernhard Schölkopf, Lester Mackey , 2023. [ abs ][ pdf ][ bib ]

Quasi-Equivalence between Width and Depth of Neural Networks Fenglei Fan, Rongjie Lai, Ge Wang , 2023. [ abs ][ pdf ][ bib ]

Naive regression requires weaker assumptions than factor models to adjust for multiple cause confounding Justin Grimmer, Dean Knox, Brandon Stewart , 2023. [ abs ][ pdf ][ bib ]

Factor Graph Neural Networks Zhen Zhang, Mohammed Haroon Dupty, Fan Wu, Javen Qinfeng Shi, Wee Sun Lee , 2023. [ abs ][ pdf ][ bib ]      [ code ]

Dropout Training is Distributionally Robust Optimal José Blanchet, Yang Kang, José Luis Montiel Olea, Viet Anh Nguyen, Xuhui Zhang , 2023. [ abs ][ pdf ][ bib ]

Variational Inference for Deblending Crowded Starfields Runjing Liu, Jon D. McAuliffe, Jeffrey Regier, The LSST Dark Energy Science Collaboration , 2023. [ abs ][ pdf ][ bib ]      [ code ]

F2A2: Flexible Fully-decentralized Approximate Actor-critic for Cooperative Multi-agent Reinforcement Learning Wenhao Li, Bo Jin, Xiangfeng Wang, Junchi Yan, Hongyuan Zha , 2023. [ abs ][ pdf ][ bib ]

Comprehensive Algorithm Portfolio Evaluation using Item Response Theory Sevvandi Kandanaarachchi, Kate Smith-Miles , 2023. [ abs ][ pdf ][ bib ]      [ code ]

Evaluating Instrument Validity using the Principle of Independent Mechanisms Patrick F. Burauel , 2023. [ abs ][ pdf ][ bib ]

Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal Sample Complexity Kaiqing Zhang, Sham M. Kakade, Tamer Basar, Lin F. Yang , 2023. [ abs ][ pdf ][ bib ]

Posterior Consistency for Bayesian Relevance Vector Machines Xiao Fang, Malay Ghosh , 2023. [ abs ][ pdf ][ bib ]

From Classification Accuracy to Proper Scoring Rules: Elicitability of Probabilistic Top List Predictions Johannes Resin , 2023. [ abs ][ pdf ][ bib ]

Beyond the Golden Ratio for Variational Inequality Algorithms Ahmet Alacaoglu, Axel Böhm, Yura Malitsky , 2023. [ abs ][ pdf ][ bib ]      [ code ]

Incremental Learning in Diagonal Linear Networks Raphaël Berthier , 2023. [ abs ][ pdf ][ bib ]

Small Transformers Compute Universal Metric Embeddings Anastasis Kratsios, Valentin Debarnot, Ivan Dokmanić , 2023. [ abs ][ pdf ][ bib ]      [ code ]

DART: Distance Assisted Recursive Testing Xuechan Li, Anthony D. Sung, Jichun Xie , 2023. [ abs ][ pdf ][ bib ]

Inference on the Change Point under a High Dimensional Covariance Shift Abhishek Kaul, Hongjin Zhang, Konstantinos Tsampourakis, George Michailidis , 2023. [ abs ][ pdf ][ bib ]

Bilevel Optimization with a Lower-level Contraction: Optimal Sample Complexity without Warm-Start Riccardo Grazzi, Massimiliano Pontil, Saverio Salzo , 2023. [ abs ][ pdf ][ bib ]      [ code ]

A Parameter-Free Conditional Gradient Method for Composite Minimization under Hölder Condition Masaru Ito, Zhaosong Lu, Chuan He , 2023. [ abs ][ pdf ][ bib ]

Robust Methods for High-Dimensional Linear Learning Ibrahim Merad, Stéphane Gaïffas , 2023. [ abs ][ pdf ][ bib ]

A Framework and Benchmark for Deep Batch Active Learning for Regression David Holzmüller, Viktor Zaverkin, Johannes Kästner, Ingo Steinwart , 2023. [ abs ][ pdf ][ bib ]      [ code ]

Preconditioned Gradient Descent for Overparameterized Nonconvex Burer--Monteiro Factorization with Global Optimality Certification Gavin Zhang, Salar Fattahi, Richard Y. Zhang , 2023. [ abs ][ pdf ][ bib ]

Flexible Model Aggregation for Quantile Regression Rasool Fakoor, Taesup Kim, Jonas Mueller, Alexander J. Smola, Ryan J. Tibshirani , 2023. [ abs ][ pdf ][ bib ]      [ code ]

q-Learning in Continuous Time Yanwei Jia, Xun Yu Zhou , 2023. [ abs ][ pdf ][ bib ]      [ code ]

Multivariate Soft Rank via Entropy-Regularized Optimal Transport: Sample Efficiency and Generative Modeling Shoaib Bin Masud, Matthew Werenski, James M. Murphy, Shuchin Aeron , 2023. [ abs ][ pdf ][ bib ]      [ code ]

Infinite-dimensional optimization and Bayesian nonparametric learning of stochastic differential equations Arnab Ganguly, Riten Mitra, Jinpu Zhou , 2023. [ abs ][ pdf ][ bib ]

Asynchronous Iterations in Optimization: New Sequence Results and Sharper Algorithmic Guarantees Hamid Reza Feyzmahdavian, Mikael Johansson , 2023. [ abs ][ pdf ][ bib ]

Restarted Nonconvex Accelerated Gradient Descent: No More Polylogarithmic Factor in the in the O(epsilon^(-7/4)) Complexity Huan Li, Zhouchen Lin , 2023. [ abs ][ pdf ][ bib ]      [ code ]

Integrating Random Effects in Deep Neural Networks Giora Simchoni, Saharon Rosset , 2023. [ abs ][ pdf ][ bib ]      [ code ]

Adaptive Data Depth via Multi-Armed Bandits Tavor Baharav, Tze Leung Lai , 2023. [ abs ][ pdf ][ bib ]

Adapting and Evaluating Influence-Estimation Methods for Gradient-Boosted Decision Trees Jonathan Brophy, Zayd Hammoudeh, Daniel Lowd , 2023. [ abs ][ pdf ][ bib ]      [ code ]

Consistent Model-based Clustering using the Quasi-Bernoulli Stick-breaking Process Cheng Zeng, Jeffrey W Miller, Leo L Duan , 2023. [ abs ][ pdf ][ bib ]      [ code ]

Selective inference for k-means clustering Yiqun T. Chen, Daniela M. Witten , 2023. [ abs ][ pdf ][ bib ]      [ code ]

Generalization error bounds for multiclass sparse linear classifiers Tomer Levy, Felix Abramovich , 2023. [ abs ][ pdf ][ bib ]

MALib: A Parallel Framework for Population-based Multi-agent Reinforcement Learning Ming Zhou, Ziyu Wan, Hanjing Wang, Muning Wen, Runzhe Wu, Ying Wen, Yaodong Yang, Yong Yu, Jun Wang, Weinan Zhang , 2023. (Machine Learning Open Source Software Paper) [ abs ][ pdf ][ bib ]      [ code ]

Controlling Wasserstein Distances by Kernel Norms with Application to Compressive Statistical Learning Titouan Vayer, Rémi Gribonval , 2023. [ abs ][ pdf ][ bib ]

Fast Objective & Duality Gap Convergence for Non-Convex Strongly-Concave Min-Max Problems with PL Condition Zhishuai Guo, Yan Yan, Zhuoning Yuan, Tianbao Yang , 2023. [ abs ][ pdf ][ bib ]

Stochastic Optimization under Distributional Drift Joshua Cutler, Dmitriy Drusvyatskiy, Zaid Harchaoui , 2023. [ abs ][ pdf ][ bib ]

Off-Policy Actor-Critic with Emphatic Weightings Eric Graves, Ehsan Imani, Raksha Kumaraswamy, Martha White , 2023. [ abs ][ pdf ][ bib ]      [ code ]

Memory-Based Optimization Methods for Model-Agnostic Meta-Learning and Personalized Federated Learning Bokun Wang, Zhuoning Yuan, Yiming Ying, Tianbao Yang , 2023. [ abs ][ pdf ][ bib ]      [ code ]

Escaping The Curse of Dimensionality in Bayesian Model-Based Clustering Noirrit Kiran Chandra, Antonio Canale, David B. Dunson , 2023. [ abs ][ pdf ][ bib ]

Large sample spectral analysis of graph-based multi-manifold clustering Nicolas Garcia Trillos, Pengfei He, Chenghui Li , 2023. [ abs ][ pdf ][ bib ]      [ code ]

On Tilted Losses in Machine Learning: Theory and Applications Tian Li, Ahmad Beirami, Maziar Sanjabi, Virginia Smith , 2023. [ abs ][ pdf ][ bib ]      [ code ]

Optimal Convergence Rates for Distributed Nystroem Approximation Jian Li, Yong Liu, Weiping Wang , 2023. [ abs ][ pdf ][ bib ]      [ code ]

Jump Interval-Learning for Individualized Decision Making with Continuous Treatments Hengrui Cai, Chengchun Shi, Rui Song, Wenbin Lu , 2023. [ abs ][ pdf ][ bib ]      [ code ]

Policy Gradient Methods Find the Nash Equilibrium in N-player General-sum Linear-quadratic Games Ben Hambly, Renyuan Xu, Huining Yang , 2023. [ abs ][ pdf ][ bib ]

Asymptotics of Network Embeddings Learned via Subsampling Andrew Davison, Morgane Austern , 2023. [ abs ][ pdf ][ bib ]      [ code ]

Implicit Bias of Gradient Descent for Mean Squared Error Regression with Two-Layer Wide Neural Networks Hui Jin, Guido Montufar , 2023. [ abs ][ pdf ][ bib ]      [ code ]

Dimension Reduction in Contextual Online Learning via Nonparametric Variable Selection Wenhao Li, Ningyuan Chen, L. Jeff Hong , 2023. [ abs ][ pdf ][ bib ]

Sparse GCA and Thresholded Gradient Descent Sheng Gao, Zongming Ma , 2023. [ abs ][ pdf ][ bib ]

MARS: A Second-Order Reduction Algorithm for High-Dimensional Sparse Precision Matrices Estimation Qian Li, Binyan Jiang, Defeng Sun , 2023. [ abs ][ pdf ][ bib ]

Exploiting Discovered Regression Discontinuities to Debias Conditioned-on-observable Estimators Benjamin Jakubowski, Sriram Somanchi, Edward McFowland III, Daniel B. Neill , 2023. [ abs ][ pdf ][ bib ]      [ code ]

Generalized Linear Models in Non-interactive Local Differential Privacy with Public Data Di Wang, Lijie Hu, Huanyu Zhang, Marco Gaboardi, Jinhui Xu , 2023. [ abs ][ pdf ][ bib ]

A Rigorous Information-Theoretic Definition of Redundancy and Relevancy in Feature Selection Based on (Partial) Information Decomposition Patricia Wollstadt, Sebastian Schmitt, Michael Wibral , 2023. [ abs ][ pdf ][ bib ]

Combinatorial Optimization and Reasoning with Graph Neural Networks Quentin Cappart, Didier Chételat, Elias B. Khalil, Andrea Lodi, Christopher Morris, Petar Veličković , 2023. [ abs ][ pdf ][ bib ]

A First Look into the Carbon Footprint of Federated Learning Xinchi Qiu, Titouan Parcollet, Javier Fernandez-Marques, Pedro P. B. Gusmao, Yan Gao, Daniel J. Beutel, Taner Topal, Akhil Mathur, Nicholas D. Lane , 2023. [ abs ][ pdf ][ bib ]

An Eigenmodel for Dynamic Multilayer Networks Joshua Daniel Loyal, Yuguo Chen , 2023. [ abs ][ pdf ][ bib ]      [ code ]

Graph Clustering with Graph Neural Networks Anton Tsitsulin, John Palowitch, Bryan Perozzi, Emmanuel Müller , 2023. [ abs ][ pdf ][ bib ]      [ code ]

Euler-Lagrange Analysis of Generative Adversarial Networks Siddarth Asokan, Chandra Sekhar Seelamantula , 2023. [ abs ][ pdf ][ bib ]      [ code ]

Statistical Robustness of Empirical Risks in Machine Learning Shaoyan Guo, Huifu Xu, Liwei Zhang , 2023. [ abs ][ pdf ][ bib ]

HiGrad: Uncertainty Quantification for Online Learning and Stochastic Approximation Weijie J. Su, Yuancheng Zhu , 2023. [ abs ][ pdf ][ bib ]

Benign overfitting in ridge regression Alexander Tsigler, Peter L. Bartlett , 2023. [ abs ][ pdf ][ bib ]

Compute-Efficient Deep Learning: Algorithmic Trends and Opportunities Brian R. Bartoldson, Bhavya Kailkhura, Davis Blalock , 2023. [ abs ][ pdf ][ bib ]

Minimal Width for Universal Property of Deep RNN Chang hoon Song, Geonho Hwang, Jun ho Lee, Myungjoo Kang , 2023. [ abs ][ pdf ][ bib ]

Maximum likelihood estimation in Gaussian process regression is ill-posed Toni Karvonen, Chris J. Oates , 2023. [ abs ][ pdf ][ bib ]

An Annotated Graph Model with Differential Degree Heterogeneity for Directed Networks Stefan Stein, Chenlei Leng , 2023. [ abs ][ pdf ][ bib ]

A Unified Framework for Optimization-Based Graph Coarsening Manoj Kumar, Anurag Sharma, Sandeep Kumar , 2023. [ abs ][ pdf ][ bib ]      [ code ]

Deep linear networks can benignly overfit when shallow ones do Niladri S. Chatterji, Philip M. Long , 2023. [ abs ][ pdf ][ bib ]      [ code ]

SQLFlow: An Extensible Toolkit Integrating DB and AI Jun Zhou, Ke Zhang, Lin Wang, Hua Wu, Yi Wang, ChaoChao Chen , 2023. (Machine Learning Open Source Software Paper) [ abs ][ pdf ][ bib ]      [ code ]

Learning Good State and Action Representations for Markov Decision Process via Tensor Decomposition Chengzhuo Ni, Yaqi Duan, Munther Dahleh, Mengdi Wang, Anru R. Zhang , 2023. [ abs ][ pdf ][ bib ]

Generalization Bounds for Adversarial Contrastive Learning Xin Zou, Weiwei Liu , 2023. [ abs ][ pdf ][ bib ]

The Implicit Bias of Benign Overfitting Ohad Shamir , 2023. [ abs ][ pdf ][ bib ]

The Hyperspherical Geometry of Community Detection: Modularity as a Distance Martijn Gösgens, Remco van der Hofstad, Nelly Litvak , 2023. [ abs ][ pdf ][ bib ]      [ code ]

FLIP: A Utility Preserving Privacy Mechanism for Time Series Tucker McElroy, Anindya Roy, Gaurab Hore , 2023. [ abs ][ pdf ][ bib ]

A General Theory for Federated Optimization with Asynchronous and Heterogeneous Clients Updates Yann Fraboni, Richard Vidal, Laetitia Kameni, Marco Lorenzi , 2023. [ abs ][ pdf ][ bib ]      [ code ]

Dimensionless machine learning: Imposing exact units equivariance Soledad Villar, Weichi Yao, David W. Hogg, Ben Blum-Smith, Bianca Dumitrascu , 2023. [ abs ][ pdf ][ bib ]

Bayesian Calibration of Imperfect Computer Models using Physics-Informed Priors Michail Spitieris, Ingelin Steinsland , 2023. [ abs ][ pdf ][ bib ]      [ code ]

Risk Bounds for Positive-Unlabeled Learning Under the Selected At Random Assumption Olivier Coudray, Christine Keribin, Pascal Massart, Patrick Pamphile , 2023. [ abs ][ pdf ][ bib ]

Concentration analysis of multivariate elliptic diffusions Lukas Trottner, Cathrine Aeckerle-Willems, Claudia Strauch , 2023. [ abs ][ pdf ][ bib ]

Knowledge Hypergraph Embedding Meets Relational Algebra Bahare Fatemi, Perouz Taslakian, David Vazquez, David Poole , 2023. [ abs ][ pdf ][ bib ]      [ code ]

Intrinsic Gaussian Process on Unknown Manifolds with Probabilistic Metrics Mu Niu, Zhenwen Dai, Pokman Cheung, Yizhu Wang , 2023. [ abs ][ pdf ][ bib ]

Sparse Training with Lipschitz Continuous Loss Functions and a Weighted Group L0-norm Constraint Michael R. Metel , 2023. [ abs ][ pdf ][ bib ]

Learning Optimal Group-structured Individualized Treatment Rules with Many Treatments Haixu Ma, Donglin Zeng, Yufeng Liu , 2023. [ abs ][ pdf ][ bib ]

Inference for Gaussian Processes with Matern Covariogram on Compact Riemannian Manifolds Didong Li, Wenpin Tang, Sudipto Banerjee , 2023. [ abs ][ pdf ][ bib ]

FedLab: A Flexible Federated Learning Framework Dun Zeng, Siqi Liang, Xiangjing Hu, Hui Wang, Zenglin Xu , 2023. (Machine Learning Open Source Software Paper) [ abs ][ pdf ][ bib ]      [ code ]

Connectivity Matters: Neural Network Pruning Through the Lens of Effective Sparsity Artem Vysogorets, Julia Kempe , 2023. [ abs ][ pdf ][ bib ]

An Analysis of Robustness of Non-Lipschitz Networks Maria-Florina Balcan, Avrim Blum, Dravyansh Sharma, Hongyang Zhang , 2023. [ abs ][ pdf ][ bib ]      [ code ]

Fitting Autoregressive Graph Generative Models through Maximum Likelihood Estimation Xu Han, Xiaohui Chen, Francisco J. R. Ruiz, Li-Ping Liu , 2023. [ abs ][ pdf ][ bib ]      [ code ]

Global Convergence of Sub-gradient Method for Robust Matrix Recovery: Small Initialization, Noisy Measurements, and Over-parameterization Jianhao Ma, Salar Fattahi , 2023. [ abs ][ pdf ][ bib ]

Statistical Inference for Noisy Incomplete Binary Matrix Yunxiao Chen, Chengcheng Li, Jing Ouyang, Gongjun Xu , 2023. [ abs ][ pdf ][ bib ]

Faith-Shap: The Faithful Shapley Interaction Index Che-Ping Tsai, Chih-Kuan Yeh, Pradeep Ravikumar , 2023. [ abs ][ pdf ][ bib ]

Decentralized Learning: Theoretical Optimality and Practical Improvements Yucheng Lu, Christopher De Sa , 2023. [ abs ][ pdf ][ bib ]

Non-Asymptotic Guarantees for Robust Statistical Learning under Infinite Variance Assumption Lihu Xu, Fang Yao, Qiuran Yao, Huiming Zhang , 2023. [ abs ][ pdf ][ bib ]

Recursive Quantile Estimation: Non-Asymptotic Confidence Bounds Likai Chen, Georg Keilbar, Wei Biao Wu , 2023. [ abs ][ pdf ][ bib ]

Outlier-Robust Subsampling Techniques for Persistent Homology Bernadette J. Stolz , 2023. [ abs ][ pdf ][ bib ]      [ code ]

Neural Operator: Learning Maps Between Function Spaces With Applications to PDEs Nikola Kovachki, Zongyi Li, Burigede Liu, Kamyar Azizzadenesheli, Kaushik Bhattacharya, Andrew Stuart, Anima Anandkumar , 2023. [ abs ][ pdf ][ bib ]      [ code ]

Dimension-Grouped Mixed Membership Models for Multivariate Categorical Data Yuqi Gu, Elena E. Erosheva, Gongjun Xu, David B. Dunson , 2023. [ abs ][ pdf ][ bib ]

Gaussian Processes with Errors in Variables: Theory and Computation Shuang Zhou, Debdeep Pati, Tianying Wang, Yun Yang, Raymond J. Carroll , 2023. [ abs ][ pdf ][ bib ]

Learning Partial Differential Equations in Reproducing Kernel Hilbert Spaces George Stepaniants , 2023. [ abs ][ pdf ][ bib ]      [ code ]

Doubly Robust Stein-Kernelized Monte Carlo Estimator: Simultaneous Bias-Variance Reduction and Supercanonical Convergence Henry Lam, Haofeng Zhang , 2023. [ abs ][ pdf ][ bib ]

Online Optimization over Riemannian Manifolds Xi Wang, Zhipeng Tu, Yiguang Hong, Yingyi Wu, Guodong Shi , 2023. [ abs ][ pdf ][ bib ]      [ code ]

Bayes-Newton Methods for Approximate Bayesian Inference with PSD Guarantees William J. Wilkinson, Simo Särkkä, Arno Solin , 2023. [ abs ][ pdf ][ bib ]      [ code ]

Iterated Block Particle Filter for High-dimensional Parameter Learning: Beating the Curse of Dimensionality Ning Ning, Edward L. Ionides , 2023. [ abs ][ pdf ][ bib ]

Fast Online Changepoint Detection via Functional Pruning CUSUM Statistics Gaetano Romano, Idris A. Eckley, Paul Fearnhead, Guillem Rigaill , 2023. [ abs ][ pdf ][ bib ]      [ code ]

Temporal Abstraction in Reinforcement Learning with the Successor Representation Marlos C. Machado, Andre Barreto, Doina Precup, Michael Bowling , 2023. [ abs ][ pdf ][ bib ]

Approximate Post-Selective Inference for Regression with the Group LASSO Snigdha Panigrahi, Peter W MacDonald, Daniel Kessler , 2023. [ abs ][ pdf ][ bib ]

Towards Learning to Imitate from a Single Video Demonstration Glen Berseth, Florian Golemo, Christopher Pal , 2023. [ abs ][ pdf ][ bib ]

A Likelihood Approach to Nonparametric Estimation of a Singular Distribution Using Deep Generative Models Minwoo Chae, Dongha Kim, Yongdai Kim, Lizhen Lin , 2023. [ abs ][ pdf ][ bib ]

A Randomized Subspace-based Approach for Dimensionality Reduction and Important Variable Selection Di Bo, Hoon Hwangbo, Vinit Sharma, Corey Arndt, Stephanie TerMaath , 2023. [ abs ][ pdf ][ bib ]

Intrinsic Persistent Homology via Density-based Metric Learning Ximena Fernández, Eugenio Borghini, Gabriel Mindlin, Pablo Groisman , 2023. [ abs ][ pdf ][ bib ]      [ code ]

Privacy-Aware Rejection Sampling Jordan Awan, Vinayak Rao , 2023. [ abs ][ pdf ][ bib ]

Inference for a Large Directed Acyclic Graph with Unspecified Interventions Chunlin Li, Xiaotong Shen, Wei Pan , 2023. [ abs ][ pdf ][ bib ]      [ code ]

How Do You Want Your Greedy: Simultaneous or Repeated? Moran Feldman, Christopher Harshaw, Amin Karbasi , 2023. [ abs ][ pdf ][ bib ]      [ code ]

Kernel-Matrix Determinant Estimates from stopped Cholesky Decomposition Simon Bartels, Wouter Boomsma, Jes Frellsen, Damien Garreau , 2023. [ abs ][ pdf ][ bib ]      [ code ]

Optimizing ROC Curves with a Sort-Based Surrogate Loss for Binary Classification and Changepoint Detection Jonathan Hillman, Toby Dylan Hocking , 2023. [ abs ][ pdf ][ bib ]      [ code ]

When Locally Linear Embedding Hits Boundary Hau-Tieng Wu, Nan Wu , 2023. [ abs ][ pdf ][ bib ]

Distributed Nonparametric Regression Imputation for Missing Response Problems with Large-scale Data Ruoyu Wang, Miaomiao Su, Qihua Wang , 2023. [ abs ][ pdf ][ bib ]      [ code ]

Prior Specification for Bayesian Matrix Factorization via Prior Predictive Matching Eliezer de Souza da Silva, Tomasz Kuśmierczyk, Marcelo Hartmann, Arto Klami , 2023. [ abs ][ pdf ][ bib ]      [ code ]

Posterior Contraction for Deep Gaussian Process Priors Gianluca Finocchio, Johannes Schmidt-Hieber , 2023. [ abs ][ pdf ][ bib ]

Wide-minima Density Hypothesis and the Explore-Exploit Learning Rate Schedule Nikhil Iyer, V. Thejas, Nipun Kwatra, Ramachandran Ramjee, Muthian Sivathanu , 2023. [ abs ][ pdf ][ bib ]      [ code ]

Fundamental limits and algorithms for sparse linear regression with sublinear sparsity Lan V. Truong , 2023. [ abs ][ pdf ][ bib ]      [ code ]

On the Complexity of SHAP-Score-Based Explanations: Tractability via Knowledge Compilation and Non-Approximability Results Marcelo Arenas, Pablo Barcelo, Leopoldo Bertossi, Mikael Monet , 2023. [ abs ][ pdf ][ bib ]

Monotonic Alpha-divergence Minimisation for Variational Inference Kamélia Daudel, Randal Douc, François Roueff , 2023. [ abs ][ pdf ][ bib ]

Density estimation on low-dimensional manifolds: an inflation-deflation approach Christian Horvat, Jean-Pascal Pfister , 2023. [ abs ][ pdf ][ bib ]      [ code ]

Provably Sample-Efficient Model-Free Algorithm for MDPs with Peak Constraints Qinbo Bai, Vaneet Aggarwal, Ather Gattami , 2023. [ abs ][ pdf ][ bib ]

Topological Convolutional Layers for Deep Learning Ephy R. Love, Benjamin Filippenko, Vasileios Maroulas, Gunnar Carlsson , 2023. [ abs ][ pdf ][ bib ]

Online Stochastic Gradient Descent with Arbitrary Initialization Solves Non-smooth, Non-convex Phase Retrieval Yan Shuo Tan, Roman Vershynin , 2023. [ abs ][ pdf ][ bib ]

Tree-AMP: Compositional Inference with Tree Approximate Message Passing Antoine Baker, Florent Krzakala, Benjamin Aubin, Lenka Zdeborová , 2023. [ abs ][ pdf ][ bib ]      [ code ]

On the geometry of Stein variational gradient descent Andrew Duncan, Nikolas Nüsken, Lukasz Szpruch , 2023. [ abs ][ pdf ][ bib ]

Kernel-based estimation for partially functional linear model: Minimax rates and randomized sketches Shaogao Lv, Xin He, Junhui Wang , 2023. [ abs ][ pdf ][ bib ]

Contextual Stochastic Block Model: Sharp Thresholds and Contiguity Chen Lu, Subhabrata Sen , 2023. [ abs ][ pdf ][ bib ]

VCG Mechanism Design with Unknown Agent Values under Stochastic Bandit Feedback Kirthevasan Kandasamy, Joseph E Gonzalez, Michael I Jordan, Ion Stoica , 2023. [ abs ][ pdf ][ bib ]

Necessary and Sufficient Conditions for Inverse Reinforcement Learning of Bayesian Stopping Time Problems Kunal Pattanayak, Vikram Krishnamurthy , 2023. [ abs ][ pdf ][ bib ]

Online Change-Point Detection in High-Dimensional Covariance Structure with Application to Dynamic Networks Lingjun Li, Jun Li , 2023. [ abs ][ pdf ][ bib ]

Convergence Rates of a Class of Multivariate Density Estimation Methods Based on Adaptive Partitioning Linxi Liu, Dangna Li, Wing Hung Wong , 2023. [ abs ][ pdf ][ bib ]

Reinforcement Learning for Joint Optimization of Multiple Rewards Mridul Agarwal, Vaneet Aggarwal , 2023. [ abs ][ pdf ][ bib ]

On the Convergence of Stochastic Gradient Descent with Bandwidth-based Step Size Xiaoyu Wang, Ya-xiang Yuan , 2023. [ abs ][ pdf ][ bib ]

A Group-Theoretic Approach to Computational Abstraction: Symmetry-Driven Hierarchical Clustering Haizi Yu, Igor Mineyev, Lav R. Varshney , 2023. [ abs ][ pdf ][ bib ]

The d-Separation Criterion in Categorical Probability Tobias Fritz, Andreas Klingler , 2023. [ abs ][ pdf ][ bib ]

The multimarginal optimal transport formulation of adversarial multiclass classification Nicolás García Trillos, Matt Jacobs, Jakwang Kim , 2023. [ abs ][ pdf ][ bib ]

Robust Load Balancing with Machine Learned Advice Sara Ahmadian, Hossein Esfandiari, Vahab Mirrokni, Binghui Peng , 2023. [ abs ][ pdf ][ bib ]

Benchmarking Graph Neural Networks Vijay Prakash Dwivedi, Chaitanya K. Joshi, Anh Tuan Luu, Thomas Laurent, Yoshua Bengio, Xavier Bresson , 2023. [ abs ][ pdf ][ bib ]      [ code ]

A Simple Approach to Improve Single-Model Deep Uncertainty via Distance-Awareness Jeremiah Zhe Liu, Shreyas Padhy, Jie Ren, Zi Lin, Yeming Wen, Ghassen Jerfel, Zachary Nado, Jasper Snoek, Dustin Tran, Balaji Lakshminarayanan , 2023. [ abs ][ pdf ][ bib ]      [ code ]

Neural Implicit Flow: a mesh-agnostic dimensionality reduction paradigm of spatio-temporal data Shaowu Pan, Steven L. Brunton, J. Nathan Kutz , 2023. [ abs ][ pdf ][ bib ]      [ code ]

On Batch Teaching Without Collusion Shaun Fallat, David Kirkpatrick, Hans U. Simon, Abolghasem Soltani, Sandra Zilles , 2023. [ abs ][ pdf ][ bib ]

Sensing Theorems for Unsupervised Learning in Linear Inverse Problems Julián Tachella, Dongdong Chen, Mike Davies , 2023. [ abs ][ pdf ][ bib ]

First-Order Algorithms for Nonlinear Generalized Nash Equilibrium Problems Michael I. Jordan, Tianyi Lin, Manolis Zampetakis , 2023. [ abs ][ pdf ][ bib ]

Ridges, Neural Networks, and the Radon Transform Michael Unser , 2023. [ abs ][ pdf ][ bib ]

Label Distribution Changing Learning with Sample Space Expanding Chao Xu, Hong Tao, Jing Zhang, Dewen Hu, Chenping Hou , 2023. [ abs ][ pdf ][ bib ]

Can Reinforcement Learning Find Stackelberg-Nash Equilibria in General-Sum Markov Games with Myopically Rational Followers? Han Zhong, Zhuoran Yang, Zhaoran Wang, Michael I. Jordan , 2023. [ abs ][ pdf ][ bib ]

Quantus: An Explainable AI Toolkit for Responsible Evaluation of Neural Network Explanations and Beyond Anna Hedström, Leander Weber, Daniel Krakowczyk, Dilyara Bareeva, Franz Motzkus, Wojciech Samek, Sebastian Lapuschkin, Marina M.-C. Höhne , 2023. (Machine Learning Open Source Software Paper) [ abs ][ pdf ][ bib ]      [ code ]

Gap Minimization for Knowledge Sharing and Transfer Boyu Wang, Jorge A. Mendez, Changjian Shui, Fan Zhou, Di Wu, Gezheng Xu, Christian Gagné, Eric Eaton , 2023. [ abs ][ pdf ][ bib ]      [ code ]

Sparse PCA: a Geometric Approach Dimitris Bertsimas, Driss Lahlou Kitane , 2023. [ abs ][ pdf ][ bib ]

Labels, Information, and Computation: Efficient Learning Using Sufficient Labels Shiyu Duan, Spencer Chang, Jose C. Principe , 2023. [ abs ][ pdf ][ bib ]

Attacks against Federated Learning Defense Systems and their Mitigation Cody Lewis, Vijay Varadharajan, Nasimul Noman , 2023. [ abs ][ pdf ][ bib ]      [ code ]

HiClass: a Python Library for Local Hierarchical Classification Compatible with Scikit-learn Fábio M. Miranda, Niklas Köhnecke, Bernhard Y. Renard , 2023. (Machine Learning Open Source Software Paper) [ abs ][ pdf ][ bib ]      [ code ]

Impact of classification difficulty on the weight matrices spectra in Deep Learning and application to early-stopping XuranMeng, JeffYao , 2023. (Machine Learning Open Source Software Paper) [ abs ][ pdf ][ bib ]      [ code ]

The SKIM-FA Kernel: High-Dimensional Variable Selection and Nonlinear Interaction Discovery in Linear Time Raj Agrawal, Tamara Broderick , 2023. [ abs ][ pdf ][ bib ]

Generalization Bounds for Noisy Iterative Algorithms Using Properties of Additive Noise Channels Hao Wang, Rui Gao, Flavio P. Calmon , 2023. [ abs ][ pdf ][ bib ]

Discrete Variational Calculus for Accelerated Optimization Cédric M. Campos, Alejandro Mahillo, David Martín de Diego , 2023. [ abs ][ pdf ][ bib ]      [ code ]

Calibrated Multiple-Output Quantile Regression with Representation Learning Shai Feldman, Stephen Bates, Yaniv Romano , 2023. [ abs ][ pdf ][ bib ]      [ code ]

Bayesian Data Selection Eli N. Weinstein, Jeffrey W. Miller , 2023. [ abs ][ pdf ][ bib ]      [ code ]

Lower Bounds and Accelerated Algorithms for Bilevel Optimization Kaiyi ji, Yingbin Liang , 2023. [ abs ][ pdf ][ bib ]

Graph-Aided Online Multi-Kernel Learning Pouya M. Ghari, Yanning Shen , 2023. [ abs ][ pdf ][ bib ]      [ code ]

Interpolating Classifiers Make Few Mistakes Tengyuan Liang, Benjamin Recht , 2023. [ abs ][ pdf ][ bib ]

Regularized Joint Mixture Models Konstantinos Perrakis, Thomas Lartigue, Frank Dondelinger, Sach Mukherjee , 2023. [ abs ][ pdf ][ bib ]      [ code ]

An Inertial Block Majorization Minimization Framework for Nonsmooth Nonconvex Optimization Le Thi Khanh Hien, Duy Nhat Phan, Nicolas Gillis , 2023. [ abs ][ pdf ][ bib ]      [ code ]

Learning Mean-Field Games with Discounted and Average Costs Berkay Anahtarci, Can Deha Kariksiz, Naci Saldi , 2023. [ abs ][ pdf ][ bib ]

Globally-Consistent Rule-Based Summary-Explanations for Machine Learning Models: Application to Credit-Risk Evaluation Cynthia Rudin, Yaron Shaposhnik , 2023. [ abs ][ pdf ][ bib ]      [ code ]

Extending Adversarial Attacks to Produce Adversarial Class Probability Distributions Jon Vadillo, Roberto Santana, Jose A. Lozano , 2023. [ abs ][ pdf ][ bib ]      [ code ]

Python package for causal discovery based on LiNGAM Takashi Ikeuchi, Mayumi Ide, Yan Zeng, Takashi Nicholas Maeda, Shohei Shimizu , 2023. (Machine Learning Open Source Software Paper) [ abs ][ pdf ][ bib ]      [ code ]

Adaptation to the Range in K-Armed Bandits Hédi Hadiji, Gilles Stoltz , 2023. [ abs ][ pdf ][ bib ]

Learning-augmented count-min sketches via Bayesian nonparametrics Emanuele Dolera, Stefano Favaro, Stefano Peluchetti , 2023. [ abs ][ pdf ][ bib ]

Optimal Strategies for Reject Option Classifiers Vojtech Franc, Daniel Prusa, Vaclav Voracek , 2023. [ abs ][ pdf ][ bib ]

A Line-Search Descent Algorithm for Strict Saddle Functions with Complexity Guarantees Michael J. O'Neill, Stephen J. Wright , 2023. [ abs ][ pdf ][ bib ]

Sampling random graph homomorphisms and applications to network data analysis Hanbaek Lyu, Facundo Memoli, David Sivakoff , 2023. [ abs ][ pdf ][ bib ]      [ code ]

A Relaxed Inertial Forward-Backward-Forward Algorithm for Solving Monotone Inclusions with Application to GANs Radu I. Bot, Michael Sedlmayer, Phan Tu Vuong , 2023. [ abs ][ pdf ][ bib ]

On Distance and Kernel Measures of Conditional Dependence Tianhong Sheng, Bharath K. Sriperumbudur , 2023. [ abs ][ pdf ][ bib ]

AutoKeras: An AutoML Library for Deep Learning Haifeng Jin, François Chollet, Qingquan Song, Xia Hu , 2023. (Machine Learning Open Source Software Paper) [ abs ][ pdf ][ bib ]      [ code ]

Cluster-Specific Predictions with Multi-Task Gaussian Processes Arthur Leroy, Pierre Latouche, Benjamin Guedj, Servane Gey , 2023. [ abs ][ pdf ][ bib ]      [ code ]

Efficient Structure-preserving Support Tensor Train Machine Kirandeep Kour, Sergey Dolgov, Martin Stoll, Peter Benner , 2023. [ abs ][ pdf ][ bib ]      [ code ]

Bayesian Spiked Laplacian Graphs Leo L Duan, George Michailidis, Mingzhou Ding , 2023. [ abs ][ pdf ][ bib ]      [ code ]

The Brier Score under Administrative Censoring: Problems and a Solution Håvard Kvamme, Ørnulf Borgan , 2023. [ abs ][ pdf ][ bib ]

Approximation Bounds for Hierarchical Clustering: Average Linkage, Bisecting K-means, and Local Search Benjamin Moseley, Joshua R. Wang , 2023. [ abs ][ pdf ][ bib ]

Search code, repositories, users, issues, pull requests...

Provide feedback.

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly.

To see all available qualifiers, see our documentation .

  • Notifications

A list of research papers in the domain of machine learning, deep learning and related fields.


Name already in use.

Use Git or checkout with SVN using the web URL.

Work fast with our official CLI. Learn more about the CLI .

  • Open with GitHub Desktop
  • Download ZIP

Sign In Required

Please sign in to use Codespaces.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching Xcode

If nothing happens, download Xcode and try again.

Launching Visual Studio Code

Your codespace will open once ready.

There was a problem preparing your codespace, please try again.

Latest commit


I have curated a list of research papers that I come across and read. I'll keep on updating the list of papers and their summary as I read them every week.

How to read a Research Paper

Professor Andrew Ng gave some awesome tips on how to read a research paper . I have summarised the tips in this PDF .

Table of Contents

The list of papers can be viewed based on differentiating criteria's such as (Conference venue, Year Published, Topic Covered, Authors, etc.).

The following filtered formats are available to view paper's list:

  • Read and Summarised Papers
  • Conference-wise Filtered Papers
  • Year-wise Filtered Papers
  • Topic-wise Filtered Papers
  • Category-wise Filtered Papers
  • Author-wise Filtered Papers
  • Submissions
  • Artificial Intelligence
  • Career Advice
  • Computer Vision
  • Data Engineering
  • Data Science
  • Language Models
  • Machine Learning
  • Programming
  • Certificates
  • Online Masters
  • Cheat Sheets
  • Publications

Top Machine Learning Papers to Read in 2023

These curated papers would step up your machine-learning knowledge.

Top Machine Learning Papers to Read in 2023

Machine Learning is a big field with new research coming out frequently. It is a hot field where academia and industry keep experimenting with new things to improve our daily lives.

In recent years, generative AI has been changing the world due to the application of machine learning. For example, ChatGPT and Stable Diffusion. Even with 2023 dominated by generative AI, we should be aware of many more machine learning breakthroughs.

Here are the top machine learning papers to read in 2023 so you will not miss the upcoming trends.

1) Learning the Beauty in Songs: Neural Singing Voice Beautifier

Singing Voice Beautifying (SVB) is a novel task in generative AI that aims to improve the amateur singing voice into a beautiful one. It’s exactly the research aim of Liu et al. (2022) when they proposed a new generative model called Neural Singing Voice Beautifier (NSVB). 

The NSVB is a semi-supervised learning model using a latent-mapping algorithm that acts as a pitch corrector and improves vocal tone. The work promises to improve the musical industry and is worth checking out.

2) Symbolic Discovery of Optimization Algorithms

Deep neural network models have become bigger than ever, and much research has been conducted to simplify the training process. Recent research by the Google team ( Chen et al. (2023) ) has proposed a new optimization for the Neural Network called Lion (EvoLved Sign Momentum). The method shows that the algorithm is more memory-efficient and requires a smaller learning rate than Adam. It’s great research that shows many promises you should not miss.

3) TimesNet: Temporal 2D-Variation Modeling for General Time Series Analysis

Time series analysis is a common use case in many businesses; For example, price forecasting, anomaly detection, etc. However, there are many challenges to analyzing temporal data only based on the current data (1D data). That is why Wu et al. (2023) propose a new method called TimesNet to transform the 1D data into 2D data, which achieves great performance in the experiment. You should read the paper to understand better this new method as it would help much future time series analysis.

4) OPT: Open Pre-trained Transformer Language Models

Currently, we are in a generative AI era where many large language models were intensively developed by companies. Mostly this kind of research would not release their model or only be commercially available. However, the Meta AI research group ( Zhang et al. (2022) ) tries to do the opposite by publicly releasing the Open Pre-trained Transformers (OPT) model that could be comparable with the GPT-3. The paper is a great start to understanding the OPT model and the research detail, as the group logs all the detail in the paper.

5) REaLTabFormer: Generating Realistic Relational and Tabular Data using Transformers

The generative model is not limited to only generating text or pictures but also tabular data. This generated data is often called synthetic data.  Many models were developed to generate synthetic tabular data, but almost no model to generate relational tabular synthetic data. This is exactly the aim of Solatorio and Dupriez (2023) research; creating a model called REaLTabFormer for synthetic relational data. The experiment has shown that the result is accurately close to the existing synthetic model, which could be extended to many applications.

6) Is Reinforcement Learning (Not) for Natural Language Processing?: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization

Reinforcement Learning conceptually is an excellent choice for the Natural Language Processing task, but is it true? This is a question that Ramamurthy et al. (2022) try to answer. The researcher introduces various library and algorithm that shows where Reinforcement Learning techniques have an edge compared to the supervised method in the NLP tasks. It’s a recommended paper to read if you want an alternative for your skillset.

7) Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation

Text-to-image generation was big in 2022, and 2023 would be projected on text-to-video (T2V) capability. Research by Wu et al. (2022) shows how T2V can be extended on many approaches. The research proposes a new Tune-a-Video method that supports T2V tasks such as subject and object change, style transfer, attribute editing, etc. It’s a great paper to read if you are interested in text-to-video research.

8) PyGlove: Efficiently Exchanging ML Ideas as Code

Efficient collaboration is the key to success on any team, especially with the increasing complexity within machine learning fields. To nurture efficiency, Peng et al. (2023) present a PyGlove library to share ML ideas easily. The PyGlove concept is to capture the process of ML research through a list of patching rules. The list can then be reused in any experiments scene, which improves the team's efficiency. It’s research that tries to solve a machine learning problem that many have not done yet, so it’s worth reading.

8) How Close is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection

ChatGPT has changed the world so much. It’s safe to say that the trend would go upward from here as the public is already in favor of using ChatGPT. However, how is the ChatGPT current result compared to the Human Experts? It’s exactly a question that Guo et al. (2023) try to answer. The team tried to collect data from experts and ChatGPT prompt results, which they compared. The result shows that implicit differences between ChatGPT and experts were there. The research is something that I feel would be kept asked in the future as the generative AI model would keep growing over time, so it’s worth reading.

2023 is a great year for machine learning research shown by the current trend, especially generative AI such as ChatGPT and Stable Diffusion. There is much promising research that I feel we should not miss because it’s shown promising results that might change the current standard. In this article, I have shown you 9 top ML papers to read, ranging from the generative model, time series model to workflow efficiency. I hope it helps.     Cornellius Yudha Wijaya is a data science assistant manager and data writer. While working full-time at Allianz Indonesia, he loves to share Python and Data tips via social media and writing media.  

More On This Topic

  • KDnuggets News, April 27: A Brief Introduction to Papers With Code;…
  • Must Read NLP Papers from the Last 12 Months
  • Generative Agent Research Papers You Should Read
  • 5 Free Data Science Books You Must Read in 2023
  • 5 Free Books on Natural Language Processing to Read in 2023
  • 5 Machine Learning Skills Every Machine Learning Engineer Should…

basic machine learning research papers

Get the FREE ebook 'The Great Big Natural Language Processing Primer' and 'The Complete Collection of Data Science Cheat Sheets' along with the leading newsletter on Data Science, Machine Learning, AI & Analytics straight to your inbox.

By subscribing you accept KDnuggets Privacy Policy

Latest Posts

  • 11 Python Magic Methods Every Programmer Should Know
  • 10 GitHub Repositories to Master Machine Learning
  • 5 Free Courses to Master Data Engineering
  • Learn Probability in Computer Science with Stanford University for FREE
  • Mastering Data Science Workflows with ChatGPT
  • Beyond Human Boundaries: The Rise of SuperIntelligence
  • Building Predictive Models: Logistic Regression in Python
  • The Top 5 Alternatives to GitHub for Data Science Projects
  • Introduction to Multithreading and Multiprocessing in Python
  • Free MIT Course: TinyML and Efficient Deep Learning Computing

Subscribe To Our Newsletter (Get The Complete Collection of Data Science Cheat Sheets & Great Big NLP Primer ebook)

basic machine learning research papers


  1. The Why and How of Integrating Machine Learning in Business

    basic machine learning research papers

  2. GitHub

    basic machine learning research papers

  3. Top Machine Learning Research Papers Released In 2020

    basic machine learning research papers

  4. Machine Learning For Beginners. Machine learning was defined in 90’s by…

    basic machine learning research papers

  5. Annotated Machine Learning Research Papers

    basic machine learning research papers

  6. Basic Machine Learning Workflow

    basic machine learning research papers


  1. Your Path to Mastering Data Science & Machine Learning

  2. Introduction to Machine learning

  3. AI and Machine Learning in Healthcare for the Clueless

  4. DATA SCIENCE: Intro to Machine Learning

  5. Unlocking Success-The Surprising Impact of External Validation #trade #forex #alien #motivation #jio

  6. Revolutionary AI Breakthroughs ChatGPTs Incredible Capabilities Unveiled #alien #smartphone #ai #jio


  1. Learn from the Pros: Examining Award-Winning Research Papers as Examples

    Research papers play a crucial role in academia, providing valuable insights, expanding knowledge, and advancing various fields of study. However, for many students and researchers, crafting a high-quality research paper can be a daunting t...

  2. How Do You Make an Acknowledgment in a Research Paper?

    To make an acknowledgement in a research paper, a writer should express thanks by using the full or professional names of the people being thanked and should specify exactly how the people being acknowledged helped.

  3. What Is a Sample Methodology in a Research Paper?

    The sample methodology in a research paper provides the information to show that the research is valid. It must tell what was done to answer the research question and how the research was done.

  4. Papers With Code: The latest in Machine Learning

    Papers With Code highlights trending Machine Learning research and the code to implement it.

  5. Top 4 Important Machine Learning Papers You Should Read in 2021

    Curated from hundreds of high-quality ML research papers, these are the ones that stood out the most. Prem Kumar. Towards Data Science.

  6. Introduction to Machine Learning and Its Basic Application in Python

    This paper majorly used. Scikit-Learn library of Python for implementing the applications developed for the purpose of research. Electronic copy available at:

  7. Machine Learning: Algorithms, Real-World Applications and

    The purpose of this paper is, therefore, to provide a basic guide for those academia and industry people who want to study, research, and

  8. ML Research Papers to Study

    Embracing the insights from these research papers, the machine learning ... essential for leveraging machine learning's potential to tackle

  9. Top 20 Recent Research Papers on Machine Learning and Deep

    Machine learning and Deep Learning research advances are transforming our technology. Here are the 20 most important (most-cited) scientific

  10. Journal of Machine Learning Research

    All published papers are freely available online. JMLR has a commitment to rigorous yet rapid reviewing. Final versions are published electronically (ISSN 1533-

  11. Machine-Learning-Research-Papers

    A list of research papers in the domain of machine learning, deep learning and related fields. - GitHub - anubhavshrimal/Machine-Learning-Research-Papers: A

  12. Machine Learning

    This paper proposes an extension of explanation-based learning, calledabductive

  13. Top Machine Learning Papers to Read in 2023

    2) Symbolic Discovery of Optimization Algorithms. Deep neural network models have become bigger than ever, and much research has been conducted

  14. 7 Best Research Papers To Read To Get Started With Deep

    Getting Started with Research Papers for Deep Learning: · 1. ResNet: · 2. YOLO: · 3. U-Net: · 4. Batch Normalization: · 5. Transformers: · 6.