“If I investigated my money to the real estate back then, I’m already a billionaire now.”

“There’s nothing special with Jimmy, we both saw the bright future of AI. It’s only because he graduated two years earlier than me. Otherwise, I must be the CEO of a biotech-AI startup now!”

I summarize them as two major types of complaints about “why I’m one step behind the true success.”

1) I was able to, but I didn’t see it.

2) I did see it, but I was not able to.

The *basic assumption* of these complaints is that the people who…

When I saw my two dogs sleeping in their bed with some chasing games in their dreams, I wished I were a dog like them.

I was not able to solve the problems in my life. I even thought it was better if I couldn’t wake up from my bed one day.

However, I never see my dogs getting up unhappy from their bed. Probably it is where the magic comes from.

So, I decided to switch beds with my dogs.

Thankfully, I found the magic of life there.

I got up at 7 am.

I used to get up…

The fact that the COVID-19 specific features could be observed on chest imaging with X-ray/CT has been inspiring a lot of AI scientists to focus on algorithm development.

Recently, the researchers from Northwestern University published a novel classifier, DeepCOVID-XR, to diagnose a “COVID-19 positive case” based on the chest radiographs. The performance of this classifier is reported to be similar to experienced thoracic radiologists.

A lot of works have been published on similar topics, but the authors claimed that DeepCOVID-XR was trained on “*the largest clinical dataset of chest radiographs from the COVID-19 era*”.

As a data scientist, let me…

Recently, an artificial intelligence-driven analytics platform, **DrBioRight**, has been developed for biomedical research, which is published at *Cancer Cell*, one of the top scientific journals.

The main function of this tool is to, by utilizing natural language processing (*NLP*) technology, enable biomedical researchers ** without** expertise in

Specifically, DrBioRight allows the user to ** “talk”** to it in

Generalized Linear Model (*GLM*) is popular because it can deal with a wide range of data with different response variable types (such as *binomial**, **Poisson**, or **multinomial*).

Comparing to the *non-linear* models, such as *the neural networks* or *tree-based* models, the linear models may not be that powerful in terms of prediction. But the easiness in *interpretation* makes it still attractive, especially when we need to understand how each of the predictors is influencing the outcome.

The shortcomings of GLM are as obvious as its advantages. The linear relationship may not always hold and it is really sensitive to outliers…

The popular multinomial logistic regression is known as an extension of the binomial logistic regression model, in order to deal with more than two possible discrete outcomes.

However, the multinomial logistic regression is not designed to be a general multi-class classifier but designed specifically for the **nominal** multinomial data.

To note, **nominal** data and **ordinal** data are two major categories of multinomial data. The difference is that there is no order to the categories in nominal multinomial data while there is an order to those in ordinal multinomial data.

For example, if our goal is to distinguish the three classes…

The Poisson regression model naturally arises when we want to model the average number of occurrences per unit of time or space. For example, the incidence of rare cancer, the number of car crossing at the crossroad, or the number of earthquakes.

One feature of the Poisson distribution is that the **mean** equals the **variance**. However, *over- *or *underdispersion* happens in Poisson models, where the variance is larger or smaller than the mean value, respectively. In reality, overdispersion happens more frequently with a limited amount of data.

The overdispersion issue affects the interpretation of the model. It is necessary to…

A lot of events in our daily life follow the binomial distribution that describes the number of successes in a sequence of independent Bernoulli experiments.

For example, assuming that the probability of James Harden making his shot is constant and each shot is independent, the number of field goals follows the binomial distribution.

If we want to find the relationship between the success probability (*p*) of a binomially distributed variable Y with a list of independent variables *x*s, the binomial regression model is among our top choices.

The link function is the major difference between a binomial regression and a…

I’ve tried several different types of NBA analytical articles within my readership who are a group of true fans of basketball. I found that the most popular articles are **not** those with state-of-the-art machine learning technologies, but those with straightforward and meaningful graphs.

At a certain stage of my career as a data scientist, I realized that delivering the information is more important than showing the fancy models. Perhaps that’s why linear regression is still one of the most popular models in the finance world.

In this post, I am going to talk about a simple topic. It is *how…*

Supervised learning and unsupervised learning are the two major tasks in machine learning. Supervised learning models are used when the output of all the instances is available, whereas unsupervised learning is applied when we don’t have the “true label”.

Even though the exploration of unsupervised learning has huge potential in future research, supervised learning is still dominating the field. However, it’s common that we need to build a supervised learning model when we don’t have sufficient labeled samples in our data.

In such a case, the semi-supervised learning can be taken into consideration. The idea is to build a supervised…

Ph.D., Data Scientist, and Bioinformatician. A true lover of data and basketball. Understanding is the path to eliminating discrimination.