Submitted by jimliu741523 t3_114de9s in MachineLearning

Hi there,

I am a research data scientist, and excited to release a new feature engineering library, designed to help you streamline the process of machine learning even more than before. Headjack is an open library which provides a ML features transformation based on self-supervised learning models, similar to huggingface as a hub, but which currently focuses on exchanging features for tabular data models.

Compared to textual data, tabular data are different in that each data set has different column length and attributes, this means that it cannot be typed consistently unlike the token embedded in NLP tasks. Therefore, Headjack is different from NLP’s pre-trained model with single domain transformation, but by performing with two different domain transformations. In other words, we can perform features transform between two domains without the same key value. In addition, release the potential of data that is not typically used. For example, enhance the prediction of the Boston housing price task applied in the Titanic domain, or enhance the prediction of the customers churn task applied in the African traffic domain and so on.

Github

Introduction

​

The IRIS dataset with California House Price Feature Transformation

The IRIS dataset with Titanic Feature Transformation

The IRIS dataset with KPMG Customer Demorgraphy Feature Transformation

​

56

Comments

You must log in or register to comment.

ekbravo t1_j8w5lir wrote

Interesting concept, not sure if a corporate dataset will be allowed to be released into the wild. Plus one has to create an account not only to register on their website but also use one’s account info every time the code runs. Not for business use.

4

jimliu741523 OP t1_j8x9047 wrote

Thanks for your kindly words, this is an open version not for enterprise. The enterprise one did not released the dataset into the wild, the feature model only put on your privacy pool. In the future version, we will consider replace account info with API key.

3

jimliu741523 OP t1_j8xatge wrote

The HeadJack framework and also were designed by ourself, so we had a paper, which summited to a ML conference and in the double-blind process, that it is not convenient public right not, but the framework was based on GAN with cross-domain and self-supervised learning. We will open it in the future : )

1