Row2Vec: Learn Embeddings from Tabular Data#
Welcome
Row2Vec is a Python library for easily generating low-dimensional vector embeddings from any tabular dataset. It uses deep learning and classical ML methods to create powerful, dense representations of your data.
What is Row2Vec?#
Row2Vec transforms tabular data into meaningful vector representations (embeddings) that capture the essential characteristics of your data. Instead of feeding raw data directly into models, you can create compressed, information-rich representations that models can easily process.
Key Features#
๐ฏ Five Powerful Modes: Neural networks, PCA, t-SNE, UMAP, and target-based embeddings
๐ง Intelligent Preprocessing: Automatic missing value imputation and feature encoding
๐ Simple API: One function -
learn_embedding()- handles everything๐พ Model Persistence: Save and load trained models for production use
๐ง Production Ready: 92% test coverage, type safety, modern build system
Quick Example#
# Import complete suppression first
exec(open('suppress_minimal.py').read())
import pandas as pd
from row2vec import learn_embedding, generate_synthetic_data
# Generate sample data
df = generate_synthetic_data(num_records=100, seed=42)
print(f"Data shape: {df.shape}")
print(df.head(3))
โ Enhanced minimal suppression active
Data shape: (100, 3)
Country Product Sales
0 USA A 103.047171
1 Mexico B 448.000795
2 Canada B 80.489648
# Generate unsupervised embeddings
embeddings = learn_embedding(df, mode="unsupervised", embedding_dim=2, max_epochs=10, verbose=False)
print(f"\nEmbedding shape: {embeddings.shape}")
print("\nFirst 5 embeddings:")
print(embeddings.head())
Model: "functional"
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโณโโโโโโโโโโโโโโโโโโโโโโโโโณโโโโโโโโโโโโโโโโ โ Layer (type) โ Output Shape โ Param # โ โกโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฉ โ input_layer (InputLayer) โ (None, 10) โ 0 โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโค โ dense (Dense) โ (None, 128) โ 1,408 โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโค โ dropout (Dropout) โ (None, 128) โ 0 โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโค โ embedding (Dense) โ (None, 2) โ 258 โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโค โ dense_1 (Dense) โ (None, 128) โ 384 โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโค โ dropout_1 (Dropout) โ (None, 128) โ 0 โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโค โ dense_2 (Dense) โ (None, 10) โ 1,290 โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโ
Total params: 3,340 (13.05 KB)
Trainable params: 3,340 (13.05 KB)
Non-trainable params: 0 (0.00 B)
Embedding shape: (100, 2)
First 5 embeddings:
embedding_0 embedding_1
0 -0.052516 0.156451
1 0.094666 0.168608
2 -0.419540 -0.065374
3 -0.074119 0.141072
4 -0.153844 -0.141039
Why Use Row2Vec?#
Compared to Manual Implementation#
Aspect |
Row2Vec |
Manual Neural Network |
|---|---|---|
Lines of code |
~5 |
~200+ |
Preprocessing |
Automatic |
Manual pipeline |
Missing values |
Handled |
Manual imputation |
Categorical encoding |
Automatic |
Manual encoding |
Scaling |
Built-in |
Manual setup |
Compared to Other Methods#
Method |
Use Case |
Row2Vec Advantage |
|---|---|---|
PCA |
Linear reduction |
Also offers non-linear (neural) options |
t-SNE |
Visualization |
Unified interface with preprocessing |
UMAP |
General reduction |
Consistent API across all methods |
Manual NN |
Custom embeddings |
Automatic preprocessing, simpler API |
Installation#
pip install row2vec
Documentation Overview#
Installation: Setup and requirements
Quickstart: Get started in 5 minutes
Titanic Example: Complete walkthrough with the Titanic dataset
Adult Example: High-cardinality categorical features
Housing Example: Real estate price prediction features
Advanced Features: Neural architecture search, imputation strategies
CLI Guide: Command-line interface documentation
API Reference: Complete API documentation
Next Steps#
Ready to get started? Head to the Installation guide or jump straight to the Quickstart tutorial.
Questions or Issues?
๐ Check the API Reference for detailed documentation
๐ Report issues on GitHub
๐ฌ Join discussions in the GitHub Discussions