Since its release, we have heard a lot about ChatGPT. From dystopian visionary founders telling the world that humans are becoming useless to curious people who are just experimenting, up to debunkers eager to find and expose every weakness of … Read more
Fast Document Similarity in Python (MinHashLSH)
Why is document similarity more important than ever? In the big data era, it is always more frequent that companies need to detect similar items in their database. Imagine platforms like Kijiji or Subito, trying to detect people that constantly … Read more
Working with Date Intervals in Data Warehouses and Data Lakes
Working as Data Engineer makes you work with dates and time data a lot. Especially in the recent period where companies want to be Data-Driven, the software is Event-driven, your coffee machine is data-driven, and AI and ML require tons of data to … Read more