Data Lake | SQUARE DKSR

Back to overview

A data lake is a cloud-based method of storing data. A large number of different types of data can be stored in a data lake without unifying them first.

A data lake is generally larger than a data pool and usually has a higher number of users. It therefore requires more storage capacity. In contrast to a data pool, data in a data lake is usually not uniformly formatted but is available in its raw state. This means that it can be made available and used for a large number of applications in different formats - however, this usually requires further processing (e.g., cleaning, preparation, transformation, etc.). If a data lake is not actively managed - e.g. via appropriate data governance or data quality mechanisms - it slowly turns into a data swamp and loses value.

A data lake makes sense where many users provide a large amount of data whose use is not yet fully defined. For a data provider, the advantage of interacting via a data lake over a data pool is that they do not have to standardize their data or put it into a specific form before uploading it, a process that would most likely omit important information.

If municipalities or businesses want to use data from a data lake, they need to create standardized formats and organizational schemas, as they would for a data pool. This is where open urban data platforms are useful - they transform large quantities of heterogeneous data into a unified format for processing. In the case of DKSR, the structuring is performed via connectors.

Sources

Ai plain english, "Data Lake or Pool" (zuletzt besucht am 20.07.2021)