As you know, you don't want your data lake to become a data swamp. I don't need to say again that you would like to have a minimum level of data organization and categorization in your data leak. To better understand what I mean by organization and categorization, let me compare with a library. Imagine a library with thousands of books, where those books are all piled up together without any sort of categorization. As you may expect, it would be very hard to find books if they are organized like that. The most optimal way for people to find books in libraries is to organize them under main topics that are both easy to memorize and reduces the number of titles to search when you're looking for something. That's why we have categories such as arts, comic books, romance, history, computer science, etc. For centuries, libraries are organized that way for a reason. The reason is that it matches the access pattern used when someone is looking for a title. That being said, if you're looking for a book containing food recipes, it makes more sense to go for the cookbook section instead of the science fiction. Now let's take the number of pages as another example of cataloging books. Imagine that you have library shelves filled up with books sorted by number of pages. In a way that cookbooks and computer science books could be close to each other if they have approximate number of pages. That way of cataloging books would make it easy if someone is looking for books that can be quickly read. But it is not very practical in reality because most of us use access patterns or finding books by subjects or by the combination of title and author. As you see, one method of categorizing data is not necessarily better than any other. It is all about finding the most appropriate method of categorizing according to the access pattern. The very same principle applies to your data link. As you have people looking for books in libraries, in your data link, you will have computer systems or AWS services getting data to process. You should consider reclassifying that data according to the access pattern needed by those. For your data link, you may have auto-generated data, such as data generated from IoT devices or server logs. Those often come via streaming and are typically unstructured, suitable to ingest with Amazon Kinesis, store in Amazon S3, catalog with AWS Glue, process with Lambda and query with Amazon Athena. You may also have operational data, such inventory and sales, expense reports, and other inputs. Those likely come in batches and are usually consumed by people who wants to visualize graphs and have access to statistics. This data may be suitable for being ingested with API Gateway, stored in S3, catalogs with Glue, transported to Amazon Elasticsearch Service and visualized with Kibana. In addition to that, you may want to have human-generated data, such as social media feeds, contact forms, call center audio, e-mails, etc. For these, you may think about access patterns needed by Data Analysis Services. You could ingest that data with S3, SFTP or Upflow, store it in S3, catalog with Glue and use a service like Amazon Comprehend, which is a natural language processing service that uses machine learning to find insights in texts, such as sentimental analysis. For each one of those scenarios, you may want to use a specific ingestion and processing layer that is the right tool for the job. Depending on which processing tool you are going to use, you may want to design your data to match the access pattern needed by the processing layer. That includes shaping the data to a specific file format, compressing, splitting, aggregating, or any other data transformation. Rest assured that we will cover more about data prep later in this course. Notice that I mentioned storage in S3 and catalog with Glue in all previous cases. That is because those services are specifically designed to work with storage and cataloging in a data agnostic way, keeping your data organized and preventing the data swamp.