Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

askthedev.com Logo askthedev.com Logo
Sign InSign Up

askthedev.com

Search
Ask A Question

Mobile menu

Close
Ask A Question
  • Ubuntu
  • Python
  • JavaScript
  • Linux
  • Git
  • Windows
  • HTML
  • SQL
  • AWS
  • Docker
  • Kubernetes
Home/ Questions/Q 9723
Next
In Process

askthedev.com Latest Questions

Asked: September 26, 20242024-09-26T00:45:47+05:30 2024-09-26T00:45:47+05:30In: AWS

Is it possible to remove specific rows of data from tables in Athena?

anonymous user

I’ve been diving into using AWS Athena for some data analysis projects, and I hit a bit of a snag that I could really use some help with. So, here’s my question: Is it possible to remove specific rows of data from tables in Athena?

I’ve come across some interesting datasets, and while I’m able to run queries and get results, sometimes I find that I want to exclude certain rows based on specific criteria. For instance, I have a table with user activity logs, and I want to remove entries from users who haven’t been active in the last year. I know Athena is built on Presto and is great for querying data, but I’m not too clear on the best way to “delete” rows, if that’s even possible.

What’s frustrating is that I’ve seen ways to select data and make new tables, but I’m not necessarily looking to create new tables each time I want to analyze a subset of data. I just want to refine what I’m working with on the fly without having to duplicate everything. Also, I assume that with Athena being a serverless query service, there might be limitations compared to traditional SQL databases where you can just run a DELETE command, right?

Some people have suggested using CTAS (Create Table As Select) for filtering the data, which sounds like a workaround, but it feels a bit cumbersome if I need to do it frequently. Plus, it leads to more tables cluttering up my S3 bucket, which isn’t ideal. Does anyone have experience with this kind of thing? Are there better approaches that you’ve found?

I’d love to hear how you all handle similar situations. What strategies do you use in Athena to manage or exclude specific rows from your analysis? Any tips or insights would be super helpful. Thanks!

Amazon S3
  • 0
  • 0
  • 2 2 Answers
  • 0 Followers
  • 0
Share
  • Facebook

    Leave an answer
    Cancel reply

    You must login to add an answer.

    Continue with Google
    or use

    Forgot Password?

    Need An Account, Sign Up Here
    Continue with Google

    2 Answers

    • Voted
    • Oldest
    • Recent
    1. anonymous user
      2024-09-26T00:45:48+05:30Added an answer on September 26, 2024 at 12:45 am



      AWS Athena Row Deletion Question

      Hey there!

      So, I get where you’re coming from! Working with AWS Athena can be a bit tricky when it comes to managing data. Unlike traditional SQL databases, you’re correct that Athena doesn’t allow for a straightforward DELETE command since it’s primarily designed for querying data in S3.

      When you want to filter out specific rows, the common approach is indeed to use CTAS (Create Table As Select). Sure, it might feel like a hassle or make things a little cluttered in your S3 bucket, but it’s one of the main ways people handle data curation in Athena.

      Here’s a simple example of how you might do it:

              CREATE TABLE new_table
              AS SELECT * FROM old_table
              WHERE activity_date >= DATE_ADD('year', -1, CURRENT_DATE())
          

      This will create a new table new_table with only the rows where the user has been active in the last year. I know it can feel annoying to keep creating new tables, but sometimes it’s just part of the process with Athena!

      Another thing to consider is using views. You can create a view based on your queries, which can make it easier to reference your filtered data without generating new tables. This might help you to avoid clutter!

      Example for creating a view:

              CREATE VIEW active_users AS 
              SELECT * FROM user_activity_logs
              WHERE activity_date >= DATE_ADD('year', -1, CURRENT_DATE())
          

      This way, whenever you query active_users, it’ll just show the filtered results. No need to create new tables all the time. But, keep in mind that the view still queries the original data every time, so performance can vary depending on the size of your data.

      Hope that helps a bit! Let me know if you have more questions or if there’s something else you’re stuck on!


        • 0
      • Reply
      • Share
        Share
        • Share on Facebook
        • Share on Twitter
        • Share on LinkedIn
        • Share on WhatsApp
    2. anonymous user
      2024-09-26T00:45:48+05:30Added an answer on September 26, 2024 at 12:45 am


      AWS Athena is a powerful query service that operates on the data stored in S3, but it does have some limitations regarding data manipulation operations like DELETE. In Athena, you cannot directly delete specific rows from a table as you would in traditional SQL databases. Instead, Athena’s architecture relies on a read-only model for the data in S3, which means modifications to the data aren’t supported natively. However, you can achieve similar results by using a technique known as Create Table As Select (CTAS). With CTAS, you can write a query that selects only the rows you wish to keep, effectively filtering your data based on specific criteria. Although this isn’t as straightforward as a DELETE command, it allows for a refined dataset that you can work with going forward.

      To manage the number of tables cluttering your S3 bucket, consider using a systematic naming convention or a temporary storage strategy where you create intermediate tables for your analysis and then drop or overwrite them after you’re done. Another approach you might consider is leveraging AWS Glue in conjunction with Athena, which can help you manage schemas and catalogs effectively. While CTAS may feel cumbersome for frequent filtering, it provides flexibility in how you can work with your data without modifying the original datasets. Depending on your analysis needs, using views (if applicable) or rethinking your data organization strategy might also present more manageable solutions. Overall, while there are workarounds for row exclusion in Athena, it often requires some creativity to avoid proliferation of datasets in S3.


        • 0
      • Reply
      • Share
        Share
        • Share on Facebook
        • Share on Twitter
        • Share on LinkedIn
        • Share on WhatsApp

    Related Questions

    • I'm having trouble figuring out how to transfer images that users upload from the frontend to the backend or an API. Can someone provide guidance or examples on how to ...
    • which statement accurately describes aws pricing
    • which component of aws global infrastructure does amazon cloudfront
    • why is aws more economical than traditional data centers
    • is the aws cloud practitioner exam hard

    Sidebar

    Related Questions

    • I'm having trouble figuring out how to transfer images that users upload from the frontend to the backend or an API. Can someone provide guidance ...

    • which statement accurately describes aws pricing

    • which component of aws global infrastructure does amazon cloudfront

    • why is aws more economical than traditional data centers

    • is the aws cloud practitioner exam hard

    • how to deploy next js app to aws s3

    • which of these are ways to access aws core services

    • which of the following aws tools help your application

    • how to do sql aws and gis

    • how do i stop all services in my aws cloud

    Recent Answers

    1. anonymous user on How do games using Havok manage rollback netcode without corrupting internal state during save/load operations?
    2. anonymous user on How do games using Havok manage rollback netcode without corrupting internal state during save/load operations?
    3. anonymous user on How can I efficiently determine line of sight between points in various 3D grid geometries without surface intersection?
    4. anonymous user on How can I efficiently determine line of sight between points in various 3D grid geometries without surface intersection?
    5. anonymous user on How can I update the server about my hotbar changes in a FabricMC mod?
    • Home
    • Learn Something
    • Ask a Question
    • Answer Unanswered Questions
    • Privacy Policy
    • Terms & Conditions

    © askthedev ❤️ All Rights Reserved

    Explore

    • Ubuntu
    • Python
    • JavaScript
    • Linux
    • Git
    • Windows
    • HTML
    • SQL
    • AWS
    • Docker
    • Kubernetes

    Insert/edit link

    Enter the destination URL

    Or link to existing content

      No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.