Techical stuff, as simple, as possible:
To get something from service you have to make request.
Service will reply with response containing requested data.
To get a lot of data you have to make a lot of requests and receive a lot of responses.
Every time you make a request you consume service CPU and traffic.
CPU and traffic is what services like AWS sold for money.
If you continuosly make a lot of requests back-to-back, creating requests storm, service instance at AWS will spend all its CPU time and traffic for dealing with your storm of requests and other users would have to wait for their responses for a long time. Service will be unavailable for ordinary users. This is how Denial of Service attack (DoS) works.
Service could have multiple CPUs and have thick traffic channel (much more than you could afford), so you have to make your requests storm from multiple instances (using your shills or botnet) to grab all CPUs and all traffic. It is Didtributed Denial of Service attack (DDoS).
To prevent DoS and DDoS any company like AWS have a protection. It detects storm of requests and drop them or slow down the stream.
To quickly download huge amount of posts and comments you have to make a huge amount of similar requests fast using multiple copies of your crawler, working from different places simultaneously. This have no any difference from DDoS, so, in normal situation AWS DDoS protection will be triggered and will not allow your crawlers to bother service, so you will not be able to download all data and create a traffic spike. If you suddenly did it, it means that DDoS protection was disabled.
All the sources are in any book or article about how web works.
All the proofs of prepared and organized DDoS is on the link in OP.
Techical stuff, as simple, as possible:
To get something from service you have to make request. Service will reply with response containing requested data.
To get a lot of data you have to make a lot of requests and receive a lot of responses.
Every time you make a request you consume service CPU and traffic.
CPU and traffic is what services like AWS sold for money.
If you continuosly make a lot of requests back-to-back, crating requests storm, service instance at AWS will spend all its CPU time and traffic for dealing with your storm of requests and other users would have to wait for their responses for a long time. Service will be unavailable for ordinary users. This is how Denial of Service attack works.
Service could have multiple instances, so you have to make your requests storm multiple instances to grab all CPUs and all traffic. It is Didtributed Denial of Service attack.
To prevent DoS and DDoS any company like AWS have a protection. It detects storm of requests and drop them or slow down the stream.
To quickly download huge amount of posts and comments you have to make a huge amount of similar requests using multiple copies of your crawler, working from different places simultaneously. This have no any difference from DDoS, so, in normal situation AWS DDoS protection will be triggered and will not allow your crawlers to bother service, so you will not be able to download all data and create a traffic spike.
All the sources are in any book or article about how web works.
All the proofs of prepared and organized DDoS is on the link in OP.