Cloud

What is cloud native?

Cloud native means software that is intended to be run in the cloud, first and foremost. It’s not supposed to be run on a home server or on-premises data center. Cloud native applications involve containers, microservices, agile, DevOps, and cloud platforms such as Amazon Web Services (AWS), Microsoft Azure, or Google Cloud. It’s possible to have stuff in the cloud that isn’t cloud-native. For example, migrating legacy apps from a local data center to the cloud is an example of non-native cloud computing.

While it’s true that this book focused more on traditional stuff, like desktop software development, LAMP web development, traditional web hosts and VMs, it’s also good to know cloud. Don’t get me wrong – if I didn’t have faith in Java/C++/LAMP/etc, I wouldn’t have written a book about it! But even though that stuff is around now, it’s important to see which way the winds are blowing. Cloud is the future.

LAMP is useful, but not super new. I think it’s stable and a safe bet. But you can’t ignore new stuff either. If you want to learn bleeding-edge cloud-native stuff, here’s some stuff you should look into:

Containers

Microservices – your first projects might be monolithic in nature, but it can be good to eventually learn about developing microservices. Microservices involve breaking up your application into smaller pieces and then creating loosely coupled services, where you can have many different containers and servers, with granular provisioning (meaning more containers of a certain type when you need more and fewer when you don’t need as many), but you shouldn’t start out with microservices and cloud native stuff. That should be an eventual goal, but stick with simpler stuff first, like a monolithic app in a single VM, because that’s easier to develop and understand at first.

Think of a restaurant. Instead of having one employee who cooks, cleans, and waits tables, there are many separate employees who have separate roles. A waiter or waitress waits tables. A chef cooks the food, sometimes with the help of line cooks. Someone else buses tables and washes dishes. The owner or manager might be in charge of things like the menu, buying ingredients, advertising, and things like that.

Having a monolithic app is like having a restaurant where one person does everything. Microservice architecture is like having separate employees for different things. “Loosely coupled” means that you can swap out one for another. So if one waiter/waitress at a restaurant called in sick, another waiter/waitress can take their shift. You can also scale out different services as needed. Maybe there are more people coming in on the weekend, so you scale out by having extra staff come in to address the extra demands of the customers, and then go back to normal when there are fewer customers.

Kubernetes – a container orchestration platform made by Google. Very popular and used a lot in cloud-native development.

Docker – a popular type of container. There are many types of containers, but this is one that really matters.

Alpine Linux – used by Docker.

Google Cloud Kubernetes Engine – a way to run Kubernetes/Docker in the cloud.

YAML – a markup language used in Docker configurations.

Cloud platforms

Infrastructure as a Service – if you don’t want to be in charge of your own IT infrastructure, then look for companies that offer IaaS, which makes it easier for developers to have servers and network infrastructure without managing each and every aspect of it. Infrastructure is data centers, server racks, cables, switches, routers, servers, air conditioning, firewalls, and more. IaaS is easier than making your own infrastructure, but more hands-off than PaaS or SaaS.

Platform as a Service – PaaS offers more stuff than IaaS. The idea is to do more stuff for you so that you don’t need to do as much system administration type stuff – operating systems, installing software, things like that. Heroku is an example of PaaS.

Software as a Service – unlike IaaS or PaaS, which are intended for software developers, SaaS is for end users. When you use Google Drive, you’re not setting up the Google Drive servers or software that runs on said servers. You’re merely using the software. It’s a service with a subscription cost (if you want more than the free tier). They do everything for you.

Infrastructure as code – config files for cloud-native stuff, auto configs, auto deployment, etc. – that’s all the infrastructure many modern developers care about, using cloud as opposte to on-prem data centers. If you can deploy new servers and containers in the cloud using software, then you can start seeing infrastructure as being code.

Serverless architecture – unlike SaaS, serverless architecture is for developers. It’s a newer concept. If you want to write code and have it run in the cloud, but don’t want to worry about servers, then use serverless options such as AWS Lambda.

It might sound confusing, but serverless software does run on a server – they call it serverless because they mean that you don’t need to worry about setting up, securing, or updating the server at all. It has all been abstracted away from you. It’s just like how Java added garbage collection, meaning it abstracted away memory management, compared to older languages like C++ which require the developer to be in charge of that stuff.

Microsoft Azure – Microsoft’s cloud platform for developers. You can have a server in Azure, Google Cloud, or AWS, and unlike an old-school web host or VPS, it has more features or benefits. They might have better scalability, take care of configurations for you, or add features that older styles of server companies don’t have.

Firebase – Google’s mobile platform. Apps need servers too. Firebase is mobile-centric, but you can use other cloud platforms for mobile stuff too.

Instead of doing a lot of really in-depth back-end stuff, like Node.js or whatever, you could just use Firebase instead.

Instead of taking payments yourself, if you want to take payments online, you should look into Stripe’s API. It’s a REST API, so it’s good to be familiar with JSON if you want to use it. The point of something like Firebase is that it does more so you can do less.

Google Cloud – Google’s cloud offering.

Amazon Web Services (recommended) – the most popular cloud computing platform. Keep in mind that, when I say cloud, I am referring to cloud stuff for developers, not home users. Google Drive is “cloud storage” but not in the same way that AWS’s storage options are. There are many cloud options out there, but you can’t go wrong with AWS.

Heroku and DigitalOcean – smaller than AWS, but used by some people. Just like other cloud providers, they have many different services and the emphasis is to make it easy for a developer to get their app up and running on a server, spending less time on system administration and more time on the app itself.

GraphQL

XML and JSON can be used for APIs (SOAP and REST respectively) or for storing structured data, with key-value pairs. SQL is a query language for databases. But GraphQL is a query language and schema language for APIs. Schemas specify structure, but queries are when you want data. If you haven’t heard of GraphQL, that’s because it was made public in 2015. Facebook developed GraphQL earlier than that, but 2015 is when they decided to open source it. For the sake of comparison, SQL was made in the 1970s. New tech comes out all the time. GraphQL isn’t going to replace SQL, as they serve different purposes. But what I’m saying is that it’s okay to not be super familiar with all tech that’s out there, because there’s so much new stuff coming out all the time that it can be hard to stay on top of things. You can use GraphQL for a remote API or even as a way for microservices to communicate with one another.

Here is a simple GraphQL schema example:

type Student {

firstName: String!

lastName: String!

studentId: ID!

gpa: Float

enrolled: Boolean!

credits: Int!

classSchedule: [Course]

}

! means it’s mandatory. In a GraphQL type schema, you’re specifying what something needs. Schemas of any sort specify the structure of something. You will notice that GraphQL looks similar to JSON, but it’s not exactly the same.

GraphQL usage consists of queries and responses. You send a query to a GraphQL API, and it responds with data, called an abstract syntax tree, which is just a complicated way of saying a structured GraphQL response to a query containing data. You can then validate the data using a schema to make sure it’s structured the way it needs to be.

[Course] means an array of courses, specified by the Course type (which the programmer would have to make themselves). But because it doesn’t have !, that means it’s not mandatory. That’s because a student might not be enrolled in classes for a given semester.

When you get the results of a query, you can compare it to a schema. If it meets all the criteria specified in the schema, then it’s valid.

There’s more to GraphQL than what I mentioned here, but this is fine for starting out, especially considering that you’re probably more likely to use JSON/REST instead of GraphQL. But now you know that, if someone mentions GraphQL, they’re talking about APIs.

But whether an API uses SOAP, REST, or GraphQL, all remote APIs are fundamentally the same: a way for a user and a server to interact in a limited way with specific rules. When querying an API, you don’t care how the API gets the data. All you care is that it does get the data for you. The implementation of an API is abstracted away from the user of it. Abstraction is a core tenet of object-oriented programming.

Performance

Web scale – having the ability to support millions of users on your website or app. Not everyone needs to be web scale. Youtube is an example of web scale infrastructure. Some Youtube videos have hundreds of millions or even billions of views. That wouldn’t be even remotely possible on a cheap web host. But at the same time, if you don’t have many users for your app, why waste time and money making something have unneeded capacity?

Redis or Memcached – ways to speed up databases.

CDNs – Content Delivery Networks. Get more servers in different locations across the globe so that users can connect to one that is close to them.

Cloudflare – a well-known CDN.

Akamai – another well-known CDN.

Scalability – the ability for your software to handle lots of users. It’s not just about hardware.

Bottleneck – a slow thing that makes everything else in the system slow, even if the other parts are fast. Maybe your server has a fast CPU, enough RAM, lots of storage, but a slow connection. The network connection would be described as a bottleneck. Bottlenecks can also be software, not just hardware.

Elasticity – the ability to add additional CPU/RAM/storage/networking capacity whenever you need it. A part of elasticity is being able to scale down after going up temporarily.

MEAN/MERN stack

MongoDB – a noSQL database. Relational databases use SQL, and noSQL differentiates them from relational databases. NoSQL databases are sometimes called document-based databases. They are more flexible than SQL, which has rigid columns and relationships for data.

Express – an application framework that coincides with Node.js.

Angular – a web app framework that is also used in MEAN. Sometimes, React is used instead of Angular, and then it’s called the MERN stack.

Google recently announced that Angular will be winding down. It will get security and stability fixes for a while, but that’s about it. It will be killed off completely around mid-2021.

I remember people telling me that the MEAN stack would replace LAMP, but it seems like LAMP will outlive MEAN. But then again, there are other stacks people can use too.

Node.js – a web server platform, kind of like Apache, but the code you write for it is in JavaScript (yes, on a server, not just in your browser). When I first used Node.js, I thought it was very strange for JavaScript to be on the back-end. But it’s very popular, at least among developers who insist on using newer stuff.

React – a JavaScript user interface framework.

MEAN stack is new(ish), but that doesn’t mean everyone uses it. Looking up jobs where I live, there are hundreds more results for MySQL than MongoDB. And that’s just on a single job-related site (indeed.com). Sure, MEAN is cool and new. But so is VR. VR is new and high-tech, but how many people actually use it? Not that many. Many companies use older tech. How many people do you know that use all brand new iPhones, brand new iPads, brand new websites and apps, all the time? The reality of the matter is that people use tech and stick with it for a while. It never hurts to learn new stuff, but if all you knew was bleeding edge new cloud stuff, and no older yet widely used stuff, that would be bad.

Cool stuff in AWS

DynamoDB – Amazon’s noSQL database. Kind of like MongoDB. One benefit of noSQL is the ability to easily scale horizontally (meaning across many different database servers). But as much as cloud engineers want to push noSQL, a lot of the tech the world relies on still uses SQL.

AWS RDS – Amazon’s relational database. Kind of like MySQL.

S3 Buckets – Amazon’s simple storage service. If you make an app where people can upload photos, you might want to store them in an S3 bucket.

Amazon Glacier – slow, cheap, bulk storage (well, cheaper than other cloud stuff). Intended for backups, not storage that needs high performance.

Amazon Elastic Container Services (ECS) – a way to run containers in AWS. Containers are an alternative to virtual machines. In a VM, you need to care about the OS and security settings. In a container, you spend less time on infrastructure and more on your app.

Amazon Elastic Compute Cloud (EC2) – kind of like a VM, but billed by the hour, and is intended to be scalable/elastic, meaning you make more of them when needed (like if your website or app suddenly gets more visitors/users), then scales down when it doesn’t need the extra capacity.

Amazon Lightsail – AWS’s VPS offering. VPSes are very hands-off and go against what most cloud providers are all about. It’s not what AWS is most known for.

AWS Lambda – Amazon’s serverless offering. While Lightsail does almost nothing for you, Lambda is the opposite. They do all the configuration, server, and infrastructure stuff for you.

AWS Fargate – a way to run containers without as much management/configuration. An easier way to run containers.

Elastic Beanstalk – do you want to run some sort of back-end server platform like Apache/PHP, Node.js/JavaScript, Ruby/Rails, or Java/Tomcat? Do you want to focus on your app and not have to deal with server configuration? Then use Elastic Beanstalk. It’s basically managed hosting.

AWS Route 53 – Amazon’s domain registrar, similar to GoDaddy or Namecheap. If you already have domain names from elsewhere, there’s no need to transfer them to Route 53, even though Amazon certainly provides information on how to do it. They just want you to be all-in with their cloud ecosystem. But I’d say it’s best not to put all your eggs in one basket.

Elastic IP address – how you can get IP addresses for your resources in AWS. Amazon really likes to use “elastic” as a buzzword, meaning that something is flexible and can be added or removed at any time.

AWS Elastic Load Balancer – a way to do load balancing in AWS. Not something a solo developer/student will need, at least not when starting out.

Elasticache – like Memcached but for AWS. Speeds up database performance.

AWS CloudWatch – a way to monitor your servers in AWS.

CloudFront – AWS’s CDN, kind of like Cloudflare or Akamai.

AWS IAM – Identity and Access Management. If you have an AWS account that is used by a team or a company rather than an individual, you can make accounts for each person using IAM instead of giving everyone the Amazon account login info. If you’re a solo developer working on a personal project, you won’t really need to use this.

AWS regions – not all AWS servers are in the same place. It makes sense to use servers that are closest to your and your customers. If you have customers in many different areas, that’s where CloudFront can help.

AWS SDKs – Amazon offers Software Development Kits, or SDKs (specific to different technologies, such as PHP), so that you can better develop software that will run in AWS. There are many different SDKs, each for a different stack/platform, such as PHP/Apache, Node.js/JavaScript, and so on.

EC2 VM import/export – let’s say you want to have a testing environment instead of changing your production environment directly. You want to write code, test it, and make sure it’s ready in your test environment before it makes its way to your public server on the internet. AWS EC2 lets you import or expert instances. You can set up a VM on a local hypervisor, such as if you use an old desktop computer and run ESXi on it and have it on your LAN, and then you can take that VM image and put it in EC2. You can also migrate out of EC2 by having your EC2 instance and then putting it into your local/on-premises infrastructure too.

EBS – Elastic Block Storage. A way of storing stuff in AWS. You can use EBS with EC2.

Ports to allow in AWS – 80 for HTTP, 443 for HTTPS, and 22 for SSH. The fewer port you have open, the better. These are pretty standards ones though. But don’t get in the habit of opening too many ports on a server’s firewall. Every open port is a potential way to get hacked. Web servers need at least a couple open ports, as they have things called services which listen on ports, meaning there’s a process that is waiting for incoming requests from users who want to request pages from the server. But port and service scanning is useful for hackers because they can see what’s running on a server, and then see if there are any known vulnerabilities and exploits for it.

For example, if you’re running the Apache Tomcat, and you’re using an old version like 9.0.0, they can find out what’s running on your server by doing a service scan in nmap to find that out, then just search for that version on exploit-db.com, and find something like this: https://www.exploit-db.com/exploits/42966

Then they can just copy/paste and run an exploit to use against your server, with very little effort.

SSH – when you set up an EC2 instance, make sure you limit which IP addresses can remotely log into it with SSH, otherwise hackers might automatically be scanning AWS for insecure EC2 instances that they can attempt to log in to.

Let’s say your public IP address is 123.45.67.89. Sometimes, ISPs will give you temporarily/dynamically-leased IP addresses, so one day your IP might be 123.45.67.89, but the next day it might be 123.45.67.92 or something like that. Because of that, you could set a range of acceptable IP addresses to allow your EC2 instance to accept SSH logins from. A subnet mask of 255.255.255.0 aka /24 (CIDR prefix notation) will mean that any address from 123.45.67.* can attempt to log in. In AWS, 0.0.0.0/0 means accept all IP addresses. For traffic to your website, that’s fine. But for private assets or SSH logins, you don’t want just any old IP address to access it.

AWS API – using an API to deal with AWS instead of the point-and-click web interface (the AWS management console).

AWS management console vs. AWS CLI – AWS Management console is like a hypervisor page that lets you look at VMs and settings and whatnot. CLI is command line. If you’ve ever used virtual machine software like Virtualbox, Parallels, ESXi, or Proxmox, you’ll be vaguely familiar with some of things you can do in the management console. And the AWS CLI is command line, kind of like bash or PowerShell, but specific to Amazon stuff.

AWS Marketplace – some stuff is from third parties rather than Amazon themselves. Think of it like an app store but for full stack web developers. For example, if you don’t like an Amazon AMI for EC2, you can always use another one, like a Bitnami image. Of course, be careful when using third party stuff. Some of it is trustworthy, but some of it is not.

Who needs to care about AWS security?

Some people have this naïve attitude towards security: “Why do I have to care about security? I’m not doing anything wrong. Nobody would target me. I’m just some random person.”

But that’s wrong! There’s money to be made in hacking servers, computers, phones, and accounts. If people can make money from it, they will try to do it. Do they care whose AWS EC2 instance they hack? No, just that it IS an AWS instance. Do they care whose identity they steal to commit fraud? No, just as long as it IS an identity they can steal.

Budgeting and problems associated with AWS bills

You can set budgets in AWS and get alerts when you’ve reached a certain threshold.

If your Amazon account gets hacked, someone could use AWS EC2 to spin up cryptominers so that the hacker can make money, but it will rack up a huge bill for you! That’s not just some hypothetical thing. It made the news when that happened to Tesla’s AWS account, but it happens to smaller companies and individual developers too. Be sure to set up budget notifcations and also have good security for your account, like a strong password, no password reuse, 2-factor authentication, password manager, secure email associated with the account (so it can’t be recovered), and things like that.

If a link to your website goes viral on social media, it can either result in your server crashing due to being overwhelmed with traffic, or it can run your AWS bill up a whole lot, if your app or website is set up to scale up when there’s more need for additional resources.

AWS EC2 reserved instances can be cheaper than on-demand. Reserved instances are longer-term, but cheaper per unit of time. On-demand resources have shorter time spans and can be useful for short stuff or things that are “spun up” and removed often, but not for things that you want to be running 24/7. On-demand resources in AWS can be cheaper in the short term but more expensive in the long run. Want to mess with something for a day? Use on-demand. Want a web server that is on 24/7 for years so that people can see your contact info and resume? Use reserved instances.

Dropbox used to be primarily AWS-based, but then they decided to leave AWS and build their own infrastructure. While AWS can be convenient, and cheap for small or short-term stuff, it can be expensive for bigger companies. Not only that, but you lose the ability to control your infrastructure when you use something like AWS rather than building your own data centers.

That’s a lot of stuff! What should I start with in AWS?

If you’re feeling overwhelmed by the sheer number of services AWS offers, you’re not alone! They have a zillion things you can use, and it can sometimes be hard for a beginner to figure out where to start. That said, I didn’t even mention everything that AWS offers, simply because many of their other services aren’t useful for beginners.

Start with EC2 or Elastic Beanstalk, in addition to RDS, EBS, and/or S3. You can configure all this stuff using the AWS management console, which is a web page. Yes, you really make and delete virtual servers in a browser.

EC2 vs. Elastic Beanstalk

Do you want to have greater control over your server, but also greater responsibility to protect it from hackers? Then use EC2. If you’d rather have something a little less complicated, then use Elastic Beanstalk. RDS is for your database. EBS and S3 are storage options.

EBS and S3 are both for storage. What’s the difference?

EBS is block storage. S3 is object storage. EBS is the best for rapidly-changing data. But S3 is the best for something like user-uploaded files, such as if you make a site where users can make accounts and then post photos.

Why would I use RDS?

If you have an app where people can sign up or make posts, you’d want a database. RDS is a relational database offering.

What if you want a domain like example.com instead of some-server-123.amazonaws.com?

You’d need to use an AWS Elastic IP address and a domain registrar such as Router 53 or Namecheap. You’d need to update the DNS info so that the domain name points to the AWS instance.

APIs

Social media APIs – think of how often you use big social media platforms vs. random small websites/apps. Most people spend a majority of their time in just a few things – Gmail, Facebook, Twitter, Snapchat, Instagram, TikTok, Youtube, Twitch, Reddit, WhatsApp, LinkedIn, WeChat, VKontakte, and things like that. If you do use some website or app, it’s probably something you found on social media rather than on its own. So apps need social media integration. One example of this is Facebook Platform, which is Facebook’s set of tools for developers to make software that integrates with Facebook.

OAuth – people hate making accounts. If your app forces someone to make an account, they might not use it. But if you’ve ever seen something like “sign in with facebook” or “sign in with google,” that can make it easier for people to use your app or website. This uses something called OAuth, which is a way to authenticate users.

GraphQL – a query language for APIs. You might be familiar with the term MySQL, but keep in mind that SQL is a query language for databases. GraphQL is for APIs, not databases.

SOAP – an old type of API. You might see it here and there, but the future is not SOAP.

REST – a common type of API.

JSON – JSON is very popular, even if newer data exchange formats are coming out.

Postman – a tool for making your own APIs. You’ll often use other people’s APIs, but eventually you might need your own.

Mobile apps

Learn Swift and Kotlin if you want to make mobile apps. You should have an iOS version and Android version for your app. You’ll need Android Studio for Android apps, or Xcode for iOS. Xcode is only available for macOS.

Miscellaneous modern cool stuff

Travis CI – a continuous integration system. It makes it easier to automate things and deploy code faster.

Jenkins – an automation server for building, testing, and deploying software.

Chef – a configuration management tool that makes it easier to manage and deploy things. Are you noticing a trend here? Cloud is all about not having to do everything manually.

Puppet – automation/configuration software.

TensorFlow – a framework for machine learning. There are lots of false claims about AI and machine learning out there. But TensorFlow is real.

Webpack – kind of like a build tool, but for web development.

Gatsby – a static site generator which uses React and GraphQL.

On infrastructure, over-engineering, and scalability

Not too long ago, it seemed like developers didn’t really care that much about infrastructure, and solo apps would just use a simple Linux VPS or shared host. But nowadays, everyone’s gotta be an expert on Kubernetes/Docker/AWS/GKE/Firebase/Heroku/GraphQL/microservices/etc.

Modern concepts like IaaS, PaaS, autoconfigs for provisioning/deployment, YAML, elasticity/scalability, containers and orchestration, loose coupling, etc. seem cool, but I also can’t help but feel like it’s a lot of overengineering for things that don’t need all this complexity.

I thought developers were supposed to concentrate on more highly-abstracted concepts, like the app they’re working on. Everyone being into infrastructure and DevOps is like if someone with an MBA suddenly took an interest in the plumbing and wiring of their company’s building.

As important as infrastructure is, it merely facilitates higher-level objectives. But if all you do is obsess over infrastructure, then you’re not really doing that much. It’s a distraction from what you’re really trying to achieve.

Not only that, but the more complex your toolchain/stack/pipeline/infrastructure is, the harder it is to be secure. To secure something, you really need to know the ins and outs of it. And if you’re using a billion different complicated things together, it’s harder to keep track of how to configure it properly and monitor it to make sure it’s secure.

And many people use the argument of “but what if your app grows?” or “what if your site goes viral on social media?” to justify extreme over-engineering of scalability. But to put things in perspective, Stack Overflow, which is the most popular programming site in the world and the 34th most popular site on the internet, uses just 11 dedicated web servers, for its millions of users. They also have 4 database servers, 2 Redis servers (for database caching, I guess), 3 “tag engine” servers, 4 load balancer servers, and use Cloudflare for CDN caching of static assets. So, yeah, that’s a lot of servers, but considering that it gets way more traffic than a vast majority of apps and websites, it’s not that much at all. Your app or website certainly won’t need that many resources. Sure, their servers are pretty fast, and Stack Overflow is a mostly text-centric site, as opposed to something like Youtube, which uses video, and would have increased performance requirements. But the point is that you don’t need your app to be infinitely scalable.

So while it’s good to know about modern cloud stuff, you don’t have to obsess over it.

← Previous | Next →

Advanced/Miscellaneous Topic List

Topic List

Leave a Reply

Your email address will not be published. Required fields are marked *