- Frontend – what the end user sees, such as on a website that they load in their web browser.
- Backend – the logic that drives the frontend, but is on a server somewhere in a data center or cloud provider like Amazon Web Services. Quite often, what will happen is that the backend will take data from a database and combine it with a template to show the user a full page, like if you look up products on Amazon and sort them a certain way. The backend can also be in charge of things like monitoring, logins, sessions, and things of that nature. What happens on the back-end is sometimes referred to as “business logic.”
- Full stack – all of the software that is required for something. It is usually used in the context of web development. To say you’re a full-stack developer means you can do a lot. It’s a good skill set to have and can lead to very lucrative careers. But you need to know frontend development such as HTML5, CSS3, and JS. Then you also need to know API querying, perhaps like GraphQL, or at least JSON/REST and maybe SOAP. You also need to know backend server stuff like Django, Rails, Apache/PHP, or Node, along with database tech such as MySQL, PostgreSQL, MongoDB, MariaDB, or SQLite. You don’t need to know all of them, just one frontend framework, one server platform, and one database management system, along with SQL.
are many different stacks. For example, you might be a MEAN stack
(MongoDB, Express, Angular, and Node) developer, but someone else
might be a LAMP stack (Linux, Apache, MySQL, PHP) developer.
- Website vs. web app – a website can be a mere static document or blog. That’s just information, not interactive software. Sure, you might be able to log in and post a comment, but that’s the extent of it. A web app, by contrast, is software that is just built on web technology and accessible in a browser. They are more complex and capable. If you tell someone you’re a web developer, they might get the wrong idea and think you only make simple websites rather than web apps. But there is a big difference.
- You can be a web designer when you only know frontend tech, but full-stack web developers who make web apps need to know about security, infrastructure, backend, server platforms, and databases.
- Model-view-controller architecture – also called MVC. For a full stack web app, you will be separating the layout, data, and interactivity into the three separate parts. This is called separation of concerns. This is an extreme oversimplification of it, but it can be hard to understand the specific details if you don’t know the basic idea first.
- URI – Uniform Resource Identifier. It’s a way to identify a resource. There are different types of URIs, but URL is the most common one.
- URL – Uniform Resource Locator, commonly known as a URL or link. All URLs are also URIs, but not all URIs are URLs.
- Here is an example of a URL:
- Link rot – when you have certain pages on your website that no longer exist after you update the site, leading some visitors who click on links to see 404 not found pages instead of whatever it was they were trying to see.
- Protocol – a standard way for network communication to happen. The most well-known protocols you’ll use in a web browser are HTTP and HTTPS. They are delimited by ://, such as http:// or https://. HTTP is HyperText Transport Protocol and HTTPS is HyperText Transport Protocol Secure. Another protocol you can use in a browser is FTP, which means File Transfer Protocol. More secure alternatives to FTP include SFTP and SCP, which stand for SSH File Transfer Protocol and Secure Copy respectively.
- Domain name – if you want a website, you will need to get a domain name – unless you use a free host, like something.wordpress.com, or something.github.io. But not having your own domain name for your site will look bad. Examples of domain names include google.com, wikipedia.org, comcast.net, and theregister.co.uk. A Domain Name System (DNS) record consists of a domain name and an IP address that it points to, kind of like a phone book with names and phone numbers. When you go to a website, your device first asks a DNS server where that domain name is located, using something called a DNS query. The DNS server responds with the IP address that corresponds to the domain name you sent in the query. This all happens behind the scenes.
- Subdomain – Subdomains are a sub-portion of a website.
- Some examples of subdomains:
https://www.google.com/ the subdomain is www
https://en.wiktionary.org/ the subdomain is en
https://bb.siue.edu/ the subdomain is bb
- Think of it this way: you can have a website that is used for many different things. For each specific thing, you can make a separate subdomain, in order to organize it better.
- Let’s look at siue.edu as an example:
- cs.siue.edu is the CS department’s site
- www.siue.edu is the main website
- office365.siue.edu is the Office 365 site (Microsoft Office, Outlook, etc.)
- webprint.siue.edu is the site for web-based printing
- bb.siue.edu is the site for Blackboard, which is software for classes
- Some sites use subdomains for their users. Tumblr and GitHub are examples of this. If you make a Tumblr account, it will be viewable at your_name_here.tumblr.com. If you make a GitHub Pages repository, people can view it at your_name_here.github.io. However, other sites like Twitter and Facebook use URL routes rather than subdomains, like facebook.com/username or twitter.com/username. URL routing (not to be confused with TCP/IP routing, which is separate) is when you use server back-end software to create a specific URL for a file. For instance, maybe you have a contact page at example.com/html/contact.html. But you’d rather clean it up and make a URL route so that you can access the same page by going to example.com/contact, which is easier for people to remember.
- Top-level domain – here are some examples of top-level domains, or TLDs:
- Nowadays, there are newer TLDs that are mostly entire words, such as the following:
- Some TLDs, like .gov(ernment) and .edu(cation), can only be reserved by specific organizations. And some (but not all) country code TLDs (ccTLDs), such as .us for the United States, require proof of residency to register. But some ccTLDs are open to the public. For example, many tech startups use the ccTLD .io. You might think it stands for input/output, but .io is actually the country code TLD for the British Indian Ocean Territory. Almost none of the .io domain names are registered by people who live there.
- I recommend sticking with .com. People aren’t used to sites ending in different things, so if you network with people and mention your site, people will have a hard time remembering some uncommon TLD. If it’s awkward to say out loud or to explain to someone, don’t use it. Of course, one problem with .com is that tons of good domains are already taken. So you’ll have to be creative and combine 3 words to find something that isn’t registered already. You could also optionally register two domain names, and have one redirect to the other. So if you like the .io TLD, you might register that, but also register the .com one.
- TLDs are separate. If someone owns* example.com, they might not also own example.net or example.xyz. Similarly, if you want to register a site called something.com, but that’s taken, you might look and see that something.network or something.global isn’t registered. The idea behind adding new TLDs is that it allows people to get more unique domain names instead of adding more words or numbers or something. But at the same time, you’re doing yourself a disservice if you use a TLD nobody is aware of.
- *Supposed ownership of domains is a tricky issue. If you forget to keep your domain registered, which is usually billed annually, someone else can register it. The moment your domain name expires, it’s fair game for anyone. So be sure to pay your registrar bills, or even set it up for autopay. But if your credit card on file with the registrar expires, and you forget about it, you could lose your domain name, sometimes after a short grace period. Someone else could then re-register the domain and they might even put it on the domain market to sell it. They might ask you for an exorbitant amount of money in return for transferring the domain name back to you, when domain names really cost next to nothing to register.
I once made a tech support website for my freelance computer repair services. Over time, I used it less and less, so I eventually decided to pull the plug on it and I intentionally let the domain name expire. Years later, I checked up on it and someone else registered it and had a similar tech support site. That’s a good example of what can happen. But sometimes, a domain squatter will register a domain, and when you try to visit the site, it’s just a landing page telling you that you can buy the domain name. Or, even worse, it might be used for bad things, like malware, bootleg pharmacies, and other unsavory content.
- There are actually laws against domain squatting, which is the act of registering a domain name with someone else’s copyright or intellectual property in order to make money through selling it or just getting more traffic to your website. For example, you can’t just register a domain name with the name of a popular movie or company name that isn’t yours.
- Aside from domain squatting, another issue with domain names is typosquatting. Typosquatting is when someone registers a domain name that is a common typo for a well-known site. Perhaps, instead of typing example.com, you typed exampl.com. Typosquatting domains are mostly used for scams and malware.
Path – in https://example.com/something/whatever.html, /something/whatever.html is the path.
Query string – in https://www.youtube.com/watch?v=erNuvARUji0, ?v=erNuvARUji0 is the query string. In another example, https://www.google.com/search?q=bliss&tbm=isch, there are multiple parameters in the query string, which are separated by an ampersand. The query string is ?q=bliss&tbm=isch.
CGI – CGI means Common Gateway Interface. It’s kind of old-school. A web page running a cgi script can mean that a user can submit a form or click a button and then the server will run an executable. A CGI script can be any executable in any language, running on a web server, but not in a user’s browser. If you see .cgi in a url, it means it’s running older software and when someone goes to the .cgi url, it’s running some sort of script server-side. You can use CGI in Apache, but again, it’s kind of old.
- HTTP vs. HTTPS – HTTP is not secure. HTTPS is. If you go to a site and it doesn’t have a protocol listed in front of the address, it means it’s using HTTP. Back in the day, you’d always see http://example.com, but nowadays, browsers have hidden it. But HTTPS is always shown, at least for now.
- When you go to a site, and it says https://example.com, that means it’s using an SSL or TLS certificate, which means the site is using encryption for network traffic and a certificate authority, or CA, has verified it. SSL stands for Secure Socket Layer and TLS stands for Transport Layer Security. A site using SSL means you can be pretty sure it’s the real deal. If you go to a website and you get a certificate error message, don’t just click through warnings. It could be an issue with malware trying to redirect you to a fake version of a website. It’s possible to be sent to a fake version of a website, with things like a malicious DNS server, your router getting hacked and using a bad DNS server or man-in-the-browser (MITB) attacks.
- HTTP traffic can be viewed by anyone in between you and the destination server, such as people on your LAN or intermediary hop routers, such as those from internet service providers. HTTP is like a postcard because postal workers can read it. HTTPS is like a letter within an envelope, which is more private. The person who delivers your mail can’t see the contents in it.
Using plain old HTTP is really bad if you’re on a public network, such as the wifi at a hotel, coffee shop, or library. Not only can malicious attackers on these kinds of networks see what you’re doing, but it might even be possible for them to modify the contents of what you get, because HTTP doesn’t verify things in the same way HTTPS does.
One security tool that demonstrated the lack of security or privacy in HTTP was Firesheep, though there are other similar tools too.
I use SSL for all of my websites. Sometimes, installing an SSL certificate for a website can be confusing or time-consuming. And if you let your SSL certificate expire, like I did once, then someone who tries to visit your website will get a huge warning from the browser, saying something like “Warning: Potential Security Risk Ahead” in Firefox or “Your connection is not private” in Chrome. This scary warning might make people think the site is hacked or something, but it can also just mean that the SSL certificate (for HTTPS) is either expired or configured incorrectly. If you use HTTPS for your site, make sure that you don’t let the certificate expire.
- Domain registrar – a company that lets you pay to register domain names. GoDaddy and Namecheap are examples of domain registrars. Domain names are usually about $10 per year, which isn’t much. But keep in mind that a domain name by itself isn’t enough to have a website. You also need hosting. I don’t recommend getting your hosting from a domain registrar because they usually have poor pricing or performance. Instead, you can have your domain name point to a different server, such as from A2 Hosting or Amazon Web Services. This is achieved through DNS records.
- DNS record – a combination of an IP address and a domain name and possibly a subdomain. If you get a web server with a certain IP address, you will have to update your DNS records to have your domain name point to that IP.
- Embed – You can put something on a web page without hosting it on your site. For example, you can embed a Youtube video on your website without hosting it yourself. Embedding means putting something in something else. One way to embed something is through <iframe> or <embed> tags. <iframe> is older and some people dislike it. <embed> is a newer HTML5 tag. Aside from big sites like Youtube that encourage sharing and embedding, don’t use content hosted on other sites unless you’re allowed to. This is because every time someone visits your site with content hosted elsewhere, it uses their server resources to send the data to a viewer. This is called hotlinking, and it’s not a good thing to do.
A better alternative to hotlinking is to save an image (or other kind of file) and then host it on your own server. But be careful about copyright, depending on what the file is.
- Cross-site request or cross-origin resource sharing (CORS) – If a website features content from other websites, such as Youtube, Google Analytics, Bootstrap, or Imgur, it will make cross-site requests to achieve this. Although cross-site request forgery is a type of malicious attack, not all cross-site requests are bad.
- Browser cache – instead of downloading everything again when you refresh a page, some assets are temporarily saved to your hard drive or solid state drive. That way, if a website has multiple pages with shared assets, such as the same CSS, favicon, or banner image, then you don’t need to re-download them every time.
- If a website looks broken, you can try clearing your browser’s cache to fix it. This used to be a more common problem than it is now, though.
- Analytics and tracking – websites have all sorts of different ways to track users. This is done in order to know their audience better. Things like browser, OS, resolution, time visited, number of pages visited, and things like that can be tracked very easily without any additional software aside from what comes with a web hosting package, but if you want more in-depth analytics about your site’s users, you can use something called Google Analytics.
- Aside from Google Analytics, another analytics tool I’ve used is AWStats. It has no relation to Amazon AWS even though it sounds similar. It’s a server-side, self-hosted tool on web servers, such as from a web host, that will give you information about traffic on your site, such as views, IP addresses, and things like that. It’s admittedly a little old and limited in functionality. I’ve used it to see how many views I get per month, what countries my visitors are from, what devices my sites are being seen on, and things like that. Google Analytics is more robust, but some people distrust Google and prefer to host their own analytics tools so their data isn’t going to any third party. However, there are many other analytics tools out there, so do your research and pick the one that’s right for you.
Skeleton screen – a website that has rectangles instead of text, loading animations, etc. to indicate that there’s more stuff loading, even though the page hasn’t finished loading yet. It looks unfinished, but it looks better than a completely blank screen. It’s also called “progressive loading” because it loads the page partially more quickly, rather than taking longer to load everything all at once. Some people criticize skeleton screens because you’re making the site load code to load a loading screen before loading the actual data for the site. It’s also an admission that the site doesn’t perform very well. If you put so much effort into loading screens, maybe your effort would be better spent in trying to make the site load faster to begin with.
Here is a video recommendation skeleton from Youtube:
The shapes indicate where the video thumbnail, title, stats, and uploader’s profile pic will be once they load. You will usually see a skeleton for anywhere from a fraction of a second to a few seconds.
robots.txt – by default, search engines will use automatic “crawler” software to download web pages and then put them into search results. But if you don’t want your website to be indexed by search engines, you can use a robots.txt file, like example.com/robots.txt. A search engine “robot” will read the robots.txt file and ignore the pages you tell it to not put in its search results. You can tell search engines to ignore your entire site or just certain parts of it.
- One problem with robots.txt is that hackers might look at it to see your hidden or valuable assets on your site that you don’t want in search results. That makes it easy for them to find things like login pages, accidentally-public things (that don’t require a login but really should), and private APIs. In other words, a robots.txt file can help a hacker figure out what they should look for on your site. One solution for this is to restrict access to specific directories or files with the use of permissions. Then, if someone who isn’t logged in goes to a private page, they will get a 403 Forbidden error page instead of seeing something private. Some sites will even give a 404 Not Found if you try to access something private because they don’t want to hint at the existence of it, which is still a kind of privacy leak.
- Google AdSense – an easy way to put advertisements on a website and make ad revenue.
- Managed/shared hosting – if you want a website up and running with minimal effort, and you don’t want to deal with doing all the lower-level OS, networking, and security stuff, just get managed hosting. For your first personal portfolio website, a managed host should be more than enough, because you can upload HTML, CSS, and JS files, which is what you should start with instead of trying to make something more complex (such as a “full stack” project). The hosting company will also typically let you create databases or even use PHP or Python, and sometimes even Node.js, though your storage and RAM and CPU usage will be quite limited (unless you get a more expensive plan). Managed hosts often use cPanel and Softaculous. Super slow shared hosting can cost only about $3/month, but decent shared hosting will run you about $10/month.
- My first websites used super cheap, bottom-of-the-barrel shared hosting, but even though it’s low cost, you don’t get much for it. It also makes you look bad if your site takes too long to load. People who visit your site will think less of you and whatever it is your site is about.
Modern equivalents to managed hosting include Platform as a Service (PaaS), Infrastructure as a Service (IaaS), and cloud providers like Microsoft Azure, Amazon Web Services, Google Compute Cloud, DigitalOcean, and Heroku. They often have additional features that make them more compelling than managed hosting, but they also have added complexity, which can sometimes make them harder for beginners to use.
- VPS – Virtual Private Server. Unlike managed hosting, a VPS is more hands-off. A VPS is just an internet-facing virtual machine running on a tech company’s hypervisor that they let you use. VPSes are not dedicated servers, as there will be multiple VPSes on a single physical server, though there is logical separation using software (for privacy and security), so one VPS customer can’t see what’s running on another customer’s VPS even if they’re on the same physical server. VPSes, despite not being complete dedicated servers, are still usually more powerful than typical shared web hosting. The performance might be better in some cases, but due to the hands-off nature of a VPS, the company you get the VPS from won’t do much of anything for security. So if you want to use a VPS, you’d better know how to secure a Linux server by yourself. If that sounds too complicated for you, go with something easier like managed web hosting.
- A VPS can cost $30+ per month. It depends on the specs, such as CPU, RAM, storage, network speed, and monthly bandwidth caps. Location of the server can also be a factor in the price. VPS storage isn’t just measured in gigabytes. It also matters if it has hard drive or SSD storage. Hard drives are very slow compared to SSDs, which is why they’re cheaper.
- Dedicated server – a dedicated server in a data center means you are paying to use the entire server, and there are no other data center customers who are using it. It’s faster and possibly more secure than a VPS or shared host because there are sometimes security issues called “VM escape” vulnerabilities, where an attack on one virtual machine can break out of it and either use the underlying hypervisor or access other virtual machines. However, dedicated servers can be costly, and VM escape or VM breakout vulnerabilities are not very common. Additionally, any competent server host will patch their hypervisors quickly.
- The cost of a a server, shared or dedicated, is more than the hardware. It’s also the rent or property tax for the data center, insurance, physical security like guards and security cameras, air conditioning for the server room, server rack space, electricity, marketing, backup power generators, IT services to make sure the server is running (and possibly also network or hypervisor security services), possibly an SOC (Security Operations Center), customer support, future investments to expand their capabilities, and more.
- You can buy a used Dell PowerEdge R710 server with 12 cores, 24GB RAM, and 600GB of disk space for less than $200 on ebay, but it doesn’t offer the same things mentioned in the previous paragraph. Self-hosting might seem cheaper, but it’s inferior in many different ways. Having your own server can be useful for learning purposes or testing out software locally before pushing to an internet-facing server, but for the sake of having a website or app, self-hosting really isn’t viable.
One drawback of a Dell PowerEdge server, or any rackmount server, is the noise. Rackmount servers are very loud because they are very compact but need a lot of cooling, so they have really hardcore fans that sound almost like a jet engine. One alternative I recommend is to use a typical desktop computer as a home server because it will be much quieter. If you have a basement or spare room, you might be okay with something like a Dell PowerEdge. But if not, like if you want to put a server in your living room or bedroom, then buy or build a regular desktop computer.
Another drawback is the heat. If you build a server rack in your home like I did, having the servers all turned on will make your home/apartment very hot, especially in the summer. In the summer, I leave most of my servers off because of this. Even if the computers can stand the heat, I can’t.
Having your own servers will also run your electric bill up pretty quickly. It’s worth noting that some older hardware might be adequately fast, but it won’t be as energy-efficient as newer servers. To calculate the cost of a server over time, you need to include not only the hardware, but also the power required to keep it on. A “cheaper” server that uses a lot of power can end up costing you more than a more expensive power-efficient one.
- A dedicated server from a server host can run you $100+ per month for something entry-level, and way more than that for beefier hardware.
- Another related concept is server colocation. When you rent a dedicated server, it’s someone else’s hardware and you don’t own it. But server colocation is a bring-your-own-server option. You buy and set up your own server with its own software on it, and all you’re paying for is to have it put into the data center to keep it up and running. It’s not cheap though.
- SLA – Service Level Agreement. When you pay for a server, there might be an SLA with guarantees about uptime or security. If the SLA is breached, you might be entitled to compensation.
- In most things in life, 99% sounds like a lot. But in the world of server uptime, if a server is up 99% of the time, that means it’s offline 1% of the time, which is over 3 days per year, which is terrible. Some SLAs might have something like 99.999% guarantees. That’s referred to as “five nines.” But don’t expect something like that for more budget-friendly hosting.
- Server migration – moving your website or app from one server to another. It can be difficult and time-consuming. Always do backups before attempting a server migration. If you have many users for your app, it can be useful to figure out a time when the lowest number of users will be online, to impact as few people as possible. You might even let people know about it through social media or a small notification on the site or app. Server migration can be complicated and annoying.
- If we’re only talking about simple web hosts, some hosting companies will offer server migration, sometimes for free — especially if you’re migrating from one hosting company to another. The new hosting company will offer free server migration as a way to get a new customer.
- Favicon – on a computer, a favicon is the little icon in a browser tab for a website. Or on mobile, it can be a favorite icon, like if you make a home screen shortcut to a website (rather than an app).
LAMP – a web development stack. It stands for Linux, Apache, MySQL, and PHP. Linux is the OS (technically Linux is a kernel, but you know what I mean), Apache is the web server software, MySQL is the database software, and PHP is the programming language you use for the server development.
Sometimes there are variations like LAMPPP, which means Linux, Apache, MySQL, PHP, Perl, and Python. Other variations of the stack include WAMP, MAMP, and XAMPP. WAMP uses Windows, Apache, MySQL, and PHP. MAMP is the Mac version. WAMP and MAMP are usually only suitable for development machines for developers or education, not production servers, which will almost always be running Linux. XAMPP means Cross-Platform, Apache, MySQL, PHP, and Perl.
LAMP is widely used, but it’s not exactly a trendy or cool stack anymore. Many people dislike it. There are lots of jobs that involve LAMP stack development. Some speculate that PHP is on its way out, and that relational databases are inferior to newer NoSQL databases like MongoDB, Apache Cassandra, Amazon DynamoDB, or Google Firebase Firestore. But regardless of what you think of it, the fact of the matter is that lots of things use LAMP and there are lots of jobs that involve it.
C10K problem – servers have trouble trying to support more than 10,000 clients at a time. People want to handle more than that, so it’s called a problem because of issues of scalability. Some proposed solutions include load balancing, caching, and CDNs.
Horizontal scaling – having multiple servers that do the same thing, such as web servers that are all used for the same site, and it has a load balancer to split up the traffic. They don’t need to be super amazing individually, because they have strength in numbers, which contributes to their overall performance. When you scale horizontally, you are adding more servers.
Vertical scaling – having one really fast, decked out server. When you scale vertically, you are upgrading a single server to be faster, like a new processor, more RAM, more storage, or a faster network connection. Horizontal scaling is better, but there might be some cases where you’re not able to scale horizontally due to software limitations.
Cookies – temporary files stored in a browser that are associated with a particular website. They can be used for active logins, web-based games, advertisement tracking, and more. If cookies get cleared, then they’re gone. Instead of storing things locally in someone’s browser cookies, you might prefer to create a system involving accounts and a database that stores the data in a more reliably-persistent way. But for more trivial things, cookies are fine.
You can view the cookies for the website you’re on by clicking on the lock icon. On desktop versions of Chrome and Firefox, it will show you the cookies that the site you’re on sets.
It’s possible to set up Firefox so that it deletes all cookies, temporarily cached files, and even browser history when you close the browser. This way, the only persistent stuff in your browser is bookmarks. But not everyone wants to do this because it means you’ll have to log back into sites again. While a password manager might make this slightly easier, most people prefer to stay logged in for long periods.
JWT (JSON Web Token) – an alternative to a cookie that gets a similar result. It can be used for login sessions.
HTTP requests – HTTP consists of requests and responses. Clients will make requests to a server, such as when you click on a link in a web browser. The server then sends a response to the client. The most well-known response is 404, which means not found. Some important requests to know about include GET and POST. GET means you are asking to get something from the web server. POST means you are sending something, such as a comment or photo.
Session – on the web, the HTTP protocol is used. HTTP is stateless, so it doesn’t keep track of things, such as a user browsing their site, logging in, and things like that. So a server’s back-end code will need to keep track of sessions. A session is the time a user spends on a site. PHP supports sessions, and so do other server-side languages/platforms.
Site map – a file, often XML, that lists all of the pages in a website. Among other things, a site map can make it easier for a search engine to find all the pages on the site, which is good if you want your website to appear in search results on google (or other search engines).
Bounce rate – the percent of people who leave your site after visiting just one page. If someone isn’t impressed, or they don’t like how slow it is, they will just give up. A site not rendering properly and looking broken will also lead to a high bounce rate. You can track bounce rate with analytics tools.
OAuth – have you ever been to a website and it said you could sign in using Google or Facebook instead of making an account for it? OAuth handles that. OAuth does not give that site your login information. Rather, it authorizes them so you can log in that way. It’s a way to make it easier for people to use a site or app without making an account. Nobody likes making an account. If it takes too long to sign up for a site or service, some people will just give up.
Advanced browser settings – if you want to change more in-depth options for your browser, go to about:config in Firefox or chrome://flags/ in Chrome. However, be aware that it’s possible to mess things up this way.
- PHP – a language used for backend web development. Some well-known uses of PHP include Facebook and WordPress. PHP has been criticized for its security issues, but regardless, people still use it anyway. There are some specific categories of security vulnerabilities associated with PHP, such as remote file inclusion, local file inclusion, and file upload vulnerabilities. A web shell is a kind of hacking tool used against insecure sites running PHP.
- There is significant fragmentation in PHP, reminiscent of Android or Python 2.7 vs. 3.X. Some legacy stuff refuses to be updated, despite all the warnings about potential hacking when the security support is discontinued. It’s weird to think that you can get hacked because your programming language has vulnerabilities rather than your code within it, but that’s the case here. It’s also the case for old versions of Python that use old urllib stuff. Anything over a network needs updates and support.
- When Python made significant changes from 2 to 3, you needed to make some changes to support the newer version, or so I’ve heard. I only started using Python 3, so I never dealt with whatever differences were in Python 2. But many people are naturally change-averse and didn’t want to embrace the new version and its syntactic sugar and whatnot. But over time, it made sense to jump ship, even if you weren’t the biggest fan. Differences in syntax are one thing, but security is always essential, and that will take precedence over pretty much everything else.
- I personally like PHP 7, which is the latest version of PHP. A lot of good changes have been made and it’s better than older versions.
- Flask – a more minimal Python web server.
- IIS – Microsoft’s web server software.
- Apache – a free and open source web server application typically used in what is called the LAMP stack. Some people might say that the adoption of Linux in the old days was in part because of Apache.
- Nginx – another web server program.
- Jekyll – a static website generator written in Ruby.
- Jekyll is an appealing static page generator project, though not suitable for all types of websites due to its static nature. But it’s interesting in the sense that it reduces attack surface (at least compared to traditional content management systems) because the software runs client-side and merely generates static pages for you to commit to a git repo, which can then correspond to a website (if you’re using something like Github Pages, which I am in this case).
- Jekyll posts are made with Markdown, which is also used for readmes on GitHub, among other things. It’s straightforward, though Jekyll’s config files are written in YAML. There are so many different markup languages that it’s hard to keep track of it all, but it’s good to learn new things.
- Jekyll is Ruby-based, and I’m honestly not a big fan of Ruby. If I had to choose a scripting language, I’d go with Python instead. I also noticed that setting up Jekyll’s many dependencies can be an annoying process on Windows, though macOS comes with a lot of programming-related things already installed, which simplifies the process significantly. MinGW is very unintuitive. But even so, I’ve set up Jekyll on both my Windows desktop and my MacBook Air. It’s good to get used to different OSes, IDEs, and other tools instead of getting into a comfort zone where you’re only good with one particular set of tools.
- One thing you might want to add to a Jekyll-based site is Disqus comments. That’s one way to add interactivity to an otherwise static website. Alternatively, you can always use WordPress, but there are lots of security issues with it. Of course, they do get patched, but there’s a lot of WordPress hacking that goes on. There is even a security tool called WPscan that is used for scanning Wordpress sites to find security issues with them. Hackers and pen testers will use scanning tools like this and then use security exploits for known security vulnerabilities, which will be a problem on your website if you use WordPress on it but don’t install software updates.
- CMS – A content management system. It makes it easier to make a web app or website without doing all of the coding from scratch. You can do some coding in a CMS to customize it. For example, in Wordpress, you can either install or create a PHP plugin for it to extend its capabilities. Many websites are made with CMSes these days because there’s no sense in reinventing the wheel. For some applications, you want to do your own full stack app just using a server platform and not a CMS, but for many people, especially those who aren’t coders, using a CMS will be your best bet.
- cPanel – a control panel for web servers that you view in a web browser after logging in. You will be able to view information about traffic, resource usage (disk, RAM, and CPU), log in to webmail, change error pages for things like 404s, update software, change the version of PHP your server has installed, set up add-on domains or SSL, and things like that. Basically a Swiss army knife for web hosting. One web host I use features cPanel on a LAMP server running a Linux distro called CloudLinux.
- Softaculous – a web-based way of quickly installing CMSes on a website. If you pay for a shared web host, it will usually have Softaculous so that you can easily install things such as WordPress to get your site up and running very quickly without any advanced manual configuration or installation.
- Drupal – a content management system.
- Joomla – also a content management system. The only time I’ve ever used Joomla was during a security lab from Virtual Hacking Labs, where I uploaded a PHP web shell to a site running Joomla (it had a file upload vulnerability) and then used it to spawn a reverse shell, where I then used a privilege escalation exploit from exploit-db.com to get root on the server. Keep in mind that this was all in a VM lab for an online course, not someone’s actual website.
- WordPress – the most popular content management system. If you’d rather be a freelance web developer rather than working 9 to 5, you’ll eventually run into clients who want a WordPress site updated, or want a simple site with features that will be easy to set up in Wordpress.
- Even if WordPress itself doesn’t have what you want built-in, there are tons of add-ons you can install to extend WordPress’s functionality. Some examples include a GDPR compliance banner, captcha for logins and comments, custom themes, layouts, contact forms, newsletter add-ons, paywall add-ons, or e-commerce capabilities. However, while WordPress’s add-ons might be convenient, they often introduce new security holes in your website. Most of the time, they’re accidental. But sometimes, add-ons are not trustworthy and might be doing something malicious in the background. Be careful about what you install.
- Wordfence – security software for WordPress. It’s one of the ways (but not the only way) I keep my WordPress websites secure. If you have a website, know that people will attempt to hack your site – if, for nothing else, to grow the capabilities of botnets to launch DDoS attacks or distribute malware. You need to take at least necessary security precautions. Wordfence gives you firewall and exploit protection options, as well as 2-factor authentication for your WordPress login, rate limiting for incorrect login attempts, logging and monitoring, and more. You can do things like block specific IP address ranges or countries. There’s a lot that it can do for a WordPress site, though there are other alternatives to using WordPress. I don’t recommend using WordPress with no security software or updates. They are both crucial.
- Adobe XD – a program for front-end UX design. If you’re less interested in coding and more interested in graphic design and what an app looks like, then UX design might be for you. However, it’s much less technical and more artsy than software development, and as such, is not the focus of this book.
Wireframe – prototypes and flowcharts, visually making a fake/mockup version of what an app is supposed to be like. It’s like the GUI equivalent to pseudocode.
- ModSecurity – a web application firewall to try and protect your web app or website from hacking. Not perfect, but better than nothing. I use it for my websites. Keep in mind that a web application firewall, or WAF, is not the same thing as a network firewall, such as pfSense, iptables, Fortigate, etc.
- Vulnerability – a security problem in code. You will have to be aware of security when you make a website or web app. Anything on a server publicly available on the internet needs to be adequately secured. If you’re using outdated software, it might have known vulnerabilities, which is why software updates are critical. Furthermore, if you do some full-stack development, the code you write might also be insecure. For example, if you make a discussion board CMS, you might have to worry about things like cross-site request forgery, file inclusion, SQL injection, cross-site scripting, and things of that nature. No code is completely unassailable, but some things are less secure than others. If you get hacked, it’ll tarnish your reputation, so at least make sure there’s no obvious low-hanging fruit.
Bug bounty – if you find a security issue in someone else’s code (or just a configuration of a program), they might give you money for notifying them about it. This is done through something called a bug bounty program. Not every company has one though.
If you complete a bug bounty for a company, such as for a cross-site request forgery (CSRF) vulnerability, one stipulation of the payout might be that you have to keep quiet about the security vulnerability for a certain amount of time in order to allow them to patch it without incident. This is called responsible disclosure. A blithe attitude towards vulnerability disclosure won’t help you build any bridges in information security.
If you get into security research, there’s the right way to do it, involving VM labs on your computer or network that you own and are allowed to mess with, and the verboten way of messing with live production servers – which some do, but it can get you in a lot of legal trouble, so I can’t recommend that. Don’t ever mess with anyone’s actual machines.
There are companies like BugCrowd and HackerOne that act as liaisons between companies with potentially insecure software, and the researchers who find security flaws in them.
- Zero-day – a hitherto unknown security vulnerability. You have zero days to prepare for it. But to assuage any concerns regarding zero days, they usually get patched very quickly after coming to light, and zero days are much rarer than low-hanging fruit attack vectors, like phishing, weak passwords, or CVEs. There’s also additional security software people can utilize to mitigate unpatched security exploits.
- Open Graph protocol – Facebook created something that allows you to add images and text to links, as a sort of preview before you click on something. This was made by Facebook but is now used by pretty much all major social media platforms. If you write a website from scratch, it’s worth looking into Open Graph. If you’re using a CMS or site-building tool, it might do it for you.
Here is an example of Open Graph <meta> tags from one of my websites:
<!–open graph protocol–>
<meta property=”og:type” content=”article” />
<meta property=”og:title” content=”Search” />
<meta property=”og:url” content=”https://saintlouissoftware.com/search.html” />
<meta property=”og:image” content=”https://saintlouissoftware.com/images/sls_og2.png” />
<meta property=”og:description” content=”Find content on SaintLouisSoftware.com” />
<meta property=”og:site_name” content=”Saint Louis Software”/>
- Single-page application – a web app that doesn’t require going to different pages. An SPA might use Ajax and a frontend framework. Many things can happen in it, but it’s all on a single page. Fetching new content with Ajax looks better than refreshing a page or clicking and going to a different page, which can leave you looking at a blank screen for a couple of seconds. A single-page application leaves the basic page layout on the screen most or all of the time, and then fetches the new content when it needs to, which looks like a much more seamless experience. SPAs are more complicated to do though. A related concept is PWA, which stands for Progressive Web App.
RTSP – Real-Time Streaming Protocol. A way to livestream stuff.
WebSockets – a way to establish a persistent connection, rather than typical stateless HTTP stuff.
WebRTC – enables real-time communication for web development.
- Infinite scrolling – instead of pagination, lots of modern stuff now uses infinite scroll, using Ajax to load additional content when you scroll down. It’s refreshing and modern in some ways, but the downside is that it can take forever to find old content, like if you’re scrolling through messages. The old style of doing things was with pages. Maybe there’s 100 messages per page, or 10 articles per page. Nowadays, you see less pagination and more infinite scrolling, but I’m personally not a fan of it because of the performance drawbacks.
- Minifiers are used legitimately, but they can also be used to make malware harder to detect. The thing is that in the process of making a file more compact, it also becomes unreadable, and it’s also different from what it used to be. So if security software is looking for a specific file or specific things within a file, and then it gets changed with a minifier, it will be harder to detect. Minifiers are essentially inadvertent obfuscation tools.
- Obfuscation – the process of making code harder to read. If a criminal is writing nefarious code, they might try to make it hard to read, especially if it’s browser-based malware, where the user can see what JS files are running. In an open source project, someone could theoretically hide an obfuscated security backdoor in it. People think open source means perfect security, but it doesn’t. So what you should do to protect against that is audit code on GitHub (or wherever it’s hosted) and then make sure there are no obfuscated parts. Any obfuscation is suspicious. Minifiers are the exception, but even then, minfied code can be harder to trust because it’s harder to read and understand.
- Obfuscated code looks like gibberish. The code examples in this book are examples of readable code, not obfuscated or minified.
Here is an example of obfuscated code:
I can’t understand it just by reading it, and that’s entirely the point of obfuscation.
- Captcha – when you make a web app, you might want to add a captcha to cut down on spambots. Ordinary people aren’t the only visitors to your site. You will also get attempted hackers, bots advertising sketchy goods, and other undesirable traffic. If you use a captcha, such as Google reCAPTCHA, you can make your website better. Captchas are annoying to fill out, but they have many benefits from a developer’s standpoint. reCAPTCHA v3 doesn’t even require any user input, unlike older versions of captchas.
- Some people are apprehensive about Google’s captcha solution, citing concerns related to privacy or censorship. If you don’t like Google, then at least make your own self-hosted captcha instead. Having no captcha at all will make your site or app a magnet for spam.
Spam – undesirable messages, such as advertisements, scams, malware, and other automated posts. Spam is often posted by bots. In other words, it’s not a person writing and sending the message, but rather it’s software that automatically sends tons of spam messages. Trillions of spam emails and messages are sent every year. Most people ignore spam, or even use software to filter out spam messages so they don’t see them. Spam has a very low success/response rate, but what spammers lack in success rates, they more than make up for in message volume. Even if only one in a million spam messages gets a reply, that’s still a lot.
- Scalability – the ability for code to handle large-scale usage. A scalable web server is one that can adapt to a high number of users. Elasticity is a notable concept which pertains to scaling, especially in the cloud rather than an on-premises data center. If you have a company that sells holiday goods, and only gets a spike in traffic during a particular time of year, then you don’t just want the ability to handle lots of users and scale up – you want to be able to scale down too so that you can save costs. If a social media post goes viral and drives a lot of traffic to your site, you want it to scale up to meet resource demand for the moments it has peak traffic, but then scale back down to manageable cloud assets afterward because cloud computing is billed like a utility, based on usage.
terms related to scalability include “webscale” and
“hyperscale.” They just mean lots and lots of users.
- Overprovisioning – the older solution to peaks in traffic: making your infrastructure capable of handling tons of traffic for the small number of peak hours, though it’s idle the rest of the time. Think about a store with a gigantic parking lot that is empty most of the year, but maybe really packed during Black Friday sales. Most of the time, it’s a complete waste, but fortunately, there are better ways to address high demand these days, with elastic resources in the cloud. You can’t make an empty parking lot resize itself automatically, but you can do that with cloud resources.
- ORM – Object Relational Mapping. A liason between a database and a programming language that supports objects. You can fetch a row from a database and then convert it into an object in memory using ORM, for use with an OOP language program. ORM comes up in web development.
- Polyfill – when you write code for a website, it might not work the same way in all browsers. Web standards are just suggestions for how things should be implemented, but in reality, there are differences between how browsers render sites. Internet Explorer, Edge, Opera, Vivaldi, Brave, Firefox, Chrome, Safari… there are many different web browsers, and they are all slightly different. As such, when you make front-end code for a website, it might look fine in one browser, but broken in another. Not only are there differences between modern browsers, but many people use older devices without outdated versions of browsers. A new version of Firefox or Chrome will have newer web features that older versions don’t. So testing a site “in Firefox” doesn’t mean much, considering that there are many older versions of it too. If you only test a website in the latest versions of browsers, you might be missing compatibility/rendering issues in older ones.
- Polyfills allow you to make a website that still gets modern features even in older browsers. Polyfills are referred to as “fallbacks” because you hope the browser supports the features your code uses, but if not, it will resort to using a polyfill instead.
Web application security (a.k.a. AppSec) – one major problem these days is that many people who learn to code think of security as a separate area of study, not something that should be their core focus. “I’m not a security person; I’m a software developer!” This is a dangerous mindset that leads to developers writing insecure code, resulting in data breaches and other problems.
The first line of defense isn’t antivirus software, a network firewall or web application firewall, an intrusion detection system, traffic monitoring, or anything like that. The first line of defense is the developers who write the software other people rely on and expect to be secure.
I highly encourage you to at least familiarize yourself with the OWASP Top 10 list of security issues for web development. It’s the 10 most common mistakes people make. OWASP stands for the Open Web Application Security Project. Their website is https://www.owasp.org, and it contains tons of great educational resources for web developers and security researchers alike.
Here’s where you can see the most up-to-date version of the OWASP top 10:
Web API – interacting with APIs is important because you won’t be using everything yourself, and libraries are only for code. APIs are useful not only for features and code, but a lot of APIs are about data. API interaction is more important these days, as more and more programs are network-connected instead of being isolated and offline-only.
In my personal experiences, I have only dealt with REST and JSON. SOAP is older than REST or JSON, and GraphQL is newer than the rest. SOAP is associated with XML, and REST is associated with JSON.
A web API might use one of the aforementioned categories of web service architecture to communicate with clients (in a client/server relationship), such as official or even third-party apps. For example, Twitter’s standard search API uses JSON, which can be used by third party app developers who want to use data from Twitter. You craft a query using something like the command line tool cURL, and the responses from the API are JSON-structured data. Another example is Tweetbot, which is an alternative Twitter client that can do a lot of the same things the official app can do, accessing data and the ability to communicate with Twitter’s servers and send and receive data. Tweetbot makes use of APIs in order to work.
Web APIs are not the only kind of APIs, but they are very common.
One example of how you could combine APIs with Python is to use the OpenWeatherMap API to get weather data, and then use the Tweepy and Twitter’s APIs to make a Twitter bot to tweet out the weather from the weather API.
Some web APIs are private, but some are public. Even on a public API, you might need to get an API key, which will let you interact with it. You need to keep your API key private, just like a password. Don’t accidentally put it on GitHub.
Some APIs are free, though they often have limitations like the number of requests you can use in a certain amount of time, like maybe only one request per second. Some APIs cost money. There are often “freemium” API business models, where the free version is limited, but the premium version has more features, or fewer restrictions such as the number of API calls you can perform.
WebAssembly: the future of the web (maybe)
- WebAssembly is basically bytecode for the web.
- Flash and Java applets tried to do this too, but as it turned out, they were very appealing for malware developers. Malware is why Java no longer runs in a browser client-side, and why Adobe Flash died too. WebAssembly has the potential to be great, but make no mistake: people WILL use it to write more nefarious browser-based malware.
Congratulations on completing section 9!
It’s a relatively short section, but still important if you want to delve deeper into web development.