Advanced Concepts

Stored procedures – basically a function in SQL. You can create a stored procedure and use it later. If you find yourself doing repetitive long queries, like complicated JOINs, you might want to make a stored procedure to do it more easily. Or, instead of using SQL for functions, you can just use your server-side language like PHP.

  1. Database schema – the structure of a database. UML is a mock-up for coding, and a schema is a mock-up/plan for a database. For data exchange formats like JSON, you can even use tools to validate data against a schema to make sure that data is in the correct format.
  2. Data breach – a data breach is usually when someone gets data from a database, such as a MySQL database. I guess a data breach could technically involve files rather than databases, but database breaches are very common, especially for things like payment information, usernames, passwords, and things like that. Hackers often sell the data they stole, such as on dark web forums, which are only accessible through something called Tor. When a hacker gets private data from a database or server onto their own machine, it’s referred to as data exfiltration. Because of how a lot of companies monitor network traffic, sometimes hackers resort to clever and weird ways of exfiltrating data rather than something super straightforward, as that could lead to them getting caught.
  3. MySQL user accounts and permissions – throughout this chapter, we’ve been using the root account for the sake of simplicity. It’s not good to do this in a real-world situation though. If you have some PHP code that takes input from a user and then creates SQL queries to send to your MySQL database to search through products on an ecommerce site, perhaps it should only do this with a new user account (which you can create) which has limited privileges and isn’t allowed to drop tables or databases. A lot of permissions in computing should be need-to-know or need-to-do basis. If something is only responsible for returning search results, does it need permission to delete or edit stuff? No.

Here’s an example of creating a limited user account and only allowing it to run SELECT and nothing else:

CREATE USER ‘limiteduser’@’localhost’ IDENTIFIED BY ‘password_goes_here’;

GRANT SELECT ON my_database_name.* TO ‘limiteduser’@’localhost’;

FLUSH PRIVILEGES;

  1. This relates to a more general concept which you’ll get more into as a developer or even IT professional: service accounts. User accounts aren’t just for people. They’re for software, too. When a user account is only used by code, and not an actual person, it’s called a service account. One example of a service account is the user called ‘apache’ in Linux when you set up a LAMP server. It is the account used by the Apache server program. Service accounts need to be monitored and secured properly.
  2. Service accounts are often targets for hacking. For example, I’ve done security labs against LAMP servers, and one lab I did involved remote file inclusion, and I used a web shell with it. A web shell is like a bash shell, but browser-based and it allows a hacker to execute shell commands on that server. When I did whoami, the command that tells you which user you are, it said apache. The apache account has more limited privileges than root, but if your server has an older version of Linux on it, someone might be able to find a local privilege escalation exploit, such as the infamous Dirty COW (Copy On Write) vulnerability which is freely available for download on GitHub or Exploit DB. So someone might only start with limited permissions on your server, but then eventually figure out which versions of software are running on it, and then downloand and run a relevant exploit that will let them do anything on it. Keep in mind that, even if the code you write for your web server is secure enough, there might be a vulnerability in your code’s dependencies, like Apache, PHP, MySQL, Linux, and so on.
  3. Keep in mind that a security lab for a class or workshop is not the same thing as malicious hacking, but they use similar techniques. The difference is that a security lab is legal and educational, whereas hacking is harmful and illegal.
  4. MATCH (column1, column2) AGAINST(‘string’ search_modifier) – search columns for a string. Some examples of the search_modifier include IN NATURAL LANGUAGE MODE and IN BOOLEAN MODE.
  5. IN NATURAL LANGUAGE MODE is just normal text search, but IN BOOLEAN MODE supports special characters which are almost like regular expressions, but slightly different.
  6. Basically, search columns for text, and if they match, they will be return with the query. MATCH() AGAINST() is not a query on its own, but it can be a part of something like a SELECT query.
  7. Here is an example of using MATCH() AGAINST() in a full query:

SELECT * FROM users WHERE MATCH(first_name) AGAINST(‘John’ IN NATURAL LANGUAGE MODE) ;

  1. AES_ENCRYPT() and AES_DECRYPT() – functions for encrypting and decrypting data. Encryption is different from hashing. Encryption is used for stuff that you want to see the original data of again, whereas hashing never needs to be un-hashed.
  2. HEX() and UNHEX() – turn strings into hexadecimal values, or vice versa.
  3. UPPER() and LOWER() – make something uppercase or lowercase.
  4. SELECT CONCAT(first_name, ‘ ‘, last_name) – concatenate strings.
  5. Collation – how characters are compared. Because there are many different character sets, each with their own binary values for representing characters, the way that strings get compared (such as for sorting) depends on the character set. For example, if you use US ASCII, the character set (or charset) will be ascii, and the collation will be ascii_general_ci. ascii is best suited for English and Spanish. If you want to support additional languages, you will need to use a different charset and collation. Keep in mind that accent marks are different letters. So even though
  6. If you want to be able to store non-ASCII stuff, such as emojis, Cyrillic, Hindi, or Chinese, then you will need to use something like utf8 as the charset and utf8_general_ci as the collation. utf8 covers lots of things and is pretty general, but there are some language-specific charsets, such as sjis for Japanese. But even though there are language-specific charsets, utf8 covers pretty much everything. So even though sjis is a Japanese-specific charset, utf8 also supports Japanese characters – just in a slightly different way. The characters it supports are the same, but the way they are represented under the hood is slightly different. It makes no difference to the end user though.
  7. 7-bit ASCII supports no special characters at all. 8-bit ASCII, a.k.a. extended ASCII, supports a couple special characters and accented characters. But for more accent support, even in the A-Z English/Latin alphabet, you will need utf8. For example, Polish and Vietnamese use A-Z, but with accent marks, like ź and ł in Polish, or ở and ặ in Vietnamese. These are not supported even in extended ASCII, but utf8 supports them.
  8. One caveat about Unicode is that it can make validation or spam filtering harder. Let’s say you have filters for profanities or spam/scam messages. Maybe this is an example of a comment you won’t accept on your website, because spammers were spamming your site with it:

You just won $1,000,000! Click here for more information.

  1. But then they can just change it slightly to get around your filters:

Yờu ??s? w0n̪ €1,O0O,O0o! Cł1ck ?3re fór morę lnformặti0n.

  1. FOREIGN KEY – instead of using JOIN queries, which only display results but don’t change anything within the database, you can use a FOREIGN KEY within one table to link it to another table. The PRIMARY KEY of one table can be the FOREIGN KEY of a different table that has related data.
  2. Address binding – you can tell your database which networks it’s allowed to accept connections from. In a typical single-server setup, where you have a Linux distro as your OS and then Apache, MySQL, and PHP on the same machine, then you’d want your bind-address to be set to 127.0.0.1. If you set up your address binding incorrectly, such as with 0.0.0.0, anything will be allowed to attempt to connect to it, so hackers could try and brute force the login info for your database. I’ve heard of data breaches occurring due to developers not setting up their bind address settings for their database correctly, whether MySQL or MongoDB.

If you would rather concentrate on developing your website or web app rather than configuring security settings, look into managed LAMP hosting. You can get the software you want on your server on the internet, but the hosting company will manage the settings for you, hopefully using secure settings. But whatever you use, make sure that they’re actually going to be responsible for security, if that’s what you want. Don’t just assume that they will, as some companies won’t.

OWASP lists “security misconfigurations” as #6 on their OWASP Top 10 list, which is a routinely-updated list of the most common categories of security vulnerabilities.

my.cnf – the MySQL config file for your MySQL settings.

  1. MySQL server logs – if you want to enable server logging, then run this query:

SET GLOBAL general_log = ‘ON’;

  1. The official MySQL reference manual says that the “general query log is a general record of what mysqld is doing.”
  2. Please note that general logging logs many different types of things, and can increase resource usage. But it can also be good if you want to log stuff like errors or logins. Maybe someone hacked your site and you want to see who did it. You’d need to check logs. Some tech companies even do automatic log monitoring to figure out when something has gone wrong or been hacked.
  3. This book didn’t cover logging too much, but even offline software can benefit from logging. Maybe you can make an app, and when something important happens, a note of what happened gets written to a log file. Maybe you can make it an optional feature, and then the log will only be written if it’s specifically enabled. Logging can help you find bugs, hacks, and many other things.
  4. But again, when you’re just making your first web projects, presumably on your own computer on a local WAMP stack rather than on a server on the internet, you won’t need to worry too much about all this.
  5. SQL injection and OWASP – let’s say you have an online shop where users can, among other things, search for products.
  6. Your website has a search feature that looks something like this:
  7. Search: _____________________________ [OK]
  8. And when someone types text in it, it gets put into a SQL query, like so:
  9. Search: earbuds_______________________[OK]

SELECT *

FROM sales

WHERE product_name = ‘earbuds’;

  1. The query starts with SELECT * FROM sales WHERE product_name = ‘ and then it puts in the string you submitted for searching the site. Then it ends it by adding ‘; to the end to signify the end of the string and the end of the SELECT query with the WHERE clause.
  2. Then, the results are sent back to the user. They can find all earbuds that are sold on the website.
  3. But what if they did something like this instead?
  4. Search: earbuds’; SELECT * FROM users;–[OK]

Here is what the above code would look like in a SQL query run by the DBMS, after being passed from the back-end code, such as PHP or something:

SELECT *

FROM sales

WHERE product_name = ‘earbuds’;

SELECT * FROM users;–‘;

There are many different types of SQL injection. The above example uses multiple queries at once. You are ending the first query and then running a second one. The — at the end means anything after it will be treated as a comment rather than SQL code. Some people online say that SQL injection by means of chaining multiple queries together doesn’t work in MySQL, but it can possibly work in other SQL variants. However, many of the most basic types of SQL injection are protected against either by the developer or by a WAF (Web Application Firewall). There are many different kinds of SQL injection, even if my above example doesn’t work on everything.

  1. This book isn’t trying to make you an expert on SQL injection. I just want you to be aware that it exists, and that you shouldn’t trust input from a user.
  2. SQL injection is illegal, but some people will still do it to your server anyway. Lauri Love, a UK man who performed SQL injection on many servers, had been caught, and the US wants to extradite him and give him up to 99 years in prison. While the UK has avoided extraditing him, he could still face a lengthy sentence. So even though you can pretty easily use online resources to learn how to do SQL injection, don’t actually perform SQL injection attacks!
  3. If you really want to learn about SQL injection in a legal way, then on a home server only (on your LAN, not the cloud or whatever), install something called DVWA, which stands for Damn Vulnerable Web App, which is an intentionally-insecure LAMP stack project. Then you can do manual SQL injection stuff by opening the local DVWA website in a browser, or by using SQL injection tools in Kali Linux, such as sqlmap or sqlninja. You can even use something like curl to craft HTTP requests yourself that perform SQL injection. But again, you can only do this on your own home network. Even if you pay money for a VPS or shared web hosting, you shouldn’t test its security there because you don’t own the hosting company’s servers.
  4. Protecting against SQL injection will be covered in the PHP chapter, because you will be using a MySQL database with an Apache web server with PHP code. Topics that come up with writing code that is safe from SQL injection include query parameterization, prepared statements, escaping input, and regular expressions.

If you would like to learn more about SQL injection, and just web security in general, check out the OWASP wiki. OWASP stands for the Open Web Application Security Project. You can find it here:

https://www.owasp.org/index.php/Main_Page

Password hashing – so earlier I mentioned how it’s not a good idea to store passwords in plain text. Plain text means non-encrypted, non-hashed textual data.

  1. Here’s what happens in a login system that doesn’t use hashing:
  1. A user signs up for a site. Their username and password are stored directly in a database, with no alterations.
  2. User submits login attempt, including username and password.
  3. Server compares password from user to password in database that corresponds to the row with the matching username.
  4. If the two strings match, the user can log in.
  1. Sounds fine, except there’s one big problem: if there’s some sort of security issue with the database, such as SQL injection, weak passwords (remember setting up your Bitnami WAMP root password?), and so on. If an attacker can exfiltrate data from a database, and the passwords are stored in plain text, then they will be able to log into anyone’s account on that server.
  2. Here’s what happens in a login system that uses hashing:
  1. When a user signs up, their password is hashed. Hashing tools are one-way functions that turn your actual password (such as “hunter2”) to a hashed version of it, which will look like gibberish, such as something like this: c543eaad558a6f4a1d4b10e807d56fd4. Hashes are usually represented with hexadecimal. You can use the SHA2() hashing function in SQL, or a hashing function in PHP instead.
  2. User submits login attempt, including username and password.
  3. The server hashes the password that was submitted by the user, and compares the newly-hashed password from the login attempt to the hashed password stored in the database, for the account with the specified username (which should be unique and possibly a primary key).
  4. If the two hashes match, the user gets to log in.
  1. If a hacker manages to view private data in the database, it’s still hashed. You can’t just submit a hash as a password attempt. It will be hashed again by the server. So knowing the password hash for the password “hunter2” doesn’t mean you can log in with it. The only way to use the hash would be to brute force it to try and figure out what the original value was. But, as hashing algorithms are supposedly one-way functions (meaning easy to do one way but impractically hard or impossible to undo it), it would take a very long time to crack a password hash… at least if a good hashing algorithm was used.
  2. If you’re wondering why I’ve repeatedly used “hunter2” as a password example, check this out:
  3. http://bash.org/?244321
  4. A concept related to hashing algorithms is salt. Cryptographic salt means you are customizing the way the hashing gets done. Think of it like this: if everyone only used the same hashing algorithm, then people could easily just compute an entire rainbow table, and then the values in it (of precomputed inputs to the hashing function and their resulting hashes), and then use that to crack anyone’s hashes. So applying a salt to a hash means that it will be hashed in a slightly different way from other hashes with different salt.

Security is difficult. There’s no getting around it. But the next chapter, which covers PHP, will provide an example of password hashing. The thing about databases and SQL is that, by itself, SQL isn’t very useful. But because you can do input validation, password hashing, and more using PHP, that makes a database more useful.

Memcached and Redis – these are two tools for doing in-memory caching. It can make a database faster, but also more complicated. It’s not recommended for beginners or people who only have websites or apps with a very small number of users. But if your personal app project ever makes it big, or use it at a company that has a large number of users, you might want to look into them. But here’s the thing: I’ve had this discussion with some people before. Often times, people have delusions of grandeur and think their website or app is gonna get a million users, even when it might only get like 500 or so. You think that you need to design it for scalability, but in reality you’re just wasting time overengineering something, based on a hypothetical thing that won’t actually happen (what if I suddenly get 1,000x more users?).

There are easier ways to get performance boosts for your database. For example, you can use SSD storage instead of hard drives. You can get a VPS with a better CPU, faster network connection, and so on. Another thing you can do is to optimize your SQL queries and back-end code. Maybe you have an online shop, and instead of using a wildcard SELECT, which can be slow when there are lots of rows in a table, maybe you can just use the LIMIT clause for pagination, such as only giving 10 product results per page for searching for things on the website. That way, the queries will take less time, even on the exact same hardware, and even without any RAM caching.

You can also strategically place your server where you think your customers would be. If your customers are in the west coast of the US, would it make sense to get a server from a web host that’s all the way over in the UK? No. That would add some latency. So yes, in-memory database caching has its place, but it shouldn’t be the first thing you turn to when you have performance issues.

Distributed databases – do you think that big tech companies like Facebook or Google have a single server with a single database and single users table with each and every user account in it? Of course not. A billion people use Facebook. You can’t have a billion users all connecting to a single database server. That’s where distributed databases come into play. A distributed database is a database that spans across multiple machines.

There are different ways to break up a database across multiple machines, including sharding or partitioning. In sharding, a single machine might have a single column. In the MySQL examples in this chapter, we went over having tables that have multiple columns. Sharding would be one machine per column. The other option is partitioning. Let’s say you have 10,000 user accounts and 10 machines. Instead of having all 10,000 on a single server, you could break it up – 1,000 for each server. Each server would have every column for a row, such as username, password, email etc. for a user account. But a single database would only have certain users on it.

Other options include replication, which is where multiple servers have the exact same data. This can be useful if you have a lot of data, but it isn’t changing very often.

Another option is syncing. There are multiple databases in different locations for performance reasons, and they sync with each other to get new data. One issue with this approach is consistency, which is a part of ACID. This can lead to something called eventual consistency.

  1. Distributed databases add complexity. Just like with in-memory database caching, having a distributed database is not something you should do unless it’s absolutely necessary. Sure, Facebook and Google need to do it, but they operate on a scale that most people won’t experience. Massive numbers like in FAANG companies (Facebook, Apple, Amazon, Netflix, and Google) is called web scale, but not all web developers will have “web scale” userbases. But if you make a website that only has a few registered users, don’t bother.
  2. Using any sort of distributed system, even for things that aren’t database-related, adds extra complexity. The main benefit is performance and scalability, so if you don’t need it, don’t bother.
  3. Distributed databases are a form of horizontal scaling, meaning scaling outwards. An easier way to scale, but with a lower ceiling for how much you can scale, is vertical scaling, like upgrading your server to have faster processing, more RAM, and so on.
  4. Modern cloud computing focuses at least somewhat on the concept of elastic scaling, or elasticity, meaning the ability to add more resources when you need them (such as a tweet with a link to your site or app going viral), and then scaling down when you’re back to your normal lower number of users. This saves money because you only pay for what you need. But elasticity is complicated too.
  5. Beginners should ignore distributed systems, elasticity, etc. Just use a single-node database, either a VPS or managed host that you can administrate via cPanel in a browser. Keep it simple to begin with.
  6. Laws regarding data in databases – so if you actually want to make a full stack web development project, and you store personal information about users of your site/app, you will need to be aware of some data privacy laws and regulations, such as PCI, GDPR, FISMA, PII/PA, and HIPAA. Compliance means meeting at least the basic standards for privacy and/or security for various regulations.
  7. PCI stands for Payment Card Industry, and PCI DSS stands for Payment Card Industry Data Security Standards. There are rules for how people who store payment information need to secure it. So if you have an online shop, you will either need to read up on PCI, or just use a vendor that makes ecommerce software so that you don’t need to do it yourself. If you store stuff like credit card info in a MySQL database, you would need to be PCI compliant. Compliance can be audited. Failing an audit is bad. If you’re just one person, and you’re focusing on personal projects, odds are that you won’t be directly taking CC info. If you’re working as a professional software developer, you might be developing database systems that take payment from users, and in that case, your company will need to be PCI compliant.
  8. Aside from databases, some things PCI DSS mentions include not using default passwords, installing software updates, using a firewall, running antivirus scans, monitoring your network, and things like that.

GDPR is the EU’s General Data Protection Regulation. It is well-known for being the reason why many websites say “this site uses cookies,” but not many people know of all the other facets of it. Basically, if you want customers or users in the EU, you will need to abide by GDPR rules. It pertains to privacy and security. You must make a good effort to secure data, let people know about a data breach, etc.

FISMA stands for the Federal Information Security Management Act. It’s a law that basically says that government organizations within the US need to make a decent effort to keep things secure.

PII stands for Personally Identifiable Information. Many things are considered PII, such as names, addresses, phone numbers, social security numbers, and even email addresses. The US’s Privacy Act law attempts to protect PII.

HIPAA stands for the Health Insurance Portability and Accountability Act. HIPAA attempts to make medical information more private and secure.

Does regulation stop people from hacking? Does it stop companies from making security mistakes? Do laws make security flawless? No. But while regulation isn’t perfect, it’s better than nothing.

Keep in mind that I live in the US, so I’m mainly writing about US law (aside from GDPR). If you’re in another country, you’ll have to read up on your country’s privacy and security laws. Long story short, if you put people’s information in a database, there are laws about how you can and can’t collect data, notifying people about what data you collect, how you need to secure it, what you can and can’t do with it (example: doctors can’t sell your PHI (Protected Health Information)), what you need to do if you have a data breach, and things of that nature.

You might just want to learn about coding, but storing legally-protected data in your database means you need to be aware of the laws surrounding it. If you don’t learn about this stuff, you could get in a lot of trouble.

Keep in mind: regulatory bodies governing compliance standards are mostly for businesses that make money and store information about users and customers. You don’t have to go all hardcore for a personal project, like a guestbook page on your website where people can post a comment and then it goes into your MySQL database. But even so, it never hurts to try to keep things secure. And do your due diligence and read more into the specific details of these laws and see which, if any, apply to what you’re doing with your website/app.

When you’re on a team, your company should, in theory, be in charge of making sure compliance standards are met and audits are passed. But when you’re doing solo projects, you’re in charge.

When you eventually pay for a server on the internet rather than having one at home, a managed web hosting company might take care of compliance for their customers. Some server companies, like cloud providers or VPS hosts, are hands-off and tell you that you’re in charge of your own security.

User uploads – aside from laws about privacy and security, there are also laws about user-uploaded content. For example, if you make a LAMP stack web app that lets people upload files to your server, and users upload copyrighted material or malware, you will have to quickly remove it. That being said, there are certain protections made making it clear that a user is responsible for what they upload, as opposed to the owner of a website being held accountable. However, you can’t just say that users are responsible for what they upload and then ignore lots of cases of copyright infringement. If you make a file upload site and people are constantly uploading movies to it so that people can illegally download movies for free, and you know it’s happening but choose to ignore it, then you could be held accountable. It can be good to not only have a written policy about how people can use your site, but also check what people are posting on it from time to time and delete bad things if people post things like movies or malware.

Because people can post bad things to your site if you accept comments or file uploads, you might want to consider coding functionality for banning users (by account or IP address) and logging IP addresses associated with posts/uploads. But that would be implemented in PHP, not just SQL. But you’d need to keep track of some things in your database so that the PHP could use it and determine if someone is banned or not.

The pseudocode for that could look something like this:

if (user not in banned_list) {

make_post(user_submission, user);

} else {

show_ban_message(user);

}

Congratulations! You officially completed section 11 of FreeCodingTutorials.com!

Databases are really important, so it’s really good that you learned this stuff. Being able to work with databases opens up all sorts of possibilities for coding and employment opportunities. This was a really long section, but it’s worth it to learn something that is absolutely vital for many kinds of software development jobs.

← Previous | Next →

SQL and Databases Topic List

Main Topic List

Leave a Reply

Your email address will not be published. Required fields are marked *