Data Types and Variables

Types – data can have different types. You can have a variable that’s a string, integer, or even a user-defined class. The type determines what it is and what you can do with it.

Dynamic typing – when you can make a variable that is initially assigned one data type, and then later changed to some other data type. Python is a dynamically-typed language.

Here’s an example of dynamic typing in Python:

dynamic_example = 5
print(type(dynamic_example)) #it will say int, for integer
dynamic_example = "hello"
print(type(dynamic_example)) #now it will say str, meaning string

Not all languages are dynamically typed. While dynamic typing offers greater flexibility, it also introduces new issues like type confusion.

Static typing – a language where, if something starts as a certain data type, it has to stay that way for its entire existence. If something is an integer, it can’t then become a string of text (called a string). Python is dynamically typed, but Java is statically typed. In Java, you can temporarily typecast something, but that doesn’t change the variable, just the value where it’s being used. For example, you can typecast an int to a float, but that doesn’t change the variable itself, just where it’s being used when the typecasting occurs.

Type inference – newer versions of Java include the var keyword, which is used for static type inference. This means that you can do something like this:

var x = 5;

And Java will automatically infer, or figure out, that it’s an integer. But this is not dynamic in Java, hence static type inference. Once that variable is an integer, it can’t be any other data type (except for subtypes).

Typecasting – converting one data type to another. Maybe you have an integer, 5, but you want to convert it to a float so that it can be 5.0. You would have to typecast it to float.

If you get input from the user in Python, it’s stored as a string. But let’s say you want to make a program that gets two numbers from the user and adds them together. You can take the input from the user and assign it to variables, but then you would need to typecast from string to integer.

typeof(), type(), and getType() – in many languages, the typeof() function will let you see the type of something. In Java, you can use the getClass() or getType() methods to figure out what something is. Python uses type().

If you do this:

type(5)

This will be the output:

<class 'int'>

That means it’s an integer.

Variable – a variable consists of two parts: a name (or identifier), and a value. You can do something like this:

String myName = "Alan";

And then use it later without referring to the actual contents:

System.out.println(myName);

Notice how I didn’t use quotes around it. If you surround a variable name with quotes, you are no longer referencing the variable, but instead, you are using a string literal, which probably isn’t what you want. Conversely, if you want to print a string but don’t surround it with quotes, you might get an error that the variable you’re referencing is undefined.

I guess another implied part of it is the namespace it’s using, though often people are just using a default namespace. A namespace is where a name is valid. You can have two variables with the same name only if they are in different namespaces. In the case of namespace ambiguity, you need to explicitly state the namespace. If you try to have two variables with the same exact name when they’re in the same namespace, that’s called a namespace collision. Think of it this way: if a town has two addresses called 123 Oak Road, how will the mailman know which is which? Names for variables need to be unique. If there is a 123 Oak Road in Whateversville and a 123 Oak Road in Somethingville, that’s not a namespace collision because they are in different namespaces. But two addresses that are the exact same within the same town is a namespace collision.

In many languages, variable names are case-sensitive. So myVariable is separate from MyVariable or myvariable. Variables aren’t the only things that are case-sensitive in programming though.

Declaration and initialization – declaration means that the variable name corresponds to a place in memory. Without initialization, it will just use whatever is there already, or some languages force you to initialize it and might give an error when you try to compile or run it. However, in C or C++, it will just use whatever was there. There might be random garbage in RAM, so in C/C++ it’s imperative that you properly initialize things before they’re used, even if the language technically allows you to use something prior to initialization.

Here is an example of declaration:

var x;

Here is initialization:

x = 5;

The following example is a combination of both declaration and initialization:

var x = 5;

You can technically initialize something to null, which is saying that you want null rather than whatever random value is in the memory space associated with the variable you’ve declared. But I wouldn’t recommend just doing a null initialization if you’re not planning on assigning something else to it later. If you perform null initialization, or use something that has default initialization to null, you will need to use control structures that check if it’s not null, and only proceeding if it isn’t.

Some programming languages will let you do all sorts of bad things, but just because you can doesn’t mean you should.

new – for variables that aren’t primitive data types, you might have to use the new keyword when creating something. For example:

Person example123 = new Person("Alan");

In the above case, I’m calling the Person class’s constructor with a single String argument. In Java, String is capitalized, but in C++, string is lowercase. That’s just how it is. I didn’t come up with it.

Assignment – to give a value to something. Assignment isn’t the same as initialization, because assignment can be done even after something has been initialized. Think of initialization as like the first assignment.

Assignment example:

int x = 5;

Reassignment – to give a value to something that had a different value before.

Reassignment example:

int x = 4;
x = 10;

Primitive data type – simple kinds of data, such as integers, floats, doubles, and chars. They are primitive, unlike user-defined data types, which can be more complicated, though they often consist of multiple primitive data type attributes and a couple of functions which perform actions on primitive data types.

Booleans – a data type where the only two possible values are true or false. Some languages require them to be all lowercase, such as in Java and C++. But Python requires Booleans to be capitalized, like True or False.

Many operators will return Boolean values.

The following Python code will yield False:

print(5>10)

5 is obviously not greater than 10.

The following Python code will yield True:

print(5<10)

Floats and doubles – numbers with decimals. Doubles are double-size floats, so they take up more memory, but they can be bigger. Float stands for floating point integer because the radix point “floats” and can move around. String formatting is critical for floats/doubles because if you are dealing with someone like money, you could have a value of $1.25, but with no formatting, it could show up as something like 1.25000000000. And sometimes there are weird rounding errors, so you might get an answer like 3.000000000002 when you’re expecting 3.0. Sometimes, you will have different numeric types that you want to use together, such as an integer and a double, but one of them will need to be typecast – at least in some languages.

Radix point – the point which separates whole integer numbers from fractional parts. It’s the fancy computer science way of saying decimal point.

Signed vs. unsigned – a signed numeric type can be positive or negative, but it essentially halves how bit the values can be. By contrast, an unsigned numeric type can’t be negative, but it can be bigger. The first bit is for positive or negative, and all the remaining bits are for the value. Unless you are dealing with massive numbers, a signed int or float/double will be just fine.

short – a small numeric type that takes up less memory than other kinds.

Integers – numbers that can only be multiples of 1, with no decimals. 1, 2, and 3 are integers. 0.5, 6.2, and (22/7) are not. An int data type in Java can only support up to 4.3 billion if unsigned, or 2.147 billion if signed.

long – if you need numbers that are bigger than what a 32-bit int can handle in Java, use a long. An unsigned long can be used to represent values of up to around 18.4 quintillion, or 9.2 quintillion if signed. If you need to use HUGE numbers, even bigger than longs, then you will need to use Java’s BigInteger class instead.

NaN – Not a Number. If you see NaN when you’re expecting to see a number, such as in JavaScript, it means you did something wrong.

byte – in Java and C#, a byte type is a numeric data type that can only be from 0-255. A byte is 8 bits, and 28 is 256. In the old days, when memory was scarce, bytes were more useful. Now? You can just use an integer and not really care about saving a little bit of memory. In some languages, byte is not a numeric type per se, just a one-byte type, where the interpretation of the value can vary depending on its implementation. If this sounds more confusing than the other types mentioned here, don’t worry. You really don’t need to use bytes. It’s just good to be aware of it in case it ever comes up.

Two’s complement – a way of dealing with negative numbers in binary. You will most likely never use this in the real world, though it might be covered in a freshman computer science class. Computers use two’s complement all the time, but you don’t need to do it manually. I guess it can’t hurt to know, but at the same time, you will never implement it yourself in a software development job.

Two’s complement is quite simple: take a number, invert its bits, and then add one to it.

Let’s use -6 as an example. If you want to represent -6 in binary, first you have to figure out positive 6.

6 is (0*8)+(1*4)+(1*2)+(0*1), so it can be represented as 0110. But you will also need an extra bit to represent sign, being positive or negative, so now our signed positive 6 is 00110. The leading bit means positive if it’s 0 or negative if it’s 1.

Now take the number and invert it:

From 00110 to 11001.

Now add 1 to it: 11010. That’s -6 in binary, using 5 bits.

Two’s complement is useful for addition and subtraction as well as converting from negative to positive or vice versa.

In binary, if you want to subtract one number from another, you instead need to convert the number to be subtracted to negative, and then add the two numbers together.

You might be used to something like this:

   7
 - 5

But with binary, you’d have to do this:

   7 
+ (-5)

If two’s complement confuses you, don’t worry. Unless you’re studying computer science in college, you’ll never actually need to know it. It’s an academia skill, not a workplace skill. If you’re studying this on your own, outside of a university setting, then you shouldn’t waste your time on two’s complement.

Increment – to programmatically add to something, usually one. Unlike two’s complement, incrementing is very important.

Examples:

some_variable++

Or

some_variable += 1

Or

some_variable = some_variable + 1

Decrement – to decrease something.

Examples:

some_variable--

Or

some_variable -= 1

Or

some_variable = some_variable – 1

char – a string can be multiple characters, but a char is a data type that is just a single character. You can also have a char array, but that’s more for older languages instead of modern ones. You’re better off just sticking with strings unless you need something only to be one character.

Strings – strings are text. A char is a single character, but a string can be many characters. Instead of having strings, some older languages use arrays (contiguous groups) of chars.

Important note about strings: Strings cannot usually use == for comparison (at least in C# and Java), but you can use the .equals() string method to compare them that way, such as in the following example:

String someStringVariable = "Hello";
String someOtherStringVar = "Not the same";
System.out.println(someStringVariable.equals(someOtherStringVar));

Substring – a string within a string. For example, if you have the string "hello", then "el" is one (of many) substrings within it.

len() and length() –to get the length of a string in Python, you can use the len() function with the string as the argument, such as len(myCoolString). In Java, you’d use the .length() method, such as someString.length() to return the integer length. To get the length of an array in Java, you’d use the .length attribute, which is not a method, therefore it does not require parentheses, such as this: myCoolArray.length.

String formatting – depending on what language and formatting stuff you’re using, you can accidentally get a rounding error, due to either not having enough significant digits or because some formatting options will merely truncate numeric types, which in some cases can make it look rounded down.

Here’s an example of how to format strings in Python:

#!/usr/bin/env python3

whatever = 5.0000001

print(“%.2f”% whatever)

The output is as follows:

5.00

Conversion specifiers – the above example used %f, which is something you probably haven’t seen before. Conversion specifiers are used for string formatting placeholders. Use %c for char, %s for “string” (char array), %d for signed numeric types (int or long), %u for unsigned numeric types (int or long), %f for float and double (numbers with decimals),

Escaping (or escape sequence) – When you have a string, it needs to be surrounded by quotes. But what if you want to have a string that contains quotation marks? It would be interpreted as a string delimiter by default, so you need to escape it.

This is incorrect:

print(“I said “howdy” to him”)

This is correct:

print(“I said \”howdy\” to him”)

The backslash indicates that the following character is literal rather than syntax.

There are a couple of exceptions though. For example, \n means newline.

Escaping input is hugely important for security; otherwise, text can potentially be treated as code and executed.

Line break – also known as a newline, \n, or CRLF (Carriage Return Line Feed), it’s when the cursor or text goes to the next line on a screen. If you use echo in a shell script, by default, it will add a newline after the text has been echoed. But maybe you don’t want that, so you’d use the -n flag to not make it use a newline afterward, like so: echo -n “hello”

In Python, if you use print(), by default, it will add a line break at the end. However, if you use something like this:

print(“thing to print”, end=””)

Then it won’t add a line break at the end.

To print something with a line break at the end in Java, use this:

System.out.println(“thing to print”);

To print something without a newline after the string to be printed, use this:

System.out.print(“something”);

Sometimes you want line breaks.

But

sometimes,

your code’s

output

might have

too many

line breaks,

and that

can be

a problem.

Mutable vs. immutable – if something can be changed, it’s mutable. If it can’t be changed, it’s called immutable. A string is actually immutable. That means strings can’t be changed. They can be reassigned, like you can say String myName = Alice and then reassign it with myName = Bob, but it’s still technically immutable. You’re not doing something like myName = myName + 1, so it’s not being changed in that way.

List – a list in Python can contain many different things. It’s sort of like an array in the sense that they’re ordered.

cool_stuff = [“Python”, “Kubernetes”, “Docker”, “TensorFlow”]

Dictionary – an unordered list of key-value pairs. Dictionaries are commonly used in Python. A key is a name, like a word in a dictionary. The value is like a definition in a dictionary.

dictionary_example = {“product_name”:”stapler”,”cost”:4.99}

print(dictionary_example[“product_name”])

Key-value pair – a key and a value. Think real-world dictionaries, with words and definitions. You can refer to a key to get its value.

Array – If an egg is a variable, then a carton of eggs is an egg array. More specifically, it’s an array of type egg with a size of 12. The things stored in the carton are called the array elements. The elements in the array are accessible with the use of their indices (plural for index), from 0 to 11 – not 1-12. If you try to access an invalid index, such as 50, you will get an array index out of bounds error. You need to do something called bounds checking. You also can’t put a different type of thing in it. If it’s an egg array, you can’t put a banana in it. If you have an integer array, you can’t put a string in it.

Maybe you want a ton of strings, a ton of ints, or something like that. Usually, they all have to be the same, although there is a weird workaround I don’t recommend. In Java, all types inherit from the Object superclass, so if you make an Object array, you can fill it with anything, because all subsequent types are descended from Object. But that’s probably not a good idea. The rule of thumb is to use an array when you need a lot of the same kind of thing; otherwise, it could be hard to keep track of what’s in each index.

In most languages, an array is contiguous and fixed size, though there are some workarounds, such as Java’s ArrayList class, or vectors in C++. But a pure array has a set size, which can be a problem in some cases.

Arrays are usually fast but inflexible. Let’s say you want to put something in between two contiguous elements in an array. How would you do that? Well, you’d have to shift everything down one. That’s not fast or easy. A different kind of data structure that lends itself well to insertion is the linked list. Linked lists let you easily insert things in between other things, but there are some downsides to them too. You will find that data structures all have their pros and cons.

Counting with 0 – in computer science, you start counting with 0 first, not 1. Which index should you start within an array? Certainly not 1. Use the zeroth index, such as this:

eggCarton[0] = new Egg(“Free range”);

Multidimensional arrays – if you have a bunch of objects in a one-dimensional line, that’s a regular array, with only one dimension for addressing elements. But what about more than one? Let’s take a chess board for example. It has two dimensions for the squares.

So while a one-dimensional array element can be accessed like this:

something[2]

A two-dimensional array, such as a chessboard, might be accessed like this:

chess_board[0][4]

You might do something like this:

chess_board[0][4] = “Black Bishop”;

chess_board[5][3] = “White Rook”;

A three-dimensional array is created like this:

three_dimensional_thing = new thing[5][4][2]

Sometimes, you can have asymmetric multidimensional arrays, meaning not all dimensions are the same size. So instead of a grid, like maybe 10×10 for a 2d array, each array within the array might be a different length.

If a language doesn’t support multidimensional arrays officially, you can sometimes take a one-dimensional array and then put one-dimensional arrays for each index/element in the original array. But that doesn’t work for every language.

Static – something static can only have one copy of itself. One of my professors used the example of a high score for a video game. You might have many instances of other things, like NPCs or projectile objects, but there is only one static integer for the high score. Don’t get confused – static values can be changed. It just means there can only be one of them. Now and then you might run into issues where non-static things can’t be used in static contexts. Static things tend to play well with other static things, and non-static things are best with other non-static things. If you’re starting out, don’t worry too much about it. But sometimes you will see the static keyword come up in Java if that’s a language you’re going to learn.

Constant – something that cannot be changed. It is often denoted with the keyword final. Constants should be named in all caps with underscores to separate the words in their names, such as WINDOW_WIDTH. Instead of hardcoding values, you can create constants and then put those into places instead of literal values. Then it’s easier to change one configuration-related constant and then have those changes be reflected elsewhere in the program. It makes it so you can change your initial assignment when you’re editing your code, but you can’t go and accidentally reassign it to something else later in your code.

Scope – Simply put, scope means where something is relevant. Scope is determined by things like access modifiers, what block something is in, nesting, and that kind of thing. Sometimes, if you are trying to access a variable and your IDE says it isn’t defined, then it’s a scoping issue. Something that is inaccessible in a certain place is said to be “out of scope.” Sometimes it can be hard to figure out where you should put a given variable. Try to avoid globals, and if you need to, you can create a class for something that has a getter method so that it can be accessed. Long story short, if a variable is only defined in a very specific block of code, it won’t be accessible in outer/more general blocks of code.

There are two main types of scope you should know about: global and local. Global means it’s accessible anywhere. Many people dislike global variables. They are lazy and can create problems. Not only that, but it goes against the idea of encapsulation. Sometimes, globals are unavoidable, but it’s best to not use them if it’s possible. Local variables are only accessible in a particular part of a program.

Global variable – a variable that is accessible everywhere. Many people say that globals are bad, but they can be useful now and then. Just don’t overdo it. A global variable is usually a lazy way to deal with scoping, rather than figuring out the proper scope for something. Try to have reasonable access modifiers, which can sometimes be aided by your IDE. Scope is also determined by where the declaration of something is, not just its access modifiers. Your IDE might also tell you about variable scope, or it might not. When creating variables, try to place them close to where they’re going to be used. Don’t make things global when they’re only going to be used in particular contexts.

Naming conventions – you can give names to your program files, classes, functions, variables, and things of that nature. There are different conventions for naming things. One is called snake_case, which uses all lowercase and uses underscores between words. You will see snake_case used a lot in Python. Another naming convention is camelCase, which starts with an all-lowercase word, then every word thereafter is capitalized and not separated by anything else. Class names are typically capitalized, and constants are all caps. You also shouldn’t start a name with a number.

Reference – instead of having a value or object on its own, you can create an identifier that merely references some other object.

Null values – when an identifier has a null value, it’s nothing. This can be useful or bad depending on the situation. Null pointers or improper pointer initialization can lead to security issues, among other things. You can check if something is null or not and only proceed if it is not null. != means not equal to.

if (someIdentifier != null){

System.out.println(“this identifier isn’t null”);

}

← Previous | Next →

Basic CS Topic List

Main Topic List

Leave a Reply

Your email address will not be published. Required fields are marked *