Introduction to Java – I

Historical background

C family of languages
Object Oriented Programming – OOP

JVM – Java Virtual Machine, HotSpot VM
JDK – Java Development Kit
JRE – Java Runtime Environment

Java SE vs Java EE

Advertisements

Unicode and UTF-8

Character Set – set of symbols and encodings.

Collation – a set of rules for comparing characters in a character set.

Mastering programming

From years of watching master programmers, I have observed certain common patterns in their workflows. From years of coaching skilled journeyman programmers, I have observed the absence of those patterns. I have seen what a difference introducing the patterns can make.

Here are ways effective programmers get the most out of their precious 3e9 seconds on the planet.

The theme here is scaling your brain. The journeyman learns to solve bigger problems by solving more problems at once. The master learns to solve even bigger problems than that by solving fewer problems at once. Part of the wisdom is subdividing so that integrating the separate solutions will be a smaller problem than just solving them together.

Time

  • Slicing. Take a big project, cut it into thin slices, and rearrange the slices to suit your context. I can always slice projects finer and I can always find new permutations of the slices that meet different needs.
  • One thing at a time. We’re so focused on efficiency that we reduce the number of feedback cycles in an attempt to reduce overhead. This leads to difficult debugging situations whose expected cost is greater than the cycle overhead we avoided.
  • Make it run, make it right, make it fast. (Example of One Thing at a Time, Slicing, and Easy Changes)
  • Easy changes. When faced with a hard change, first make it easy (warning, this may be hard), then make the easy change. (e.g. slicing, one thing at a time, concentration, isolation). Example of slicing.
  • Concentration. If you need to change several elements, first rearrange the code so the change only needs to happen in one element.
  • Isolation. If you only need to change a part of an element, extract that part so the whole subelement changes.
  • Baseline Measurement. Start projects by measuring the current state of the world. This goes against our engineering instincts to start fixing things, but when you measure the baseline you will actually know whether you are fixing things.

Learning

  • Call your shot. Before you run code, predict out loud exactly what will happen.
  • Concrete hypotheses. When the program is misbehaving, articulate exactly what you think is wrong before making a change. If you have two or more hypotheses, find a differential diagnosis.
  • Remove extraneous detail. When reporting a bug, find the shortest repro steps. When isolating a bug, find the shortest test case. When using a new API, start from the most basic example. “All that stuff can’t possibly matter,” is an expensive assumption when it’s wrong. E.g. see a bug on mobile, reproduce it with curl
  • Multiple scales. Move between scales freely. Maybe this is a design problem, not a testing problem. Maybe it is a people problem, not a technology problem [cheating, this is always true].

Transcend Logic

  • Symmetry. Things that are almost the same can be divided into parts that are identical and parts that are clearly different.
  • Aesthetics. Beauty is a powerful gradient to climb. It is also a liberating gradient to flout (e.g. inlining a bunch of functions into one giant mess).
  • Rhythm. Waiting until the right moment preserves energy and avoids clutter. Act with intensity when the time comes to act.
  • Tradeoffs. All decisions are subject to tradeoffs. It’s more important to know what the decision depends on than it is to know which answer to pick today (or which answer you picked yesterday).

Risk

  • Fun list. When tangential ideas come, note them and get back to work quickly. Revisit this list when you’ve reached a stopping spot.
  • Feed Ideas. Ideas are like frightened little birds. If you scare them away they will stop coming around. When you have an idea, feed it a little. Invalidate it as quickly as you can, but from data not from a lack of self-esteem.
  • 80/15/5. Spend 80% of your time on low-risk/reasonable-payoff work. Spend 15% of your time on related high-risk/high-payoff work. Spend 5% of your time on things that tickle you, regardless of payoff. Teach the next generation to do your 80% job. By the time someone is ready to take over, one of your 15% experiments (or, less frequently, one of your 5% experiments) will have paid off and will become your new 80%. Repeat.

Conclusion

The flow in this outline seems to be from reducing risks by managing time and increasing learning to mindfully taking risks by using your whole brain and quickly triaging ideas.

Computational complexity

Java (and any other programming language) is modelled in terms of types and values. At the theoretical level, a value is a representation for some quantum of information, and a type is a set of values. When we say value X is an instance of type Y, we are simply saying that X is a member of the set of values that is the type Y.

So that’s what the term instance really means: it describes a relationship not a thing.
The type system of the Java programming language supports two kinds of types, primitive types and reference types. The reference types are further divided into the Classes and array types. A Java Object is (according to the JLS) an instance of a reference type; i.e. either an array or an instance of a Class.

That’s the type theoretic view.

In practice, most Java developers treat the words “instance” and “object” as synonyms. (And that includes me then I’m trying to explain something quickly.) And most developers use “value” rather than “instance” to refer to an instance of a primitive type.

Coming soon!

Topics to be discussed:

    Complexity of problems
    Some problems are easy, some are hard
    The millennium prize problem – P vs NP
    Implications of P = NP
    Quantum computers – D Wave

Human Genome Project

The Human Genome Project was an international collaborative research project to sequence the human genome – find the exact order of nucleotides (chemical base units) that make up all of human DNA (deoxyribonucleic acid)

A genome is an organism’s complete set of deoxyribonucleic acid (DNA), a chemical compound that contains the genetic instructions needed to develop and direct the activities of every organism. DNA molecules are made of two twisting, paired strands. Each strand is made of four chemical units, called nucleotide bases. The bases are adenine (A), thymine (T), guanine (G) and cytosine (C). Bases on opposite strands pair specifically; an A always pairs with a T, and a C always with a G.

In short, the genome is divided into chromosomes, chromosomes contain genes, and genes are made of DNA.

Every person has a unique genome. The genome differences between two people are much smaller than the genome differences between people and our closest relatives, the chimpanzees. The human genome contains approximately 3 billion of these base pairs, which reside in the 23 pairs of chromosomes within the nucleus of all our cells. Each chromosome contains hundreds to thousands of genes, which carry the instructions for making proteins. Each of the estimated 30,000 genes in the human genome makes an average of three proteins.

Every cell in our body contains 23 pairs of chromosomes, within its nucleus. Each chromosome in each pair comes from the either parent. One pair of chromosomes, called the sex chromosomes, determines the sex of the person – males have XY pair whereas females have XX pair.

Some definitions are in order:

Genome: An organism’s complete set of genetic instructions. Each genome contains all of the information needed to build that organism and allow it to grow and develop. Our genome is approximately 3,000,000,000 base pairs long and is packaged into 23 pairs of chromosomes.The human genome contains 20,687 protein-coding genes.

Genes: Genes are small sections of DNA within the genome that code for proteins. They contain the instructions for our individual characteristics – like eye and hair colour.

Chromosome: A threadlike structure in our cells, made of a long DNA molecule, wrapped around a protein scaffold. Humans have 23 pairs of chromosomes.

DNA: DNA or deoxyribonucleic acid is a long molecule that contains our unique genetic code. Like a recipe book it holds the instructions for making all the proteins in our bodies.

The information in DNA is stored as a code made up of four chemical bases: adenine (A), guanine (G), cytosine (C), and thymine (T). Human DNA consists of about 3 billion bases, and more than 99 percent of those bases are the same in all people. The order, or sequence, of these bases determines the information available for building and maintaining an organism, similar to the way in which letters of the alphabet appear in a certain order to form words and sentences.

For more information, visit:
What is DNA?
Genetics Home Reference
National Human Genome Research Institute
Human chromosomes (Wikipedia)
Genome variations

Quote

At the heart of mathematics is pattern recognition and the joy of numerical play. What psychologists might call fluid reasoning, or mental power, is what you use when you’re struggling with a problem and don’t know what to do. This includes pattern recognition, abstract reasoning, and problem solving, and can be considered the engine powering numeracy. It is fundamental to so much of human and technological progress, as Erik Brynjolfsson and Andrew McAfee noted in The Second Machine Age. Math education, then, is really about training people to think creatively within a logical space and to solve problems.

Some definitions

Computer Science: Computer science is the scientific and practical approach to computation and its applications.

Algorithm: An algorithm is a step-by-step procedure for solving a problem (calculations).

Programming language: A programming language is a formal language (defined by a formal grammar) designed to write programs to be executed by a computer.

Grammar: A grammar or formal grammar is a set of production rules for strings in a formal language. The rules describe how to form strings from the language’s alphabet that are valid according to the language’s syntax. A grammar does not describe the meaning of the strings or what can be done with them in whatever context—only their form.  A grammar G is formally defined as a tuple (N, \Sigma, P, S)  where

(\Sigma \cup N)^{*} N (\Sigma \cup N)^{*} \rightarrow (\Sigma \cup N)^{*}
  •  S \in N  is a distinguished symbol that is the start symbol, also called the sentence symbol.

Program: A program is a list (sequence) of instructions written in a formal language that specifies how to perform a computation.

Computer: A computer is a general purpose device that can be programmed to perform computations (arithmetic and logic) automatically.

Programming paradigm: A programming paradigm is a fundamental style of computer programming, a way of building the structure and elements of computer programs. Examples are – imperative, declarative, functional, object-oriented, logic, and symbolic programming.

Imperative Programming: Imperative programming is a programming paradigm that describes computation in terms of statements that change a program state, i.e., a sequence of instructions to perform the computation (the how). In much the same way that imperative mood in natural languages expresses commands to take action, imperative programs define sequences of commands for the computer to perform.

Procedural Programming: Procedural programming is imperative programming in which the program is built from one or more procedures (also known as subroutines or functions).

Modular Programming:  Modular programming is a software design technique that emphasizes separating the functionality of a program into independent, interchangeable modules, such that each contains everything necessary to execute only one aspect of the desired functionality.

Structured Programming: Structured programming is a programming paradigm aimed at improving the clarity, quality, and development time of a computer program by making extensive use of subroutines, block structures and for and while loops – in contrast to using simple tests and jumps such as the goto statement which could lead to spaghetti code which is difficult both to follow and to maintain. It basically means using explicit control-flow structures rather than jumping about directly from instruction to instruction.

Functional Programming: Functional programming is a programming paradigm that treats computation as the evaluation of mathematical functions and avoids changing state and mutable data. It treats functions as values and deals with the what in contrast to imperative programming which deals with the how. See for further details – Functional Programming vs Procedural Programming.

Concurrency vs Parallelism

A concurrent application is an application for which computations logically execute simultaneously due to the semantics of the application. The problem is fundamentally concurrent, i.e., it is built into the problem definition. The primary goal of using concurrency is not to gain performance, but rather because that is the simplest and most direct way to write the program.

A parallel application is an application for which computations actually execute simultaneously in order to complete a problem in less time. The problem doesn’t inherently require parallelism, it can be stated sequentially. Parallelism means running a program on multiple processors, with the goal of improving performance. Ideally, this should be done invisibly, and with no semantic changes.

For more see:
Rob Pike on Concurrency vs Parallelism
What is the difference between Concurrency and Parallelism – Stackoverflow

Project Euler 26

Misc

So it’s really been a long time since I last wrote something. A lot has happened in the intervening time and I currently don’t have the luxury of time dwell on my thoughts much. So I will keep it brief and defer a lengthy post, sort of a catharsis, for a later time. So first things first. I have been learning Python and I am loving it. I rue why I didn’t encounter it earlier. Anyway since I am learning a great many concepts each day, I thought I better make little notes of the things I learn and save the links to the resources for reference in the future. So here are some tidbits I have been learning over the past few days:

1) GIL – Global Interpreter Lock. It’s a feature of the CPython implementation of Python – not a part of the core language itself. Other implementations like Jython, IronPython don’t have GIL. The CPython interpreter is not fully thread safe and therefore it employs a mechanism so that only one thread at a time can have access to the Python interpreter in order to support multi-threaded Python programs. GIL is a big mutex-type global lock (to protect reference counters) that must be held by the current thread before it can safely access Python objects. This ensures that only one thread executes Python bytecode at a time.

However, it is a coarse-grained locking at the interpreter level (as opposed to a fine-grained locking where every separate structure has its own lock at the application level) sufficient only to keep Python’s own structures (the CPython interpreter) in a consistent state. In short, the GIL only just protects the interpreter internally.

In order to support multi-threaded Python programs, the interpreter regularly releases and reacquires the lock – by default, every 10 bytecode instructions (this can be changed with sys.setcheckinterval()). The lock is also released and reacquired around potentially blocking I/O operations like reading or writing a file, so that other threads can run while the thread that requests the I/O is waiting for the I/O operation to complete.

Since only one Python thread can execute at a time, the multithreading is due to time-division multiplexing and in mutli-core systems, only one core will be executing instructions while others will be idle. Locking the entire interpreter makes it easier for the interpreter to be multi-threaded, at the expense of much of the parallelism afforded by multi-processor machines.

Due to tradeoffs between fine-grained and coarse-grained locking, it is difficult to remove GIL without affecting the performance of the interpreter for single-threaded Python programs. It’s because there’s more book-keeping to do when locking is done at a finer level and as a result single-threaded programs run slower. However, fine-grained locking allows greater parallelism – two threads can execute in parallel when they don’t share any resources. However there is a much larger administrative overhead. For every line of code, you may need to acquire and release several locks.

Some links for further details:

Why GIL? (stackoverflow)
Are locks unnecessary in multi-threaded Python?
Concurrency and Python