Principles of modelling, statistics and machine learning. Note that a model can be either mathematical or conceptual (e.g. a flowchart). See also statistics.
Modelling is about generating a distribution of results, rather than a single result. A step further would be to create a range of models, that each generate different types of distributions of results.
[toc]
Properties of good models (internal)
- Elegance or beauty. A balance between simplicity and complexity (correctness)
- Structure and cohesion.
- Variation and sensitivity.
- Generality. Something that applies to groups rather than individuals. A pattern rather than an incident.
- Fundamentals. Underlying causes. Often the infinitely great and infinitely small. E.g. going from materials to molecules to atoms to protons to quarks.
Conditions for good models (external)
- There are designed for a specific purpose. They are useful in some sense.
- Clear boundaries. Adjusted to a specific audience.
- Clear, domain-specific language. Explicit assumptions.
Guidelines
- Minimize the number of assumptions in a model. Simplicity is the greatest sophistication.
- Distinguish between systematic & random errors. And between sensitivity & specificity.
Risks
- All models are wrong. A model is never equal to reality. Models make oversimplifications.
- The real world can only be explained by multiplicity of models.
- Multiple models can be correct; they may provided different perspectives of the same phenomena.
- Generalization gap: performance on a test set is always worse than performance on the training set.
- Garbage in, garbage out: designing models requires reliable data.
- There is a difference between validation & verification.
- Optima can be global or local.
Analogies vs. First Principles
- If an event is observed frequently, then analogies can help.
- If a process is novel, but the inner process is understood, then it can be beneficial to start with fundamentals.
There is no free lunch; making models specific to one dataset decreases the performance on all others (i.e. generalization).
Models are exclusive by nature. They are biassed to a given scenario and context.
...
Learning revolves around revising your opinions.
Experiments not require complex setups. Start simple and expand if necessary. Three types:
- Learn by observation. Then reflect on the observations.
- Iterate. Change a specific variable and measure the effect. Then repeat.
- Experiment. E.g. A/B testing.
- Replicate multiple environments.
- Change one of them and use the other one as a baseline.
- Compare the results.
See learning.
With false assumptions, any idea can be sold.
Distinguish between the inside view and the outside view. Evaluate the idea first from the outside view perspective, and use that as a consideration (limitation) of the validity of any internal claims.
- Outside view: without going into details, is the source credible?
- Is it a radical idea, or conforming to consensus?
- Is the source a single entity, or a multiplicity? Is it dogmatic?
- Is the scope of the idea well-defined, or is the idea applied to various domains, without a firm grounding (racing thoughts)?
- Is there an "enemy view"? Is there an obsession over categorization of alternative ideas?
- Are both the advantages and disadvantages considered? Is there a tradeoff or is the idea an absolute?
- Is the channel of communication reliable? Public? Can others criticize the idea openly?
- Is the method of communication suitable? Does the source have honest intent?
- Inside view: go in depth.
- Are the assumptions valid?
- Given the assumptions of the source, does the story make sense?