Good literacy of computational resources can help to collect interesting data, speed up research, avoid mistakes, and reduce frustration… For starters, here are a few recommendations as well as tools I found handy.
Python: Python can basically do EVERYTHING. For whatever you need, somebody has probably written a package for it. This is also why most of my recommendations are based on Python. On Windows, I use Anaconda as a package manager (which comes with Python and a set of packages) and Spyder as an IDE (which you can install from within Anaconda). On Mac, I installed as stand-alone, writing code in Sublime, and running it from the terminal.
The standard modules for python are pandas and numpy which can be used for simulations, data analysis, and – in combination with matplotlib – for plotting. You will find nice tutorials basically everywhere, no need to explain that here. For regressions (and more), sklearn is the go-to package. Try to not write code twice but use function (def ()) if possible. Push your code to GitHub to keep track of changes and have a back-up. For Deep learning, I use keras which is a high-level language talking to Tensorflow, Google’s DeepLearning platform.
Speeding things up or computing clusters: One hurdle is time. Simulations can take a lot of time when sequential (e.g. I use simulation for market and distribution system analysis) and it can take ages until you realize that the setup you chose does not work out. Therefore, absolutely use computing clusters („High Performance Clusters“). You can set up multiple jobs at once and in parallel, let them run over night without bothering about freezing computers, Windows updates, or alike, and evaluate your results in the morning (or whenever convenient). Many universities have their own clusters or collaborate with others. For Bavaria/Germany, here is an overview of their computer clusters. Paid services are available through Amazon (AWS) and Google (Google Cloud Services).
Geo-spatial analysis: In energy and transportation, tools for geospatial analysis are often needed. On the internet, you can find so called shapefiles – e.g. for political boarders, road maps, etc. You can join this information with point data such as coordinates of solar panels or EV charging stations. Again, with geopandas, the world of python offers something helpful to match this data and visualize it. Getting it to run on Windows is tricky (I gave up) but on Mac/Linux it’s easy. For Windows, Arcgis is an alternative. It also offers a broad functionality which saves you some nitty-gritty implementation, e.g. for hotspot analysis, regression, etc. Downside: You need a paid licence.
Powerflow: I mostly work with distribution systems. For some easy power-flow implementation in python, pandapower is a great matrix-based tool. Because I contributed to it, I am, however, more familiar with GridLAB-D, a tool for distribution system simulation which not only comes with powerflow capabilities but also residential load models. The distribution system (including houses with HVAC/batteries/PV, lines, trafos,…) is represented in a glm model. What is nice is that you can interact with it in python, for instance, you can simulate the distribution system and pause it whenever you need – e.g. to run a market, let an EV arrive, or introduce a partial system failure.
Webcrawling: A lot of interesting information for energy systems is on the web but in a decentralized form, e.g. if a charging station is occupied or what the current electricity price is. Instead of manually selecting this information, it is handy to write a webcrawler. I use selenium which you can use to mimick your manual actions in a browser. Once you developed the code, you can run it in the background and let it collect the information you require.
Text recognition: Sometims, you get your data as pdfs or even pictures. If the pdf is made from a digital document (e.g. a Word document), the text is often already readable (OCR). In that case, python packages like PyPDF2 can help to read this text in an automated way. It is more tricky, of course, if the document has been scanned and is only available as a picture. Fortunately, only recently, a great tool called layoutparser has been released which recognizes text and layouts using Deep Learning. However, there are also more user-friendly but paid options like ABBYY which also offer functionality like csv export.